What’s Driving Big Data in the Enterprise – and Why Now?

Big Data on Ulitzer

Subscribe to Big Data on Ulitzer: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Big Data on Ulitzer: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

BigData Authors: Pat Romanski, Elizabeth White, Liz McMillan, Yeshim Deniz, Zakia Bouachraoui

Related Topics: Apache Web Server Journal, Java Developer Magazine, Big Data on Ulitzer

Apache Web Server: Article

Crunching Big Data with Java

One Team, One Month, One JVM

I target customers who have large data processing needs. These come in various forms, but generically look like this: the customer gets huge data drops in some form or another and must process the data and output results in a very specific time frame. The customer has written some scripts, maybe some code and SQL. They have attempted some optimizations that helped a little, but they're not meeting their timeline. They have opportunities to take on even larger processing jobs, but don't have the capacity. They need help, now!

This is not an uncommon scenario. What to do? And what does this have to do with Java? Good questions. Hold onto the Java question, I'll get to that next. First, there are many products and frameworks for processing large amounts of data (such as relational database management systems or RDBMSes). But the vast majority of data that I see from day to day is not in a database. Data, even very large data, is usually exchanged between businesses in the form of files. Processing this data with single-threaded scripts or other code is just not working anymore. Making many passes through the data with SQL isn't working either.

More Stories By Jim Falgout

Jim Falgout has 20+ years of large-scale software development experience and is active in the Java development community. As Chief Technologist for Pervasive DataRush, he’s responsible for setting innovative design principles that guide the company’s engineering teams as they develop new releases and products for partners and customers. He applied dataflow principles to help architect Pervasive DataRush.

Prior to Pervasive, Jim held senior positions with NexQL, Voyence Net Perceptions/KD1 Convex Computer, Sequel Systems and E-Systems. Jim has a B.Sc. (Cum Laude) in Computer Science from Nicholls State University. He can be reached at [email protected]

Comments (2) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

Most Recent Comments
Eman 04/05/08 10:33:42 AM EDT

Funny, Cos, you are pointing out how Java isn't all that "free & open" like its corp. creator claims it is... the beauty of open source + patent law = morass of bear traps

Frankly, I haven't seen any Java framework that holds a match to this DataRush thing... download and see for yourself.

Cos 03/27/08 08:05:17 PM EDT

Daah! Check US Patent 7,020,699
Filed: December 19, 2001