Announcement: Creating Big Data Solutions with Impala

I am proud to announce that my latest screencast on Apache Impala called Creating Big Data Solutions with Impala was released on O’Reilly. This caps off a long relationship with Impala that started well before it was released publicly. My relationship with Impala started off when I first joined Cloudera. I started learning about the […]

Unit Testing Spark with Java

Unit testing, Apache Spark, and Java are three things you’ll rarely see together. And yes, all three are possible and work well together. Why Unit Test With Spark? I’m not an advocate of TDD (Test-Driven Development), except when I’m writing Big Data code. You can create as small of a dataset as you want, but […]

Three Top Themes From Strata+Hadoop World

I spoke at Strata+Hadoop World two weeks on Kafka. There were three main themes from the conference that I came away with: real-time Big Data is the (present) future, we should be using intermediary libraries instead of programming directly to an API, and applied AI is the (present) future. Real-time Big Data Companies are realizing […]

Kafka 0.9.0 Changes For Developers

Kafka 0.9.0 brings with it a bevy of goodness. Many of those features are aimed at operations, like security. Some are developer features, like a new Consumer API. This post will focus on the new features of 0.9.0 for developers. New Consumer API The most notable change is a brand-new consumer API. Unfortunately, there isn’t […]

The ROI of the Right Training

An investment in knowledge pays the best interest.

Benjamin Franklin

Benjamin Franklin is saying that investing in yourself or your team’s k…

Hadoop Cheat Sheet

Hadoop has a vast and vibrant developer community.  Following the lead of Hadoop’s name, the projects in the Hadoop ecosystem all have names that don’t correlate to their function.  This makes it really hard to figure out what each piece does or is used for.  This is a cheat sheet to help you keep track of things. […]

The Hidden Costs of the Wrong Training

You’re sitting there thinking about a training purchase. You look at the price again. That $750 sounds like a lot for an online course. That $20,000 sounds like a lot for an in-person course for the…

Is My Developer Team Ready for Big Data?

“Is my developer team ready for big data?” This is the most common question I’m asked by business leaders. Executives realize that big data projects can build their enterprise, but aren’t sure if their current development teams have the skills to actually create the solutions. Read the rest of my guest post on O’Reilly Ideas.

Rackspace Orchestration Image Names

If you are using a Rackspace orchestration and getting the error: The Image (Ubuntu 12.04 LTS (Precise Pangolin)) could not be found. or something similar, there isn’t any documentation of it. For some reason, Rackspace decided to change the name of the images for, at least Ubuntu, and possibly others. For us, this orchestration image […]