Announcement: Creating Big Data Solutions with Impala
I am proud to announce that my latest screencast on Apache Impala called Creating Big Data Solutions with Impala was released on O’Reilly. This caps off a long relationship with Impala that started well before it was released publicly. My relationship with Impala started off when I first joined Cloudera. I started learning about the […]
Unit Testing Spark with Java
Unit testing, Apache Spark, and Java are three things you’ll rarely see together. And yes, all three are possible and work well together. Why Unit Test With Spark? I’m not an advocate of TDD (Test-Driven Development), except when I’m writing Big Data code. You can create as small of a dataset as you want, but […]
Three Top Themes From Strata+Hadoop World
I spoke at Strata+Hadoop World two weeks on Kafka. There were three main themes from the conference that I came away with: real-time Big Data is the (present) future, we should be using intermediary libraries instead of programming directly to an API, and applied AI is the (present) future. Real-time Big Data Companies are realizing […]
Kafka 0.9.0 Changes For Developers
Kafka 0.9.0 brings with it a bevy of goodness. Many of those features are aimed at operations, like security. Some are developer features, like a new Consumer API. This post will focus on the new features of 0.9.0 for developers. New Consumer API The most notable change is a brand-new consumer API. Unfortunately, there isn’t […]
The ROI of the Right Training
An investment in knowledge pays the best interest.
Benjamin Franklin
Benjamin Franklin is saying that investing in yourself or your team’s k…
Hadoop Cheat Sheet
Hadoop has a vast and vibrant developer community. Following the lead of Hadoop’s name, the projects in the Hadoop ecosystem all have names that don’t correlate to their function. This makes it really hard to figure out what each piece does or is used for. This is a cheat sheet to help you keep track of things. […]
Identifying Great Training Even If You Know Nothing About the Subject
It’s easy for someone in the training industry to identify great training. They live, eat, and breath it. What about everyone else? How can you identify great training even if you aren’t in …
The Hidden Costs of the Wrong Training
You’re sitting there thinking about a training purchase. You look at the price again. That $750 sounds like a lot for an online course. That $20,000 sounds like a lot for an in-person course for the…
Is My Developer Team Ready for Big Data?
“Is my developer team ready for big data?” This is the most common question I’m asked by business leaders. Executives realize that big data projects can build their enterprise, but aren’t sure if their current development teams have the skills to actually create the solutions. Read the rest of my guest post on O’Reilly Ideas.
Rackspace Orchestration Image Names
If you are using a Rackspace orchestration and getting the error: The Image (Ubuntu 12.04 LTS (Precise Pangolin)) could not be found. or something similar, there isn’t any documentation of it. For some reason, Rackspace decided to change the name of the images for, at least Ubuntu, and possibly others. For us, this orchestration image […]