We Live, Eat, and Breathe This Stuff

The NFL ran a commercial a few years back. It featured various professional athletes from the NFL doing things you wouldn’t otherwise believe. One showed a quarterback shooting trap with his football instead of a shotgun. I’ve shot trap and it’s hard enough to with a shotgun, much less a football. I see a similar […]

Spark and Java – Yes, They Work Together

Person who chases two rabbits catches neither. – Confucius This applies really to learning. Learning two new and different technologies at the same time makes you catch neither. I’ve seen so many students trying to learn Big Data and a new programming language at the same time. A few succeed where most fail. Why Two? […]

SSH With Google Cloud

Let’s just say that Google Cloud’s SSH instructions aren’t the greatest. Here are the steps to SSH into your instance. It also assumes that you’ve installed the gcloud program. These instructions are for MacOSX and Linux. We start off by creating a new SSH key. $ ssh-keygen -t rsa -f ~/.ssh/google_compute_engine -C yourgooglecloudemailaddress@example.com The ssh-keygen […]

Announcement: Creating Big Data Solutions with Impala

I am proud to announce that my latest screencast on Apache Impala called Creating Big Data Solutions with Impala was released on O’Reilly. This caps off a long relationship with Impala that started well before it was released publicly. My relationship with Impala started off when I first joined Cloudera. I started learning about the […]

Unit Testing Spark with Java

Unit testing, Apache Spark, and Java are three things you’ll rarely see together. And yes, all three are possible and work well together. Why Unit Test With Spark? I’m not an advocate of TDD (Test-Driven Development), except when I’m writing Big Data code. You can create as small of a dataset as you want, but […]

Three Top Themes From Strata+Hadoop World

I spoke at Strata+Hadoop World two weeks on Kafka. There were three main themes from the conference that I came away with: real-time Big Data is the (present) future, we should be using intermediary libraries instead of programming directly to an API, and applied AI is the (present) future. Real-time Big Data Companies are realizing […]

Kafka 0.9.0 Changes For Developers

Kafka 0.9.0 brings with it a bevy of goodness. Many of those features are aimed at operations, like security. Some are developer features, like a new Consumer API. This post will focus on the new features of 0.9.0 for developers. New Consumer API The most notable change is a brand-new consumer API. Unfortunately, there isn’t […]

The ROI of the Right Training

An investment in knowledge pays the best interest.

Benjamin Franklin

Benjamin Franklin is saying that investing in yourself or your team’s k…

Hadoop Cheat Sheet

Hadoop has a vast and vibrant developer community.  Following the lead of Hadoop’s name, the projects in the Hadoop ecosystem all have names that don’t correlate to their function.  This makes it really hard to figure out what each piece does or is used for.  This is a cheat sheet to help you keep track of things. […]