We Live, Eat, and Breathe This Stuff
The NFL ran a commercial a few years back. It featured various professional athletes from the NFL doing things you wouldn’t otherwise believe. One showed a quarterback shooting trap with his football instead of a shotgun. I’ve shot trap and it’s hard enough to with a shotgun, much less a football. I see a similar […]
Spark and Java – Yes, They Work Together
Person who chases two rabbits catches neither. – Confucius This applies really to learning. Learning two new and different technologies at the same time makes you catch neither. I’ve seen so many students trying to learn Big Data and a new programming language at the same time. A few succeed where most fail. Why Two? […]
SSH With Google Cloud
Let’s just say that Google Cloud’s SSH instructions aren’t the greatest. Here are the steps to SSH into your instance. It also assumes that you’ve installed the gcloud program. These instructions are for MacOSX and Linux. We start off by creating a new SSH key. $ ssh-keygen -t rsa -f ~/.ssh/google_compute_engine -C yourgooglecloudemailaddress@example.com The ssh-keygen […]
Announcement: Creating Big Data Solutions with Impala
I am proud to announce that my latest screencast on Apache Impala called Creating Big Data Solutions with Impala was released on O’Reilly. This caps off a long relationship with Impala that started well before it was released publicly. My relationship with Impala started off when I first joined Cloudera. I started learning about the […]
Unit Testing Spark with Java
Unit testing, Apache Spark, and Java are three things you’ll rarely see together. And yes, all three are possible and work well together. Why Unit Test With Spark? I’m not an advocate of TDD (Test-Driven Development), except when I’m writing Big Data code. You can create as small of a dataset as you want, but […]
Three Top Themes From Strata+Hadoop World
I spoke at Strata+Hadoop World two weeks on Kafka. There were three main themes from the conference that I came away with: real-time Big Data is the (present) future, we should be using intermediary libraries instead of programming directly to an API, and applied AI is the (present) future. Real-time Big Data Companies are realizing […]
Kafka 0.9.0 Changes For Developers
Kafka 0.9.0 brings with it a bevy of goodness. Many of those features are aimed at operations, like security. Some are developer features, like a new Consumer API. This post will focus on the new features of 0.9.0 for developers. New Consumer API The most notable change is a brand-new consumer API. Unfortunately, there isn’t […]
The ROI of the Right Training
An investment in knowledge pays the best interest.
Benjamin Franklin
Benjamin Franklin is saying that investing in yourself or your team’s k…
Hadoop Cheat Sheet
Hadoop has a vast and vibrant developer community. Following the lead of Hadoop’s name, the projects in the Hadoop ecosystem all have names that don’t correlate to their function. This makes it really hard to figure out what each piece does or is used for. This is a cheat sheet to help you keep track of things. […]
Identifying Great Training Even If You Know Nothing About the Subject
It’s easy for someone in the training industry to identify great training. They live, eat, and breath it. What about everyone else? How can you identify great training even if you aren’t in …