How much do companies lose before training?
Sometimes companies will start writing code or designing a solution before I train there. This is usually a bad idea. It really shows the difference between Big Data and small data. Making a mistake with small data isn’t costly and doesn’t take long to fix. Making a mistake with Big Data is very costly and […]
Maven Tips
Facebook Twitter LinkedIn Digg Google+ reddit Hacker News Delicious Working with complex and multi-module Maven projects can be a handful. These are a few tips to make that easier. I’m going to use Apache Beam as an example of a multi-module Maven project. The first helpful command is to list all of the modules. To […]
Apache Beam Regex
In a previous post, I showed how to use Beam’s Regex class to split up a string. In this post, I’m going to going to show some other features of the Regex class. The Regex class gives you a distributed way to work with strings. I tried to make the interface very familiar to Java […]
Beam’s Pico WordCount
There’s this friendly game in Big Data frameworks. It’s what’s the fewest lines of code it takes to do WordCount. I’m a committer on Apache Beam and most of my time is dedicated to making things easier for developers to use Beam. I also help explain Beam in articles and in conference sessions. One of […]
Are you attaining your goals?
We’re coming on that time of year when many people make their goals for the next year. Before you do that, reflect on how you did this year. If you accomplished a goal, how did you do it? If you didn’t accomplish a goal, what happened? Many people wrote in to me with the goal […]
Unit Testing Kafka Consumers
Unit testing your Kafka code is incredibly important. It’s transporting your most important data. This is especially true for your Consumers. They are the end point for using the data. There are often many different Consumers using the data. You’ll want to unit test all of them. In a previous post, I showed you how […]
Unit Testing Kafka
Unit testing your Kafka code is incredibly important. It’s transporting your most important data. As of 0.9.0 there’s a new way to unit test with mock objects. Refactoring Your Producer First of all, you’ll need to be able to change your Producer at runtime. Instead of using the KafkaProducer object directly, you’ll use the Producer […]
What will become of Big Data?
I’m often asked what I think will happen to Big Data over the next five to ten years. From a Developer’s point of view, they’re asking if investing their time in becoming a Data Engineer will pay off. We’re going to see a continuing maturity of Big Data technologies. There will be better stories on […]
Why should or shouldn’t you become a Data Engineer?
You’re considering a change to become a Data Engineer. Why should you do it? Why shouldn’t you do it? Let’s consider some reasons. Should There is a major shortage of qualified Data Engineers. There is a high demand and low supply of qualified Data Engineers. You can make an extra $20,000 to $60,000 per year […]
Q and A: How can I tear out Informatica and MySQL and put in Big Data?
Today’s blog post comes from a question from a subscriber on my mailing list. The question come from G.P.: I need to gain a hands on understanding of these technologies. I’m going to have to build some demonstration pilots before I would get any traction. I’m the VP of Analytics, so the engineering team think […]