Should You Even Do Big Data?

There’s an elephant in the room with Big Data. If an organization tries to half-ass their way through a Big Data project, they’re going to fail (usually a 5-10% odds of success). Given this really low success rate, should you even do Big Data? When I worked at a Big Data vendor, I couldn’t tell […]

Unit Testing Kafka Streams

Unit testing your Kafka code is incredibly important. I’ve already written about integration testing, consumer testing, and producer testing. Now, I’m going to share how to unit test your Kafka Streams code. To start off with, you will need to change your Maven pom.xml file. You’ll need to include the test libraries for Kafka Streams […]

Are Your Programming Skills Ready for Big Data?

As people start with Big Data, they go through the list of necessary skills. One of those crucial skills is to program. The question arises — how good does a person’s programming skills need to be? This is because programming skills are on a wide spectrum. There are people who are: Brand new to programming […]

The Veteran Skill on a Data Engineering Team

In my book *Data Engineering Teams, I talk about a skill that’s often overlooked and unknown to data engineering teams. Teams often don’t know they need a veteran, think they can’t afford a veteran, or don’t understand why you need a veteran on the team. In Chapter 3 “Data Engineering Teams,” I give my definition […]

How Much More Complicated Is Real-Time Big Data?

In my seminal post On Complexity in Big Data I talked about the level of complexity increase with Big Data. The post itself focused on Big Data batch systems. I didn’t really cover real-time complexity increases when dealing with Big Data. In the post, I argue that Big Data batch is 10x more complex than […]

Getting Into Big Data as a Consultant

I’m often asked how someone who is a consultant how they can get into Big Data. This is an important subject because it will define your success as consultant in the field. More importantly, it will define how successful your customers will be. Learning If Big Data is brand new to you, learning should be […]

Q and A: How do I improve my skills to become a Data Engineer?

Today’s blog post comes from a question from a subscriber on my mailing list. The questions come from Vaughn S.: How is programming used in data engineering? What do I have to offer at meetups? How can I round out my skillset? How is programming used in data engineering? I really really want to improve […]

Why You Shouldn’t Write Your Own Distributed System

Writing your own distributed system shouldn’t be a task you undertake lightly. Too often, I’m seeing teams create their own distributed system. In my experience, this is because they don’t know or think about all of the ramifications of creating their own distributed system. I say all of these things as someone who’s created 3 […]

What Is Big Data?

You’re starting to learn about Big Data or you’re wanting to learn more about Big Data. You start of by googling “what is Big Data?” You get an answer that doesn’t quite makes sense. The site talks about 3 Vs or sometimes they’re 4 Vs or even 5 Vs. These 3 Vs are usually defined […]

You’re Probably Not a Distributed Systems Engineer

As I’ve worked with software teams, I’ve found some interesting views on distributed systems. Some teams think they’re creators of distributed systems. They usually aren’t. I think there are three main groups of teams that interact with distributed systems: users of end data products, users of existing distributed system frameworks, and creators of distributed systems […]