What Is Big Data?
You’re starting to learn about Big Data or you’re wanting to learn more about Big Data. You start of by googling “what is Big Data?” You get an answer that doesn’t quite makes sense. The site talks about 3 Vs or sometimes they’re 4 Vs or even 5 Vs. These 3 Vs are usually defined […]
You’re Probably Not a Distributed Systems Engineer
As I’ve worked with software teams, I’ve found some interesting views on distributed systems. Some teams think they’re creators of distributed systems. They usually aren’t. I think there are three main groups of teams that interact with distributed systems: users of end data products, users of existing distributed system frameworks, and creators of distributed systems […]
On Cheating with Big Data
To achieve the scales of Big Data, you have to cheat in some way. Sometimes people call these tradeoffs. In Big Data, I prefer to call them cheats. A tradeoff makes it sound like a small thing, but the reality is that Big Data tradeoffs can make a use case possible or impossible. I don’t […]
When You Have the Wrong Team for Big Data
In my book, Data Engineering Teams, I talk about the right skills and people to be on a data engineering team. The right skills and people are incredibly important to the success, or failure, of a Big Data project. Sometimes it’s easier to understand this point with some real examples. Instead of telling you what […]
Integration Testing for Kafka
We’re creating more and more complicated data pipelines and systems with Kafka. These interactions are becoming even more complex as we create microservices. As we create these complex systems, we aren’t thinking about how to test, debug, or fix them. These 3 parts are the defining factors of a project’s ongoing success. What Are Integration […]
How Training is Delivered – From the Beginning to the End
Teams will often tell me how much better my training classes are than what they’ve had before. They go on to tell me how the training they’ve attended previously were useless. My students are surprised that I can answer programming questions, no matter how difficult they are. I want to share some of the behind […]
Two Halves Don’t Make a Whole
In Chapter 3 of my Data Engineering Teams book, I show you how to do a skill gap analysis. During the analysis of the team, you either say the person has the skill or not. It’s a very binary decision. Some people have written me asking if it can be a fraction. Instead of a […]
Apache Kafka and Amazon Kinesis
This post will focus on the key differences a Data Engineer or Architect needs to know between Apache Kafka and Amazon Kinesis. Cloud vs DIY Some of the contenders for Big Data messaging systems are Apache Kafka, Amazon Kinesis, and Google Cloud Pub/Sub (discussed in this post). While similar in many ways, there are enough […]
This is Useless (Without Use Cases)
Sometimes I’ll write a post and the comments will say something to the effect of “this is useless.” Other times I’ll be finishing up a class and a student will ask me why I didn’t cover what they’re trying to. I’ve written example code and people will ask me why didn’t write it on something […]
The Blame Game
When a Big Data project fails, there’s plenty of blame to go around. When I do the retrospectives with teams who are failing or about to fail, their blame is often misplaced. There’s a focus on blaming the technology. The more difficult considerations of looking inwards at the team itself is often skipped. The teams […]