I Come Not To Bury Cloudera But To Praise It

It’s been a tumultuous past few weeks for big data vendors. First MapR is having problems (their update). Now, Cloudera is having problems.
As of today, Cloudera closed at $5.21 (June …

Reducing System Complexity with Event Sourcing

When I start working with a team, one the first questions I ask is “how much time do you spend creating new features versus making sure those new features don’t break something else.” …

Saving Money with Apache Pulsar Tiered Storage

As companies start to look at rolling out real-time messaging systems, it’s important to look at the overall hardware costs. With some forward planning, companies can save as much as 85% on their overall storage costs. Before we start getting into the cost comparisons, let me briefly show how Apache Kafka and Apache Pulsar store […]

Q and A: Viewpoints on Open Source

There are diverse viewpoints on open source and its usage as a service. I’ve attempted to give a synopsis of the issues and some background – but that’s only my viewpoint. I’m bringing in other people to give their diverse viewpoints to give a more well-rounded one. This is stemming from this Twitter thread. The […]

The Three Components of a Big Data Data Pipeline

The Three Components of a Big Data Data Pipeline There’s a common misconception in Big Data that you only need 1 technology to do everything that’s necessary for a data pipeline – and that’s incorrect. Data Engineering != Spark The misconception that Apache Spark is all you’ll need for your data pipeline is common. The […]

Advice for Small Teams and Startups on Data Engineering

Small data engineering teams require different tactics. Much of my writing is geared towards larger companies and teams. How should a startup or small data engineering team in a big company be set up and work? What, if anything, should be done different? Your First Data Engineer Your first data engineering hire is a crucial […]

Creating a Data Engineering Culture

At DataEngConf Barcelona, I premiered a new talk about the importance of creating a data engineering culture. I share what a data engineering culture is and what management needs to do to be successful with Big Data.
Here is the video from the conferen…