Why I Recommend My Clients NOT Use KSQL and Kafka Streams
Update: Confluent has renamed KSQL to ksqlDB
It’s that Kafka Summit time of year again. There are lots of announcements. Some are good and some you have to sift through in order to figure out…
Getting Your Programming Skills Ready for Data Engineering
Note: This post was guest written by John Desmond.
My preparation for the course began before I knew about the course, and before I realized that I wanted to specialize in data engineering. …
I Come Not To Bury Cloudera But To Praise It
It’s been a tumultuous past few weeks for big data vendors. First MapR is having problems (their update). Now, Cloudera is having problems.
As of today, Cloudera closed at $5.21 (June …
Reducing Operational Overhead with Pulsar Functions
It’s been fascinating watching the operational world change over the years. We started out by racking and stacking anything that needed to run. We wisened up a bit and started using vi…
Reducing System Complexity with Event Sourcing
When I start working with a team, one the first questions I ask is “how much time do you spend creating new features versus making sure those new features don’t break something else.” …
Saving Money with Apache Pulsar Tiered Storage
As companies start to look at rolling out real-time messaging systems, it’s important to look at the overall hardware costs. With some forward planning, companies can save as much as 85% on their overall storage costs. Before we start getting into the cost comparisons, let me briefly show how Apache Kafka and Apache Pulsar store […]
Q and A: Viewpoints on Open Source
There are diverse viewpoints on open source and its usage as a service. I’ve attempted to give a synopsis of the issues and some background – but that’s only my viewpoint. I’m bringing in other people to give their diverse viewpoints to give a more well-rounded one. This is stemming from this Twitter thread. The […]
The Three Components of a Big Data Data Pipeline
The Three Components of a Big Data Data Pipeline There’s a common misconception in Big Data that you only need 1 technology to do everything that’s necessary for a data pipeline – and that’s incorrect. Data Engineering != Spark The misconception that Apache Spark is all you’ll need for your data pipeline is common. The […]
Advice for Small Teams and Startups on Data Engineering
Small data engineering teams require different tactics. Much of my writing is geared towards larger companies and teams. How should a startup or small data engineering team in a big company be set up and work? What, if anything, should be done different? Your First Data Engineer Your first data engineering hire is a crucial […]
Creating a Data Engineering Culture
At DataEngConf Barcelona, I premiered a new talk about the importance of creating a data engineering culture. I share what a data engineering culture is and what management needs to do to be successful with Big Data.
Here is the video from the conferen…