Data Engineering, Big Data Institute, Page 9

Crawl, Walk, Run with Big Data

Crawl, Walk, Run with Big Data Attacking a Big Data project with an all-or-nothing mindset leads to an absolute failure. I highly suggest breaking the overall project into more manageable phases. These phases are called crawl, walk, and run. Crawling In this phase, you’re doing the absolute minimum to start using Big Data. This might […]

Q and A: Big Data strategy

How good a Big Data strategy can be defined by someone that doesn’t know the technology behind it? Today’s blog post comes from a question from a subscriber to my mailing list. The question come from André M: How good a Big Data strategy can be defined by someone that doesn’t know the technology behind […]

Is Big Data Cheap?

Companies and individuals often come into Big Data thinking everything is cheap. After all, the entire stack is open source, right? Well, some things are cheap and some things are more expensive. Software One of the important distinctions with Hadoop is that it isn’t an open source knock off of a better closed source framework. […]

Apache Kafka and Google Cloud Pub/Sub

Some of the contenders for Big Data messaging systems are Apache Kafka, Google Cloud Pub/Sub, and Amazon Kinesis (not discussed in this post). While similar in many ways, there are enough subtle differences that a Data Engineer needs to know. These can range from nice to know to we’ll have to switch. Cloud vs DIY […]

Kafka 0.10 Changes for Developers

Kafka 0.10 is out. Here are the changes that developers need to know about. Here is the new URL to the Kafka 0.10 JavaDoc. KafkaConsumer The KafkaConsumer had a minor change to that allows you to specify a maximum number of messages to return. You can set this by using the max.poll.records property to a […]

Question and Answers with the Apache Beam Team

Apache Beam just had its first release. Now that we’re working towards the second release, 0.2.0-incubating, I’m catching up with the committers and users to ask some of the common questions about Beam. Each committer and user is sharing their own opinion and not necessarily that of their company. Our interviewees are: Neville Li (NL) […]

Ability Gap – Why We Need Data Engineers

I had a conversation with another person in the Big Data field. We were discussing whether the Data Engineers would become a more common job title and migrate out of Silicon Valley. I told him yes. Big Data is downright complicated on many levels. There are too many new technologies and changes within technologies where […]

Big Data’s Required and Recommended Technical Skills

A common question beginners ask about Hadoop are the technical skills needed to get started. This helps level set what skills you need before you embark on a big data journey. For developers and administrators, I divide up the skills as those that required and those that are nice to have or recommended. Developer Skills […]

My Big Data Journey

Everyone’s Big Data journey starts somewhere. We’re often given stories of outright mastery, but I want to tell you how I got started with Big Data. Each of these stories about mastery forget or omit their humble beginnings. This is my story from my humble beginnings. Distributed Systems My specialty in programming has always been […]

The Case for Heron

For the past few months, I’ve been teaching at companies who are heavy users of Apache Storm. They’re also undertaking massive projects to move off of Storm. During that time, I’d say that something new was coming that might convince them to consider an alternative. Now, I’m free to talk about that alternative. Twitter has […]

Category: Data Engineering

Crawl, Walk, Run with Big Data

Q and A: Big Data strategy

Is Big Data Cheap?

Apache Kafka and Google Cloud Pub/Sub

Kafka 0.10 Changes for Developers

Question and Answers with the Apache Beam Team

Ability Gap – Why We Need Data Engineers

Big Data’s Required and Recommended Technical Skills

My Big Data Journey

The Case for Heron

Get your free copy of Data Engineering Teams: Creating Successful Big Data Teams and Products

Data Engineering Teams Book

Would you like to know what I teach successful organizations to do?

Mentoring

We’re here to help make the process more successful and the outcome more effective.

Architecture Reviews

The right tool for the job saves countless hours, time, money. Are you using the right tool for the job?

Project Acceleration

Why do so few companies create enormous value from Big Data while most fail?

Company

Resources

Resources

Stay updated with the latest.

Have a question?

Send us a message

or give us a call at +1 775.393.9122

© 2025 Big Data Institute

Privacy

© 2025 Big Data Institute

Privacy

Have a question?

Send us a message

or give us a call at +1 775.393.9122