Getting Stuck Crawling with Big Data

I always encourage companies to break down their Big Data projects into smaller pieces. I call this process crawl, walk, run. There is an interesting corollary to this process. Some companies get stuck at the crawl phase and don’t progress on to the walk and run phases. The first time I saw this, I was […]

Strata+Hadoop World and Trends

Last week, I gave two talks about Strata+Hadoop World. These talks covered some of the up and coming technologies in Big Data. I describe Strata as the Super Bowl of Big Data conferences. This is where you’ll find the best minds talking about the present and future conditions of Big Data. My first session was […]

Solving the First and Last Mile Problem With Kafka Part 2

In the first post in the series, I talked about Big Data’s first and last mile problems. I showed how the first mile problems could be solved with Kafka. In this post, I’m going to talk about the last mile problems. Big Data Last Mile With Big Data we’re faced with finding value in large […]

Solving the First and Last Mile Problem With Kafka Part 1

In telecommunications, there is the term “last mile”. It refers to getting the connection to customer. It’s the last mile between the company’s infrastructure and the customer’s location. We have similar issues in Big Data. We don’t just have a last mile problem; we have a first mile problem too. We have an issue with […]

How Programmers Should Start Viewing Training

There’s a sad thing that’s limiting our growth as programmers. We (programmers) don’t invest in ourselves like other professions. The business person will happily spend $3,000 to attend a class to improve a part of their business. The marketing person will spend $3,000 to attend a class to improve their marketing and sales copy. A […]

On complexity in big data

After years of teaching Big Data, I’ve come up with the best explanation of why it isn’t easy, cheap, or quick. I wrote the in-depth piece published on O’Reilly.

Q and A: Is a Data Engineer the same thing as a BI or DBA?

Today’s blog post comes from a question from a subscriber to my mailing list. The question come from Alpesh D.: I have been getting your emails and they all seem to make sense. However, did I understand it correct that you believe all big data engineers need to be to use Java? I come from […]

Crawl, Walk, Run with Big Data

Crawl, Walk, Run with Big Data Attacking a Big Data project with an all-or-nothing mindset leads to an absolute failure. I highly suggest breaking the overall project into more manageable phases. These phases are called crawl, walk, and run. Crawling In this phase, you’re doing the absolute minimum to start using Big Data. This might […]

Q and A: Big Data strategy

How good a Big Data strategy can be defined by someone that doesn’t know the technology behind it? Today’s blog post comes from a question from a subscriber to my mailing list. The question come from André M: How good a Big Data strategy can be defined by someone that doesn’t know the technology behind […]

Is Big Data Cheap?

Companies and individuals often come into Big Data thinking everything is cheap. After all, the entire stack is open source, right? Well, some things are cheap and some things are more expensive. Software One of the important distinctions with Hadoop is that it isn’t an open source knock off of a better closed source framework. […]