About Professional Hadoop Development

Takes a participant from no knowledge of Hadoop to being able to develop with Hadoop professionally. It covers the main technologies of Hadoop: HDFS and MapReduce. There is an in-depth coverage of essential Big Data and Hadoop ecosystem technologies. The class ends with a consideration of how to architect Big Data solutions with Hadoop and its ecosystem.

Duration: 4 days

Intended Audience: Technical, Software Engineers, QA, Analysts

Prerequisites: Intermediate-Level Java

You Will Learn

What exists in the Big Data ecosystem so you can use the right tool for the right job.
An understanding of how HDFS works and how to interact with it.
An understanding of how MapReduce works and how each phase works.
The basics of coding a MapReduce job with Java to build your Big Data foundation.
What the advanced features of the MapReduce API that only the true experts know.
How Apache Crunch gives you a very different API from MapReduce and gives you a more Java-centric API.
How to use Apache Crunch to do the things not humanly possible in MapReduce like joining datasets and performing secondary sorts.
The simple and advanced SQL-like commands available in Hive.
How to extend Hive commands with custom non-Java code to do company or use case specific queries.
How to move data out of and into relational databases like MySQL and Oracle from Hadoop/Spark using Apache Sqoop.
How to move files and network data from many different computers to Hadoop using Apache Flume.
What is Hue and how it aids in creating browser-based data products.
How Apache Oozie makes it possible to create repeatable workflows that enterprises need.
How all of these technologies come together as a solution for ETL, click stream, and sessionization use cases.
The steps and iterations to take when creating a Big Data solution.

Course Outline

Professional Hadoop Development
Thinking in Big Data
Introducing Big Data
What is Hadoop?
The Ecosystem
Introduction to HDFS
Introduction to MapReduce
Coding with MapReduce
Java API
Streaming API
Using Eclipse
Regular Expressions
Using Apache Maven
Advanced MapReduce
Advanced MapReduce Classes
Unit Testing
Avro
MapReduce and Avro
Coding With Crunch
Using Crunch
Crunch API Pipelines
Advanced Crunch
Joins
Crunch Operations
Secondary Sorts
Unit Testing
Using Hive
Hive Overview
Hive Queries
Advanced Queries
Augmenting Hive With UDFs and Transforms
Hive Transforms
Hive UDFs
Pig Overview
Pig
Moving and Accessing Data
Sqoop
Flume
Creating Workflows
Hue
Oozie
Hue and Oozie
Hadoop Architectures
ETL
Click Steam
Other Architectures
Conclusion

Technologies Covered

Apache Hadoop
Apache Spark
Apache Hive
Apache Pig
Apache HBase
Apache Impala
Apache Kafka
Apache Crunch
Hue
Apache Oozie

About Professional Hadoop Development

You Will Learn

Course Outline

Technologies Covered

I want this class

Get your free copy of Data Engineering Teams: Creating Successful Big Data Teams and Products

Data Engineering Teams Book

Would you like to know what I teach successful organizations to do?

Mentoring

We’re here to help make the process more successful and the outcome more effective.

Architecture Reviews

The right tool for the job saves countless hours, time, money. Are you using the right tool for the job?

Project Acceleration

Why do so few companies create enormous value from Big Data while most fail?

Company

Resources

Resources

Stay updated with the latest.

Have a question?

Send us a message

or give us a call at +1 775.393.9122

© 2025 Big Data Institute

Privacy

© 2025 Big Data Institute

Privacy

Have a question?

Send us a message

or give us a call at +1 775.393.9122