About Professional Spark Development
Takes a participant from no knowledge of Apache Spark to being able to develop with Spark professionally. It covers the main technologies of Hadoop: HDFS and MapReduce. There is an in-depth coverage of essential Big Data and Hadoop ecosystem technologies. The class ends with a consideration of how to architect Big Data solutions with Hadoop and its ecosystem.
Duration: 3 days
Intended Audience: Technical, Software Engineers, QA, Analysts
Prerequisites: Intermediate-Level Java
You Will Learn
What exists in the Big Data ecosystem so you can use the right tool for the right job.
An understanding of how HDFS works and how to interact with it.
An understanding of how MapReduce works and how each phase works.
An understanding of how Spark works and how each phase works.
- What are Java 8 Lambdas and how they make your Spark code humanly readable.
- The basics of coding a Spark job with Java to build your Big Data foundation.
- The various API methods in Spark and what they do.
- How SQL can be used with a Spark job and when that vastly improves your productivity and code.
- How to create Java code that runs as a function during a Spark SQL command to use existing Java code or do use case specific queries.
- How to process data in real-time with Spark.
- How to integrate and use Spark with the rest of your Big Data systems.
Course Outline
Professional Spark Development
Thinking in Big Data
Introducing Big Data
What is Hadoop?
The Ecosystem
Introduction to HDFS
Introduction to MapReduce
Coding With Spark
About Spark
Using Eclipse
Using Apache Maven
Functional Programming
Java API
Built-In Transformations and Actions
Advanced Spark
Advanced API
Shuffles
Caching
Avro
Spark and Avro
Unit Testing
Spark SQL
Spark SQL
Spark SQL API
Spark SQL UDFs
Spark Streaming
Spark Streaming
Streaming API
Advanced Streaming
Integrating Spark
Real-time Systems
Using With Hadoop MapReduce
Replacing Other Systems
Conclusion
Technologies Covered
- Apache Spark
- Apache Hadoop
- Apache Kafka