About Real-time Systems with Spark Streaming and Kafka
Takes a participant through the benefits and challenges of real-time Big Data systems. We cover real-time Big Data services that are open source or managed services from Cloud providers. The class focuses on Apache Kafka and Apache Spark Streaming. It shows how to create consumers and publishers in Kafka. Then, we see how to use Apache Spark Streaming to process the data in Kafka and send it back to Kafka. Finally, the data is visualized in real-time on a webpage using Kafka REST.
Duration: 2 days
Intended Audience: Technical, Software Engineers, QA, Analysts
Prerequisites: Intermediate-Level Java
You Will Learn
- How to create large scale real-time systems using both Apache Kafka and Apache Spark Streaming.
How real-time distributed systems are different from batch systems.
- How to create Kafka producers and consumers.
- How to process data in Kafka with Spark Streaming and place the results back into Kafka.
- How to visualize data and show data in real-time on a web page.
Course Outline
Real-time Data Pipelines
Real-time Technologies
Real-time Pipelines
Pros and Cons of Real-time
Using the Cloud
Cloud Providers
Real-time Technologies
Choosing a Provider
Ingesting Data
Real-time Ingestion
Real-time ETL
Kafka
About Kafka
Kafka Internals
Kafka API
Processing Data
Real-time Data Processing
Real-time Processing Technologies
Spark Streaming
Spark Streaming
Streaming API
Advanced Streaming
Data Products
Analysis of Data
Dashboarding
Technologies Covered
In-depth coverage:
- Apache Spark Streaming
- Apache Kafka
Covered:
- Amazon Web Services
- Microsoft Azure
- Google Cloud
- IBM SoftLayer
- Amazon Kinesis
- Microsoft Event Hubs
- Google Pub/Sub
- Apache NiFi
- Apache Flink
- Apache Apex
- Apache Storm
- Heron
- Azure Stream Analytics
- Google Cloud Dataflow
- Apache Beam