Course Summary: Advanced Data Analytics with Spark
Our Advanced Data Analytics with Spark cohort is a 4 week evening course.
Apache Spark is a fast and general engine for large-scale data processing.
Spark was developed as an alternative to the traditional MapReduce processing paradigm. By using in
memory storage, Spark can achieve up to 100X the speed of Hadoop MapReduce and is 10X faster when
running on disk. Spark is preferred for iterative processing, which is being done by many machine
Sparks runs on top of Hadoop, as a standalone platform or in the cloud. It is easy to use, fast and has a
powerful stack of libraries including SQL and Dataframes. Our course will require that you have some experience
programming in python.
Course Details: Advanced Data Analytics with Spark
Week 1 : Spark Fundamentals
Week 2 : Spark SQL
- C: Introduction to Spark
- C: Why Spark?
- C: Introduction to RDDs
- C: Data sharing
- C: Data Partitioning
Week 3 : Spark Streaming
- C: Working with the Spark Shell
- C: What is Spark SQL?
- C: Spark SQL vs Spark Core
- C: DataFrames API
Learning Objectives: Advanced Data Analytics with Spark
- C: DStreams
- C: Transformations: Stateless and Stateful Transformation
- C: Checkpointing and Output Operations
- C: Tuning and Debugging Spark
- Become familiar with Spark fundamentals. Learn about the different components of Spark.
- Use Spark on a HDFS cluster. Gain experience working with RDDs.
- Learn how to tune and debug Spark.
- Tools used : Python, Spark
- Drop us a note, to schedule an interview, and see if this course is a good fit for
- January 10th, 2017 - February 2nd, 2017
Tuesday and Thursday: 6:30 PM to 9:30 PM
Financing Options available with: