Spark in ScalaDecember 21, 2022 2022-12-21 16:36
Spark in Scala
Spark in Scala
This course takes advantage of the Scala programming language to make the most of the Spark framework. It offers a deep dive into distributed programming with Apache Spark and Scala, explaining the nuts and bolts of the Spark computational model, and the intricacies of the Spark APIs. In particular, the course provides insights on how to analyze program performance using the Spark UIs, and how to solve common optimization problems through practical exercises.
- Non-Scala programmers willing to jump into the Scala bandwagon and make the most of the Spark framework through Scala
- Spark programmers in Java or Python willing to start using the framework using Scala
- Big data programmers in Spark interested in improving their skills in the framework and getting a full understanding of performance problems and their solutions
- Topics: Spark, Catalyst, Data Frames, Scala, Optimization
- References: https://www.oreilly.com/library/view/spark-the-definitive/9781491912201/
What students will learn
- Feeling comfortable using Spark with Scala
- Getting acquaintance with the features of the Scala API of Spark
- Being able to understand Scala signatures made up from generics, implicits, etc.
- Getting used to the strongly-typed discipline of Scala
- Introduce the basic FP techniques needed to develop efficient and modular ETLs in Spark
- Identify and resolve common problems in Spark, with a special focus on performance
Module 1: Spark features I: the computational model (4 hours)
The basic principles and concepts behind Spark, as a framework for distributed processing.