Spark in Scala

Overview
Curriculum
Instructor

This course takes advantage of the Scala programming language to make the most of the Spark framework. It offers a deep dive into distributed programming with Apache Spark and Scala, explaining the nuts and bolts of the Spark computational model, and the intricacies of the Spark APIs. In particular, the course provides insights on how to analyze program performance using the Spark UIs, and how to solve common optimization problems through practical exercises.

Audience:

Non-Scala programmers willing to jump into the Scala bandwagon and make the most of the Spark framework through Scala
Spark programmers in Java or Python willing to start using the framework using Scala
Big data programmers in Spark interested in improving their skills in the framework and getting a full understanding of performance problems and their solutions

Course Topics

Topics: Spark, Catalyst, Data Frames, Scala, Optimization
References: https://www.oreilly.com/library/view/spark-the-definitive/9781491912201/

What students will learn

Feeling comfortable using Spark with Scala
Getting acquaintance with the features of the Scala API of Spark
Being able to understand Scala signatures made up from generics, implicits, etc.
Getting used to the strongly-typed discipline of Scala
Introduce the basic FP techniques needed to develop efficient and modular ETLs in Spark
Identify and resolve common problems in Spark, with a special focus on performance

Curriculum

5 Sections
19 Lessons
20 Hours

Expand all sectionsCollapse all sections

Module 1: Spark features I: the computational model (4 hours)
The basic principles and concepts behind Spark, as a framework for distributed processing.
3
Module 2: Spark features II: Spark APIs (4 hours)
How do we process massive distributed data sets in a cluster? With high-level APIs! We have two major alternative APIs available at your fingertips: statically typed and dynamically typed.
4
Module 3: Spark features III: Reading and writing in Spark (4 hours)
The last module focused on transformations, whereas this one focuses on the data side: formats, optimizations, management, etc.
3
Module 4: Spark optimizations (4 hours)
Learn how to take the most of the spark optimizations for free.
3
Module 5. Best practices on performance & modular design (4 hours)
Learn the best ways to optimize and organize your Spark code to make it more robust and performant.
6

Habla Computing

Habla Computing relies on a curated team of data engineer experts and functional programming evangelists and trainers. Pioneers in the use of Scala, with more than 12 years of experience in the language and its big data & microservice ecosystem. Habla boasts a proven track record of more than 40 public & on-site Scala trainings for companies like Telefónica, BBVA, Schibsted, Amadeus, etc.

Price

Instructor Habla Computing

Duration 20 hours

Lectures 19

Language English

Course Topics

What students will learn

Curriculum

Habla Computing

Developing Software Teams

Functional programming in Scala

Explore

Information

Spark in Scala

Spark in Scala

Course Topics

What students will learn

Curriculum

Habla Computing

Related Courses

Developing Software Teams

Functional programming in Scala

Explore

Information

Modal title