Apache Spark APIs for Data Processing

Unlock the power of big data with our self-paced, hands-on course by CERN IT.

Welcome

Course Introduction

Welcome to Apache Spark APIs for Data Processing, a self-paced, interactive course developed by CERN IT. This course offers a practical introduction to Spark’s powerful architecture and essential capabilities, combining theoretical insights with engaging demonstrations and hands-on exercises.

  • Comprehensive API Coverage: Dive into Spark DataFrames, SQL queries, real-time Streaming, and Machine Learning to harness Spark's full potential.
  • Interactive Hands-On Tutorials: Build practical experience with guided exercises in Python using Jupyter notebooks.
  • Real-World Deployments: Explore how you can deploy Apache Spark at scale and see examples from CERN's accelerator logging systems, IT infrastructure monitoring, and physics research.

By course end, you will be equipped to confidently apply Apache Spark to large-scale data processing scenarios.

Accompanying Notebooks

Course Lectures & Tutorials

Bonus Material

Credits

Author & Contact: Luca.Canali@cern.ch
Presented by: CERN-IT Data Analytics Services
Contributors: R. Castellotti, P. Kothuri
License: CC BY-SA 4.0
Published: November 2022 | Last modified: March 2025