Module 4: PySpark Programming: Environment, Basics, RDD

In this module, we will learn about PySpark programming. We will start with setting up PySpark environment in Google Colab and then we will learn about PySpark programming focusing on the Resilient Distributed Dataset (RDD) only.

Notebooks for practice