Scale machine learning workflows and optimize distributed model training using Apache Spark on Databricks

This course covers Spark architecture for machine learning, distributed model training, hyperparameter tuning with frameworks like Optuna, and model deployment using MLflow, Unity Catalog and pandas APIs on Spark.

  • Why get trained: Learn how to build and scale machine learning workflows using Apache Spark, Spark ML, MLflow, Unity Catalog and distributed tuning frameworks such as Optuna and Hyperopt.
  • Why it matters: Scalable machine learning capabilities enable organizations to process large datasets efficiently and deploy production-grade models across distributed environments.
  • Who should attend: Data scientists, machine learning engineers and advanced analytics professionals looking to scale ML workloads and operationalize models on Databricks.

Build the capability to scale machine learning workflows and deploy distributed models using Databricks with Trainocate. HRD Corp Claimable.

Overview

In this course, you will gain theoretical and practical knowledge of Apache Spark’s architecture and its application to machine learning workloads within Databricks. You will learn when to use Spark for data preparation, model training, and deployment, while also gaining hands-on experience with Spark ML and pandas APIs on Spark.

This course will introduce you to advanced concepts like hyperparameter tuning and scaling Optuna with Spark. This course will use features and concepts introduced in the associate course such as MLflow and Unity Catalog for comprehensive model packaging and governance.

Skills Covered

  • Machine Learning Development with Spark
  • Distributed Model Tuning on Databricks
  • Deploying Machine Learning Models with Spark
  • Pandas on Spark

Prerequisites

The content was developed for participants with these skills/knowledge/abilities:

  • A beginner-level understanding of Python.
  • Basic understanding of DS/ML concepts (e.g. classification and regression models), common model metrics (e.g. F1-score), and Python libraries (e.g. scikit-learn and XGBoost).

Target Audience

  • Everyone who is interested

Course Curriculum

Module 1: Machine Learning Development with Spark

  • A Brief Overview of Spark Architecture for Machine Learning
  • Introduction to Spark ML for Model Development
  • Model Tracking and Packaging with MLflow and Unity Catalog on Databricks
  • Model Development with Spark

Module 2: Distributed Model Tuning on Databricks

  • Overview of Hyperparameter Tuning
  • Scalable HPO Frameworks on Databricks
  • Optuna and Hyperopt with Spark ML
  • HPO with Ray Tune

Module 3: Deploying Machine Learning Models with Spark

  • Deployment with Spark
  • Inference with Spark
  • Model Deployment with Spark
  • Optimization Strategies with Spark and Delta Lake
  • Model Deployment with Spark

Module 4: Pandas on Spark

  • Scaling with Pandas APIs
  • Pandas UDFs and Function APIs
  • Pandas APIs

Dates & Locations

Let’s make it work for you

Can’t find a date that fits? Need to train your whole team? Looking for a discount?
Speak to one of our learning experts today.

June 24, 2026 - June 24, 2026

Location: Online
Modal: VILT
Availability: TBC
Exam:
RM 900

July 24, 2026 - July 24, 2026

Location: Online
Modal: VILT
Availability: TBC
Exam:
RM 900
Trainocate exam and cert

Exam & Certification

Databricks Certified Machine Learning Professional.

The Databricks Certified Machine Learning Professional certification exam assesses an individual’s ability to design, implement, and manage enterprise-scale machine learning solutions using advanced Databricks platform capabilities. This includes proficiency in building scalable ML pipelines with SparkML, implementing distributed training and hyperparameter tuning, leveraging advanced MLflow features, and utilizing Feature Store concepts for automated feature pipelines.

The certification exam evaluates expertise in MLOps practices, including testing strategies, environment management with Databricks Asset Bundles, automated retraining workflows, and monitoring using Lakehouse Monitoring for drift detection.

Additionally, test-takers are assessed on their ability to implement deployment strategies, custom model serving, and model rollout management. Individuals who pass this certification exam can be expected to perform advanced machine learning engineering tasks at enterprise scale, implementing production-ready ML systems with comprehensive monitoring, testing, and deployment practices using the full feature set of Databricks.

This exam covers:

  • Model Development – 44%
  • ML Ops – 44%
  • Model Deployment – 12%

Training & Certification Guide

Frequently Asked Questions

Speak to a Training Consultant

All courses are HRD Claimable.
Get in touch with our team via the form or WhatsApp us on +6011-5119 6631

Preferred mode of training
Checkboxes