DTB-MLS: Machine Learning at Scale

Scale machine learning workflows and optimize distributed model training using Apache Spark on Databricks

This course covers Spark architecture for machine learning, distributed model training, hyperparameter tuning with frameworks like Optuna, and model deployment using MLflow, Unity Catalog and pandas APIs on Spark.

Why get trained: Learn how to build and scale machine learning workflows using Apache Spark, Spark ML, MLflow, Unity Catalog and distributed tuning frameworks such as Optuna and Hyperopt.
Why it matters: Scalable machine learning capabilities enable organizations to process large datasets efficiently and deploy production-grade models across distributed environments.
Who should attend: Data scientists, machine learning engineers and advanced analytics professionals looking to scale ML workloads and operationalize models on Databricks.

Build the capability to scale machine learning workflows and deploy distributed models using Databricks with Trainocate. HRD Corp Claimable.

Overview

In this course, you will gain theoretical and practical knowledge of Apache Spark’s architecture and its application to machine learning workloads within Databricks. You will learn when to use Spark for data preparation, model training, and deployment, while also gaining hands-on experience with Spark ML and pandas APIs on Spark.

This course will introduce you to advanced concepts like hyperparameter tuning and scaling Optuna with Spark. This course will use features and concepts introduced in the associate course such as MLflow and Unity Catalog for comprehensive model packaging and governance.

Skills Covered

Machine Learning Development with Spark
Distributed Model Tuning on Databricks
Deploying Machine Learning Models with Spark
Pandas on Spark

Prerequisites

The content was developed for participants with these skills/knowledge/abilities:

A beginner-level understanding of Python.
Basic understanding of DS/ML concepts (e.g. classification and regression models), common model metrics (e.g. F1-score), and Python libraries (e.g. scikit-learn and XGBoost).

Target Audience

Everyone who is interested

Course Curriculum

Download PDF

Module 1: Machine Learning Development with Spark

A Brief Overview of Spark Architecture for Machine Learning
Introduction to Spark ML for Model Development
Model Tracking and Packaging with MLflow and Unity Catalog on Databricks
Model Development with Spark

Module 2: Distributed Model Tuning on Databricks

Overview of Hyperparameter Tuning
Scalable HPO Frameworks on Databricks
Optuna and Hyperopt with Spark ML
HPO with Ray Tune

Module 3: Deploying Machine Learning Models with Spark

Deployment with Spark
Inference with Spark
Model Deployment with Spark
Optimization Strategies with Spark and Delta Lake
Model Deployment with Spark

Module 4: Pandas on Spark

Scaling with Pandas APIs
Pandas UDFs and Function APIs
Pandas APIs

Show full curriculum

Dates & Locations

Let’s make it work for you

Can’t find a date that fits? Need to train your whole team? Looking for a discount?
Speak to one of our learning experts today.

Talk To Us

July 24, 2026 - July 24, 2026

Location: Online

Modal: VILT

Availability: TBC

Exam:

RM 900

Yes, I want to add-on exam fees

Exam & Certification

Databricks Certified Machine Learning Professional.

The Databricks Certified Machine Learning Professional certification exam assesses an individual’s ability to design, implement, and manage enterprise-scale machine learning solutions using advanced Databricks platform capabilities. This includes proficiency in building scalable ML pipelines with SparkML, implementing distributed training and hyperparameter tuning, leveraging advanced MLflow features, and utilizing Feature Store concepts for automated feature pipelines.

The certification exam evaluates expertise in MLOps practices, including testing strategies, environment management with Databricks Asset Bundles, automated retraining workflows, and monitoring using Lakehouse Monitoring for drift detection.

Additionally, test-takers are assessed on their ability to implement deployment strategies, custom model serving, and model rollout management. Individuals who pass this certification exam can be expected to perform advanced machine learning engineering tasks at enterprise scale, implementing production-ready ML systems with comprehensive monitoring, testing, and deployment practices using the full feature set of Databricks.

This exam covers:

Model Development – 44%
ML Ops – 44%
Model Deployment – 12%

Training & Certification Guide

Frequently Asked Questions

Speak to a Training Consultant

All courses are HRD Claimable.
Get in touch with our team via the form or WhatsApp us on +6011-5119 6631

Overview

Skills Covered

Prerequisites

Target Audience

Course Curriculum

Dates & Locations

Let’s make it work for you

July 24, 2026 - July 24, 2026

Exam & Certification

Training & Certification Guide

Frequently Asked Questions

Speak to a Training Consultant

Explore Our Courses

Explore Tech Partners

Customer Service

Company

Trainocate: A Global Leader in Technology, Business, and People Development

Download Course Syllabus

Explore Tech Partners

Courses

Search for a course

Popular Courses

Popular Tech Articles