Overview

This course covers how to implement various flavors of production ML systems, including:

  • Static, dynamic, and continuous training
  • Static and dynamic inference
  • Batch and online processing

You will delve into TensorFlow abstraction levels, explore options for distributed training, and learn how to write distributed training models using custom estimators.

Skills Covered

Upon completion of this course, learners will be able to:

  • Differentiate between static, dynamic, and continuous training pipelines.
  • Implement static and dynamic inference for production models.
  • Choose between batch and online processing based on use case requirements.
  • Navigate TensorFlow abstraction levels (from high-level Keras to low-level custom ops).
  • Set up and manage distributed training jobs on Google Cloud.
  • Write distributed training models using custom estimators.
  • Apply best practices for productionizing ML systems.

Prerequisites

  • Completion of Course 1 in the Advanced Machine Learning on Google Cloud series (recommended)
  • Working knowledge of TensorFlow (including Keras)
  • Familiarity with Python and basic cloud concepts

Target Audience

  • Cloud Architects designing ML pipelines
  • Intermediate Machine Learning Engineers
  • Data Scientists moving models to production
  • Learners who have completed the first course in the Advanced ML on Google Cloud series

Course Curriculum

Module 1: Introduction to Production ML Systems

  • Production challenges
  • Static vs. dynamic vs. continuous training
  • Batch vs. online processing

Module 2: Inference in Production

  • Static inference (precomputed)
  • Dynamic inference (on-demand)
  • Latency and throughput considerations

Module 3: TensorFlow Abstraction Layers

  • TF 2.x ecosystem
  •  Estimators, Keras, and custom loops
  • When to use which abstraction

Module 4: Distributed Training Fundamentals

  • Why distribute training
  • Data parallelism vs. model parallelism
  • MirroredStrategy, TPUStrategy, MultiWorkerMirroredStrategy

Module 5: Custom Estimators for Distributed Training

  • Writing custom estimators
  • Model functions and input functions
  • Lab: Distributed training with custom estimators

Module 6: Production Pipeline Architecture

  • Continuous training pipelines
  • Model versioning and rollback
  • Monitoring and alerting

Module 7: Challenge Lab (Skills Badge)

  • Jump directly to a challenge lab
  • Demonstrate production ML skills without completing all modules

Dates & Locations

Let’s make it work for you

Can’t find a date that fits? Need to train your whole team? Looking for a discount?
Speak to one of our learning experts today.

July 7, 2026 - July 8, 2026

Location: Kuala Lumpur
Modal: ILT
Availability: TBC

July 7, 2026 - July 8, 2026

Location: Online
Modal: VILT
Availability: TBC

September 8, 2026 - September 9, 2026

Location: Kuala Lumpur
Modal: ILT
Availability: TBC

September 8, 2026 - September 9, 2026

Location: Online
Modal: VILT
Availability: TBC

November 3, 2026 - November 4, 2026

Location: Kuala Lumpur
Modal: ILT
Availability: TBC

November 3, 2026 - November 4, 2026

Location: Online
Modal: VILT
Availability: TBC
Trainocate exam and cert

Exam & Certification

Training & Certification Guide

Frequently Asked Questions

Speak to a Training Consultant

All courses are HRD Claimable.
Get in touch with our team via the form or WhatsApp us on +6011-5119 6631

Preferred mode of training
Checkboxes