Empower data-driven decisions with the Google Professional Data Engineer certification.

  • Leverage data and gain real-time insights that improve your decision-making and accelerate innovation.
  • Learn how to design and build data processing systems
  • Get an introduction to designing data processing systems, building end-to-end data pipelines, and analyzing data.
  • You’ll learn how to lift and shift Hadoop workloads using Dataproc
  • Process batch and streaming data on Dataflow
  • Manage data pipelines with Data Fusion and Cloud Composer, and more.

HRDC Claimable and Malaysian Bumiputeras are eligible for Yayasan Peneraju Financing Scheme. T&C applies.

Overview

Data Engineers design solutions that ensure maximum flexibility and scalability, while meeting all required security controls.

Get hands-on experience with designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hand-on labs to show you how to design data processing systems, build end-to-end data pipelines, analyze data, and implement machine learning.

This Google Cloud course covers structured, unstructured, and streaming data.

Limited Time Offer: Get up to 45% OFF selected Google Cloud courses in H2 2025 via our Google Cloud Certified program.

Skills Covered

  • Design and build data processing systems on Google Cloud.
  • Process batch and streaming data by implementing autoscaling data pipelines on Dataflow.
  • Derive business insights from extremely large datasets using BigQuery.
  • Leverage unstructured data using Spark and ML APIs on Dataproc.
  • Enable instant insights from streaming data.

Prerequisites

To get the most of out of this course, participants should have:

  • Prior Google Cloud experience using Cloud Shell and accessing products from the Google Cloud console.
  • Basic proficiency with a common query language such as SQL.
  • Experience with data modeling and ETL (extract, transform, load) activities.
  • Experience developing applications using a common programming language such as Python.

 

Target Audience

This class is intended for experienced developers who are responsible for managing big data transformations including:

  • Extracting, loading, transforming, cleaning, and validating data.
  • Designing pipelines and architectures for data processing.
  • Creating and maintaining machine learning and statistical models.
  • Querying datasets, visualizing query results and creating reports

Course Curriculum

Module 1: Data engineering tasks and components
Topics

  • The role of a data engineer
  • Data sources versus data syncs
  • Data formats
  • Storage solution options on Google Cloud
  • Metadata management options on Google Cloud
  • Share datasets using Analytics Hub

Objectives

  • Explain the role of a data engineer.
  • Understand the differences between a data source and a data sink.
  • Explain the different types of data formats.
  • Explain the storage solution options on Google Cloud.
  • Learn about the metadata management options on Google Cloud.
  • Understand how to share datasets with ease using Analytics Hub.
  • Understand how to load data into BigQuery using the Google Cloud console and/ or the gcloud CLI.

Module 2: Data replication and migration
Topics

  • Replication and migration architecture
  • The gcloud command line tool
  • Moving datasets
  • Datastream

Objectives

  • Explain the baseline Google Cloud data replication and migration architecture.
  • Understand the options and use cases for the gcloud command line tool.
  • Explain the functionality and use cases for the Storage Transfer Service.
  • Explain the functionality and use cases for the Transfer Appliance.
  • Understand the features and deployment of Datastream.

Module 3: The extract and load data pipeline pattern
Topics

  • Extract and load architecture
  • The bq command line tool
  • BigQuery Data Transfer Service
  • BigLake

Objectives

  • Explain the baseline extract and load architecture diagram.
  • Understand the options of the bq command line tool.
  • Explain the functionality and use cases for the BigQuery Data Transfer Service.
  • Explain the functionality and use cases for BigLake as a non-extract-load pattern.

Module 4: The extract, load, and transform data pipeline pattern
Topics

  • Extract, load, and transform (ELT) architecture
  • SQL scripting and scheduling with BigQuery
  • Dataform

Objectives

  • Explain the baseline extract, load, and transform architecture diagram.
  • Understand a common ELT pipeline on Google Cloud.
  • Learn about BigQuery’s SQL scripting and scheduling capabilities.
  • Explain the functionality and use cases for Dataform.

Module 5: The extract, transform, and load data pipeline pattern
Topics

  • Extract, transform, and load (ETL) architecture
  • Google Cloud GUI tools for ETL data pipelines
  • Batch data processing using Dataproc
  • Streaming data processing options
  • Bigtable and data pipelines

Objectives

  • Explain the baseline extract, transform, and load architecture diagram.
  • Learn about the GUI tools on Google Cloud used for ETL data pipelines.
  • Explain batch data processing using Dataproc.
  • Learn to use Dataproc Serverless for Spark for ETL.
  • Explain streaming data processing options.
  • Explain the role Bigtable plays in data pipelines.

Module 6: Automation techniques
Topics

  • Automation patterns and options for pipelines
  • Cloud Scheduler and Workflows
  • Cloud Composer
  • Cloud Run functions
  • Eventarc

Objectives

  • Explain the automation patterns and options available for pipelines.
  • Learn about Cloud Scheduler and workflows.
  • Learn about Cloud Composer.
  • Learn about Cloud Run functions.
  • Explain the functionality and automation use cases for Eventarc.

Module 7: Introduction to data engineering
Topics

  • Data engineer’s role
  • Data engineering challenges
  • Introduction to BigQuery
  • Data lakes and data warehouses
  • Transactional databases versus data warehouses
  • Effective partnership with other data teams
  • Management of data access and governance
  • Building of production-ready pipelines
  • Google Cloud customer case study

Objectives

  • Discuss the challenges of data engineering, and how building data pipelines in the cloud helps to address these.
  • Review and understand the purpose of a data lake versus a data warehouse, and when to use which.

Module 8: Build a Data Lake
Topics

  • Introduction to data lakes
  • Data storage and ETL options on Google Cloud
  • Building of a data lake using Cloud Storage
  • Secure Cloud Storage
  • Store all sorts of data types
  • Cloud SQL as your OLTP system

Objectives

  • Discuss why Cloud Storage is a great option for building a data lake on Google Cloud.
  • Explain how to use Cloud SQL for a relational data lake.

Module 9: Build a data warehouse
Topics

  • The modern data warehouse
  • Introduction to BigQuery
  • Get started with BigQuery
  • Loading of data into BigQuery
  • Exploration of schemas
  • Schema design
  • Nested and repeated fields
  • Optimization with partitioning and clustering

Objectives

  • Discuss requirements of a modern warehouse
  • Explain why BigQuery is the scalable data warehousing solution on Google Cloud.
  • Discuss the core concepts of BigQuery and review options of loading data into BigQuery.

Module 10: Introduction to building batch data pipelines
Topics

  • EL, ELT, ETL
  • Quality considerations
  • Ways of executing operations in BigQuery
  • Shortcomings
  • ETL to solve data quality issues

Objectives

  • Review different methods of loading data into your data lakes and warehouses: EL, ELT, and ETL.

Module 11: Execute Spark on Dataproc
Topics

  • The Hadoop ecosystem
  • Run Hadoop on Dataproc
  • Cloud Storage instead of HDFS
  • Optimize Dataproc

Objectives

  • Review the Hadoop ecosystem.
  • Discuss how to lift and shift your existing Hadoop workloads to the cloud using Dataproc.
  • Explain when you would use Cloud Storage instead of HDFS storage.
  • Explain how to optimize Dataproc jobs.

Module 12: Serverless data processing with Dataflow
Topics

  • Introduction to Dataflow
  • Reasons why customers value Dataflow
  • Dataflow pipelines
  • Aggregating with GroupByKey and Combine
  • Side inputs and windows
  • Dataflow templates

Objectives

  • Identify features customers value in Dataflow.
  • Discuss core concepts in Dataflow.
  • Review the use of Dataflow templates and SQL.
  • Write a simple Dataflow pipeline and run it both locally and on the cloud.
  • Identify Map and Reduce operations, execute the pipeline, and use command line parameters.
  • Read data from BigQuery into Dataflow and use the output of a pipeline as a sideinput to another pipeline.

Module 13: Manage data pipelines with Cloud Data Fusion and Cloud Composer
Topics

  • Build batch data pipelines visually with Cloud Data Fusion
    • Components
    • UI overview
    • Building a pipeline
    • Exploring data using Wrangler
  • Orchestrate work between Google Cloud services with Cloud Composer
    • Apache Airflow environment
    • DAGs and operators
    • Workflow scheduling
    • Monitoring and logging

Objectives

  • Discuss how to manage your data pipelines with Cloud Data Fusion and Cloud Composer.
  • Summarize how Cloud Data Fusion allows data analysts and ETL developers to wrangle data and build pipelines in a visual way.
  • Describe how Cloud Composer can help to orchestrate the work across multiple Google Cloud services.

Module 14: Introduction to processing streaming data
Topics

  • Process streaming data

Objectives

  • Explain streaming data processing.
  • Identify the Google Cloud products and tools that can help address streaming data challenges

Module 15: Serverless messaging with Pub/Sub
Topics

  • Introduction to Pub/Sub
  • Pub/Sub push versus pull
  • Publishing with Pub/Sub code

Objectives

  • Describe the Pub/Sub service.
  • Explain how Pub/Sub works.
  • Simulate real-time streaming sensor data using Pub/Sub

Module 16: Dataflow streaming features
Topics

  • Steaming data challenges
  • Dataflow windowing

Objectives

  • Describe the Dataflow service.
  • Build a stream processing pipeline for live traffic data.
  • Demonstrate how to handle late data using watermarks, triggers, and accumulation.

Module 17: High-throughput BigQuery and Bigtable streaming features
Topics

  • Streaming into BigQuery and visualizing results
  • High-throughput streaming with Bigtable
  • Optimizing Bigtable performance

Objectives

  • Describe how to perform ad-hoc analysis on streaming data using BigQuery and dashboards.
  • Discuss Bigtable as a low-latency solution.
  • Describe how to architect for Bigtable and how to ingest data into Bigtable.
  • Highlight performance considerations for the relevant services.

Module 18: Advanced BigQuery functionality and performance
Topics

  • Analytic window functions
  • GIS functions
  • Performance considerations

Objectives

  • Review some of BigQuery’s advanced analysis capabilities.
  • Discuss ways to improve query performance.

Dates & Locations

Let’s make it work for you

Can’t find a date that fits? Need to train your whole team? Looking for a discount?
Speak to one of our learning experts today.

July 7, 2026 - July 10, 2026

Location: Kuala Lumpur
Modal: ILT
Availability: TBC

July 7, 2026 - July 10, 2026

Location: Online
Modal: VILT
Availability: TBC

September 22, 2026 - September 25, 2026

Location: Kuala Lumpur
Modal: ILT
Availability: TBC

September 22, 2026 - September 25, 2026

Location: Online
Modal: VILT
Availability: TBC

November 17, 2026 - November 20, 2026

Location: Kuala Lumpur
Modal: ILT
Availability: TBC

November 17, 2026 - November 20, 2026

Location: Online
Modal: VILT
Availability: TBC
Trainocate exam and cert

Exam & Certification

Google Cloud Professional Data Engineer Certification

A Google Professional Data Engineer enables data-driven decision making by collecting, transforming, and publishing data. A data engineer should be able to design, build, operationalize, secure, and monitor data processing systems with a particular emphasis on security and compliance; scalability and efficiency; reliability and fidelity; and flexibility and portability. A data engineer should also be able to leverage, deploy, and continuously train pre-existing machine learning models.

Training & Certification Guide

A Professional Data Engineer enables data-driven decision making by collecting, transforming, and publishing data. A Data Engineer should be able to design, build, operationalize, secure, and monitor data processing systems with a particular emphasis on security and compliance; scalability and efficiency; reliability and fidelity; and flexibility and portability. A Data Engineer should also be able to leverage, deploy, and continuously train pre-existing machine learning models.

The Professional Data Engineer exam assesses your ability to:

  • Design data processing systems
  • Build and operationalize data processing systems
  • Operationalize machine learning models
  • Ensure solution quality
  • Length: 2 hours
  • Registration fee: $200 (plus tax where applicable)
  • Languages: English, Japanese.
  • Recommended experience: 3+ years of industry experience including 1+ years designing and managing solutions using GCP.

Google Cloud’s role-based certifications measure an individual’s proficiency at performing a specific job role using Google Cloud technology. The knowledge, skills, and abilities for each job role are assessed using rigorously developed industry-standard methods.

Google Cloud certifications empower individuals to advance their careers and give organizations the confidence to build highly skilled, effective teams.

Top 7 Data Analytics Certification 2025

Are you looking to level up your career in data analytics? With the increasing demand for data-driven insights in today’s business landscape, obtaining a data analytics certification can be a game-changer.

But with so many options available, how do you choose the best one for you?

Top Data Science Certifications You Should Know in 2025

Data science certifications are vital for IT professionals to validate their skills, set themselves apart in a competitive job market, and meet the rising demand in the field, as the employment of data scientists is projected to grow by 35% from 2022 to 2032.

Frequently Asked Questions

Google Cloud certifications help you advance your professional skills and demonstrate your value to hiring managers. Also once you become Google Cloud certified, you unlock the following benefits:

  • Distinguish yourself with a digital badge by sharing it on your social profile or resume.
  • Showcase your achievement on a publicly-accessible Google Cloud Certified Directory.
  • Get exclusive Google Cloud Certified swag for Professional certifications.
  • Network and exchange ideas with others in the Google Cloud Certified community.
  • Get access to global cloud virtual and in-person events hosted by Google Cloud.

A skill badge measures one’s knowledge of a specific product or service and tests their ability to apply that knowledge in an interactive hands on environment.

A certification measures an individual’s proficiency at performing a specific job role using Google Cloud technology. A certification exam tests one knowledge of a wide range of products and services needed to perform a job role versus one product/service. In order to prepare for a Google Cloud certification, it is recommended that an individual has multiple years of experience in the role, in addition to completing the recommended online training and skill badges.

Unless explicitly stated in the detailed exam descriptions, all Google Cloud certifications are valid for two years from the date certified. Candidates must recertify in order to maintain their certification status.

Yes. You will receive renewal notifications 90, 60, and 30 days prior to your expiration date. Reminder emails will be sent to the email address used during exam registration.

Speak to a Training Consultant

All courses are HRD Claimable.
Get in touch with our team via the form or WhatsApp us on +6011-5119 6631

Preferred mode of training
Checkboxes