Empower data-driven decisions with the Google Professional Data Engineer certification.
- Leverage data and gain real-time insights that improve your decision-making and accelerate innovation.
- Learn how to design and build data processing systems
- Get an introduction to designing data processing systems, building end-to-end data pipelines, and analyzing data.
- You’ll learn how to lift and shift Hadoop workloads using Dataproc
- Process batch and streaming data on Dataflow
- Manage data pipelines with Data Fusion and Cloud Composer, and more.
HRDC Claimable and Malaysian Bumiputeras are eligible for Yayasan Peneraju Financing Scheme. T&C applies.

Overview
Data Engineers design solutions that ensure maximum flexibility and scalability, while meeting all required security controls.
Get hands-on experience with designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hand-on labs to show you how to design data processing systems, build end-to-end data pipelines, analyze data, and implement machine learning.
This Google Cloud course covers structured, unstructured, and streaming data.
Limited Time Offer: Get up to 45% OFF selected Google Cloud courses in H2 2025 via our Google Cloud Certified program.
Skills Covered
- Design and build data processing systems on Google Cloud.
- Process batch and streaming data by implementing autoscaling data pipelines on Dataflow.
- Derive business insights from extremely large datasets using BigQuery.
- Leverage unstructured data using Spark and ML APIs on Dataproc.
- Enable instant insights from streaming data.
Prerequisites
To get the most of out of this course, participants should have:
- Prior Google Cloud experience using Cloud Shell and accessing products from the Google Cloud console.
- Basic proficiency with a common query language such as SQL.
- Experience with data modeling and ETL (extract, transform, load) activities.
- Experience developing applications using a common programming language such as Python.
Target Audience
This class is intended for experienced developers who are responsible for managing big data transformations including:
- Extracting, loading, transforming, cleaning, and validating data.
- Designing pipelines and architectures for data processing.
- Creating and maintaining machine learning and statistical models.
- Querying datasets, visualizing query results and creating reports

Module 1: Data engineering tasks and components
Topics
- The role of a data engineer
- Data sources versus data syncs
- Data formats
- Storage solution options on Google Cloud
- Metadata management options on Google Cloud
- Share datasets using Analytics Hub
Objectives
- Explain the role of a data engineer.
- Understand the differences between a data source and a data sink.
- Explain the different types of data formats.
- Explain the storage solution options on Google Cloud.
- Learn about the metadata management options on Google Cloud.
- Understand how to share datasets with ease using Analytics Hub.
- Understand how to load data into BigQuery using the Google Cloud console and/ or the gcloud CLI.
Module 2: Data replication and migration
Topics
- Replication and migration architecture
- The gcloud command line tool
- Moving datasets
- Datastream
Objectives
- Explain the baseline Google Cloud data replication and migration architecture.
- Understand the options and use cases for the gcloud command line tool.
- Explain the functionality and use cases for the Storage Transfer Service.
- Explain the functionality and use cases for the Transfer Appliance.
- Understand the features and deployment of Datastream.
Module 3: The extract and load data pipeline pattern
Topics
- Extract and load architecture
- The bq command line tool
- BigQuery Data Transfer Service
- BigLake
Objectives
- Explain the baseline extract and load architecture diagram.
- Understand the options of the bq command line tool.
- Explain the functionality and use cases for the BigQuery Data Transfer Service.
- Explain the functionality and use cases for BigLake as a non-extract-load pattern.
Module 4: The extract, load, and transform data pipeline pattern
Topics
- Extract, load, and transform (ELT) architecture
- SQL scripting and scheduling with BigQuery
- Dataform
Objectives
- Explain the baseline extract, load, and transform architecture diagram.
- Understand a common ELT pipeline on Google Cloud.
- Learn about BigQuery’s SQL scripting and scheduling capabilities.
- Explain the functionality and use cases for Dataform.
Module 5: The extract, transform, and load data pipeline pattern
Topics
- Extract, transform, and load (ETL) architecture
- Google Cloud GUI tools for ETL data pipelines
- Batch data processing using Dataproc
- Streaming data processing options
- Bigtable and data pipelines
Objectives
- Explain the baseline extract, transform, and load architecture diagram.
- Learn about the GUI tools on Google Cloud used for ETL data pipelines.
- Explain batch data processing using Dataproc.
- Learn to use Dataproc Serverless for Spark for ETL.
- Explain streaming data processing options.
- Explain the role Bigtable plays in data pipelines.
Module 6: Automation techniques
Topics
- Automation patterns and options for pipelines
- Cloud Scheduler and Workflows
- Cloud Composer
- Cloud Run functions
- Eventarc
Objectives
- Explain the automation patterns and options available for pipelines.
- Learn about Cloud Scheduler and workflows.
- Learn about Cloud Composer.
- Learn about Cloud Run functions.
- Explain the functionality and automation use cases for Eventarc.
Module 7: Introduction to data engineering
Topics
- Data engineer’s role
- Data engineering challenges
- Introduction to BigQuery
- Data lakes and data warehouses
- Transactional databases versus data warehouses
- Effective partnership with other data teams
- Management of data access and governance
- Building of production-ready pipelines
- Google Cloud customer case study
Objectives
- Discuss the challenges of data engineering, and how building data pipelines in the cloud helps to address these.
- Review and understand the purpose of a data lake versus a data warehouse, and when to use which.
Module 8: Build a Data Lake
Topics
- Introduction to data lakes
- Data storage and ETL options on Google Cloud
- Building of a data lake using Cloud Storage
- Secure Cloud Storage
- Store all sorts of data types
- Cloud SQL as your OLTP system
Objectives
- Discuss why Cloud Storage is a great option for building a data lake on Google Cloud.
- Explain how to use Cloud SQL for a relational data lake.
Module 9: Build a data warehouse
Topics
- The modern data warehouse
- Introduction to BigQuery
- Get started with BigQuery
- Loading of data into BigQuery
- Exploration of schemas
- Schema design
- Nested and repeated fields
- Optimization with partitioning and clustering
Objectives
- Discuss requirements of a modern warehouse
- Explain why BigQuery is the scalable data warehousing solution on Google Cloud.
- Discuss the core concepts of BigQuery and review options of loading data into BigQuery.
Module 10: Introduction to building batch data pipelines
Topics
- EL, ELT, ETL
- Quality considerations
- Ways of executing operations in BigQuery
- Shortcomings
- ETL to solve data quality issues
Objectives
- Review different methods of loading data into your data lakes and warehouses: EL, ELT, and ETL.
Module 11: Execute Spark on Dataproc
Topics
- The Hadoop ecosystem
- Run Hadoop on Dataproc
- Cloud Storage instead of HDFS
- Optimize Dataproc
Objectives
- Review the Hadoop ecosystem.
- Discuss how to lift and shift your existing Hadoop workloads to the cloud using Dataproc.
- Explain when you would use Cloud Storage instead of HDFS storage.
- Explain how to optimize Dataproc jobs.
Module 12: Serverless data processing with Dataflow
Topics
- Introduction to Dataflow
- Reasons why customers value Dataflow
- Dataflow pipelines
- Aggregating with GroupByKey and Combine
- Side inputs and windows
- Dataflow templates
Objectives
- Identify features customers value in Dataflow.
- Discuss core concepts in Dataflow.
- Review the use of Dataflow templates and SQL.
- Write a simple Dataflow pipeline and run it both locally and on the cloud.
- Identify Map and Reduce operations, execute the pipeline, and use command line parameters.
- Read data from BigQuery into Dataflow and use the output of a pipeline as a sideinput to another pipeline.
Module 13: Manage data pipelines with Cloud Data Fusion and Cloud Composer
Topics
- Build batch data pipelines visually with Cloud Data Fusion
- Components
- UI overview
- Building a pipeline
- Exploring data using Wrangler
- Orchestrate work between Google Cloud services with Cloud Composer
- Apache Airflow environment
- DAGs and operators
- Workflow scheduling
- Monitoring and logging
Objectives
- Discuss how to manage your data pipelines with Cloud Data Fusion and Cloud Composer.
- Summarize how Cloud Data Fusion allows data analysts and ETL developers to wrangle data and build pipelines in a visual way.
- Describe how Cloud Composer can help to orchestrate the work across multiple Google Cloud services.
Module 14: Introduction to processing streaming data
Topics
- Process streaming data
Objectives
- Explain streaming data processing.
- Identify the Google Cloud products and tools that can help address streaming data challenges
Module 15: Serverless messaging with Pub/Sub
Topics
- Introduction to Pub/Sub
- Pub/Sub push versus pull
- Publishing with Pub/Sub code
Objectives
- Describe the Pub/Sub service.
- Explain how Pub/Sub works.
- Simulate real-time streaming sensor data using Pub/Sub
Module 16: Dataflow streaming features
Topics
- Steaming data challenges
- Dataflow windowing
Objectives
- Describe the Dataflow service.
- Build a stream processing pipeline for live traffic data.
- Demonstrate how to handle late data using watermarks, triggers, and accumulation.
Module 17: High-throughput BigQuery and Bigtable streaming features
Topics
- Streaming into BigQuery and visualizing results
- High-throughput streaming with Bigtable
- Optimizing Bigtable performance
Objectives
- Describe how to perform ad-hoc analysis on streaming data using BigQuery and dashboards.
- Discuss Bigtable as a low-latency solution.
- Describe how to architect for Bigtable and how to ingest data into Bigtable.
- Highlight performance considerations for the relevant services.
Module 18: Advanced BigQuery functionality and performance
Topics
- Analytic window functions
- GIS functions
- Performance considerations
Objectives
- Review some of BigQuery’s advanced analysis capabilities.
- Discuss ways to improve query performance.
Dates & Locations
July 7, 2026 - July 10, 2026
July 7, 2026 - July 10, 2026
September 22, 2026 - September 25, 2026
September 22, 2026 - September 25, 2026
November 17, 2026 - November 20, 2026
November 17, 2026 - November 20, 2026

Exam & Certification
Google Cloud Professional Data Engineer Certification
A Google Professional Data Engineer enables data-driven decision making by collecting, transforming, and publishing data. A data engineer should be able to design, build, operationalize, secure, and monitor data processing systems with a particular emphasis on security and compliance; scalability and efficiency; reliability and fidelity; and flexibility and portability. A data engineer should also be able to leverage, deploy, and continuously train pre-existing machine learning models.
Training & Certification Guide
Frequently Asked Questions
Speak to a Training Consultant
All courses are HRD Claimable.
Get in touch with our team via the form or WhatsApp us on +6011-5119 6631























