GCPDE: Data Engineering on Google Cloud

Design and operate scalable data pipelines on Google Cloud.

Modern data platforms must collect, process, store, and analyze large volumes of data while maintaining reliability, security, and operational efficiency.

This course provides the technical knowledge and hands-on experience required to build data pipelines, manage analytical workloads, and implement data engineering solutions using Google Cloud services.

Why get trained: Learn how to build batch and streaming data pipelines, manage data lakes and data warehouses, implement ETL and ELT workflows, process real-time data with Pub/Sub and Dataflow, orchestrate workloads with Cloud Composer, and analyze data using BigQuery.
Why it matters: Data Engineers play a critical role in delivering trusted, accessible, and well-managed data for analytics, artificial intelligence, and business applications. Understanding how Google Cloud data services work together helps organizations process information efficiently, improve data quality, and support reliable analytical workloads.
Who should attend: Data Engineers, Data Architects, Cloud Engineers, Data Platform Engineers, Database Administrators, Analytics Engineers, ETL Developers, and IT professionals responsible for designing, implementing, or maintaining Google Cloud data platforms.

The Google Professional Data Engineer certification validates data engineering skills that support reliable data pipelines, scalable analytics platforms, and production-ready data solutions on Google Cloud. HRD Corp Claimable.

Overview

Data Engineers design solutions that ensure maximum flexibility and scalability, while meeting all required security controls.

Get hands-on experience with designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hand-on labs to show you how to design data processing systems, build end-to-end data pipelines, analyze data, and implement machine learning.

This Google Cloud course covers structured, unstructured, and streaming data.

Gain industry-recognized Google Cloud certifications and future-proof your career in 2026.

Skills Covered

Design and build data processing systems on Google Cloud.
Process batch and streaming data by implementing autoscaling data pipelines on Dataflow.
Derive business insights from extremely large datasets using BigQuery.
Leverage unstructured data using Spark and ML APIs on Dataproc.
Enable instant insights from streaming data.

Prerequisites

To get the most of out of this course, participants should have:

Prior Google Cloud experience using Cloud Shell and accessing products from the Google Cloud console.
Basic proficiency with a common query language such as SQL.
Experience with data modeling and ETL (extract, transform, load) activities.
Experience developing applications using a common programming language such as Python.

Target Audience

This class is intended for experienced developers who are responsible for managing big data transformations including:

Extracting, loading, transforming, cleaning, and validating data.
Designing pipelines and architectures for data processing.
Creating and maintaining machine learning and statistical models.
Querying datasets, visualizing query results and creating reports

Course Curriculum

Download PDF

Module 1: Data engineering tasks and components
Topics

The role of a data engineer
Data sources versus data syncs
Data formats
Storage solution options on Google Cloud
Metadata management options on Google Cloud
Share datasets using Analytics Hub

Objectives

Explain the role of a data engineer.
Understand the differences between a data source and a data sink.
Explain the different types of data formats.
Explain the storage solution options on Google Cloud.
Learn about the metadata management options on Google Cloud.
Understand how to share datasets with ease using Analytics Hub.
Understand how to load data into BigQuery using the Google Cloud console and/ or the gcloud CLI.

Module 2: Data replication and migration
Topics

Replication and migration architecture
The gcloud command line tool
Moving datasets
Datastream

Objectives

Explain the baseline Google Cloud data replication and migration architecture.
Understand the options and use cases for the gcloud command line tool.
Explain the functionality and use cases for the Storage Transfer Service.
Explain the functionality and use cases for the Transfer Appliance.
Understand the features and deployment of Datastream.

Module 3: The extract and load data pipeline pattern
Topics

Extract and load architecture
The bq command line tool
BigQuery Data Transfer Service
BigLake

Objectives

Explain the baseline extract and load architecture diagram.
Understand the options of the bq command line tool.
Explain the functionality and use cases for the BigQuery Data Transfer Service.
Explain the functionality and use cases for BigLake as a non-extract-load pattern.

Module 4: The extract, load, and transform data pipeline pattern
Topics

Extract, load, and transform (ELT) architecture
SQL scripting and scheduling with BigQuery
Dataform

Objectives

Explain the baseline extract, load, and transform architecture diagram.
Understand a common ELT pipeline on Google Cloud.
Learn about BigQuery’s SQL scripting and scheduling capabilities.
Explain the functionality and use cases for Dataform.

Module 5: The extract, transform, and load data pipeline pattern
Topics

Extract, transform, and load (ETL) architecture
Google Cloud GUI tools for ETL data pipelines
Batch data processing using Dataproc
Streaming data processing options
Bigtable and data pipelines

Objectives

Explain the baseline extract, transform, and load architecture diagram.
Learn about the GUI tools on Google Cloud used for ETL data pipelines.
Explain batch data processing using Dataproc.
Learn to use Dataproc Serverless for Spark for ETL.
Explain streaming data processing options.
Explain the role Bigtable plays in data pipelines.

Module 6: Automation techniques
Topics

Automation patterns and options for pipelines
Cloud Scheduler and Workflows
Cloud Composer
Cloud Run functions
Eventarc

Objectives

Explain the automation patterns and options available for pipelines.
Learn about Cloud Scheduler and workflows.
Learn about Cloud Composer.
Learn about Cloud Run functions.
Explain the functionality and automation use cases for Eventarc.

Module 7: Introduction to data engineering
Topics

Data engineer’s role
Data engineering challenges
Introduction to BigQuery
Data lakes and data warehouses
Transactional databases versus data warehouses
Effective partnership with other data teams
Management of data access and governance
Building of production-ready pipelines
Google Cloud customer case study

Objectives

Discuss the challenges of data engineering, and how building data pipelines in the cloud helps to address these.
Review and understand the purpose of a data lake versus a data warehouse, and when to use which.

Module 8: Build a Data Lake
Topics

Introduction to data lakes
Data storage and ETL options on Google Cloud
Building of a data lake using Cloud Storage
Secure Cloud Storage
Store all sorts of data types
Cloud SQL as your OLTP system

Objectives

Discuss why Cloud Storage is a great option for building a data lake on Google Cloud.
Explain how to use Cloud SQL for a relational data lake.

Module 9: Build a data warehouse
Topics

The modern data warehouse
Introduction to BigQuery
Get started with BigQuery
Loading of data into BigQuery
Exploration of schemas
Schema design
Nested and repeated fields
Optimization with partitioning and clustering

Objectives

Discuss requirements of a modern warehouse
Explain why BigQuery is the scalable data warehousing solution on Google Cloud.
Discuss the core concepts of BigQuery and review options of loading data into BigQuery.

Module 10: Introduction to building batch data pipelines
Topics

EL, ELT, ETL
Quality considerations
Ways of executing operations in BigQuery
Shortcomings
ETL to solve data quality issues

Objectives

Review different methods of loading data into your data lakes and warehouses: EL, ELT, and ETL.

Module 11: Execute Spark on Dataproc
Topics

The Hadoop ecosystem
Run Hadoop on Dataproc
Cloud Storage instead of HDFS
Optimize Dataproc

Objectives

Review the Hadoop ecosystem.
Discuss how to lift and shift your existing Hadoop workloads to the cloud using Dataproc.
Explain when you would use Cloud Storage instead of HDFS storage.
Explain how to optimize Dataproc jobs.

Module 12: Serverless data processing with Dataflow
Topics

Introduction to Dataflow
Reasons why customers value Dataflow
Dataflow pipelines
Aggregating with GroupByKey and Combine
Side inputs and windows
Dataflow templates

Objectives

Identify features customers value in Dataflow.
Discuss core concepts in Dataflow.
Review the use of Dataflow templates and SQL.
Write a simple Dataflow pipeline and run it both locally and on the cloud.
Identify Map and Reduce operations, execute the pipeline, and use command line parameters.
Read data from BigQuery into Dataflow and use the output of a pipeline as a sideinput to another pipeline.

Module 13: Manage data pipelines with Cloud Data Fusion and Cloud Composer
Topics

Build batch data pipelines visually with Cloud Data Fusion
- Components
- UI overview
- Building a pipeline
- Exploring data using Wrangler
Orchestrate work between Google Cloud services with Cloud Composer
- Apache Airflow environment
- DAGs and operators
- Workflow scheduling
- Monitoring and logging

Objectives

Discuss how to manage your data pipelines with Cloud Data Fusion and Cloud Composer.
Summarize how Cloud Data Fusion allows data analysts and ETL developers to wrangle data and build pipelines in a visual way.
Describe how Cloud Composer can help to orchestrate the work across multiple Google Cloud services.

Module 14: Introduction to processing streaming data
Topics

Process streaming data

Objectives

Explain streaming data processing.
Identify the Google Cloud products and tools that can help address streaming data challenges

Module 15: Serverless messaging with Pub/Sub
Topics

Introduction to Pub/Sub
Pub/Sub push versus pull
Publishing with Pub/Sub code

Objectives

Describe the Pub/Sub service.
Explain how Pub/Sub works.
Simulate real-time streaming sensor data using Pub/Sub

Module 16: Dataflow streaming features
Topics

Steaming data challenges
Dataflow windowing

Objectives

Describe the Dataflow service.
Build a stream processing pipeline for live traffic data.
Demonstrate how to handle late data using watermarks, triggers, and accumulation.

Module 17: High-throughput BigQuery and Bigtable streaming features
Topics

Streaming into BigQuery and visualizing results
High-throughput streaming with Bigtable
Optimizing Bigtable performance

Objectives

Describe how to perform ad-hoc analysis on streaming data using BigQuery and dashboards.
Discuss Bigtable as a low-latency solution.
Describe how to architect for Bigtable and how to ingest data into Bigtable.
Highlight performance considerations for the relevant services.

Module 18: Advanced BigQuery functionality and performance
Topics

Analytic window functions
GIS functions
Performance considerations

Objectives

Review some of BigQuery’s advanced analysis capabilities.
Discuss ways to improve query performance.

Show full curriculum

Dates & Locations

Let’s make it work for you

Can’t find a date that fits? Need to train your whole team? Looking for a discount?
Speak to one of our learning experts today.

Talk To Us

September 22, 2026 - September 25, 2026

Location: Kuala Lumpur

Modal: ILT

Availability: TBC

September 22, 2026 - September 25, 2026

Location: Online

Modal: VILT

Availability: TBC

November 17, 2026 - November 20, 2026

Location: Kuala Lumpur

Modal: ILT

Availability: TBC

November 17, 2026 - November 20, 2026

Location: Online

Modal: VILT

Availability: TBC

Exam & Certification

Google Cloud Professional Data Engineer Certification

A Google Professional Data Engineer enables data-driven decision making by collecting, transforming, and publishing data. A data engineer should be able to design, build, operationalize, secure, and monitor data processing systems with a particular emphasis on security and compliance; scalability and efficiency; reliability and fidelity; and flexibility and portability. A data engineer should also be able to leverage, deploy, and continuously train pre-existing machine learning models.

Training & Certification Guide

Google Cloud Professional Data Engineer

A Professional Data Engineer enables data-driven decision making by collecting, transforming, and publishing data. A Data Engineer should be able to design, build, operationalize, secure, and monitor data processing systems with a particular emphasis on security and compliance; scalability and efficiency; reliability and fidelity; and flexibility and portability. A Data Engineer should also be able to leverage, deploy, and continuously train pre-existing machine learning models.

Skills Measured

The Professional Data Engineer exam assesses your ability to:

Design data processing systems
Build and operationalize data processing systems
Operationalize machine learning models
Ensure solution quality

About this certification exam

Length: 2 hours
Registration fee: $200 (plus tax where applicable)
Languages: English, Japanese.
Recommended experience: 3+ years of industry experience including 1+ years designing and managing solutions using GCP.

Google Cloud Certification

Google Cloud’s role-based certifications measure an individual’s proficiency at performing a specific job role using Google Cloud technology. The knowledge, skills, and abilities for each job role are assessed using rigorously developed industry-standard methods.

Google Cloud certifications empower individuals to advance their careers and give organizations the confidence to build highly skilled, effective teams.

Recommended Google Data Engineering Learning Pathways

Frequently Asked Questions

Why should I get Google Cloud certified?

Google Cloud certifications help you advance your professional skills and demonstrate your value to hiring managers. Also once you become Google Cloud certified, you unlock the following benefits:

Distinguish yourself with a digital badge by sharing it on your social profile or resume.
Showcase your achievement on a publicly-accessible Google Cloud Certified Directory.
Get exclusive Google Cloud Certified swag for Professional certifications.
Network and exchange ideas with others in the Google Cloud Certified community.
Get access to global cloud virtual and in-person events hosted by Google Cloud.

Difference between Skill Badges and Certifications

A skill badge measures one’s knowledge of a specific product or service and tests their ability to apply that knowledge in an interactive hands on environment.

A certification measures an individual’s proficiency at performing a specific job role using Google Cloud technology. A certification exam tests one knowledge of a wide range of products and services needed to perform a job role versus one product/service. In order to prepare for a Google Cloud certification, it is recommended that an individual has multiple years of experience in the role, in addition to completing the recommended online training and skill badges.

How long is a Google Cloud certification valid?

Unless explicitly stated in the detailed exam descriptions, all Google Cloud certifications are valid for two years from the date certified. Candidates must recertify in order to maintain their certification status.

Will I be notified when I need to recertify?

Yes. You will receive renewal notifications 90, 60, and 30 days prior to your expiration date. Reminder emails will be sent to the email address used during exam registration.

Speak to a Training Consultant

All courses are HRD Claimable.
Get in touch with our team via the form or WhatsApp us on +6011-5119 6631

Overview

Skills Covered

Prerequisites

Target Audience

Course Curriculum

Dates & Locations

Let’s make it work for you

September 22, 2026 - September 25, 2026

September 22, 2026 - September 25, 2026

November 17, 2026 - November 20, 2026

November 17, 2026 - November 20, 2026

Exam & Certification

Training & Certification Guide

Frequently Asked Questions

Speak to a Training Consultant

Explore Our Courses

Explore Tech Partners

Customer Service

Company

Trainocate: A Global Leader in Technology, Business, and People Development

Download Course Syllabus

Explore Tech Partners

Courses

Search for a course

Popular Courses

Popular Tech Articles