Course Summary
Learn how to build scalable and efficient batch data pipelines using Google Cloud Platform with this comprehensive course. With hands-on labs and real-world examples, you'll gain the skills needed to design and deploy data processing systems on GCP.Key Learning Points
- Understand the basics of batch data processing and how to use GCP tools to build pipelines.
- Learn how to design and implement data processing systems for different use cases.
- Explore advanced concepts such as fault tolerance, scalability, and monitoring.
Related Topics for further study
Learning Outcomes
- Design and deploy batch data processing systems on GCP.
- Implement fault tolerant and scalable data pipelines.
- Monitor and troubleshoot batch data processing systems.
Prerequisites or good to have knowledge before taking this course
- Familiarity with programming concepts and SQL.
- Basic knowledge of cloud computing.
Course Difficulty Level
IntermediateCourse Format
- Online self-paced course
- Hands-on labs and real-world examples
Similar Courses
- Data Engineering on Google Cloud Platform
- Building Batch Data Pipelines on AWS
- Apache Beam on Google Cloud Dataflow
Related Education Paths
Related Books
Description
Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud Platform for data transformation including BigQuery, executing Spark on Cloud Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Cloud Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud Platform using Qwiklabs.
Outline
- Introduction
- Course Introduction
- Getting Started with Google Cloud and Qwiklabs
- Introduction to Batch Data Pipelines
- EL, ELT, ETL
- Quality considerations
- How to carry out operations in BigQuery
- Shortcomings
- ETL to solve data quality issues
- EL, ELT, ETL
- Executing Spark on Cloud Dataproc
- The Hadoop ecosystem
- Running Hadoop on Cloud Dataproc
- GCS instead of HDFS
- Optimizing Dataproc
- Optimizing Dataproc Storage
- Optimizing Dataproc Templates and Autoscaling
- Optimizing Dataproc Monitoring
- Lab Intro: Running Apache Spark jobs on Cloud Dataproc
- Summary
- Executing Spark on Cloud Dataproc
- Manage Data Pipelines with Cloud Data Fusion and Cloud Composer
- Introduction
- Components of Data Fusion
- Building a Pipeline
- Exploring Data using Wrangler
- Lab: Building and executing a pipeline graph in Cloud Data Fusion
- Orchestrating work between GCP services with Cloud Composer
- Apache Airflow Environment
- DAGs and Operators
- Workflow scheduling
- Monitoring and Logging
- Lab: An Introduction to Cloud Composer
- Cloud Data Fusion and Cloud Composer
- Serverless Data Processing with Cloud Dataflow
- Cloud Dataflow
- Why customers value Dataflow
- Building Cloud Dataflow Pipelines in code
- Key considerations with designing pipelines
- Transforming data with PTransforms
- Lab: Building a Simple Dataflow Pipeline
- Aggregating with GroupByKey and Combine
- Lab: MapReduce in Cloud Dataflow
- Side Inputs and Windows of data
- Lab: Practicing Pipeline Side Inputs
- Creating and re-using Pipeline Templates
- Cloud Dataflow SQL pipelines
- Data Processing with Cloud Dataflow
- Summary
- Course Summary
Summary of User Reviews
Discover how to build and operate effective batch data pipelines on Google Cloud Platform with the Batch Data Pipelines GCP course on Coursera. Users found this course to be comprehensive and well-structured, with clear explanations and practical examples.Key Aspect Users Liked About This Course
comprehensive and well-structured course with clear explanations and practical examplesPros from User Reviews
- Hands-on practice with GCP tools
- Great instructor with a deep understanding of the topic
- Easy to follow and understand
- Real-world examples provided
- Exercises and quizzes reinforce understanding
Cons from User Reviews
- Some lectures are too basic
- Not enough emphasis on best practices and optimization
- Course may not be suitable for advanced users
- Lack of depth on some topics
- Could benefit from more hands-on projects