Site Reliability Engineering: Measuring and Managing Reliability
- 4.5
Course Summary
This course teaches the principles of Site Reliability Engineering (SRE) and how to use Service Level Objectives (SLOs) to manage and improve service reliability.Key Learning Points
- Learn the fundamentals of Site Reliability Engineering (SRE)
- Understand how to use Service Level Objectives (SLOs) to measure and improve service reliability
- Gain practical experience with real-world case studies
Related Topics for further study
Learning Outcomes
- Understand the principles and best practices of Site Reliability Engineering
- Learn how to use Service Level Objectives (SLOs) to improve service reliability
- Gain practical experience through real-world case studies
Prerequisites or good to have knowledge before taking this course
- Basic knowledge of software engineering principles
- Familiarity with Linux command line
Course Difficulty Level
IntermediateCourse Format
- Online
- Self-paced
Similar Courses
- Google Cloud Platform Fundamentals: Core Infrastructure
- Introduction to DevOps: Transforming and Improving Operations
Related Education Paths
- Google Cloud Platform Certification
- AWS Certified DevOps Engineer
- Microsoft Certified: Azure DevOps Engineer Expert
Notable People in This Field
- Ben Treynor Sloss
- Niall Murphy
Related Books
Description
This course teaches the theory of Service Level Objectives (SLOs), a principled way of describing and measuring the desired reliability of a service. Upon completion, learners should be able to apply these principles to develop the first SLOs for services they are familiar with in their own organizations.
Knowledge
- How to make systems reliable
- Understanding SLIs, SLOs and SLAs
- Quantifying risks to and consequences of SLOs
Outline
- Introduction to SRE
- Course structure
- Introduction
- Intro
- CRE's Three Reliability Principles
- Reliability in the Cloud
- How SLOs help your business make decisions
- How SLOs help you build features faster
- How SLOs help you balance operational and project work
- Making SLOs work for your organization
- DevOps/SRE
- Targeting Reliability
- Introduction
- SLOs vs SLAs
- The happiness test
- How do we measure reliability?
- Edge cases
- 100% is the wrong target
- Iterating
- A working service
- SLOs and SLAs
- Reliability and iterating
- Targeting Reliability Assessment
- Operating for Reliability
- Introduction
- Error budgets
- Everything is a trade-off
- Error budgets: advanced concepts
- Axes of improvement
- Operational approach to increasing reliability
- Module summary
- Error budgets
- Increasing reliability
- Operating for Reliability Assessment
- Choosing a Good SLI
- Introduction
- User happiness in metric form
- The properties of good SLI metrics
- Ways of measuring SLIs
- The SLI menu
- The SLI equation
- Request / Response SLIs
- Data processing SLIs
- "But my system is really complex!"
- Managing complexity with aggregation
- Managing complexity with bucketing
- Achieveable SLOs
- Aspirational SLOs
- Continuous improvement
- Measuring happiness
- Commonly used SLIs
- Correctness and Coverage
- Developing SLOs and SLIs
- Introduction
- The 4 step process
- Our example game
- Loading the profile page
- Refining SLI specifications
- Looking for observability gaps
- Failure modes
- Postmortem!
- Setting Achievable SLO targets
- Quantifying Risks to SLOs
- Introduction
- Is your error budget realistic?
- Modeling risks in our spreadsheet
- Analyzing risk
- Consequences of SLO Misses
- Introduction
- No surprises
- A dashboard example
- Why an error budget policy?
- Fundamentals of an error budget policy
- How to draft an error budget policy
- Example policy thresholds
- A hypothetical policy scenario
- Course conclusion and video wrap up
- Error budget policies
- Error budget policy -- considerations
- Consequences of SLO Misses
Summary of User Reviews
Learn site reliability engineering and service level objectives with Coursera. Students highly recommend this course, praising its real-world relevance and practical application. However, some users note that the course may be too basic for experienced engineers.Key Aspect Users Liked About This Course
Real-world relevance and practical applicationPros from User Reviews
- Course content is relevant and applicable to real-world scenarios
- Instructors are knowledgeable and provide clear explanations
- Hands-on labs and exercises reinforce learning
Cons from User Reviews
- May be too basic for experienced engineers
- Some technical difficulties with the Coursera platform
- Limited interaction with instructors and other students