Although service-level objectives (SLOs) continue to grow in importance, there's a distinct lack of information about how to implement them. Practical advice that does exist usually assumes that your team already has the infrastructure, tooling, and culture in place. In this book, recognized SLO expert Alex Hidalgo explains how to build an SLO culture from the ground up.
Ideal as a primer and daily reference for anyone creating both the culture and tooling necessary for SLO-based approaches to reliability, this guide provides detailed analysis of advanced SLO and service-level indicator (SLI) techniques. Armed with mathematical models and statistical knowledge to help you get the most out of an SLO-based approach, you'll learn how to build systems capable of measuring meaningful SLIs with buy-in across all departments of your organization.
- Define SLIs that meaningfully measure the reliability of a service from a user's perspective
- Choose appropriate SLO targets, including how to perform statistical and probabilistic analysis
- Use error budgets to help your team have better discussions and make better data-driven decisions
- Build supportive tooling and resources required for an SLO-based approach
- Use SLO data to present meaningful reports to leadership and your users
Alex Hidalgo is a Site Reliability Engineer and expert at all things related to Service Level Objectives. He developed an interest in computers at a young age, started writing his first BASIC programs at around the age of nine, and remembers the Internet when it was all still text. He eventually turned his hobby into a career, working in various capacities as a network engineer, security engineer, and systems administrator and in many roles within the world of IT support. After moving to New York, he joined Admeld as a Technical Operations Engineer, only to find himself employed by Google a few months later due to acquisition.At Google, Alex was first introduced to the discipline of Site Reliability Engineering, which connected so closely with him that he wonders how he ever did anything else. Eventually, he found his other calling as an educator, writer, and speaker, traveling all over the world training other Site Reliability Engineers, becoming one of the primary developers of the Coursera Google IT Professional Certification, and contributing to multiple chapters of The Site Reliability Workbook -- most notably "Implementing SLOs" and "SLO Engineering Case Studies."Recently, he has joined Squarespace, where his focus is now on spreading the concepts of SLO-based approaches to service reliability -- both internally and across the entire industry. When not sharing his passion for error budgets with others, you can find him scuba diving or watching college basketball. He lives in Park Slope, Brooklyn, with his partner Jen and a rescue dog named Taco. He thinks about SLOs so much he once had a dream about defining some for Taco. Twitter handle: @ahidalgosre