Shipping systems powered by LLMs would be hard enough if the models stayed the same. But in reality, they don’t. Models get updated and deprecated at a pace traditional software wouldn’t. All while teams are still expected to hit reliability targets that look a lot like traditional SLAs. In this episode, Tomás Hernando Koffman, Co-founder of Not Diamond, breaks down what it really takes to reach 99%+ accuracy when the underlying model is a moving target. He explains why non-determinism, model...
All content for Humans of Reliability is the property of Rootly and is served directly from their servers
with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Shipping systems powered by LLMs would be hard enough if the models stayed the same. But in reality, they don’t. Models get updated and deprecated at a pace traditional software wouldn’t. All while teams are still expected to hit reliability targets that look a lot like traditional SLAs. In this episode, Tomás Hernando Koffman, Co-founder of Not Diamond, breaks down what it really takes to reach 99%+ accuracy when the underlying model is a moving target. He explains why non-determinism, model...
The Reliability Diagnosis: Google’s Steve McGhee on Debugging and Incident Response
Humans of Reliability
15 minutes
11 months ago
The Reliability Diagnosis: Google’s Steve McGhee on Debugging and Incident Response
In this episode of Humans of Reliability, we sit down with Steve McGhee, Reliability Advocate at Google, to discuss his journey from early SRE work to advocating for reliability best practices. Steve shares fascinating stories from his time at Google, the challenges of implementing SRE in enterprises, and what people often misunderstand about the discipline. He also offers valuable insights on incident response, distributed systems, and the underrated skill every reliability engin...
Humans of Reliability
Shipping systems powered by LLMs would be hard enough if the models stayed the same. But in reality, they don’t. Models get updated and deprecated at a pace traditional software wouldn’t. All while teams are still expected to hit reliability targets that look a lot like traditional SLAs. In this episode, Tomás Hernando Koffman, Co-founder of Not Diamond, breaks down what it really takes to reach 99%+ accuracy when the underlying model is a moving target. He explains why non-determinism, model...