Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
History
Music
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/10/e7/14/10e7143c-1dfa-c4a1-51f7-f53fd73536e0/mza_5021740068495285085.jpg/600x600bb.jpg
Humans of Reliability
Rootly
23 episodes
2 days ago
Shipping systems powered by LLMs would be hard enough if the models stayed the same. But in reality, they don’t. Models get updated and deprecated at a pace traditional software wouldn’t. All while teams are still expected to hit reliability targets that look a lot like traditional SLAs. In this episode, Tomás Hernando Koffman, Co-founder of Not Diamond, breaks down what it really takes to reach 99%+ accuracy when the underlying model is a moving target. He explains why non-determinism, model...
Show more...
Technology
RSS
All content for Humans of Reliability is the property of Rootly and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Shipping systems powered by LLMs would be hard enough if the models stayed the same. But in reality, they don’t. Models get updated and deprecated at a pace traditional software wouldn’t. All while teams are still expected to hit reliability targets that look a lot like traditional SLAs. In this episode, Tomás Hernando Koffman, Co-founder of Not Diamond, breaks down what it really takes to reach 99%+ accuracy when the underlying model is a moving target. He explains why non-determinism, model...
Show more...
Technology
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/10/e7/14/10e7143c-1dfa-c4a1-51f7-f53fd73536e0/mza_5021740068495285085.jpg/600x600bb.jpg
The Reliability Diagnosis: Google’s Steve McGhee on Debugging and Incident Response
Humans of Reliability
15 minutes
11 months ago
The Reliability Diagnosis: Google’s Steve McGhee on Debugging and Incident Response
In this episode of Humans of Reliability, we sit down with Steve McGhee, Reliability Advocate at Google, to discuss his journey from early SRE work to advocating for reliability best practices. Steve shares fascinating stories from his time at Google, the challenges of implementing SRE in enterprises, and what people often misunderstand about the discipline. He also offers valuable insights on incident response, distributed systems, and the underrated skill every reliability engin...
Humans of Reliability
Shipping systems powered by LLMs would be hard enough if the models stayed the same. But in reality, they don’t. Models get updated and deprecated at a pace traditional software wouldn’t. All while teams are still expected to hit reliability targets that look a lot like traditional SLAs. In this episode, Tomás Hernando Koffman, Co-founder of Not Diamond, breaks down what it really takes to reach 99%+ accuracy when the underlying model is a moving target. He explains why non-determinism, model...