Deep dive into HolmesGPT, the CNCF Sandbox AI agent that revolutionizes cloud-native troubleshooting. This episode covers what it is, its 40+ integrations, the project roadmap, and how to set it up today.
News Segment:
AirFrance-KLM's secure automation platform with Terraform, Vault, and Ansible
AWS ECS tmpfs mounts on Fargate for secure secrets handling
Qwen 30B running on Raspberry Pi - democratizing edge AI
AWS European Sovereign Cloud with independent EU governance
Main Topic - HolmesGPT:
CNCF Sandbox project (accepted October 2025) with 1,600+ GitHub stars
Agentic architecture: creates investigation task lists, queries systems, synthesizes findings
40+ built-in toolsets: Prometheus, Grafana Loki/Tempo, Kubernetes, ArgoCD, DataDog, and more
Privacy-first: bring your own LLM keys, read-only access, respects RBAC
End-to-end automation with AlertManager, PagerDuty, OpsGenie integration
Installation options: pip, Homebrew, Helm, Web UI, K9s plugin
Resources:
HolmesGPT GitHub
HolmesGPT Documentation
Full Transcript
Episode Type: full Episode Number: 83 Season: 1 Tags: HolmesGPT, CNCF, Kubernetes, root cause analysis, AI ops, troubleshooting, observability, SRE, platform engineering, Robusta, agentic AI
Show more...