In this episode, we talk with Abdel Sghiouar and Mofi Rahman, Developer Advocates at Google and (guest) hosts of the Kubernetes Podcast from Google. Together, we dive into one central question: can you truly run LLMs reliably and at scale on Kubernetes? It quickly becomes clear that LLM workloads behave nothing like traditional web applications: GPUs are scarce, expensive, and difficult to schedule.Models are massive — some reaching 700GB — making load times, storage throughput, and caching ...
All content for De Nederlandse Kubernetes Podcast is the property of Ronald Kers en Jan Stomphorst and is served directly from their servers
with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
In this episode, we talk with Abdel Sghiouar and Mofi Rahman, Developer Advocates at Google and (guest) hosts of the Kubernetes Podcast from Google. Together, we dive into one central question: can you truly run LLMs reliably and at scale on Kubernetes? It quickly becomes clear that LLM workloads behave nothing like traditional web applications: GPUs are scarce, expensive, and difficult to schedule.Models are massive — some reaching 700GB — making load times, storage throughput, and caching ...
#106 De Game, de Hack en de Infrastructuur die zichzelf uitrolt
De Nederlandse Kubernetes Podcast
36 minutes
3 months ago
#106 De Game, de Hack en de Infrastructuur die zichzelf uitrolt
Stuur ons een bericht. ACC ICT Specialist in IT-CONTINUÏTEIT Bedrijfskritische applicaties én data veilig beschikbaar, onafhankelijk van derden, altijd en overal Support the show Like and subscribe! It helps out a lot. You can also find us on: De Nederlandse Kubernetes Podcast - YouTube Nederlandse Kubernetes Podcast (@k8spodcast.nl) | TikTok De Nederlandse Kubernetes Podcast Where can you meet us: Events This Podcast is powered by: ACC ICT - IT-Continuïteit voor Bedrijfskritische Applicat...
De Nederlandse Kubernetes Podcast
In this episode, we talk with Abdel Sghiouar and Mofi Rahman, Developer Advocates at Google and (guest) hosts of the Kubernetes Podcast from Google. Together, we dive into one central question: can you truly run LLMs reliably and at scale on Kubernetes? It quickly becomes clear that LLM workloads behave nothing like traditional web applications: GPUs are scarce, expensive, and difficult to schedule.Models are massive — some reaching 700GB — making load times, storage throughput, and caching ...