In this episode, we talk with Abdel Sghiouar and Mofi Rahman, Developer Advocates at Google and (guest) hosts of the Kubernetes Podcast from Google. Together, we dive into one central question: can you truly run LLMs reliably and at scale on Kubernetes? It quickly becomes clear that LLM workloads behave nothing like traditional web applications: GPUs are scarce, expensive, and difficult to schedule.Models are massive — some reaching 700GB — making load times, storage throughput, and caching ...
All content for De Nederlandse Kubernetes Podcast is the property of Ronald Kers en Jan Stomphorst and is served directly from their servers
with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
In this episode, we talk with Abdel Sghiouar and Mofi Rahman, Developer Advocates at Google and (guest) hosts of the Kubernetes Podcast from Google. Together, we dive into one central question: can you truly run LLMs reliably and at scale on Kubernetes? It quickly becomes clear that LLM workloads behave nothing like traditional web applications: GPUs are scarce, expensive, and difficult to schedule.Models are massive — some reaching 700GB — making load times, storage throughput, and caching ...
#110 Cluster API: Kubernetes-clusters bouwen met Kubernetes zelf
De Nederlandse Kubernetes Podcast
28 minutes
2 months ago
#110 Cluster API: Kubernetes-clusters bouwen met Kubernetes zelf
In deze aflevering spreken we opnieuw met Tim Stoop, Senior Solutions Architect bij ACC ICT BV Tim legt uit hoe je Cluster API technologie het opzetten en beheren van Kubernetes-clusters sterk vereenvoudigt door alles te abstraheren en te automatiseren. Vanuit een zogeheten management cluster kun je eenvoudig workload clusters uitrollen, ongeacht of dat nu op hardware, VMware, of een cloudprovider is. We bespreken de verschillen met tools als kubeadm en Terraform, hoe auto-scaling e...
De Nederlandse Kubernetes Podcast
In this episode, we talk with Abdel Sghiouar and Mofi Rahman, Developer Advocates at Google and (guest) hosts of the Kubernetes Podcast from Google. Together, we dive into one central question: can you truly run LLMs reliably and at scale on Kubernetes? It quickly becomes clear that LLM workloads behave nothing like traditional web applications: GPUs are scarce, expensive, and difficult to schedule.Models are massive — some reaching 700GB — making load times, storage throughput, and caching ...