In this episode, we talk with Abdel Sghiouar and Mofi Rahman, Developer Advocates at Google and (guest) hosts of the Kubernetes Podcast from Google. Together, we dive into one central question: can you truly run LLMs reliably and at scale on Kubernetes? It quickly becomes clear that LLM workloads behave nothing like traditional web applications: GPUs are scarce, expensive, and difficult to schedule.Models are massive — some reaching 700GB — making load times, storage throughput, and caching ...
Show more...