#119 Your Web App Scaling Tricks Don’t Work for LLMs

EXPLORE

Society & Culture

© 2024 PodJoint

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/b4/61/ed/b461ed35-bc8f-43ba-d3b7-a1ac63240c5b/mza_16911523502878814154.jpg/600x600bb.jpg

De Nederlandse Kubernetes Podcast

Ronald Kers en Jan Stomphorst

118 episodes

2 days ago

In this episode, we talk with Abdel Sghiouar and Mofi Rahman, Developer Advocates at Google and (guest) hosts of the Kubernetes Podcast from Google. Together, we dive into one central question: can you truly run LLMs reliably and at scale on Kubernetes? It quickly becomes clear that LLM workloads behave nothing like traditional web applications: GPUs are scarce, expensive, and difficult to schedule.Models are massive — some reaching 700GB — making load times, storage throughput, and caching ...

Show more...

All content for De Nederlandse Kubernetes Podcast is the property of Ronald Kers en Jan Stomphorst and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

In this episode, we talk with Abdel Sghiouar and Mofi Rahman, Developer Advocates at Google and (guest) hosts of the Kubernetes Podcast from Google. Together, we dive into one central question: can you truly run LLMs reliably and at scale on Kubernetes? It quickly becomes clear that LLM workloads behave nothing like traditional web applications: GPUs are scarce, expensive, and difficult to schedule.Models are massive — some reaching 700GB — making load times, storage throughput, and caching ...

Show more...

https://storage.buzzsprout.com/emkozuzl0hayltnufmi3a5t0vun5?.jpg

#119 Your Web App Scaling Tricks Don’t Work for LLMs

De Nederlandse Kubernetes Podcast

36 minutes

2 days ago

#119 Your Web App Scaling Tricks Don’t Work for LLMs

In this episode, we talk with Abdel Sghiouar and Mofi Rahman, Developer Advocates at Google and (guest) hosts of the Kubernetes Podcast from Google. Together, we dive into one central question: can you truly run LLMs reliably and at scale on Kubernetes? It quickly becomes clear that LLM workloads behave nothing like traditional web applications: GPUs are scarce, expensive, and difficult to schedule.Models are massive — some reaching 700GB — making load times, storage throughput, and caching ...