In this episode, we talk with Abdel Sghiouar and Mofi Rahman, Developer Advocates at Google and (guest) hosts of the Kubernetes Podcast from Google. Together, we dive into one central question: can you truly run LLMs reliably and at scale on Kubernetes? It quickly becomes clear that LLM workloads behave nothing like traditional web applications: GPUs are scarce, expensive, and difficult to schedule.Models are massive — some reaching 700GB — making load times, storage throughput, and caching ...
All content for De Nederlandse Kubernetes Podcast is the property of Ronald Kers en Jan Stomphorst and is served directly from their servers
with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
In this episode, we talk with Abdel Sghiouar and Mofi Rahman, Developer Advocates at Google and (guest) hosts of the Kubernetes Podcast from Google. Together, we dive into one central question: can you truly run LLMs reliably and at scale on Kubernetes? It quickly becomes clear that LLM workloads behave nothing like traditional web applications: GPUs are scarce, expensive, and difficult to schedule.Models are massive — some reaching 700GB — making load times, storage throughput, and caching ...
#109 OpenTelemetry: eenvoud in observability met AI
De Nederlandse Kubernetes Podcast
24 minutes
2 months ago
#109 OpenTelemetry: eenvoud in observability met AI
In aflevering 109 van de Nederlandse Kubernetes Podcast spreken we met Miel Donkers, founding engineer bij Dash0. Dash0 biedt een observability- en monitoringplatform dat volledig draait op OpenTelemetry. Miel legt uit hoe zij zich onderscheiden van andere aanbieders: volledig open, gebruiksvriendelijk, en direct inzetbaar met bestaande standaarden zoals Prometheus API’s. We bespreken hoe OpenTelemetry werkt met metrics, logs en traces, en hoe Dash0 ontwikkelaars helpt om overzicht te houden ...
De Nederlandse Kubernetes Podcast
In this episode, we talk with Abdel Sghiouar and Mofi Rahman, Developer Advocates at Google and (guest) hosts of the Kubernetes Podcast from Google. Together, we dive into one central question: can you truly run LLMs reliably and at scale on Kubernetes? It quickly becomes clear that LLM workloads behave nothing like traditional web applications: GPUs are scarce, expensive, and difficult to schedule.Models are massive — some reaching 700GB — making load times, storage throughput, and caching ...