Tempo: SLO-Aware LLM Serving Maximizing Service Gain

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/92/f0/ad/92f0adf4-2b10-a63c-bc79-1889b710b139/mza_6601485165628379978.jpg/600x600bb.jpg

AI: post transformers

mcgrof

316 episodes

1 day ago

The transformer architecture revolutionized the world of Neural Networks. It was a springboard for what we know today as modern artificial intelligence. This podcast focuses on modern state of the art research paper reviews starting from the transformer and on.

Technology

RSS

All content for AI: post transformers is the property of mcgrof and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/44199026/44199026-1754490757264-4f84f1d34e94a.jpg

Tempo: SLO-Aware LLM Serving Maximizing Service Gain

AI: post transformers

14 minutes 52 seconds

1 week ago

Tempo: SLO-Aware LLM Serving Maximizing Service Gain

The April 24, 2025 academic paper introduces **Tempo**, a novel scheduling system designed to optimize Large Language Model (LLM) serving by addressing the wide variety of Service Level Objectives (**SLOs**) in modern LLM applications. The authors categorize requests into three types—**latency-sensitive**, **throughput-intensive**, and **collective requests**—each with distinct performance requirements that existing schedulers fail to manage effectively. Tempo maximizes "service gain" by allocating just enough serving bandwidth to meet each request’s SLO, utilizing a **hybrid scheduling strategy** that relies on lightweight prediction models for conservative initial estimates of response length and **dependency-graph matching** for complex workflows. Evaluations demonstrate that Tempo significantly outperforms state-of-the-art systems in terms of both service gain and **SLO goodput** across diverse workloads and models.

Source:

April 24, 2025

Tempo: Application-aware LLM Serving with Mixed SLO Requirements

https://arxiv.org/pdf/2504.20068