Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
History
Business
Sports
News
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/f0/4a/f8/f04af822-008a-2330-e3f3-5fae4e00262c/mza_6620006532835236257.jpg/600x600bb.jpg
The Gist Talk
kw
258 episodes
4 days ago
Welcome to The Gist Talk, the podcast where we break down the big ideas from the world’s most fascinating business and non-fiction books. Whether you’re a busy professional, a lifelong learner, or just someone curious about the latest insights shaping the world, this show is for you. Each episode, we’ll explore the key takeaways, actionable lessons, and inspiring stories—giving you the ‘gist’ of every book, one conversation at a time. Join us for engaging discussions that make learning effortless and fun.
Show more...
Business
RSS
All content for The Gist Talk is the property of kw and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Welcome to The Gist Talk, the podcast where we break down the big ideas from the world’s most fascinating business and non-fiction books. Whether you’re a busy professional, a lifelong learner, or just someone curious about the latest insights shaping the world, this show is for you. Each episode, we’ll explore the key takeaways, actionable lessons, and inspiring stories—giving you the ‘gist’ of every book, one conversation at a time. Join us for engaging discussions that make learning effortless and fun.
Show more...
Business
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/42551424/42551424-1732839355363-f882e4dafe46.jpg
vAttention: Dynamic LLM Memory Without PagedAttention
The Gist Talk
35 minutes 1 second
1 month ago
vAttention: Dynamic LLM Memory Without PagedAttention

paper titled "vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention," introduces a novel memory management approach called vAttention designed to optimize Large Language Model (LLM) serving systems. The paper primarily critiques PagedAttention, the existing standard for dynamic memory allocation, arguing that it introduces performance overheads and complexity by causing the Key-Value (KV) cache to become non-contiguous in virtual memory. vAttention solves this by decoupling virtual and physical memory allocation using CUDA Virtual Memory Management (VMM) APIs, thereby retaining virtual memory contiguity while mitigating physical memory fragmentation. Through evaluations, the authors demonstrate that vAttention is a simpler, more portable, and often more performant alternative, supporting various attention kernels—including FlashAttention-3—out-of-the-box and achieving throughput improvements up to 1.23× over PagedAttention-based systems. The work also details LLM-specific optimizations, such as deferred reclamation and supporting smaller 64KB page groups, to hide VMM latency and reduce fragmentation

The Gist Talk
Welcome to The Gist Talk, the podcast where we break down the big ideas from the world’s most fascinating business and non-fiction books. Whether you’re a busy professional, a lifelong learner, or just someone curious about the latest insights shaping the world, this show is for you. Each episode, we’ll explore the key takeaways, actionable lessons, and inspiring stories—giving you the ‘gist’ of every book, one conversation at a time. Join us for engaging discussions that make learning effortless and fun.