Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
History
Sports
Technology
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/f2/56/51/f256516c-7ca0-a1e0-095d-98b42a505a34/mza_2950839120930297173.jpg/600x600bb.jpg
Best AI papers explained
Enoch H. Kang
605 episodes
22 hours ago
Cut through the noise. We curate and break down the most important AI papers so you don’t have to.
Show more...
Technology
RSS
All content for Best AI papers explained is the property of Enoch H. Kang and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Cut through the noise. We curate and break down the most important AI papers so you don’t have to.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_episode/43252366/43252366-1766358564445-91dd127dbc8a3.jpg
Let’s (not) just put things in Context: Test-Time Training for Long-Context LLMs
Best AI papers explained
13 minutes 45 seconds
2 weeks ago
Let’s (not) just put things in Context: Test-Time Training for Long-Context LLMs

Large language models often struggle with long-context tasks because the attention mechanism suffers from **score dilution**, where relevant information is overwhelmed by surrounding "distractor" tokens. Researchers found that common **inference-time scaling strategies**, such as generating additional "thinking tokens," fail to solve this problem as context length increases. To address this, the authors propose **query-only test-time training (qTTT)**, a computationally efficient method that updates only the model's **query projection matrices** for a specific input. By performing a single prefill to cache **keys and values** and then applying targeted gradient updates, the model learns to better distinguish the "needle" of relevant information from the "haystack" of noise. Experiments across **LongBench-v2** and **ZeroScrolls** benchmarks show that qTTT consistently outperforms traditional methods and thinking tokens. This approach suggests that **adapting model parameters** during inference is a more effective use of compute than simply increasing the length of the generated output.

Best AI papers explained
Cut through the noise. We curate and break down the most important AI papers so you don’t have to.