NeurIPS 2025: Thinkless: LLM Learns When to Think

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/92/f0/ad/92f0adf4-2b10-a63c-bc79-1889b710b139/mza_6601485165628379978.jpg/600x600bb.jpg

AI: post transformers

mcgrof

340 episodes

2 days ago

The transformer architecture revolutionized the world of Neural Networks. It was a springboard for what we know today as modern artificial intelligence. This podcast focuses on modern state of the art research paper reviews starting from the transformer and on.

Technology

RSS

All content for AI: post transformers is the property of mcgrof and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/44199026/44199026-1754490757264-4f84f1d34e94a.jpg

NeurIPS 2025: Thinkless: LLM Learns When to Think

AI: post transformers

13 minutes 48 seconds

1 month ago

NeurIPS 2025: Thinkless: LLM Learns When to Think

The research introduces Thinkless, a framework designed to solve the computational inefficiency of Large Language Models (LLMs) that overuse chain-of-thought reasoning for simple queries. This adaptive model determines whether to utilize a concise () or detailed reasoning () mode based on the input complexity and its own capabilities. Central to this approach is the Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, which employs reinforcement learning to jointly optimize both the selection of the reasoning mode and the accuracy of the final answer. DeGRPO stabilizes training by balancing the gradient signals between the control tokens and the response tokens, successfully preventing policy collapse observed in traditional reinforcement learning methods. Empirically, the model effectively handles varied tasks, demonstrating its ability to reduce the reliance on computationally expensive, long-form reasoning by 50% to 90% on mathematical benchmarks while maintaining performance.

Source:

https://openreview.net/pdf?id=ariVQf0KZx