Home
Categories
EXPLORE
True Crime
Comedy
Business
Society & Culture
History
Sports
Technology
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/92/f0/ad/92f0adf4-2b10-a63c-bc79-1889b710b139/mza_6601485165628379978.jpg/600x600bb.jpg
AI: post transformers
mcgrof
340 episodes
2 days ago
The transformer architecture revolutionized the world of Neural Networks. It was a springboard for what we know today as modern artificial intelligence. This podcast focuses on modern state of the art research paper reviews starting from the transformer and on.
Show more...
Technology
RSS
All content for AI: post transformers is the property of mcgrof and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
The transformer architecture revolutionized the world of Neural Networks. It was a springboard for what we know today as modern artificial intelligence. This podcast focuses on modern state of the art research paper reviews starting from the transformer and on.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/44199026/44199026-1754490757264-4f84f1d34e94a.jpg
NeurIPS 2025: Thinkless: LLM Learns When to Think
AI: post transformers
13 minutes 48 seconds
1 month ago
NeurIPS 2025: Thinkless: LLM Learns When to Think

The research introduces Thinkless, a framework designed to solve the computational inefficiency of Large Language Models (LLMs) that overuse chain-of-thought reasoning for simple queries. This adaptive model determines whether to utilize a concise () or detailed reasoning () mode based on the input complexity and its own capabilities. Central to this approach is the Decoupled Group Relative Policy Optimization (DeGRPO) algorithm, which employs reinforcement learning to jointly optimize both the selection of the reasoning mode and the accuracy of the final answer. DeGRPO stabilizes training by balancing the gradient signals between the control tokens and the response tokens, successfully preventing policy collapse observed in traditional reinforcement learning methods. Empirically, the model effectively handles varied tasks, demonstrating its ability to reduce the reliance on computationally expensive, long-form reasoning by 50% to 90% on mathematical benchmarks while maintaining performance.


Source:

https://openreview.net/pdf?id=ariVQf0KZx

AI: post transformers
The transformer architecture revolutionized the world of Neural Networks. It was a springboard for what we know today as modern artificial intelligence. This podcast focuses on modern state of the art research paper reviews starting from the transformer and on.