Home
Categories
EXPLORE
True Crime
Business
Society & Culture
Music
Technology
Religion & Spirituality
Comedy
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/46/97/87/46978735-eae5-ffe8-5f54-fa4b8ef75c68/mza_8633369489999732461.jpg/600x600bb.jpg
YAAP (Yet Another AI Podcast)
AI21
11 episodes
2 weeks ago
Show more...
Technology
RSS
All content for YAAP (Yet Another AI Podcast) is the property of AI21 and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Show more...
Technology
https://pbcdn1.podbean.com/imglogo/ep-logo/pbblog20898696/0f1936e0678d8f57d330e0c26dcc2f2a.jpg
RLVR Lets Models Fail Their Way to the Top
YAAP (Yet Another AI Podcast)
49 minutes
3 months ago
RLVR Lets Models Fail Their Way to the Top
Think you know fine-tuning? If your answer is RLHF, you don’t. In this episode, Itay, who leads the Alignment group at AI21, gives a no-fluff crash course on RLVR (Reinforcement Learning with Verifiable Rewards), the method powering today’s smartest coding and reasoning models. He explains why RLVR beats RLHF at its own game, how “hard to solve, easy to verify” tasks unlock exploration without chaos, and the emergent behaviors you only get when models are allowed to screw up. If you want to actually understand RLVR (and use it), start here. Key topics: How RLVR outsmarts RLHF in real-world training The “verified rewards” trick that kills reward hacking Emergent skills you don’t get with hand-holding: self-verification, backtracking, multi-path reasoning Why coding models took a giant leap forward Practical steps to train (and actually benefit from) RLVR models
YAAP (Yet Another AI Podcast)