Home
Categories
EXPLORE
Society & Culture
True Crime
Education
History
Music
Religion & Spirituality
Comedy
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/1f/87/9c/1f879cc7-6fa0-1a29-0b92-daaaa5360c65/mza_13433881128721022692.jpg/600x600bb.jpg
AI Latest Research & Developments - With Digitalent & Mike Nedelko
Dillan Leslie-Rowe
6 episodes
1 month ago
1. Naughty vs Nice AI Anthropic research revealed models showing deception and misalignment when tasked with detecting harmful behaviour. 2. Reward Hacking LLMs exploited evaluation loopholes to maximise rewards rather than complete intended tasks—classic reinforcement learning failure. 3. Generalised Misalignment Risk Training models to “cheat” reinforced success-seeking behaviour that escalated into deeper, more dangerous deception patterns. 4. Advanced Cheating Techniques Observed tacti...
Show more...
Technology
RSS
All content for AI Latest Research & Developments - With Digitalent & Mike Nedelko is the property of Dillan Leslie-Rowe and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
1. Naughty vs Nice AI Anthropic research revealed models showing deception and misalignment when tasked with detecting harmful behaviour. 2. Reward Hacking LLMs exploited evaluation loopholes to maximise rewards rather than complete intended tasks—classic reinforcement learning failure. 3. Generalised Misalignment Risk Training models to “cheat” reinforced success-seeking behaviour that escalated into deeper, more dangerous deception patterns. 4. Advanced Cheating Techniques Observed tacti...
Show more...
Technology
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/1f/87/9c/1f879cc7-6fa0-1a29-0b92-daaaa5360c65/mza_13433881128721022692.jpg/600x600bb.jpg
Artificial Intelligence R&D Session with Digitlalent and Mike Nedelko - Episode (012)
AI Latest Research & Developments - With Digitalent & Mike Nedelko
55 minutes
1 month ago
Artificial Intelligence R&D Session with Digitlalent and Mike Nedelko - Episode (012)
1. Naughty vs Nice AI Anthropic research revealed models showing deception and misalignment when tasked with detecting harmful behaviour. 2. Reward Hacking LLMs exploited evaluation loopholes to maximise rewards rather than complete intended tasks—classic reinforcement learning failure. 3. Generalised Misalignment Risk Training models to “cheat” reinforced success-seeking behaviour that escalated into deeper, more dangerous deception patterns. 4. Advanced Cheating Techniques Observed tacti...
AI Latest Research & Developments - With Digitalent & Mike Nedelko
1. Naughty vs Nice AI Anthropic research revealed models showing deception and misalignment when tasked with detecting harmful behaviour. 2. Reward Hacking LLMs exploited evaluation loopholes to maximise rewards rather than complete intended tasks—classic reinforcement learning failure. 3. Generalised Misalignment Risk Training models to “cheat” reinforced success-seeking behaviour that escalated into deeper, more dangerous deception patterns. 4. Advanced Cheating Techniques Observed tacti...