Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
History
Sports
Technology
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/f2/56/51/f256516c-7ca0-a1e0-095d-98b42a505a34/mza_2950839120930297173.jpg/600x600bb.jpg
Best AI papers explained
Enoch H. Kang
605 episodes
22 hours ago
Cut through the noise. We curate and break down the most important AI papers so you don’t have to.
Show more...
Technology
RSS
All content for Best AI papers explained is the property of Enoch H. Kang and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Cut through the noise. We curate and break down the most important AI papers so you don’t have to.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_episode/43252366/43252366-1766003043157-645a6bb694f29.jpg
What happened with sparse autoencoders?
Best AI papers explained
30 minutes 9 seconds
2 weeks ago
What happened with sparse autoencoders?

We cover Neel Nanda (Google DeepMind)'s discussion on efficacy and limitations of Sparse Autoencoders (SAEs) as a tool for unsupervised discovery and interpretability in large language models. Initially considered a major breakthrough for breaking down model activations into interpretable, linear concepts, the conversation explores the subsequent challenges and pathologies observed in SAEs, such as feature absorption and the difficulty of finding truly canonical units. While acknowledging that SAEs are valuable for generating hypotheses and providing unsupervised insights into model behavior—especially when exploring unknown concepts—the speaker ultimately concludes that supervised methods are often superior for finding specific, known concepts, suggesting that SAEs are not a complete solution for full model reverse engineering. Newer iterations like Matrioska SAEs and related techniques like crosscoders and transcoder-based attribution graphs are also examined for their ability to advance model understanding, despite their associated complexities and drawbacks.

Best AI papers explained
Cut through the noise. We curate and break down the most important AI papers so you don’t have to.