Home
Categories
EXPLORE
Comedy
Business
Society & Culture
Leisure
True Crime
News
Education
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts116/v4/ce/e1/c6/cee1c6b1-766b-9137-2409-b41df5796394/mza_467482820930693497.jpg/600x600bb.jpg
Theo Jaffee Podcast
Theo Jaffee
21 episodes
1 week ago
Deep conversations with brilliant people.
Show more...
Technology
RSS
All content for Theo Jaffee Podcast is the property of Theo Jaffee and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Deep conversations with brilliant people.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/38564113/38564113-1690595532312-56289e25064c6.jpg
#5: Quintin Pope - AI alignment, machine learning, failure modes, and reasons for optimism
Theo Jaffee Podcast
2 hours 36 minutes 28 seconds
2 years ago
#5: Quintin Pope - AI alignment, machine learning, failure modes, and reasons for optimism

Quintin Pope is a machine learning researcher focusing on natural language modeling and AI alignment. Among alignment researchers, Quintin stands out for his optimism. He believes that AI alignment is far more tractable than it seems, and that we appear to be on a good path to making the future great. On LessWrong, he's written one of the most popular posts of the last year, “My Objections To ‘We're All Gonna Die with Eliezer Yudkowsky’”, as well as many other highly upvoted posts on various alignment papers, and on his own theory of alignment, shard theory.

  • Quintin’s Twitter: https://twitter.com/QuintinPope5

  • Quintin’s LessWrong profile: https://www.lesswrong.com/users/quintin-pope

  • My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”: https://www.lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky

  • The Shard Theory Sequence: https://www.lesswrong.com/s/nyEFg3AuJpdAozmoX

  • Quintin’s Alignment Papers Roundup: https://www.lesswrong.com/s/5omSW4wNKbEvYsyje

  • Evolution provides no evidence for the sharp left turn: https://www.lesswrong.com/posts/hvz9qjWyv8cLX9JJR/evolution-provides-no-evidence-for-the-sharp-left-turn

  • Deep Differentiable Logic Gate Networks: https://arxiv.org/abs/2210.08277

  • The Hydra Effect: Emergent Self-repair in Language Model Computations: https://arxiv.org/abs/2307.15771

  • Deep learning generalizes because the parameter-function map is biased towards simple functions: https://arxiv.org/abs/1805.08522

  • Bridging RL Theory and Practice with the Effective Horizon: https://arxiv.org/abs/2304.09853

PODCAST LINKS:

  • Video Transcript: https://www.theojaffee.com/p/5-quintin-pope

  • Spotify: https://open.spotify.com/show/1IJRtB8FP4Cnq8lWuuCdvW?si=eba62a72e6234efb

  • Apple Podcasts: https://podcasts.apple.com/us/podcast/theo-jaffee-podcast/id1699912677

  • RSS: https://api.substack.com/feed/podcast/989123/s/75569/private/129f6344-c459-4581-a9da-dc331677c2f6.rss

  • Playlist of all episodes: https://www.youtube.com/playlist?list=PLVN8-zhbMh9YnOGVRT9m0xzqTNGD_sujj

  • My Twitter: https://x.com/theojaffee

  • My Substack: https://www.theojaffee.com

CHAPTERS:

Introduction (0:00)

What Is AGI? (1:03)

What Can AGI Do? (12:49)

Orthogonality (23:14)

Mind Space (42:50)

Quintin’s Background and Optimism (55:06)

Mesa-Optimization and Reward Hacking (1:02:48)

Deceptive Alignment (1:11:52)

Shard Theory (1:24:10)

What Is Alignment? (1:30:05)

Misalignment and Evolution (1:37:21)

Mesa-Optimization and Reward Hacking, Part 2 (1:46:56)

RL Agents (1:55:02)

Monitoring AIs (2:09:29)

Mechanistic Interpretability (2:14:00)

AI Disempowering Humanity (2:28:13)

Theo Jaffee Podcast
Deep conversations with brilliant people.