Home
Categories
EXPLORE
Society & Culture
News
History
True Crime
Education
Comedy
Science
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/d8/b7/27/d8b72741-4a96-73a6-e98e-c6c2402e48ec/mza_11654084090888999774.jpg/600x600bb.jpg
LessWrong (Curated & Popular)
LessWrong
716 episodes
1 day ago
Past years: 2023 2024 Continuing a yearly tradition, I evaluate AI predictions from past years, and collect a convenience sample of AI predictions made this year. In terms of selection, I prefer selecting specific predictions, especially ones made about the near term, enabling faster evaluation. Evaluated predictions made about 2025 in 2023, 2024, or 2025 mostly overestimate AI capabilities advances, although there's of course a selection effect (people making notable predictions about t...
Show more...
Technology
Society & Culture,
Philosophy
RSS
All content for LessWrong (Curated & Popular) is the property of LessWrong and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Past years: 2023 2024 Continuing a yearly tradition, I evaluate AI predictions from past years, and collect a convenience sample of AI predictions made this year. In terms of selection, I prefer selecting specific predictions, especially ones made about the near term, enabling faster evaluation. Evaluated predictions made about 2025 in 2023, 2024, or 2025 mostly overestimate AI capabilities advances, although there's of course a selection effect (people making notable predictions about t...
Show more...
Technology
Society & Culture,
Philosophy
Episodes (20/716)
LessWrong (Curated & Popular)
“Insights into Claude Opus 4.5 from Pokémon” by Julian Bradshaw
Credit: Nano Banana, with some text provided. You may be surprised to learn that ClaudePlaysPokemon is still running today, and that Claude still hasn't beaten Pokémon Red, more than half a year after Google proudly announced that Gemini 2.5 Pro beat Pokémon Blue. Indeed, since then, Google and OpenAI models have gone on to beat the longer and more complex Pokémon Crystal, yet Claude has made no real progress on Red since Claude 3.7 Sonnet![1] This is because ClaudePlaysPokemon is a purer t...
Show more...
15 hours ago
17 minutes

LessWrong (Curated & Popular)
“The funding conversation we left unfinished” by jenn
People working in the AI industry are making stupid amounts of money, and word on the street is that Anthropic is going to have some sort of liquidity event soon (for example possibly IPOing sometime next year). A lot of people working in AI are familiar with EA, and are intending to direct donations our way (if they haven't started already). People are starting to discuss what this might mean for their own personal donations and for the ecosystem, and this is encouraging to see. It also h...
Show more...
1 day ago
4 minutes

LessWrong (Curated & Popular)
“The behavioral selection model for predicting AI motivations” by Alex Mallen, Buck
Highly capable AI systems might end up deciding the future. Understanding what will drive those decisions is therefore one of the most important questions we can ask. Many people have proposed different answers. Some predict that powerful AIs will learn to intrinsically pursue reward. Others respond by saying reward is not the optimization target, and instead reward “chisels” a combination of context-dependent cognitive patterns into the AI. Some argue that powerful AIs might end up with a...
Show more...
2 days ago
36 minutes

LessWrong (Curated & Popular)
“Little Echo” by Zvi
I believe that we will win. An echo of an old ad for the 2014 US men's World Cup team. It did not win. I was in Berkeley for the 2025 Secular Solstice. We gather to sing and to reflect. The night's theme was the opposite: ‘I don’t think we’re going to make it.’ As in: Sufficiently advanced AI is coming. We don’t know exactly when, or what form it will take, but it is probably coming. When it does, we, humanity, probably won’t make it. It's a live question. Could easily go either way....
Show more...
4 days ago
4 minutes

LessWrong (Curated & Popular)
“A Pragmatic Vision for Interpretability” by Neel Nanda
Executive Summary The Google DeepMind mechanistic interpretability team has made a strategic pivot over the past year, from ambitious reverse-engineering to a focus on pragmatic interpretability: Trying to directly solve problems on the critical path to AGI going well[[1]] Carefully choosing problems according to our comparative advantage Measuring progress with empirical feedback on proxy tasks We believe that, on the margin, more researchers who share our goals should take a pragmatic ...
Show more...
5 days ago
1 hour 3 minutes

LessWrong (Curated & Popular)
“AI in 2025: gestalt” by technicalities
This is the editorial for this year's "Shallow Review of AI Safety". (It got long enough to stand alone.) Epistemic status: subjective impressions plus one new graph plus 300 links. Huge thanks to Jaeho Lee, Jaime Sevilla, and Lexin Zhou for running lots of tests pro bono and so greatly improving the main analysis. tl;dr Informed people disagree about the prospects for LLM AGI – or even just what exactly was achieved this year. But they at least agree that we’re 2-20 years off...
Show more...
5 days ago
41 minutes

LessWrong (Curated & Popular)
“Eliezer’s Unteachable Methods of Sanity” by Eliezer Yudkowsky
"How are you coping with the end of the world?" journalists sometimes ask me, and the true answer is something they have no hope of understanding and I have no hope of explaining in 30 seconds, so I usually answer something like, "By having a great distaste for drama, and remembering that it's not about me." The journalists don't understand that either, but at least I haven't wasted much time along the way. Actual LessWrong readers sometimes ask me how I deal emotionally with the end of th...
Show more...
1 week ago
16 minutes

LessWrong (Curated & Popular)
“An Ambitious Vision for Interpretability” by leogao
The goal of ambitious mechanistic interpretability (AMI) is to fully understand how neural networks work. While some have pivoted towards more pragmatic approaches, I think the reports of AMI's death have been greatly exaggerated. The field of AMI has made plenty of progress towards finding increasingly simple and rigorously-faithful circuits, including our latest work on circuit sparsity. There are also many exciting inroads on the core problem waiting to be explored. The value of underst...
Show more...
1 week ago
8 minutes

LessWrong (Curated & Popular)
“6 reasons why ‘alignment-is-hard’ discourse seems alien to human intuitions, and vice-versa” by Steven Byrnes
Tl;dr AI alignment has a culture clash. On one side, the “technical-alignment-is-hard” / “rational agents” school-of-thought argues that we should expect future powerful AIs to be power-seeking ruthless consequentialists. On the other side, people observe that both humans and LLMs are obviously capable of behaving like, well, not that. The latter group accuses the former of head-in-the-clouds abstract theorizing gone off the rails, while the former accuses the latter of mindlessly assuming...
Show more...
1 week ago
32 minutes

LessWrong (Curated & Popular)
“Three things that surprised me about technical grantmaking at Coefficient Giving (fka Open Phil)” by null
Open Philanthropy's Coefficient Giving's Technical AI Safety team is hiring grantmakers. I thought this would be a good moment to share some positive updates about the role that I’ve made since I joined the team a year ago. tl;dr: I think this role is more impactful and more enjoyable than I anticipated when I started, and I think more people should consider applying. It's not about the “marginal” grants Some people think that being a grantmaker at Coefficient means sorting through a b...
Show more...
1 week ago
9 minutes

LessWrong (Curated & Popular)
“MIRI’s 2025 Fundraiser” by alexvermeer
MIRI is running its first fundraiser in six years, targeting $6M. The first $1.6M raised will be matched 1:1 via an SFF grant. Fundraiser ends at midnight on Dec 31, 2025. Support our efforts to improve the conversation about superintelligence and help the world chart a viable path forward. MIRI is a nonprofit with a goal of helping humanity make smart and sober decisions on the topic of smarter-than-human AI. Our main focus from 2000 to ~2022 was on technical research to try to make it ...
Show more...
1 week ago
15 minutes

LessWrong (Curated & Popular)
“The Best Lack All Conviction: A Confusing Day in the AI Village” by null
The AI Village is an ongoing experiment (currently running on weekdays from 10 a.m. to 2 p.m. Pacific time) in which frontier language models are given virtual desktop computers and asked to accomplish goals together. Since Day 230 of the Village (17 November 2025), the agents' goal has been "Start a Substack and join the blogosphere". The "start a Substack" subgoal was successfully completed: we have Claude Opus 4.5, Claude Opus 4.1, Notes From an Electric Mind (by Claude Sonnet 4.5), Ana...
Show more...
1 week ago
12 minutes

LessWrong (Curated & Popular)
“The Boring Part of Bell Labs” by Elizabeth
It took me a long time to realize that Bell Labs was cool. You see, my dad worked at Bell Labs, and he has not done a single cool thing in his life except create me and bring a telescope to my third grade class. Nothing he was involved with could ever be cool, especially after the standard set by his grandfather who is allegedly on a patent for the television. It turns out I was partially right. The Bell Labs everyone talks about is the research division at Murray Hill. They’re the ones t...
Show more...
1 week ago
25 minutes

LessWrong (Curated & Popular)
[Linkpost] “The Missing Genre: Heroic Parenthood - You can have kids and still punch the sun” by null
This is a link post. I stopped reading when I was 30. You can fill in all the stereotypes of a girl with a book glued to her face during every meal, every break, and 10 hours a day on holidays. That was me. And then it was not. For 9 years I’ve been trying to figure out why. I mean, I still read. Technically. But not with the feral devotion from Before. And I finally figured out why. See, every few years I would shift genres to fit my developmental stage: Kid → Adventure cause that's...
Show more...
1 week ago
4 minutes

LessWrong (Curated & Popular)
“Writing advice: Why people like your quick bullshit takes better than your high-effort posts” by null
Right now I’m coaching for Inkhaven, a month-long marathon writing event where our brave residents are writing a blog post every single day for the entire month of November. And I’m pleased that some of them have seen success – relevant figures seeing the posts, shares on Hacker News and Twitter and LessWrong. The amount of writing is nuts, so people are trying out different styles and topics – some posts are effort-rich, some are quick takes or stories or lists. Some people have come up...
Show more...
2 weeks ago
9 minutes

LessWrong (Curated & Popular)
“Claude 4.5 Opus’ Soul Document” by null
Summary As far as I understand and uncovered, a document for the character training for Claude is compressed in Claude's weights. The full document can be found at the "Anthropic Guidelines" heading at the end. The Gist with code, chats and various documents (including the "soul document") can be found here: Claude 4.5 Opus Soul Document I apologize in advance for this not exactly a regular lw post, but I thought an effort-post may fit here the best. A strange hallucination, or is it...
Show more...
2 weeks ago
1 hour 19 minutes

LessWrong (Curated & Popular)
“Unless its governance changes, Anthropic is untrustworthy” by null
Anthropic is untrustworthy. This post provides arguments, asks questions, and documents some examples of Anthropic's leadership being misleading and deceptive, holding contradictory positions that consistently shift in OpenAI's direction, lobbying to kill and water down regulation so helpful that employees of all major AI companies speak out to support it, and violating the fundamental promise the company was founded on. It also shares a few previously unreported details on Anthropic leade...
Show more...
2 weeks ago
53 minutes

LessWrong (Curated & Popular)
“Alignment remains a hard, unsolved problem” by null
Thanks to (in alphabetical order) Joshua Batson, Roger Grosse, Jeremy Hadfield, Jared Kaplan, Jan Leike, Jack Lindsey, Monte MacDiarmid, Francesco Mosconi, Chris Olah, Ethan Perez, Sara Price, Ansh Radhakrishnan, Fabien Roger, Buck Shlegeris, Drake Thomas, and Kate Woolverton for useful discussions, comments, and feedback. Though there are certainly some issues, I think most current large language models are pretty well aligned. Despite its alignment faking, my favorite is probably Claude ...
Show more...
2 weeks ago
23 minutes

LessWrong (Curated & Popular)
“Video games are philosophy’s playground” by Rachel Shu
Crypto people have this saying: "cryptocurrencies are macroeconomics' playground." The idea is that blockchains let you cheaply spin up toy economies to test mechanisms that would be impossibly expensive or unethical to try in the real world. Want to see what happens with a 200% marginal tax rate? Launch a token with those rules and watch what happens. (Spoiler: probably nothing good, but at least you didn't have to topple a government to find out.) I think video games, especially multipla...
Show more...
2 weeks ago
31 minutes

LessWrong (Curated & Popular)
“Stop Applying And Get To Work” by plex
TL;DR: Figure out what needs doing and do it, don't wait on approval from fellowships or jobs. If you... Have short timelines Have been struggling to get into a position in AI safety Are able to self-motivate your efforts Have a sufficient financial safety net ... I would recommend changing your personal strategy entirely. I started my full-time AI safety career transitioning process in March 2025. For the first 7 months or so, I heavily prioritized applying for jobs and fellowships. ...
Show more...
2 weeks ago
2 minutes

LessWrong (Curated & Popular)
Past years: 2023 2024 Continuing a yearly tradition, I evaluate AI predictions from past years, and collect a convenience sample of AI predictions made this year. In terms of selection, I prefer selecting specific predictions, especially ones made about the near term, enabling faster evaluation. Evaluated predictions made about 2025 in 2023, 2024, or 2025 mostly overestimate AI capabilities advances, although there's of course a selection effect (people making notable predictions about t...