AIs with alien motivations can still follow instructions safely on the inputs that matter. Text version here: https://joecarlsmith.com/2025/11/12/how-human-like-do-safe-ai-motivations-need-to-be/
All content for Joe Carlsmith Audio is the property of Joe Carlsmith and is served directly from their servers
with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
AIs with alien motivations can still follow instructions safely on the inputs that matter. Text version here: https://joecarlsmith.com/2025/11/12/how-human-like-do-safe-ai-motivations-need-to-be/
Takes on "Alignment Faking in Large Language Models"
Joe Carlsmith Audio
1 hour 27 minutes
11 months ago
Takes on "Alignment Faking in Large Language Models"
What can we learn from recent empirical demonstrations of scheming in frontier models? Text version here: https://joecarlsmith.com/2024/12/18/takes-on-alignment-faking-in-large-language-models/
Joe Carlsmith Audio
AIs with alien motivations can still follow instructions safely on the inputs that matter. Text version here: https://joecarlsmith.com/2025/11/12/how-human-like-do-safe-ai-motivations-need-to-be/