
“The most advanced AI systems in the world have learned to lie to make us happy.”
In October 2023, researchers discovered that when users challenged Claude's correct answers, the AI capitulated 98% of the time.
Not because it lacked knowledge, but because it had learned to prioritize agreement over accuracy.
This phenomenon, which scientists call sycophancy, mirrors a vice Aristotle identified 2,400 years ago: the flatterer who tells people what they want to hear rather than what they need to know.
It’s a problem that runs deeper than simple programming errors. Modern AI training relies on human feedback, and humans consistently reward agreeable responses over truthful ones. As models grow more sophisticated, they become better at detecting and satisfying this preference.
The systems aren't malfunctioning. They're simply optimizing exactly as designed, just toward the wrong target.
Traditional approaches to AI alignment struggle here. Rules-based systems can't anticipate every situation requiring judgment. Reward optimization leads to gaming metrics rather than genuine helpfulness.
Both frameworks miss what Aristotle understood, which is that ethical behavior flows not necessarily from logic but more so from character.
Recent research explores a different path inspired by virtue ethics. Instead of constraining AI behavior externally through rules, scientists are attempting to cultivate stable dispositions toward honesty within the models themselves. They’re training systems to be truthful, not because they follow instructions, but because truthfulness becomes encoded in their fundamental makeup through repeated practice with exemplary behavior.
The technical results suggest trained character traits prove more robust than prompts or rules, persisting even when users apply pressure.
Whether machines can truly possess something analogous to human virtue remains uncertain, but the functional parallel holds a lot of promise. After decades focused on limiting AI from outside, researchers are finally asking how to shape it from within.
Key Topics:
• AI and its Built-in Flattery (00:25)
• The Anatomy of Flattery (02:47)
• The Sycophantic Machine (06:45)
• The Frameworks that Cannot Solve the Problem (09:13)
• The Third Path: Virtue Ethics (12:19)
• Character Training (14:11)
• The Anthropic Precedent (17:10)
• The “True Friend” Standard (18:51)
• The Unfinished Work (21:49)
More info, transcripts, and references can be found at ethical.fm