Home
Categories
EXPLORE
True Crime
Comedy
Business
Society & Culture
History
Sports
Technology
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/01/1c/4f/011c4f19-1f8b-29e3-6acf-78be44b020ba/mza_15450455317821352510.jpg/600x600bb.jpg
Ethical Bytes | Ethics, Philosophy, AI, Technology
Carter Considine
35 episodes
2 weeks ago
Ethical Bytes explores the combination of ethics, philosophy, AI, and technology. More info: ethical.fm
Show more...
Society & Culture
RSS
All content for Ethical Bytes | Ethics, Philosophy, AI, Technology is the property of Carter Considine and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Ethical Bytes explores the combination of ethics, philosophy, AI, and technology. More info: ethical.fm
Show more...
Society & Culture
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/42178869/42178869-1730013614624-c83a0b4b66f1e.jpg
How Hackers Keep AI Safe: Inside the World of AI Red Teaming
Ethical Bytes | Ethics, Philosophy, AI, Technology
27 minutes 6 seconds
2 months ago
How Hackers Keep AI Safe: Inside the World of AI Red Teaming

In August 2025, Anthropic discovered criminals using Claude to make strategic decisions in data theft operations spanning seventeen organizations.

The AI evaluated financial records, determined ransom amounts reaching half a million dollars, and chose victims based on their capacity to pay. Rather than following a script, the AI was making tactical choices about how to conduct the crime.

Unlike conventional software with predictable failure modes, large language models respond to conversational manipulation. An eleven-year-old at a Las Vegas hacking conference successfully compromised seven AI systems, which shows that technical expertise isn't required.

That accessibility transforms AI security into a challenge unlike anything cybersecurity has faced before. This makes red teaming essential. Organizations hire people to probe their systems for weaknesses before criminals find them.

These models process everything as undifferentiated text streams. You could say it’s an architectural issue. System instructions and user input flow together without clear boundaries.

Security researcher Simon Willison, who named this "prompt injection," confesses he sees no reliable solution. Many experts believe the problem may be inherent to how these systems work.

Real-world testing exposes severe vulnerabilities. Third-party auditors found that more than half their attempts to coax weapons information from Google's systems succeeded in certain setups. Researchers pulled megabytes of training data from ChatGPT for around two hundred dollars. A 2025 study showed GPT-4 could be jailbroken 87.2 percent of the time.

Today's protections focus on reducing rather than eliminating risk.

Tools like Lakera Guard detect attacks in real-time, while guidance from NIST, OWASP, and MITRE provides strategic frameworks. Meanwhile, underground markets price AI exploits between fifty and five hundred dollars, and criminal operations build malicious tools despite safeguards.

When all’s said and done, red teaming offers our strongest defense against threats that may prove impossible to completely resolve.


Key Topics:

  • Criminal Use of AI (00:00)
  • The Origins: Breaking Things in the Cold War (02:57)
  • When a Bug is a Core Functionality (05:40)
  • Testing at Scale (10:30)
  • When Attacks Succeed (12:55)
  • What Works (17:06)
  • The Democratization of Hacking (19:09)
  • What Two Years of Red Teaming Tells Us (21:01)
  • The Arms Race Ahead (23:58)


More info, transcripts, and references can be found at ethical.fm


Ethical Bytes | Ethics, Philosophy, AI, Technology
Ethical Bytes explores the combination of ethics, philosophy, AI, and technology. More info: ethical.fm