AI Summer
AI Summer
Podcast Description
Tim Lee and Dean Ball interview leading experts about the future of AI technology and policy. www.aisummer.org
Podcast Insights
Content Themes
Explores a range of topics within AI, including advancements in autonomous vehicles as discussed with Sophia Tung, AI scaling trends with Nathan Labenz, and AI governance frameworks as covered with Lennart Heim, while focusing on the implications for public policy and societal impacts

Tim Lee and Dean Ball interview leading experts about the future of AI technology and policy.
Last week Anthropic stunned the AI world by announcing Claude Mythos Preview—and then refusing to release it. Princeton’s Sayash Kapoor, co-author of the newsletter AI as Normal Technology, joins Tim and Kai Williams to make sense of the moment.
Kapoor argues that Mythos’ vulnerability-finding prowess, including unearthing a 27-year-old OpenBSD bug, fits a familiar pattern: fuzzing tools triggered similar alarm decades ago but ultimately strengthened defenders more than attackers. Kapoor’s “normal technology” thesis holds that AI’s impact is shaped less by capability jumps than by downstream adoption—how industries, legal systems, and institutions absorb the technology.
The conversation turns to whether alignment or control is the more promising safety strategy. Kapoor contends that the Mythos system card’s examples of the model bypassing access controls reveal shortcomings in control mechanisms, not alignment failures, and calls for ecosystem-level hardening—formal verification, sandboxing, network security—rather than relying on any single model behaving well.
Kapoor then shares his latest research finding that AI agent reliability is improving four to ten times more slowly than average-case accuracy, and that current frontier models—including GPT-5.2—haven’t cleared even “one nine” of reliability. On Sierra’s TauBench, agents confidently book wrong flights and refund thousands of dollars in error, with Gemini 2.5 claiming 100% confidence even when it fails. If each additional nine of reliability is harder than the last, does that mean the real timeline for autonomous AI isn’t set by when models get smart enough, but by when the surrounding infrastructure catches up?
This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.aisummer.org

Disclaimer
This podcast’s information is provided for general reference and was obtained from publicly accessible sources. The Podcast Collaborative neither produces nor verifies the content, accuracy, or suitability of this podcast. Views and opinions belong solely to the podcast creators and guests.
For a complete disclaimer, please see our Full Disclaimer on the archive page. The Podcast Collaborative bears no responsibility for the podcast’s themes, language, or overall content. Listener discretion is advised. Read our Terms of Use and Privacy Policy for more details.