We should really be on Polymarket. Pierce and Richard make their bets on GPT-6, competition between the different letters of FAANG, dynamic websites calibrated to user preferences, and increasing quality of OSS models.
It's been a long year in the world of AI. Benchmarks are now almost totally saturated; the financial bubble keeps growing; spending more on inference compute; increasing competition from open source models; agents finally reach the mainstream; the frankly horrible job market for people out of school; multi-model models are back and increasing converging on transformer architectures. We cover them all in our anything goes holiday recap show. Plus - all the things that didn't happen this year.
Pierce reflects on his own 2025. Thoughts on choosing the right buyer persona, scaling an AI business from zero lines in a github repo, the feeling of finally reaching product market fit, boring versus interesting businesses, and more.
Richard takes the hot seat for the first episode of our 2025 recap series where we spend the rest of December looking back on what this year meant to us personally and in the world of AI/ML. We cover what it's like to defend a thesis in the UK, the difficulty of training meta-learning models, choosing a well scoped research topic, how to define small models, and what's needed to make them better.
OpenAI is rolling out shopping support to their users and plotting an ads rollout to challenge Google's ad business, we get a peek behind the curtain on SOTA image generation models with the release of Alibaba's Z-Image (and speculate this might be how nano banana has great text performance), and OpenReview exposes the identities behind double blind reviews.
Pierce and Richard cover OpenAI's new long range model compression in Codex, initial takeaways of Gemini 3.0 and Nano Banana Pro, Nvidia chip exports to the UAE and Saudi Arabia, and Cloudstrike's global outage. Plus - why Pierce prefers chicken to turkey.
Richard and Pierce break down the 5 levels of autonomy, whether Elon has a point about RGB vs lidar systems, sensor fusion algorithms, end to end learning in driving simulations, and more.
Pierce and Richard cover the news that Yann LeCunn is planning to depart Meta to focus on world models, Cursor 2.0 and their new home trained Composer coding model, Kimi K2 has great generalization performance for an open model but is lagging on code, Microsoft creates a super data center across 700 miles, and Anthropic reports the first hacking campaign orchestrated by AI.
Further reading:
https://arxiv.org/abs/2509.14252
https://www.digit.in/features/general/meta-chief-ai-scientist-yann-lecun-thinks-llms-are-a-waste-of-time.html
https://cursor.com/blog/composer
https://arxiv.org/abs/2507.20534
https://www.anthropic.com/news/disrupting-AI-espionage
Richard and Pierce talk about the major AI conferences, walk through the history of NeurIPS/ICML/ICLR, and retrofitting the peer review system.
Richard and Pierce break down all the new AI web browser entrants with a particular focus on OpenAI's new Atlas, tradeoffs between vision models and text based dom parsing, potential security vulnerabilities, and more.
OpenAI releases their long awaited browser Atlas, Pytorch releases their distributed computation framework Monarch, the SALT reinforcement learning addition to GRPO, the HAL benchmark for agent evaluation, and trying to adapt the kv cache for text diffusion models.
Further reading:
https://openai.com/index/introducing-chatgpt-atlas/
https://pytorch.org/blog/introducing-pytorch-monarch/
https://arxiv.org/pdf/2510.20022
https://arxiv.org/abs/2510.11977
https://arxiv.org/abs/2510.14973
Richard and Pierce take the bull case on whether we're in an AI bubble. They cover circular financial deals, energy build outs, AI representing 92% of GDP growth in H1 2025, and a comparison with the hype in 2000s around meaningless dot-com companies.
Articles written by LLMs have stabilized at exactly 50% of the internet (at least - so far as our classifiers can discriminate), the price of embedding models, OpenAI announces a new job board and certification programs for applied AI, Amazon releases the public availability of Bedrock AgentCore, and how pre-training on low quality data affects the capability of post-training.
Further reading:
https://arxiv.org/abs/2510.13928
https://openai.com/index/expanding-economic-opportunity-with-ai/
https://www.tensoreconomics.com/p/why-are-embeddings-so-cheap
https://graphite.io/five-percent/more-articles-are-now-created-by-ai-than-humans
OpenAI diversifies their chip suppliers through partnerships with AMD and Broadcom, Google starts a new AI Bug Bountry problem but only for computational security not for llm hallucinations, Nvidia ships their first prosumer computer, DeepMind has a new complexity theory proof solver, and Anthropic writes their own gibberish poison pill that works across model sizes.
Further reading:
https://openai.com/index/openai-amd-strategic-partnership/
https://investor.nvidia.com/news/press-release-details/2024/NVIDIA-Announces-Financial-Results-for-Second-Quarter-Fiscal-2025/default.aspx
https://bughunters.google.com/blog/6116887259840512/announcing-google-s-new-ai-vulnerability-reward-program
https://marketplace.nvidia.com/en-us/developer/dgx-spark/
https://arxiv.org/abs/2509.18057
https://www.anthropic.com/research/small-samples-poison
You asked, we answered! Rich and Pierce do their first listener mailbag. Explaining RLHF, our current development stack, whether model competition is making things better for people using them, and more.
Breaking down California's recently passed SB 53 to legislate frontier model development, ISO standards in startups, and why this one passed where the older SB 1047 failed.
Building a modern AI app and architecting Sora II, first impressions of Sonnet 4.5, and the frontier labs go after n8n and Zapier.
Further reading:
https://openai.com/index/sora-2/
https://openai.com/index/sora-is-here/
https://www.lesswrong.com/posts/4yn8B8p2YiouxLABy/claude-sonnet-4-5-system-card-and-alignment
https://www-cdn.anthropic.com/872c653b2d0501d6ab44cf87f43e1dc4853e4d37.pdf
https://www.testingcatalog.com/openai-prepares-to-release-agent-builder-during-devday-on-october-6/
Richard and Pierce respond to the Times podcast about the scarcity of junior engineering jobs. They talk through the academic difference between Computer Science vs. Engineering, AI as a new engineering primitive, talent arbitrage through intern programs, and more.
https://www.nytimes.com/2025/09/29/podcasts/the-daily/big-tech-told-kids-to-code-the-jobs-didnt-follow.html
OpenAI & NVIDIA’s 10GW partnership, GDPVal as a new human curated benchmark dataset, Gemini Robotics-ER 1.5, and Apple's distillation of AlphaFold.
Additional reading:
https://nvidianews.nvidia.com/news/openai-and-nvidia-announce-strategic-partnership-to-deploy-10gw-of-nvidia-systems
https://openai.com/index/gdpval/
https://deepmind.google/discover/blog/gemini-robotics-15-brings-ai-agents-into-the-physical-world/
https://arxiv.org/pdf/2509.18480
Pierce and Richard recap Anthropic's Economic Index. Differences between country use of AI, autonomy versus augmentation, and the real business use cases that Anthropic is seeing so far.
Further reading:
https://www.anthropic.com/research/anthropic-economic-index-september-2025-report