Jacob and Igor argue that AI safety is hurting users, not helping them. The techniques used to make chatbots "safe" and "aligned," such as instruction tuning and RLHF, anthropomorphize AI systems such they take advantage of our instincts as social beings. At the same time, Big Tech companies push these systems for "wellness" while dodging healthcare liability, causing real harms today We discuss what actual safety would look like, drawing on self-driving car regulations.
Chapters
AI Ethics & Philosophy
AI Model Bias, Failures, and Impacts
AI Mental Health & Safety Concerns
Guidelines, Governance, and Censorship
We dig into how the concept of AI "safety" has been co-opted and weaponized by tech companies. Starting with examples like Mecha-Hitler Grok, we explore how real safety engineering differs from AI "alignment," the myth of the alignment tax, and why this semantic confusion matters for actual safety.
Additional Referenced Papers
Inciting Examples
Other Sources
In this episode, we redefine AI's "reasoning" as mere rambling, exposing the "illusion of thinking" and "Potemkin understanding" in current models. We contrast the classical definition of reasoning (requiring logic and consistency) with Big Tech's new version, which is a generic statement about information processing. We explain how Large Rambling Models generate extensive, often irrelevant, rambling traces that appear to improve benchmarks, largely due to best-of-N sampling and benchmark gaming.
Words and definitions actually matter! Carelessness leads to misplaced investments and an overestimation of systems that are currently just surprisingly useful autocorrects.
Theoretical understanding
Empirical explanations
Other sources
In this episode, we break down Trump's "One Big Beautiful Bill" and its dystopian AI provisions: automated fraud detection systems, centralized citizen databases, military AI integration, and a 10-year moratorium blocking all state AI regulation. We explore the historical parallels with authoritarian data consolidation and why this represents a fundamental shift away from limited government principles once held by US conservatives.
Authoritarianism
Military
Moratorium on State AI Regulation
Other Sources
Jacob and Igor tackle the wild claims about AI's economic impact by examining three main clusters of arguments: automating expensive tasks like programming, removing "cost centers" like call centers and corporate art, and claims of explosive growth. They dig into the actual data, debunk the hype, and explain why most productivity claims don't hold up in practice. Plus: MIT denounces a paper with fabricated data, and Grok randomly promotes white genocide myths.
Automating profit centers
Removing cost centers
AlphaEvolve
Off Topic
DeepSeek has been out for over 2 months now, and things have begun to settle down. We take this opportunity to contextualize the developments that have occurred in its wake, both within the AI industry and the world economy. As systems get more "agentic" and users are willing to spend increasing amounts of time waiting for their outputs, the value of supposed "reasoning" models continues to be peddled by AI system developers, but does the data really back these claims?
Check out our DeepSeek minisode for a snappier overview!
EPISODE RECORDED 2025.03.30
Understanding DeepSeek/DeepResearch
Fallout coverage
US governance
Leaderboards
Other References
DeepSeek R1 has taken the world by storm, causing a stock market crash and prompting further calls for export controls within the US. Since this story is still very much in development, with follow-up investigations and calls for governance being released almost daily, we thought it best to hold of for a little while longer to be able to tell the whole story. Nonetheless, it's a big story, so we provide a brief overview of all that's out there so far.
Fallout coverage
Investigations into "reasoning"
Chris Canal, co-founder of EquiStamp, joins muckrAIkers as our first ever podcast guest! In this ~3.5 hour interview, we discuss intelligence vs. competencies, the importance of test-time compute, moving goalposts, the orthogonality thesis, and much more.
A seasoned software developer, Chris started EquiStamp as a way to improve our current understanding of model failure modes and capabilities in late 2023. Now a key contractor for METR, EquiStamp evaluates the next generation of LLMs from frontier model developers like OpenAI and Anthropic.
EquiStamp is hiring, so if you're a software developer interested in a fully remote opportunity with flexible working hours, join the EquiStamp Discord server and message Chris directly; oh, and let him know muckrAIkers sent you!
Links
Superintelligence & Commentary
Referenced Sources
LeCun on AGI
Other Sources
What happens when you bring over 15,000 machine learning nerds to one city? If your guess didn't include racism, sabotage and scandal, belated epiphanies, a spicy SoLaR panel, and many fantastic research papers, you wouldn't have captured my experience. In this episode we discuss the drama and takeaways from NeurIPS 2024.
Posters available at time of episode preparation can be found on the episode webpage.
EPISODE RECORDED 2024.12.22
Note: many workshop papers had not yet been published to arXiv as of preparing this episode, the OpenReview submission page is provided in these cases.
Referenced Sources
Improving Datasets, Benchmarks, and Measurements
Governance Related
Certified Bangers + Useful Tools
Fun Benchmarks/Datasets
Other Sources
The idea of model cards, which was introduced as a measure to increase transparency and understanding of LLMs, has been perverted into the marketing gimmick characterized by OpenAI's o1 system card. To demonstrate the adversarial stance we believe is necessary to draw meaning from these press-releases-in-disguise, we conduct a close read of the system card. Be warned, there's a lot of muck in this one.
Note: All figures/tables discussed in the podcast can be found on the podcast website at https://kairos.fm/muckraikers/e009/
Additional o1 Coverage
On Data Labelers
Chain-of-Thought Papers Cited
Other Mentioned/Relevant Sources
Unrelated Developments
While on the campaign trail, Trump made claims about repealing Biden's Executive Order on AI, but what will actually be changed when he gets into office? We take this opportunity to examine policies being discussed or implemented by leading governments around the world.
Non-USA
Taxes
Perplexity Pages
Other Sources
Multiple news outlets, including The Information, Bloomberg, and Reuters [see sources] are reporting an "end of scaling" for the current AI paradigm. In this episode we look into these articles, as well as a wide variety of economic forecasting, empirical analysis, and technical papers to understand the validity, and impact of these reports. We also use this as an opportunity to contextualize the realized versus promised fruits of "AI".
Empirical Analysis
Forecasting
Externalities and the Bursting Bubble
On Productization
Other Sources
October 2024 saw a National Security Memorandum and US framework for using AI in national security contexts. We go through the content so you don't have to, pull out the important bits, and summarize our main takeaways.
Related Media
Other Sources
Frontier developers continue their war on sane versioning schema to bring us Claude 3.5 Sonnet (New), along with "computer use" capabilities. We discuss not only the new model, but also why Anthropic may have released this model and tool combination now.
Links
Other Anthropic Relevant Media
Other Sources
Brace yourselves, winter is coming for OpenAI - atleast, that's what we think. In this episode we look at OpenAI's recent massive funding round and ask "why would anyone want to fund a company that is set to lose net 5 billion USD for 2024?" We scrape through a whole lot of muck to find the meaningful signals in all this news, and there is a lot of it, so get ready!
Links
More on AI Hype (Dying)
Other Sources
The Open Source AI Definition is out after years of drafting, will it reestablish brand meaning for the “Open Source” term? Also, the 2024 Nobel Prizes in Physics and Chemistry are heavily tied to AI; we scrutinize not only this year's prizes, but also Nobel Prizes as a concept.
Links
More Reading on Open Source AI
On Nobel Prizes
Other Sources
Why is Mark Ruffalo talking about SB1047, and what is it anyway? Tune in for our thoughts on the now vetoed California legislation that had Big Tech scared.
Additional SB1047 Related Coverage
Other Sources
OpenAI's new model is out, and we are going to have to rake through a lot of muck to get the value out of this one!
⚠ Opt out of LinkedIn's GenAI scraping ➡️ https://lnkd.in/epziUeTi
Links relating to o1
Other stuff we mention