Home
Categories
EXPLORE
True Crime
Comedy
Business
Society & Culture
Sports
TV & Film
History
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/41/5f/f4/415ff42b-f3e4-62d7-e017-6ccd1fe8b935/mza_9584984547342647834.jpg/600x600bb.jpg
Deep Papers
Arize AI
59 episodes
2 weeks ago
We dive into the latest paper from Google and a team of academic researchers: "TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture." Hear from one of the paper's authors — Yongchao Chen, Research Scientist — walks through the research and its implications. The paper proposes Tool-Use Mixture (TUMIX), an ensemble framework that runs multiple agents in parallel, each employing distinct tool-use strategies and answer paths. Agents in TUMIX iteratively share and refine responses ba...
Show more...
Mathematics
Technology,
Business,
Science
RSS
All content for Deep Papers is the property of Arize AI and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
We dive into the latest paper from Google and a team of academic researchers: "TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture." Hear from one of the paper's authors — Yongchao Chen, Research Scientist — walks through the research and its implications. The paper proposes Tool-Use Mixture (TUMIX), an ensemble framework that runs multiple agents in parallel, each employing distinct tool-use strategies and answer paths. Agents in TUMIX iteratively share and refine responses ba...
Show more...
Mathematics
Technology,
Business,
Science
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/41/5f/f4/415ff42b-f3e4-62d7-e017-6ccd1fe8b935/mza_9584984547342647834.jpg/600x600bb.jpg
LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection
Deep Papers
27 minutes
7 months ago
LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection
For this week's paper read, we actually dive into our own research. We wanted to create a replicable, evolving dataset that can keep pace with model training so that you always know you're testing with data your model has never seen before. We also saw the prohibitively high cost of running LLM evals at scale, and have used our data to fine-tune a series of SLMs that perform just as well as their base LLM counterparts, but at 1/10 the cost. So, over the past few weeks, the Arize team ...
Deep Papers
We dive into the latest paper from Google and a team of academic researchers: "TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture." Hear from one of the paper's authors — Yongchao Chen, Research Scientist — walks through the research and its implications. The paper proposes Tool-Use Mixture (TUMIX), an ensemble framework that runs multiple agents in parallel, each employing distinct tool-use strategies and answer paths. Agents in TUMIX iteratively share and refine responses ba...