Ep.10 Are benchmarks broken?

EXPLORE

Society & Culture

© 2024 PodJoint

00:00 / 00:00

Sign in

or

Don't have an account?

Sign up

Forgot password

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/8e/a3/60/8ea3605a-62cc-2e99-7212-c958c89bc059/mza_5922033384984611785.jpg/600x600bb.jpg

Medical Attention

Medical Attention

18 episodes

1 month ago

Show more...

Health & Fitness

All content for Medical Attention is the property of Medical Attention and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Show more...

Health & Fitness

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/8e/a3/60/8ea3605a-62cc-2e99-7212-c958c89bc059/mza_5922033384984611785.jpg/600x600bb.jpg

Ep.10 Are benchmarks broken?

Medical Attention

56 minutes 53 seconds

6 months ago

Ep.10 Are benchmarks broken?

In this episode, we’re lucky to be joined by Alexandre Sallinen and Tony O’Halloran from the Laboratory for Intelligent Global Health & Humanitarian Response Technologies to discuss how large language models are assessed, including their Massive Open Online Validation & Evaluation (MOOVE) initiative. 0:25 - Technical wrap: what are agents? 13:20 - What are benchmarks? 18:20 - Automated evaluation 20:10 - Benchmarks 37:45 - Human feedback 44:50 - LLM as judge Read more about the projects we discuss here: Meditron Learn about the MOOVE or contact our team if you'd like to be involved Listen to the LiGHTCAST including their recent excellent outline of the HealthBench paper More details in the show notes on our website. Episodes | Bluesky | info@medicalattention.ai

Medical Attention