Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
History
Technology
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts221/v4/12/b2/1d/12b21d77-05e4-113a-59f1-74e7cc4f2771/mza_11943161808051384234.jpg/600x600bb.jpg
Deep Dive in Research
NotebookLM
17 episodes
2 weeks ago
Discussion about interesting research papers
Show more...
Technology
RSS
All content for Deep Dive in Research is the property of NotebookLM and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
Discussion about interesting research papers
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/39551831/39551831-1728103572572-7b52b76d15834.jpg
The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix
Deep Dive in Research
7 minutes 1 second
1 month ago
The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix

Today's podcast is based on an article from Hugging Face detailing an extensive research project that addresses the high cost and scale of training modern large language models. The authors, through over 50 systematic experiments, sought to find an optimal data mixing strategy that would allow a GPT-2 model to achieve comparable performance to models trained on ten times the data. Their central finding is that a static dataset mix of 50% finePDFs, 30% DCLM-baseline, and 20% FineWeb-Edu significantly outperforms more complex curriculum learning approaches, which often led to catastrophic forgetting or overfitting. This optimal 50-30-20 mixture successfully trained a GPT-2-70M model that achieved over 90% of the original GPT-2's benchmark performance while using substantially fewer resources. The key takeaway is that dataset quality and intelligent composition are more critical than sheer quantity for training efficient language models.


Read the full article on https://huggingface.co/blog/codelion/optimal-dataset-mixing

Deep Dive in Research
Discussion about interesting research papers