Bolmo: Byteifying the Next Generation of Language Models

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/f2/56/51/f256516c-7ca0-a1e0-095d-98b42a505a34/mza_2950839120930297173.jpg/600x600bb.jpg

Best AI papers explained

Enoch H. Kang

609 episodes

1 day ago

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.

Technology

RSS

All content for Best AI papers explained is the property of Enoch H. Kang and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_episode/43252366/43252366-1766117714735-a651d67d73ea4.jpg

Bolmo: Byteifying the Next Generation of Language Models

Best AI papers explained

13 minutes 13 seconds

2 weeks ago

Bolmo: Byteifying the Next Generation of Language Models

We discuss Bolmo, a groundbreaking family of byte-level language models by AI2 that offers a practical alternative to traditional subword-based tokenization. Developed by the Allen Institute for AI and collaborating universities, these models achieve state-of-the-art performance by "byteifying" existing subword models like OLMo. This innovative process uses a specialized two-stage distillation procedure to convert subword models into byte-level ones using less than 1% of the original pretraining budget. Architecturally, Bolmo features a non-causal boundary predictor and local mLSTM layers to resolve efficiency and character-understanding limitations inherent in previous systems. The research demonstrates that Bolmo effectively matches or exceeds the performance of its source models in coding and character-based tasks. Furthermore, the authors show that Bolmo can be further optimized for speed and easily post-trained using existing subword ecosystems via task arithmetic.