AMD: Instella: Fully Open Language Models with Stellar Performance

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/92/f0/ad/92f0adf4-2b10-a63c-bc79-1889b710b139/mza_6601485165628379978.jpg/600x600bb.jpg

AI: post transformers

mcgrof

316 episodes

1 day ago

The transformer architecture revolutionized the world of Neural Networks. It was a springboard for what we know today as modern artificial intelligence. This podcast focuses on modern state of the art research paper reviews starting from the transformer and on.

Technology

RSS

All content for AI: post transformers is the property of mcgrof and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/44199026/44199026-1754490757264-4f84f1d34e94a.jpg

AMD: Instella: Fully Open Language Models with Stellar Performance

AI: post transformers

10 minutes 15 seconds

4 days ago

AMD: Instella: Fully Open Language Models with Stellar Performance

The November 13, 2025 paper by AMD introducs **Instella**, a new family of **fully open-source** three-billion-parameter large language models (LLMs) developed by AMD and powered by their Instinct MI300X GPUs. The central focus is on advancing transparency and reproducibility in LLMs by releasing not only the model weights but also the **complete training pipeline, datasets, and optimization details**. Instella achieves **state-of-the-art performance** among fully open models of its size, remaining competitive with leading open-weight counterparts despite using fewer pre-training tokens. The family includes specialized variants: **Instella-Long**, which supports a 128K token context length, and **Instella-Math**, a reasoning-centric model enhanced through specialized supervised fine-tuning and reinforcement learning. The document details the two-stage pre-training, post-training, and specific methods used to create the Long and Math versions, demonstrating that **openness does not compromise performance**.

Source:

https://arxiv.org/pdf/2511.10628