DeepSeek-OCR: Contexts Optical Compression

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/92/f0/ad/92f0adf4-2b10-a63c-bc79-1889b710b139/mza_6601485165628379978.jpg/600x600bb.jpg

AI: post transformers

mcgrof

340 episodes

1 day ago

The transformer architecture revolutionized the world of Neural Networks. It was a springboard for what we know today as modern artificial intelligence. This podcast focuses on modern state of the art research paper reviews starting from the transformer and on.

Technology

RSS

All content for AI: post transformers is the property of mcgrof and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/44199026/44199026-1754490757264-4f84f1d34e94a.jpg

DeepSeek-OCR: Contexts Optical Compression

AI: post transformers

15 minutes 8 seconds

1 month ago

DeepSeek-OCR: Contexts Optical Compression

The October 21, 2025 Deepseek paper introduces **DeepSeek-OCR**, a Vision-Language Model (VLM) designed to investigate the feasibility of **contexts optical compression** for managing long contexts in Large Language Models (LLMs). This two-component model utilizes **DeepEncoder** to efficiently convert high-resolution text images into a manageable number of **vision tokens**, and a DeepSeek3B-MoE decoder for text reconstruction (Optical Character Recognition, or OCR). Experiments on the Fox benchmark demonstrate that DeepSeek-OCR can achieve approximately **97% decoding precision** at a **10× text compression ratio**, indicating that visual modality offers a promising avenue for efficiently compressing large amounts of text. Beyond serving as a research tool for exploring vision-text compression and memory-forgetting mechanisms, the model also exhibits strong practical performance, achieving state-of-the-art results on the OmniDocBench while requiring **fewer vision tokens** than comparable models. The architecture and training methodology are detailed, highlighting its potential for applications like high-throughput data generation for LLMs and VLMs.

Source:

https://arxiv.org/pdf/2510.18234