Home
Categories
EXPLORE
True Crime
Comedy
Society & Culture
Business
Sports
History
Technology
About Us
Contact Us
Copyright
© 2024 PodJoint
00:00 / 00:00
Sign in

or

Don't have an account?
Sign up
Forgot password
https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/92/f0/ad/92f0adf4-2b10-a63c-bc79-1889b710b139/mza_6601485165628379978.jpg/600x600bb.jpg
AI: post transformers
mcgrof
340 episodes
1 day ago
The transformer architecture revolutionized the world of Neural Networks. It was a springboard for what we know today as modern artificial intelligence. This podcast focuses on modern state of the art research paper reviews starting from the transformer and on.
Show more...
Technology
RSS
All content for AI: post transformers is the property of mcgrof and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.
The transformer architecture revolutionized the world of Neural Networks. It was a springboard for what we know today as modern artificial intelligence. This podcast focuses on modern state of the art research paper reviews starting from the transformer and on.
Show more...
Technology
https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/44199026/44199026-1754490757264-4f84f1d34e94a.jpg
DeepSeek-OCR: Contexts Optical Compression
AI: post transformers
15 minutes 8 seconds
1 month ago
DeepSeek-OCR: Contexts Optical Compression

The October 21, 2025 Deepseek paper introduces **DeepSeek-OCR**, a Vision-Language Model (VLM) designed to investigate the feasibility of **contexts optical compression** for managing long contexts in Large Language Models (LLMs). This two-component model utilizes **DeepEncoder** to efficiently convert high-resolution text images into a manageable number of **vision tokens**, and a DeepSeek3B-MoE decoder for text reconstruction (Optical Character Recognition, or OCR). Experiments on the Fox benchmark demonstrate that DeepSeek-OCR can achieve approximately **97% decoding precision** at a **10× text compression ratio**, indicating that visual modality offers a promising avenue for efficiently compressing large amounts of text. Beyond serving as a research tool for exploring vision-text compression and memory-forgetting mechanisms, the model also exhibits strong practical performance, achieving state-of-the-art results on the OmniDocBench while requiring **fewer vision tokens** than comparable models. The architecture and training methodology are detailed, highlighting its potential for applications like high-throughput data generation for LLMs and VLMs.


Source:

https://arxiv.org/pdf/2510.18234

AI: post transformers
The transformer architecture revolutionized the world of Neural Networks. It was a springboard for what we know today as modern artificial intelligence. This podcast focuses on modern state of the art research paper reviews starting from the transformer and on.