Introducing the Techsplainers by IBM podcast, your new podcast for quick, powerful takes on today’s most important AI and tech topics. Each episode brings you bite-sized learning designed to fit your day, whether you’re driving, exercising, or just curious for something new.
This is just the beginning. Tune in every weekday at 6 AM ET for fresh insights, new voices, and smarter learning.
Introducing the Techsplainers by IBM podcast, your new podcast for quick, powerful takes on today’s most important AI and tech topics. Each episode brings you bite-sized learning designed to fit your day, whether you’re driving, exercising, or just curious for something new.
This is just the beginning. Tune in every weekday at 6 AM ET for fresh insights, new voices, and smarter learning.

This episode of Techsplainers explores vision language models (VLMs), the sophisticated AI systems that bridge computer vision and natural language processing. We examine how these multimodal models understand relationships between images and text, allowing them to generate image descriptions, answer visual questions, and even create images from text prompts. The podcast dissects the architecture of VLMs, explaining the critical components of vision encoders (which process visual information into vector embeddings) and language encoders (which interpret textual data). We delve into training strategies, including contrastive learning methods like CLIP, masking techniques, generative approaches, and transfer learning from pretrained models. The discussion highlights real-world applications—from image captioning and generation to visual search, image segmentation, and object detection—while showcasing leading models like DeepSeek-VL2, Google's Gemini 2.0, OpenAI's GPT-4o, Meta's Llama 3.2, and NVIDIA's NVLM. Finally, we address implementation challenges similar to traditional LLMs, including data bias, computational complexity, and the risk of hallucinations.
Find more information at https://www.ibm.com/think/podcasts/techsplainers
Narrated by Amanda Downie