Vision Language Models (Better, Faster, Stronger) (via) Extremely useful review of the last year in vision and multi-modal LLMs.
So much has happened! I'm particularly excited about the range of small open weight vision models that are now available. Models like gemma3-4b-it and Qwen2.5-VL-3B-Instruct produce very impressive results and run happily on mid-range consumer hardware.
Recent articles
- Nano Banana Pro aka gemini-3-pro-image-preview is the best available image generation model - 20th November 2025
- How I automate my Substack newsletter with content from my blog - 19th November 2025
- Trying out Gemini 3 Pro with audio transcription and a new pelican benchmark - 18th November 2025