LLaVA: Large Language and Vision Assistant (via) Yet another multi-modal model combining a vision model (pre-trained CLIP ViT-L/14) and a LLaMA derivative model (Vicuna). The results I get from their demo are even more impressive than MiniGPT-4. Also includes a new training dataset, LLaVA-Instruct-150K, derived from GPT-4 and subject to the same warnings about the OpenAI terms of service.
Recent articles
- A new SQL-powered permissions system in Datasette 1.0a20 - 4th November 2025
- New prompt injection papers: Agents Rule of Two and The Attacker Moves Second - 2nd November 2025
- Hacking the WiFi-enabled color screen GitHub Universe conference badge - 28th October 2025