Simon Willison’s Weblog

Subscribe

Items tagged computervision in Apr, 2023

Filters: Year: 2023 × Month: Apr × computervision × Sorted by date


Trainbot (via) “Trainbot watches a piece of train track, detects passing trains, and stitches together images of them”—check out the site itself too, which shows beautifully stitched panoramas of trains that have recently passed near Jo M’s apartment. Found via the best Hacker News thread I’ve seen in years, “Ask HN: Most interesting tech you built for just yourself?”. # 28th April 2023, 2:24 pm

LLaVA: Large Language and Vision Assistant (via) Yet another multi-modal model combining a vision model (pre-trained CLIP ViT-L/14) and a LLaMA derivative model (Vicuna). The results I get from their demo are even more impressive than MiniGPT-4. Also includes a new training dataset, LLaVA-Instruct-150K, derived from GPT-4 and subject to the same warnings about the OpenAI terms of service. # 19th April 2023, 1:14 am

MiniGPT-4 (via) An incredible project with a poorly chosen name. A team from King Abdullah University of Science and Technology in Saudi Arabia combined Vicuna-13B (a model fine-tuned on top of Facebook’s LLaMA) with the BLIP-2 vision-language model to create a model that can conduct ChatGPT-style conversations around an uploaded image. The demo is very impressive, and the weights are available to download—45MB for MiniGPT-4, but you’ll need the much larger Vicuna and LLaMA weights as well. # 17th April 2023, 2:21 pm

Types

Years

Months

Tags