Simon Willison’s Weblog

Subscribe

Blogmarks tagged googleio, homebrewllms in 2024

Filters: Type: blogmark × Year: 2024 × googleio × homebrewllms × Sorted by date


PaliGemma model README (via) One of the more over-looked announcements from Google I/O yesterday was PaliGemma, an openly licensed VLM (Vision Language Model) in the Gemma family of models.

The model accepts an image and a text prompt. It outputs text, but that text can include special tokens representing regions on the image. This means it can return both bounding boxes and fuzzier segment outlines of detected objects, behavior that can be triggered using a prompt such as "segment puffins".

You can try it out on Hugging Face.

It's a 3B model, making it feasible to run on consumer hardware. # 15th May 2024, 9:16 pm

Types

Years

Months

Tags