Simon Willison’s Weblog

Subscribe

Items tagged homebrewllms in May, 2024

Filters: Year: 2024 × Month: May × homebrewllms × Sorted by date


PaliGemma model README (via) One of the more over-looked announcements from Google I/O yesterday was PaliGemma, an openly licensed VLM (Vision Language Model) in the Gemma family of models.

The model accepts an image and a text prompt. It outputs text, but that text can include special tokens representing regions on the image. This means it can return both bounding boxes and fuzzier segment outlines of detected objects, behavior that can be triggered using a prompt such as "segment puffins".

You can try it out on Hugging Face.

It's a 3B model, making it feasible to run on consumer hardware. # 15th May 2024, 9:16 pm

experimental-phi3-webgpu (via) Run Microsoft’s excellent Phi-3 model directly in your browser, using WebGPU so didn’t work in Firefox for me, just in Chrome.

It fetches around 2.1GB of data into the browser cache on first run, but then gave me decent quality responses to my prompts running at an impressive 21 tokens a second (M2, 64GB).

I think Phi-3 is the highest quality model of this size, so it’s a really good fit for running in a browser like this. # 9th May 2024, 10:21 pm

Types

Years

Months

Tags