Simon Willison’s Weblog

Subscribe

Friday, 7th June 2024

A Picture is Worth 170 Tokens: How Does GPT-4o Encode Images? (via) Oran Looney dives into the question of how GPT-4o tokenizes images - an image "costs" just 170 tokens, despite being able to include more text than could be encoded in that many tokens by the standard tokenizer.

There are some really neat tricks in here. I particularly like the experimental validation section where Oran creates 5x5 (and larger) grids of coloured icons and asks GPT-4o to return a JSON matrix of icon descriptions. This works perfectly at 5x5, gets 38/49 for 7x7 and completely fails at 13x13.

I'm not convinced by the idea that GPT-4o runs standard OCR such as Tesseract to enhance its ability to interpret text, but I would love to understand more about how this all works. I imagine a lot can be learned from looking at how openly licensed vision models such as LLaVA work, but I've not tried to understand that myself yet. # 11:30 pm

LLM bullshit knife, to cut through bs

RAG -> Provide relevant context
Agentic -> Function calls that work
CoT -> Prompt model to think/plan
FewShot -> Add examples
PromptEng -> Someone w/good written comm skills.
Prompt Optimizer -> For loop to find best examples.

Hamel Husain # 6:02 pm

Update on the Recall preview feature for Copilot+ PCs (via) This feels like a very good call to me: in response to widespread criticism Microsoft are making Recall an opt-in feature (during system onboarding), adding encryption to the database and search index beyond just disk encryption and requiring Windows Hello face scanning to access the search feature. # 5:30 pm

In fact, Microsoft goes so far as to promise that it cannot see the data collected by Windows Recall, that it can’t train any of its AI models on your data, and that it definitely can’t sell that data to advertisers. All of this is true, but that doesn’t mean people believe Microsoft when it says these things. In fact, many have jumped to the conclusion that even if it’s true today, it won’t be true in the future.

Zac Bowden # 5:23 pm

2024 » June

MTWTFSS
     12
3456789
10111213141516
17181920212223
24252627282930