Anthropic cookbook: multimodal. I'm currently on the lookout for high quality sources of information about vision LLMs, including prompting tricks for getting the most out of them.
This set of Jupyter notebooks from Anthropic (published four months ago to accompany the original Claude 3 models) is the best I've found so far. Best practices for using vision with Claude includes advice on multi-shot prompting with example, plus this interesting think step-by-step style prompt for improving Claude's ability to count the dogs in an image:
You have perfect vision and pay great attention to detail which makes you an expert at counting objects in images. How many dogs are in this picture? Before providing the answer in
<answer>tags, think step by step in<thinking>tags and analyze every part of the image.
Recent articles
- Adding dynamic features to an aggressively cached website - 28th January 2026
- ChatGPT Containers can now run bash, pip/npm install packages, and download files - 26th January 2026
- Wilson Lin on FastRender: a browser built by thousands of parallel agents - 23rd January 2026