Anthropic cookbook: multimodal. I'm currently on the lookout for high quality sources of information about vision LLMs, including prompting tricks for getting the most out of them.
This set of Jupyter notebooks from Anthropic (published four months ago to accompany the original Claude 3 models) is the best I've found so far. Best practices for using vision with Claude includes advice on multi-shot prompting with example, plus this interesting think step-by-step style prompt for improving Claude's ability to count the dogs in an image:
You have perfect vision and pay great attention to detail which makes you an expert at counting objects in images. How many dogs are in this picture? Before providing the answer in
<answer>
tags, think step by step in<thinking>
tags and analyze every part of the image.
Recent articles
- My AI/LLM predictions for the next 1, 3 and 6 years, for Oxide and Friends - 10th January 2025
- Weeknotes: Starting 2025 a little slow - 4th January 2025
- I still don't think companies serve you ads based on spying through your microphone - 2nd January 2025