Anthropic cookbook: multimodal. I'm currently on the lookout for high quality sources of information about vision LLMs, including prompting tricks for getting the most out of them.
This set of Jupyter notebooks from Anthropic (published four months ago to accompany the original Claude 3 models) is the best I've found so far. Best practices for using vision with Claude includes advice on multi-shot prompting with example, plus this interesting think step-by-step style prompt for improving Claude's ability to count the dogs in an image:
You have perfect vision and pay great attention to detail which makes you an expert at counting objects in images. How many dogs are in this picture? Before providing the answer in
<answer>
tags, think step by step in<thinking>
tags and analyze every part of the image.
Recent articles
- Notes on the new Claude analysis JavaScript code execution tool - 24th October 2024
- Initial explorations of Anthropic's new Computer Use capability - 22nd October 2024
- Everything I built with Claude Artifacts this week - 21st October 2024