Yeah, unfortunately vision prompting has been a tough nut to crack. We've found it's very challenging to improve Claude's actual "vision" through just text prompts, but we can of course improve its reasoning and thought process once it extracts info from an image.
In general, I think vision is still in its early days, although 3.5 Sonnet is noticeably better than older models.
— Alex Albert, Anthropic
Recent articles
- Notes on the new Claude analysis JavaScript code execution tool - 24th October 2024
- Initial explorations of Anthropic's new Computer Use capability - 22nd October 2024
- Everything I built with Claude Artifacts this week - 21st October 2024