Yeah, unfortunately vision prompting has been a tough nut to crack. We've found it's very challenging to improve Claude's actual "vision" through just text prompts, but we can of course improve its reasoning and thought process once it extracts info from an image.
In general, I think vision is still in its early days, although 3.5 Sonnet is noticeably better than older models.
— Alex Albert, Anthropic
Recent articles
- Grok: searching X for "from:elonmusk (Israel OR Palestine OR Hamas OR Gaza)" - 11th July 2025
- Phoenix.new is Fly's entry into the prompt-driven app development space - 23rd June 2025
- Trying out the new Gemini 2.5 model family - 17th June 2025