Running prompts against images and PDFs with Google Gemini. New TIL. I've been experimenting with the Google Gemini APIs for running prompts against images and PDFs (in preparation for finally adding multi-modal support to LLM) - here are my notes on how to send images or PDF files to their API using curl
and the base64 -i
macOS command.
I figured out the curl
incantation first and then got Claude to build me a Bash script that I can execute like this:
prompt-gemini 'extract text' example-handwriting.jpg
Playing with this is really fun. The Gemini models charge less than 1/10th of a cent per image, so it's really inexpensive to try them out.
Recent articles
- New audio models from OpenAI, but how much can we rely on them? - 20th March 2025
- Calling a wrap on my weeknotes - 20th March 2025
- Not all AI-assisted programming is vibe coding (but vibe coding rocks) - 19th March 2025