Jina AI Reader. Jina AI provide a number of different AI-related platform products, including an excellent family of embedding models, but one of their most instantly useful is Jina Reader, an API for turning any URL into Markdown content suitable for piping into an LLM.
Add r.jina.ai
to the front of a URL to get back Markdown of that page, for example https://r.jina.ai/https://simonwillison.net/2024/Jun/16/jina-ai-reader/ - in addition to converting the content to Markdown it also does a decent job of extracting just the content and ignoring the surrounding navigation.
The API is free but rate-limited (presumably by IP) to 20 requests per minute without an API key or 200 request per minute with a free API key, and you can pay to increase your allowance beyond that.
The Apache 2 licensed source code for the hosted service is on GitHub - it's written in TypeScript and uses Puppeteer to run Readabiliy.js and Turndown against the scraped page.
It can also handle PDFs, which have their contents extracted using PDF.js.
There's also a search feature, s.jina.ai/search+term+goes+here
, which uses the Brave Search API.
Recent articles
- Open challenges for AI engineering - 27th June 2024
- Building search-based RAG using Claude, Datasette and Val Town - 21st June 2024
- Weeknotes: Datasette Studio and a whole lot of blogging - 19th June 2024
- Language models on the command-line - 17th June 2024
- A homepage redesign for my blog's 22nd birthday - 12th June 2024
- Thoughts on the WWDC 2024 keynote on Apple Intelligence - 10th June 2024