Simon Willison’s Weblog

Subscribe

6 items tagged “puppeteer”

2024

Jina AI Reader. Jina AI provide a number of different AI-related platform products, including an excellent family of embedding models, but one of their most instantly useful is Jina Reader, an API for turning any URL into Markdown content suitable for piping into an LLM.

Add r.jina.ai to the front of a URL to get back Markdown of that page, for example https://r.jina.ai/https://simonwillison.net/2024/Jun/16/jina-ai-reader/ - in addition to converting the content to Markdown it also does a decent job of extracting just the content and ignoring the surrounding navigation.

The API is free but rate-limited (presumably by IP) to 20 requests per minute without an API key or 200 request per minute with a free API key, and you can pay to increase your allowance beyond that.

The Apache 2 licensed source code for the hosted service is on GitHub - it's written in TypeScript and uses Puppeteer to run Readabiliy.js and Turndown against the scraped page.

It can also handle PDFs, which have their contents extracted using PDF.js.

There's also a search feature, s.jina.ai/search+term+goes+here, which uses the Brave Search API.

# 16th June 2024, 7:33 pm / apis, markdown, ai, puppeteer, llms

2022

shot-scraper: automated screenshots for documentation, built on Playwright

Visit shot-scraper: automated screenshots for documentation, built on Playwright

shot-scraper is a new tool that I’ve built to help automate the process of keeping screenshots up-to-date in my documentation. It also doubles as a scraping tool—hence the name—which I picked as a complement to my git scraping and help scraping techniques.

[... 1,802 words]

2021

A framework for building Open Graph images. GitHub’s new social preview images are generated by a Node.js script that fetches data from their GraphQL API, generates an HTML version of the card and then grabs a PNG snapshot of it using Puppeteer. It takes an average of 280ms to serve an image and generates around 2 million unique images a day. Interestingly, they found that bumping the available RAM from 512MB up to 513MB had a big effect on performance, because Chromium detects devices on 512MB or less and switches some processes from parallel to sequential.

# 22nd June 2021, 9:25 pm / github, nodejs, puppeteer

2020

Animating a commit based Sudoku game using Puppeteer (via) This is really clever. There’s a GitHub repo that tracks progress in a game of Sudoku: Anish Karandikar wrote code which iterates through the game board state commit by commit, uses that state to generate an HTML table, passes that table to Puppeteer using a data: URI, renders a PNG of each stage and then concatenates those PNGs together into an animated GIF using the gifencoder Node.js library.

# 9th October 2020, 10:28 pm / datauri, gifs, puppeteer

Weeknotes: airtable-export, generating screenshots in GitHub Actions, Dogsheep!

This week I figured out how to populate Datasette from Airtable, wrote code to generate social media preview card page screenshots using Puppeteer, and made a big breakthrough with my Dogsheep project.

[... 1,461 words]

html-to-svg (via) ‪This is absolutely ingenious: 50 lines of JavaScript which uses Puppeteer to get headless Chrome to grab a PDF screenshot of a page, then shells out to Inkscape to convert the PDF to SVG. Wraps the whole thing up in a Docker container and ships it to Cloud Run as a web service you can call by passing it a URL.

# 7th May 2020, 6:01 am / chrome, svg, cloudrun, puppeteer