Simon Willison’s Weblog

Subscribe

June 2022

June 5, 2022

How to use the GPT-3 language model

Visit How to use the GPT-3 language model

I ran a Twitter poll the other day asking if people had tried GPT-3 and why or why not. The winning option, by quite a long way, was “No, I don’t know how to”. So here’s how to try it out, for free, without needing to write any code.

[... 838 words]

June 7, 2022

Weeknotes: Datasette Cloud ready to preview

I made an absolute ton of progress building Datasette Cloud on Fly this week, and also had a bunch of fun playing with GPT-3.

[... 370 words]

June 8, 2022

Announcing Pyston-lite: our Python JIT as an extension module (via) The Pyston JIT can now be installed in any Python 3.8 virtual environment by running “pip install pyston_lite_autoload”—which includes a hook to automatically inject the JIT. I just tried a very rough benchmark against Datasette (ab -n 1000 -c 10) and got 391.20 requests/second without the JIT compared to 404.10 request/second with it.

# 5:58 pm / performance, jit, python

The End of Localhost. swyx makes the argument for cloud-based development environments, and points out that many large companies—including Google, Facebook, Shopify and GitHub—have made the move already. I was responsible for the team maintaining the local development environment experience at Eventbrite for a while, and my conclusion is that with a large enough engineering team someone will ALWAYS find a new way to break their local environment: the idea of being able to bootstrap a fresh, guaranteed-to-work environment in the cloud at the click of a button could save SO much time and money.

# 6:09 pm / software-engineering, swyx

June 12, 2022

A tiny web app to create images from OpenStreetMap maps

Visit A tiny web app to create images from OpenStreetMap maps

Earlier today I found myself wanting to programmatically generate some images of maps.

[... 1,388 words]

Twenty years of my blog

Visit Twenty years of my blog

I started this blog on June 12th 2002—twenty years ago today! To celebrate two decades of blogging, I decided to pull together some highlights and dive down a self-indulgent nostalgia hole.

[... 4,455 words]

June 13, 2022

Making Code Faster. Tim Bray’s detailed guide to using the Go profiler.

# 7:40 pm / profiling, go, tim-bray

June 18, 2022

Becoming a good engineer is about collecting experience. Each project, even small ones, is a chance to add new techniques and tools to your toolbox. Where this delivers even more value is when you can solve problems by pairing techniques learned on one project with tools learned working on another. It all adds up.

Addy Osmani

# 9:21 pm / careers, addy-osmani

June 19, 2022

Weeknotes: datasette-socrata, and the last 10%...

... takes 90% of the work. I continue to work towards a preview of the new Datasette Cloud, and keep finding new “just one more things” to delay inviting in users.

[... 1,214 words]

WarcDB (via) Florents Tselai built this tool for loading web crawl data stored in WARC (Web ARChive) format into a SQLite database for smaller-scale analysis with SQL, on top of my sqlite-utils Python library.

# 6:08 pm / archiving, sqlite-utils, sqlite

June 20, 2022

The State of WebAssembly 2022. Colin Eberhardt talks through the results of the State of WebAssembly 2022 survey. Rust continues to dominate as the most popular language for working to WebAssembly, but Python has a notable increase of interest.

# 6:07 pm / rust, webassembly

Joining CSV files in your browser using Datasette Lite

Visit Joining CSV files in your browser using Datasette Lite

I added a new feature to Datasette Lite—my version of Datasette that runs entirely in your browser using WebAssembly (previously): you can now use it to load one or more CSV files by URL, and then run SQL queries against them—including joins across data from multiple files.

[... 546 words]

June 23, 2022

How Imagen Actually Works. Imagen is Google’s new text-to-image model, similar to (but possibly even more effective than) DALL-E. This article is the clearest explanation I’ve seen of how Imagen works: it uses Google’s existing T5 text encoder to convert the input sentence into an encoding that captures the semantic meaning of the sentence (including things like items being described as being on top of other items), then uses a trained diffusion model to generate a 64x64 image. That image is passed through two super-res models to increase the resolution to the final 1024x1024 output.

# 6:05 pm / machine-learning, ai, google, dalle, generative-ai

First impressions of DALL-E, generating images from text

Visit First impressions of DALL-E, generating images from text

I made it off the DALL-E waiting list a few days ago and I’ve been having an enormous amount of fun experimenting with it. Here are some notes on what I’ve learned so far (and a bunch of example images too).

[... 2,102 words]

June 28, 2022

The Magic Interview Question (via) Jeff Gothelf explains why “Tell me about the last time you [did something]” is the most valuable question you can ask when interviewing a user or potential user.

# 2:26 pm / usability

The general idea of an “Islands” architecture is deceptively simple: render HTML pages on the server, and inject placeholders or slots around highly dynamic regions. These placeholders/slots contain the server-rendered HTML output from their corresponding widget. They denote regions that can then be "hydrated" on the client into small self-contained widgets, reusing their server-rendered initial HTML.

Jason Miller

# 3:01 pm / javascript

June 30, 2022

s3-ocr: Extract text from PDF files stored in an S3 bucket

Visit s3-ocr: Extract text from PDF files stored in an S3 bucket

I’ve released s3-ocr, a new tool that runs Amazon’s Textract OCR text extraction against PDF files in an S3 bucket, then writes the resulting text out to a SQLite database with full-text search configured so you can run searches against the extracted data.

[... 1,493 words]

2022 » June

MTWTFSS
  12345
6789101112
13141516171819
20212223242526
27282930