Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

Chris Amico’s Python setup for 2022 (via) Homebrew to install pyenv, then pyenv to install specific Python versions. pipx and pipenv for package management. I need to habitually start using pyenv for everything.

# 18th October 2022, 2:04 pm / python, chris-amico

“You are GPT-3”. Genius piece of prompt design by Riley Goodside. “A long-form GPT-3 prompt for assisted question-answering with accurate arithmetic, string operations, and Wikipedia lookup. Generated IPython commands (in green) are pasted into IPython and output is pasted back into the prompt (no green).” Uses “Out[” as a stop sequence to ensure GPT-3 stops at each generated iPython prompt rather than inventing the output itself.

# 17th October 2022, 4:35 am / gpt-3, prompt-engineering, generative-ai, riley-goodside, llms

Half Moon Bay Pumpkin Festival traffic on Saturday 15th October 2022 (via) It’s the Half Moon Bay Pumpkin Festival this weekend... and its impact on the traffic between our little town of El Granada and Half Moon Bay—8 minutes drive away—is notorious. So I built a git scraper that archives estimated driving times from the Google Maps Navigation API, and used git-history to turn that scraped data into a SQLite database and visualize it on a chart.

# 16th October 2022, 3:56 am / projects, git-scraping, git-history, half-moon-bay

How to create a Python package in 2022 (via) Fantastic tutorial on modern Python packaging by Rodrigo Girão Serrão. I’ve been meaning to figure out Poetry for a while now and this gave me exactly the information I needed to start figuring it out. Great coverage of GitHub Actions, Tox and pre-commit as well.

# 15th October 2022, 10:10 pm / packaging, python, github-actions

Dumping the HTML of a page using shot-scraper. New in 1.0 is the “shot-scraper html URL” command, which outputs the HTML of a page once JavaScript has finished executing there. You can pass in additional custom JavaScript to run before the shapshot is taken, and you can also specify a CSS selector on the page to return just that fragment of HTML.

# 15th October 2022, 9:30 pm / shot-scraper

shot-scraper 1.0 (via) Only a minor release in terms of features, but I decided that I'm comfortable enough with the CLI design at this point that I'm ready to stamp a 1.0 on it and commit to not making backwards-incompatible changes (at least without shipping a 2.0 release, which I'd like to avoid if possible).

Full release notes:

# 15th October 2022, 9:28 pm / cli, projects, shot-scraper

How to implement a “dry run mode” for data imports in Django (via) Adam Johnson describes in detail a beautiful pattern for implementing a dry-run mode for a Django management command, by executing ORM calls inside an atomic() transaction block, showing a summary of changes that are made and then rolling the transaction back at the end.

# 13th October 2022, 4:22 pm / django, transactions, adam-johnson

The AI that creates any picture you want, explained. Vox made this explainer video about text-to-image generative AI models back in June, months before Stable Diffusion was released and shortly before the DALL-E preview started rolling out to a wider audience. It’s a really good video—in particular the animation that explains at a high level how diffusion models work, which starts about 5m30s in.

# 10th October 2022, 3:28 am / ai, dalle, stable-diffusion, generative-ai, text-to-image

Reasons Why I Think 50% Coding 50% Marketing is the Best Framework for Solo Tech Founders (via) Jon Yongfook offers a deliciously simple recipe for splitting up the work of both developing and marketing a product: one week of development, then one week of marketing, then repeat. I really like this concept: I mix the two activities randomly at the moment and constantly find myself feeling guilty that I’m not spending enough focused time on either of them!

# 8th October 2022, 3:43 pm / entrepreneurship, marketing, startup

Can :has Connect 4? (via) Spectacular CSS demo by Jhey Tompkins, implementing a working 3D Connect 4 game using just CSS (brilliant trickery with the new :has() selector) and not a single line of JavaScript.

# 7th October 2022, 5:49 pm / css

Stringing together several free tiers to host an application with zero cost using fly.io, Litestream and Cloudflare. Alexander Dahl provides a detailed description (and code) for his current preferred free hosting solution for small sites: SQLite (and a Go application) running on Fly’s free tier, with the database replicated up to Cloudflare’s R2 object storage (again on a free tier) by Litestream.

# 7th October 2022, 5:47 pm / hosting, sqlite, cloudflare, fly, litestream

py2rs. Extremely useful document providing resources for learning Rust followed by an extensive collection of common Python tasks (building a list, opening a file, spawning a thread, running a simple web server) and their Rust equivalents.

# 7th October 2022, 5:44 pm / python, rust

Getting tabular data from unstructured text with GPT-3: an ongoing experiment (via) Roberto Rocha shows how to use a carefully designed prompt (with plenty of examples) to get GPT-3 to convert unstructured textual data into a structured table.

# 5th October 2022, 3:03 am / data-journalism, ai, gpt-3, openai, prompt-engineering, generative-ai, llms

The Illustrated Stable Diffusion (via) Jay Alammar provides a detailed, clearly explained description of how the Stable Diffusion image generation model actually works under the hood..

# 5th October 2022, 2:58 am / ai, stable-diffusion, generative-ai, text-to-image

libsql (via) A brand new Apache 2 licensed fork of SQLite. The README explains the rationale behind the project: SQLite itself is open source but not open contribution, and this fork aims to try out new ideas. The most interesting to me so far is a plan to support user defined functions implemented in WebAssembly. The project also intends to use Rust for new feature development.

# 4th October 2022, 4:13 pm / open-source, sqlite, rust, webassembly

mod_wasm: run WebAssembly with Apache (via) Brand new Apache module from a team at VMWare: mod_wasm builds on top of wasmtime to let you write WebAssembly programs that are exposed to the world by Apache, using a mechanism that looks similar to old CGI scripts (headers passed in environment variables, request body sent to standard input). Includes a demo Docker image that runs using Python-compiled-to-WebAssembly.

# 4th October 2022, 12:53 am / apache, webassembly

You should have lots of AWS accounts (via) Richard Crowley makes the case for maintaining multiple AWS accounts within a single company, because “AWS accounts are the most complete form of isolation on offer”.

# 3rd October 2022, 6:36 pm / aws, richard-crowley, security

Querying Postgres Tables Directly From DuckDB (via) I learned a lot of interesting PostgreSQL tricks from this write-up of the new DuckDB feature that allows it to run queries against PostgreSQL servers directly. It works using COPY (SELECT ...) TO STDOUT (FORMAT binary) which writes rows to the protocol stream in efficient binary format, but splits the table being read into parallel fetches against page ranges and uses SET TRANSACTION SNAPSHOT ... in those parallel queries to ensure they see the same transactional snapshot of the database.

# 3rd October 2022, 2:27 pm / databases, postgresql, duckdb

Figure out how to serve an AWS Lambda function with a Function URL from a custom subdomain (via) This took me five hours and 77 issue comments to figure out, but I finally managed to serve an AWS Lambda function running Datasette on a custom subdomain with an HTTPS certificate. I was going to write this up as a TIL but I’m exhausted so I decided to share my private notes thread instead.

# 3rd October 2022, 12:29 am / aws, lambda, datasette

konstantint/SKompiler (via) A tool for compiling trained SKLearn models into other representations —including SQL queries and Excel formulas. I’ve been pondering the most light-weight way to package a simple machine learning model as part of a larger application without needing to bundle heavy dependencies, this set of techniques looks ideal!

# 2nd October 2022, 11:56 pm / machine-learning, sql

Does Company ‘X’ have an Azure Active Directory Tenant? (via) Neat write-up from Shawn Tabrizi about looking up if a company has Active Directory single-sign-on configured (which is based on OpenID) by checking for an OpenID configuration endpoint. I particularly enjoyed this new-to-me trick: Google’s “I’m Feeling Lucky” search button redirects to the first result, which means it can double as an unofficial API endpoint for returning the URL of the first matching search result.

# 1st October 2022, 8:15 pm / google, microsoft, openid

Supporting logical properties. A frustrating reminder from Jeremy Keith that Safari is not an evergreen browser: older iOS devices (1st gen iPad Air for example) get stuck on the last iOS version that supports them, which also sticks them with an old version of Safari, which means they will never get support for newer CSS properties such as inline-start and block-end. Jeremy shows how to use the @supports rule to hide this new syntax from those older browsers.

# 1st October 2022, 1:03 am / css, jeremy-keith, safari, web-standards, ios

nat/natbot (via) Extremely devious hack by Nat Friedman: opens a browser using Playwright and then passes a DOM representation to GPT-3 in order to power a chat-style interface for driving the browser. Worth diving into the code to look at the prompt it uses, it’s fascinating.

# 30th September 2022, 1:01 am / playwright, gpt-3, openai

Introducing LiteFS (via) LiteFS is the new SQLite replication solution from Fly, now ready for beta testing. It’s from the same author as Litestream but has a very different architecture; LiteFS works by implementing a custom FUSE filesystem which spies on SQLite transactions being written to the journal file and forwards them on to other nodes in the cluster, providing full read-replication. The signature Litestream feature of streaming a backup to S3 should be coming within the next few months.

# 21st September 2022, 6:56 pm / databases, replication, sqlite, fly, litestream, ben-johnson

Fastly Compute@Edge JS Runtime (via) Fastly’s JavaScript runtime, designed to run at the edge of their CDN, uses the Mozilla SpiderMonkey JavaScript engine compiled to WebAssembly.

# 20th September 2022, 10:20 pm / javascript, mozilla, spidermonkey, webassembly, fastly

Wasmtime Reaches 1.0: Fast, Safe and Production Ready! The Bytecode Alliance are making some confident promises in this post about the performance and stability of their Wasmtime WebAssembly runtime. They also highlight some exciting use-cases for WebAssembly on the server, including safe 3rd party plugin execution and User Defined Functions running inside databases.

# 20th September 2022, 10:11 pm / webassembly

I Resurrected “Ugly Sonic” with Stable Diffusion Textual Inversion (via) “I trained an Ugly Sonic object concept on 5 image crops from the movie trailer, with 6,000 steps [...] (on a T4 GPU, this took about 1.5 hours and cost about $0.21 on a GCP Spot instance)”

# 20th September 2022, 3:35 am / machine-learning, ai, max-woolf, stable-diffusion, generative-ai, text-to-image

PEP 554 – Multiple Interpreters in the Stdlib: Shared data (via) Python 3.12 hopes to introduce multiple interpreters as part of the Python standard library, so Python code will be able to launch subinterpreters, each with their own independent GIL. This will allow Python code to execute on multiple CPU cores at the same time while ensuring existing code (and C modules) that rely on the GIL continue to work.

The obvious question here is how data will be shared between those interpreters. This PEP proposes a channels mechanism, where channels can be used to send just basic Python types between interpreters: None, bytes, str, int and channels themselves (I wonder why not floats?)

# 20th September 2022, 1:25 am / concurrency, pep, python

How I’m a Productive Programmer With a Memory of a Fruit Fly (via) Hynek Schlawack describes the value he gets from searchable offline developer documentation, and advocates for the Documentation Sets format which bundles docs, metadata and a SQLite search index. Hynek’s doc2dash command can convert documentation generated by tools like Sphinx into a docset that’s compatible with several offline documentation browser applications.

# 19th September 2022, 4:19 pm / documentation, sqlite, sphinx-docs, hynek-schlawack

Deploying Python web apps as AWS Lambda functions. After literally years of failed half-hearted attempts, I finally managed to deploy an ASGI Python web application (Datasette) to an AWS Lambda function! Here are my extensive notes.

# 19th September 2022, 4:05 am / aws, lambda, python, serverless, datasette, asgi

Years

Tags