Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

How I’m a Productive Programmer With a Memory of a Fruit Fly (via) Hynek Schlawack describes the value he gets from searchable offline developer documentation, and advocates for the Documentation Sets format which bundles docs, metadata and a SQLite search index. Hynek’s doc2dash command can convert documentation generated by tools like Sphinx into a docset that’s compatible with several offline documentation browser applications.

# 19th September 2022, 4:19 pm / documentation, sqlite, sphinx-docs, hynek-schlawack

Deploying Python web apps as AWS Lambda functions. After literally years of failed half-hearted attempts, I finally managed to deploy an ASGI Python web application (Datasette) to an AWS Lambda function! Here are my extensive notes.

# 19th September 2022, 4:05 am / aws, lambda, python, serverless, datasette, asgi

An introduction to XGBoost regression. I hadn’t realized what a wealth of high quality tutorial material could be found in Kaggle notebooks. Here Carl McBride Ellis provides a very approachable and practical introduction to XGBoost, one of the leading techniques for building machine learning models against tabular data.

# 18th September 2022, 1:42 pm / machine-learning, ai

The Changelog: Stable Diffusion breaks the internet. I’m on this week’s episode of The Changelog podcast, talking about Stable Diffusion, AI ethics and a little bit about prompt injection attacks too.

# 17th September 2022, 2:14 am / podcasts, ai, stable-diffusion, prompt-engineering, prompt-injection, generative-ai, llms, text-to-image, podcast-appearances

Retrospection and Learnings from Dgraph Labs (via) I was excited about Dgraph as an interesting option in the graph database space. It didn’t work out, and founder Manish Rai Jain provides a thoughtful retrospective as to why, full of useful insights for other startup founders considering projects in a similar space.

# 16th September 2022, 6:43 pm / entrepreneurship, startups, graphql

APSW is now available on PyPI. News I missed from June: the venerable (17+ years old) APSW SQLite library for Python is now officially available on PyPI as a set of wheels, built using cibuildwheel. This is a really big deal: APSW is an extremely well maintained library which exposes way more low-level SQLite functionality than the standard library’s sqlite3 module, and to-date one of the only disadvantages of using it was the need to install it independently of PyPI. Now you can just run “pip install apsw”.

# 15th September 2022, 10:18 pm / pypi, python, sqlite, apsw

Ladybird: A new cross-platform browser project (via) Conventional wisdom is that building a new browser engine from scratch is impossible without enormous capital outlay and many people working together for many years. Andreas Kling has been disproving that for a while now with his SerenityOS from-scratch operating system project, which includes a brand new browser implemented in C++. Now Andreas is announcing his plans to extract that browser as Ladybird and make it run across multiple platforms. Andreas is a former WebKit engineer (at Nokia and then Apple) and really knows his stuff: Ladybird already passes the Acid3 test!

# 12th September 2022, 7:34 pm / acid3, browsers, webkit, andreas-kling, ladybird

TIL: You Can Build Portable Binaries of Python Applications (via) Hynek Schlawack on the brilliant PyOxidizer by Gregory Szorc.

# 7th September 2022, 9:41 pm / python, hynek-schlawack

How the SQLite Virtual Machine Works. The latest entry in Ben Johnson’s series about SQLite internals.

# 7th September 2022, 8:49 pm / databases, sqlite, ben-johnson

CROSS JOIN and virtual tables in SQLite. Learned today on the SQLite forums that the SQLite CROSS JOIN in SQLite is a special case of join where the provided table order is preserved when executing the join. This is useful for advanced cases where you might want to use a SQLite virtual table to perform some kind of custom operation—searching against an external search engine for example—and then join the results back against other tables in a predictable way.

# 7th September 2022, 12:15 am / sqlite

dolthub/jsplit (via) Neat Go CLI tool for working with truly gigantic JSON files. This assumes files will be an object with one or more keys that are themselves huge lists of objects—it than extracts those lists out into one or more newline-delimited JSON files (capping their size at 4GB) which are much easier to work with as streams of data.

# 6th September 2022, 8:27 pm / cli, go, json

karpathy/minGPT (via) A “minimal PyTorch re-implementation” of the OpenAI GPT training and inference model, by Andrej Karpathy. It’s only a few hundred lines of code and includes extensive comments, plus notebook demos.

# 6th September 2022, 2:52 pm / machine-learning, ai, gpt-3, andrej-karpathy, generative-ai, llms

Spevktator: OSINT analysis tool for VK. This is a really cool project that came out of a recent Bellingcat hackathon. Spevktator takes 67,000 posts from five popular Russian news channels on VK (a popular Russian social media platform) and makes them available in Datasette, along with automated translations to English, post sharing metrics and sentiment analysis scores. This README includes some detailed analysis of the data, plus a link to an Observable notebook that implements custom visualizations against queries run directly against the Datasette instance.

# 5th September 2022, 8:48 pm / political-hacking, datasette, observable, bellingcat

The Amazon Builders’ Library (via) “How Amazon builds and operates software”—an extraordinarily valuable collection of detailed articles about how AWS works and operates under the hood.

# 5th September 2022, 5:50 pm / aws, software-architecture

r/MachineLearning: What is the SOTA explanation for why deep learning works? The thing I find fascinating about this Reddit conversation is that it makes it clear that the machine learning research community has very little agreement on WHY the state of the art techniques that are being used today actually work as well as they do.

# 5th September 2022, 5:46 pm / machine-learning, reddit, ai, generative-ai

Should You Use Upper Bound Version Constraints? (via) Should you pin your library's dependencies using "click>=7,<8" or "click~=7.0"? Henry Schreiner's short answer is no, and his long answer is an exhaustive essay covering every conceivable aspect of this thorny Python packaging problem.

# 5th September 2022, 5:42 pm / packaging, pypi, python

Grokking Stable Diffusion (via) Jonathan Whitaker built this interactive Jupyter notebook that walks through how to use Stable Diffusion from Python step-by-step, and then dives deep into helping understand the different components of the implementation, including how text is encoded, how the diffusion loop works and more. This is by far the most useful tool I’ve seen yet for understanding how this model actually works. You can run Jonathan’s notebook directly on Google Colab, with a GPU.

# 4th September 2022, 6:50 pm / jupyter, stable-diffusion, generative-ai, text-to-image

Discord History Tracker. Very interestingly shaped piece of software. You install and run a localhost web application on your own machine, then paste some JavaScript into the Discord Electron app’s DevTools console (ignoring the prominent messages there warning you not to paste anything into it). The JavaScript scrapes messages you can see in Discord and submits them back to that localhost application, which writes them to a SQLite database for you. It’s written in C# with ASP.NET Core, but complied executables are provided for Windows, macOS and Linux. I had to allow execution of four different unsigned binaries to get this working on my Mac.

# 2nd September 2022, 9:37 pm / security, sqlite, discord

Open every CSV file in a GitHub repository in Datasette Lite (via) I built an Observable notebook that accepts a GitHub repository as input, scans it for CSV files and generates a link to open all of those CSV files in Datasette Lite.

# 1st September 2022, 7:24 pm / github, projects, observable, datasette-lite

Building Layoffs on a Healthy Foundation (via) Kellan provides some valuable guidance for running layoffs in as humane a way as possible.

# 1st September 2022, 6:11 pm / kellan-elliott-mccrea, management

Run Stable Diffusion on your M1 Mac’s GPU. Ben Firshman provides detailed instructions for getting Stable Diffusion running on an M1 Mac.

# 1st September 2022, 5:41 pm / ben-firshman, machine-learning, macos, ai, stable-diffusion, generative-ai, text-to-image

Farmbound, or how I built an app in 2022. Stuart Langridge describes the architecture and decision process behind his new mobile web game, Farmbound.

# 31st August 2022, 11:23 pm / stuart-langridge, web

Exploring 12 Million of the 2.3 Billion Images Used to Train Stable Diffusion’s Image Generator. Andy Baio and I collaborated on an investigation into the training set used for Stable Diffusion. I built a Datasette instance with 12m image records sourced from the LAION-Aesthetics v2 6+ aesthetic score data used as part of the training process, and built a tool so people could run searches and explore the data. Andy did some extensive analysis of things like the domains scraped for the images and names of celebrities and artists represented in the data. His write-up here explains our project in detail and some of the patterns we’ve uncovered so far.

# 31st August 2022, 2:10 am / machine-learning, ai, stable-diffusion, generative-ai, laion, training-data

How SQLite Scales Read Concurrency (via) Ben Johnson’s series on SQLite internals continues—this time with a detailed explanation of how the SQLite WAL (Write-Ahead Log) is implemented.

# 24th August 2022, 4:16 pm / databases, sqlite, ben-johnson

Stable Diffusion Public Release (via) New AI just dropped. Stable Diffusion is similar to DALL-E, but completely open source and with a CC0 license applied to everything it generates. I have a Twitter thread (the via) link of comparisons I’ve made between its output and my previous DALL-E experiments. The announcement buries the lede somewhat: to try it out, visit beta.dreamstudio.ai—which you can use for free at the moment, but it’s unclear to me how billing is supposed to work.

# 22nd August 2022, 7:12 pm / machine-learning, dalle, stable-diffusion, generative-ai, text-to-image

Digitizing 55,000 pages of civic meetings (via) Philip James has been building public, searchable archives of city council meetings for various cities—Oakland and Alamedia so far—using my s3-ocr script to run Textract OCR against the PDFs of the minutes, and deploying them to Fly using Datasette. This is a really cool project, and very much the kind of thing I’ve been hoping to support with the tools I’ve been building.

# 22nd August 2022, 4:26 pm / archiving, ocr, political-hacking, datasette, fly

Turning SQLite into a distributed database (via) Heyang Zhou introduces mvSQLite, his brand new open source “SQLite-compatible distributed database” built in Rust on top of Apple’s FoundationDB. This is a very promising looking new entry into the distributed/replicated SQLite space: FoundationDB was designed to provide low-level primitives that tools like this could build on top of.

# 21st August 2022, 5:40 pm / databases, sqlite, rust

Show HN: A new way to use GPT-3 to generate code (and everything else). Riley Goodside is my favourite Twitter follow for GPT-3 tips. Here he describes a powerful prompt pattern he's designed which lets you generate extremely complex code output by asking GPT-3 to fill in $$areas like this$$ with different patterns, then stitch them together into full HTML or other source code files. It's really clever.

# 20th August 2022, 9:33 pm / gpt-3, prompt-engineering, generative-ai, riley-goodside, llms

Shoelace (via) Saw this for the first time today: it’s a relatively new library of framework-agnostic Web Components, built on lit-html and covering a huge array of common functionality: buttons and sliders and dialogs and drawer interfaces and dropdown menus and so on. The design is very clean, the documentation is superb—and it looks like you can cherry pick just the components you are using for a pretty lean addition to your page weight. So refreshing to see libraries like this that really take advantage of modern web standards.

# 20th August 2022, 8:57 pm / css, javascript, web-standards, web-components, lit-html

Years

Tags