Simon Willison’s Weblog

Subscribe

Blogmarks

Filters: Sorted by date

You should have lots of AWS accounts (via) Richard Crowley makes the case for maintaining multiple AWS accounts within a single company, because “AWS accounts are the most complete form of isolation on offer”.

# 3rd October 2022, 6:36 pm / aws, richard-crowley, security

Querying Postgres Tables Directly From DuckDB (via) I learned a lot of interesting PostgreSQL tricks from this write-up of the new DuckDB feature that allows it to run queries against PostgreSQL servers directly. It works using COPY (SELECT ...) TO STDOUT (FORMAT binary) which writes rows to the protocol stream in efficient binary format, but splits the table being read into parallel fetches against page ranges and uses SET TRANSACTION SNAPSHOT ... in those parallel queries to ensure they see the same transactional snapshot of the database.

# 3rd October 2022, 2:27 pm / databases, postgresql, duckdb

Figure out how to serve an AWS Lambda function with a Function URL from a custom subdomain (via) This took me five hours and 77 issue comments to figure out, but I finally managed to serve an AWS Lambda function running Datasette on a custom subdomain with an HTTPS certificate. I was going to write this up as a TIL but I’m exhausted so I decided to share my private notes thread instead.

# 3rd October 2022, 12:29 am / aws, lambda, datasette

konstantint/SKompiler (via) A tool for compiling trained SKLearn models into other representations —including SQL queries and Excel formulas. I’ve been pondering the most light-weight way to package a simple machine learning model as part of a larger application without needing to bundle heavy dependencies, this set of techniques looks ideal!

# 2nd October 2022, 11:56 pm / machine-learning, sql

Does Company ‘X’ have an Azure Active Directory Tenant? (via) Neat write-up from Shawn Tabrizi about looking up if a company has Active Directory single-sign-on configured (which is based on OpenID) by checking for an OpenID configuration endpoint. I particularly enjoyed this new-to-me trick: Google’s “I’m Feeling Lucky” search button redirects to the first result, which means it can double as an unofficial API endpoint for returning the URL of the first matching search result.

# 1st October 2022, 8:15 pm / google, microsoft, openid

Supporting logical properties. A frustrating reminder from Jeremy Keith that Safari is not an evergreen browser: older iOS devices (1st gen iPad Air for example) get stuck on the last iOS version that supports them, which also sticks them with an old version of Safari, which means they will never get support for newer CSS properties such as inline-start and block-end. Jeremy shows how to use the @supports rule to hide this new syntax from those older browsers.

# 1st October 2022, 1:03 am / css, jeremy-keith, safari, web-standards, ios

nat/natbot (via) Extremely devious hack by Nat Friedman: opens a browser using Playwright and then passes a DOM representation to GPT-3 in order to power a chat-style interface for driving the browser. Worth diving into the code to look at the prompt it uses, it’s fascinating.

# 30th September 2022, 1:01 am / playwright, gpt-3, openai

Introducing LiteFS (via) LiteFS is the new SQLite replication solution from Fly, now ready for beta testing. It’s from the same author as Litestream but has a very different architecture; LiteFS works by implementing a custom FUSE filesystem which spies on SQLite transactions being written to the journal file and forwards them on to other nodes in the cluster, providing full read-replication. The signature Litestream feature of streaming a backup to S3 should be coming within the next few months.

# 21st September 2022, 6:56 pm / databases, replication, sqlite, fly, litestream, ben-johnson

Fastly Compute@Edge JS Runtime (via) Fastly’s JavaScript runtime, designed to run at the edge of their CDN, uses the Mozilla SpiderMonkey JavaScript engine compiled to WebAssembly.

# 20th September 2022, 10:20 pm / javascript, mozilla, spidermonkey, webassembly, fastly

Wasmtime Reaches 1.0: Fast, Safe and Production Ready! The Bytecode Alliance are making some confident promises in this post about the performance and stability of their Wasmtime WebAssembly runtime. They also highlight some exciting use-cases for WebAssembly on the server, including safe 3rd party plugin execution and User Defined Functions running inside databases.

# 20th September 2022, 10:11 pm / webassembly

I Resurrected “Ugly Sonic” with Stable Diffusion Textual Inversion (via) “I trained an Ugly Sonic object concept on 5 image crops from the movie trailer, with 6,000 steps [...] (on a T4 GPU, this took about 1.5 hours and cost about $0.21 on a GCP Spot instance)”

# 20th September 2022, 3:35 am / machine-learning, ai, max-woolf, stable-diffusion, generative-ai, text-to-image

PEP 554 – Multiple Interpreters in the Stdlib: Shared data (via) Python 3.12 hopes to introduce multiple interpreters as part of the Python standard library, so Python code will be able to launch subinterpreters, each with their own independent GIL. This will allow Python code to execute on multiple CPU cores at the same time while ensuring existing code (and C modules) that rely on the GIL continue to work.

The obvious question here is how data will be shared between those interpreters. This PEP proposes a channels mechanism, where channels can be used to send just basic Python types between interpreters: None, bytes, str, int and channels themselves (I wonder why not floats?)

# 20th September 2022, 1:25 am / concurrency, pep, python

How I’m a Productive Programmer With a Memory of a Fruit Fly (via) Hynek Schlawack describes the value he gets from searchable offline developer documentation, and advocates for the Documentation Sets format which bundles docs, metadata and a SQLite search index. Hynek’s doc2dash command can convert documentation generated by tools like Sphinx into a docset that’s compatible with several offline documentation browser applications.

# 19th September 2022, 4:19 pm / documentation, sqlite, sphinx-docs, hynek-schlawack

Deploying Python web apps as AWS Lambda functions. After literally years of failed half-hearted attempts, I finally managed to deploy an ASGI Python web application (Datasette) to an AWS Lambda function! Here are my extensive notes.

# 19th September 2022, 4:05 am / aws, lambda, python, serverless, datasette, asgi

An introduction to XGBoost regression. I hadn’t realized what a wealth of high quality tutorial material could be found in Kaggle notebooks. Here Carl McBride Ellis provides a very approachable and practical introduction to XGBoost, one of the leading techniques for building machine learning models against tabular data.

# 18th September 2022, 1:42 pm / machine-learning, ai

The Changelog: Stable Diffusion breaks the internet. I’m on this week’s episode of The Changelog podcast, talking about Stable Diffusion, AI ethics and a little bit about prompt injection attacks too.

# 17th September 2022, 2:14 am / podcasts, ai, stable-diffusion, prompt-engineering, prompt-injection, generative-ai, llms, text-to-image, podcast-appearances

Retrospection and Learnings from Dgraph Labs (via) I was excited about Dgraph as an interesting option in the graph database space. It didn’t work out, and founder Manish Rai Jain provides a thoughtful retrospective as to why, full of useful insights for other startup founders considering projects in a similar space.

# 16th September 2022, 6:43 pm / entrepreneurship, startups, graphql

APSW is now available on PyPI. News I missed from June: the venerable (17+ years old) APSW SQLite library for Python is now officially available on PyPI as a set of wheels, built using cibuildwheel. This is a really big deal: APSW is an extremely well maintained library which exposes way more low-level SQLite functionality than the standard library’s sqlite3 module, and to-date one of the only disadvantages of using it was the need to install it independently of PyPI. Now you can just run “pip install apsw”.

# 15th September 2022, 10:18 pm / pypi, python, sqlite, apsw

Ladybird: A new cross-platform browser project (via) Conventional wisdom is that building a new browser engine from scratch is impossible without enormous capital outlay and many people working together for many years. Andreas Kling has been disproving that for a while now with his SerenityOS from-scratch operating system project, which includes a brand new browser implemented in C++. Now Andreas is announcing his plans to extract that browser as Ladybird and make it run across multiple platforms. Andreas is a former WebKit engineer (at Nokia and then Apple) and really knows his stuff: Ladybird already passes the Acid3 test!

# 12th September 2022, 7:34 pm / acid3, browsers, webkit, andreas-kling, ladybird

TIL: You Can Build Portable Binaries of Python Applications (via) Hynek Schlawack on the brilliant PyOxidizer by Gregory Szorc.

# 7th September 2022, 9:41 pm / python, hynek-schlawack

How the SQLite Virtual Machine Works. The latest entry in Ben Johnson’s series about SQLite internals.

# 7th September 2022, 8:49 pm / databases, sqlite, ben-johnson

CROSS JOIN and virtual tables in SQLite. Learned today on the SQLite forums that the SQLite CROSS JOIN in SQLite is a special case of join where the provided table order is preserved when executing the join. This is useful for advanced cases where you might want to use a SQLite virtual table to perform some kind of custom operation—searching against an external search engine for example—and then join the results back against other tables in a predictable way.

# 7th September 2022, 12:15 am / sqlite

dolthub/jsplit (via) Neat Go CLI tool for working with truly gigantic JSON files. This assumes files will be an object with one or more keys that are themselves huge lists of objects—it than extracts those lists out into one or more newline-delimited JSON files (capping their size at 4GB) which are much easier to work with as streams of data.

# 6th September 2022, 8:27 pm / cli, go, json

karpathy/minGPT (via) A “minimal PyTorch re-implementation” of the OpenAI GPT training and inference model, by Andrej Karpathy. It’s only a few hundred lines of code and includes extensive comments, plus notebook demos.

# 6th September 2022, 2:52 pm / machine-learning, ai, gpt-3, andrej-karpathy, generative-ai, llms

Spevktator: OSINT analysis tool for VK. This is a really cool project that came out of a recent Bellingcat hackathon. Spevktator takes 67,000 posts from five popular Russian news channels on VK (a popular Russian social media platform) and makes them available in Datasette, along with automated translations to English, post sharing metrics and sentiment analysis scores. This README includes some detailed analysis of the data, plus a link to an Observable notebook that implements custom visualizations against queries run directly against the Datasette instance.

# 5th September 2022, 8:48 pm / political-hacking, datasette, observable, bellingcat

The Amazon Builders’ Library (via) “How Amazon builds and operates software”—an extraordinarily valuable collection of detailed articles about how AWS works and operates under the hood.

# 5th September 2022, 5:50 pm / aws, software-architecture

r/MachineLearning: What is the SOTA explanation for why deep learning works? The thing I find fascinating about this Reddit conversation is that it makes it clear that the machine learning research community has very little agreement on WHY the state of the art techniques that are being used today actually work as well as they do.

# 5th September 2022, 5:46 pm / machine-learning, reddit, ai, generative-ai

Should You Use Upper Bound Version Constraints? (via) Should you pin your library's dependencies using "click>=7,<8" or "click~=7.0"? Henry Schreiner's short answer is no, and his long answer is an exhaustive essay covering every conceivable aspect of this thorny Python packaging problem.

# 5th September 2022, 5:42 pm / packaging, pypi, python

Grokking Stable Diffusion (via) Jonathan Whitaker built this interactive Jupyter notebook that walks through how to use Stable Diffusion from Python step-by-step, and then dives deep into helping understand the different components of the implementation, including how text is encoded, how the diffusion loop works and more. This is by far the most useful tool I’ve seen yet for understanding how this model actually works. You can run Jonathan’s notebook directly on Google Colab, with a GPU.

# 4th September 2022, 6:50 pm / jupyter, stable-diffusion, generative-ai, text-to-image

Years

Tags