Simon Willison’s Weblog

Subscribe

Entries tagged python

Filters: Type: entry × python × Sorted by date


Publish Python packages to PyPI with a python-lib cookiecutter template and GitHub Actions

I use cookiecutter to start almost all of my Python projects. It helps me quickly generate a skeleton of a project with my preferred directory structure and configured tools.

[... 686 words]

Things I’ve learned about building CLI tools in Python

I build a lot of command-line tools in Python. It’s become my favorite way of quickly turning a piece of code into something I can use myself and package up for other people to use too.

[... 1235 words]

Catching up on the weird world of LLMs

I gave a talk on Sunday at North Bay Python where I attempted to summarize the last few years of development in the space of LLMs—Large Language Models, the technology behind tools like ChatGPT, Google Bard and Llama 2.

[... 10489 words]

Symbex: search Python code for functions and classes, then pipe them into a LLM

I just released a new Python CLI tool called Symbex. It’s a search tool, loosely inspired by ripgrep, which lets you search Python code for functions and classes by name or wildcard, then see just the source code of those matching entities.

[... 1183 words]

Running Python micro-benchmarks using the ChatGPT Code Interpreter alpha

Today I wanted to understand the performance difference between two Python implementations of a mechanism to detect changes to a SQLite database schema. I rendered the difference between the two as this chart:

[... 2939 words]

Weeknotes: Joining the board of the Python Software Foundation

A few weeks ago I was elected to the board of directors for the Python Software Foundation.

[... 2081 words]

Bundling binary tools in Python wheels

I spotted a new (to me) pattern which I think is pretty interesting: projects are bundling compiled binary applications as part of their Python packaging wheels. I think it’s really neat.

[... 903 words]

Datasette Lite: a server-side Python web application running in a browser

Datasette Lite is a new way to run Datasette: entirely in a browser, taking advantage of the incredible Pyodide project which provides Python compiled to WebAssembly plus a whole suite of useful extras.

[... 4800 words]

Automatically opening issues when tracked file content changes

I figured out a GitHub Actions pattern to keep track of a file published somewhere on the internet and automatically open a new repository issue any time the contents of that file changes.

[... 1211 words]

Weeknotes: Parallel SQL queries for Datasette, plus some middleware tricks

A promising new performance optimization for Datasette, plus new datasette-gzip and datasette-total-page-time plugins.

[... 1534 words]

Useful tricks with pip install URL and GitHub

The pip install command can accept a URL to a zip file or tarball. GitHub provides URLs that can create a zip file of any branch, tag or commit in any repository. Combining these is a really useful trick for maintaining Python packages.

[... 929 words]

Weeknotes: python_requires, documentation SEO

Fixed Datasette on Python 3.6 for the last time. Worked on documentation infrastructure improvements. Spent some time with Fly Volumes.

[... 1497 words]

How to build, test and publish an open source Python library

At PyGotham this year I presented a ten minute workshop on how to package up a new open source Python library and publish it to the Python Package Index. Here is the video and accompanying notes, which should make sense even without watching the talk.

[... 2055 words]

s3-credentials: a tool for creating credentials for S3 buckets

I’ve built a command-line tool called s3-credentials to solve a problem that’s been frustrating me for ages: how to quickly and easily create AWS credentials (an access key and secret key) that have permission to read or write from just a single S3 bucket.

[... 1618 words]

Finding and reporting an asyncio bug in Python 3.10

I found a bug in Python 3.10 today! Some notes on how I found it and my process for handling it once I figured out what was going on.

[... 1789 words]

Weeknotes: datasette-export-notebook, PyInstaller packaged Datasette, CBSAs

What a terrible week. I’ve found it hard to concentrate on anything substantial. In a mostly futile attempt to distract myself from doomscrolling I’ve mainly been building some experimental output plugins, fiddling with PyInstaller and messing around with shapefiles.

[... 732 words]

datasette-ripgrep: deploy a regular expression search engine for your source code

This week I built datasette-ripgrep—a web application for running regular expression searches against source code, built on top of the amazing ripgrep command-line tool.

[... 1362 words]

The “await me maybe” pattern for Python asyncio

I’ve identified a pattern for handling potentially-asynchronous callback functions in Python which I’m calling the “await me maybe” pattern. It works by letting you return a value, a callable function that returns a value OR an awaitable function that returns that value.

[... 787 words]

Weeknotes: Covid-19, First Python Notebook, more Dogsheep, Tailscale

My covid-19.datasettes.com project publishes information on COVID-19 cases around the world. The project started out using data from Johns Hopkins CSSE, but last week the New York Times started publishing high quality USA county- and state-level daily numbers to their own repository. Here’s the change that added the NY Times data.

[... 993 words]

How to cheat at unit tests with pytest and Black

I’ve been making a lot of progress on Datasette Cloud this week. As an application that provides private hosted Datasette instances (initially targeted at data journalists and newsrooms) the majority of the code I’ve written deals with permissions: allowing people to form teams, invite team members, promote and demote team administrators and suchlike.

[... 933 words]

The subset of reStructuredText worth committing to memory

reStructuredText is the standard for documentation in the Python world.

[... 1186 words]

Datasette Demo (video) from the SF Python Meetup

I gave a short talk about Datasette last month at the SF Python Meetup Holiday Party. They’ve just posted the video, so here it is:

[... 63 words]

Porting my blog to Python 3

This blog is now running on Python 3! Admittedly this is nearly nine years after the first release of Python 3.0, but it’s the first Python 3 project I’ve deployed myself so I’m pretty excited about it.

[... 883 words]

Deploying an asynchronous Python microservice with Sanic and Zeit Now

Back in 2008 Natalie Downe and I deployed what today we would call a microservice: json-head, a tiny Google App Engine app that allowed you to make an HTTP head request against a URL and get back the HTTP headers as JSON. One of our initial use-scase for this was Natalie’s addSizes.js, an unobtrusive jQuery script that could annotate links to PDFs and other large files with their corresponding file size pulled from the Content-Length header. Another potential use-case is detecting broken links, since the API can be used to spot 404 status codes (as in this example).

[... 1361 words]

Generating interactive HTML charts from Python?

D3 is absolutely amazing but the learning curve is a bit steep. Totally worth the effort to learn it in the long run, but it’s not so useful if you want to get something done quickly.

[... 97 words]

Why is snapEDA.com slow? Is it because it uses Django?

No, it’s not slow because it uses Django.

[... 36 words]

What’s the best way to communicate with go libraries from within Python programs?

Go speaks HTTP extremely well, so one simple but powerful approach is to hook your Go libraries up as simple HTTP+JSON APIs and have Python call them over HTTP (the Requests: HTTP for Humans library is awesome for this).

[... 59 words]

What are some good resources to learn how to cleanse data using Python?

http://gnosis.cx/TPiP/ “Text Processing in Python” is a free online book that covers a bunch of useful topics related to data cleanup. It’s over 10 years old now but is still mostly relevant—the chapter on regular expressions is particularly good.

[... 61 words]

Is it better to create your own framework, or would it be best to just use Django or something like that?

You should absolutely use an existing framework such as Django rather than writing your own.

[... 176 words]

For a Django application, deployed on Heroku, what are my options for storing user-uploaded media files?

S3 is really a no-brainer for this, it’s extremely inexpensive, very easy to integrate with and unbelievably reliable. It’s so cheap that it will be practically free for testing purposes (expect to spend pennies a month on it).

[... 88 words]