Simon Willison’s Weblog

Items tagged python

Filters: python ×


PyPI now supports uploading via API token (via) All of my open source Python libraries are set up to automatically deploy new tagged releases as PyPI packages using Circle CI or Travis, but I’ve always get a bit uncomfortable about sharing my PyPI password with those CI platforms to get this to work. PyPI just added scopes authentication tokens, which means I can issue a token that’s only allowed to upload a specific project and see an audit log of when that token was last used. # 1st August 2019, 4:03 pm

Using memory-profiler to debug excessive memory usage in healthkit-to-sqlite. This morning I figured out how to use the memory-profiler module (and mprof command line tool) to debug memory usage of Python processes. I added the details, including screenshots, to this GitHub issue. It helped me knock down RAM usage for my healthkit-to-sqlite from 2.5GB to just 80MB by making smarter usage of the ElementTree pull parser. # 24th July 2019, 8:25 am

PugSQL. Interesting new twist on a definitely-not-an-ORM library for Python. With PugSQL you define SQL queries in files, give them names and then load them into a module which allows you to execute them as Python methods with keyword arguments. You can mark statements as only returning a single row (or a single scalar value) with a comment at the top of their file. # 3rd July 2019, 6:19 pm

json-flatten. A little Python library I wrote that attempts to flatten a JSON object into a set of key/value pairs suitable for transmitting in a query string or using to construct an HTML form. I first wrote this back in 2015 as a Gist—I’ve reconstructed the Gist commit history in a new repository and shipped it to PyPI. # 22nd June 2019, 4:51 am

Toward a “Kernel Python” (via) Glyph makes a strong case for releasing a slimmed down “kernel” version of Python with the minimal possible standard library, and argues that the current standard library is proving impossible for a single core team to productively maintain. “If I wanted to update the colorsys module to be more modern—perhaps to have a Color object rather than a collection of free functions, perhaps to support integer color models—I’d likely have to wait 500 days, or more, for a review.” # 15th June 2019, 4 pm

quicktype code generator for Python. Really interesting tool: give it an example JSON document and it will code-generate the equivalent set of Python classes (with type annotations) instantly in your browser. It also accepts input in JSON Schema or TypeScript and can generate code in 18 different languages. # 14th May 2019, 11:35 pm

Smaller Python Docker Containers with Multi-Stage Builds and Python Wheels (via) Clear tutorial on how to use Docker’s multi-stage build feature to create smaller final images by taking advantage of Python’s wheel format—so an initial stage can install a full compiler toolchain and compile C dependencies into wheels, then a later stage can install those pre-compiled wheels into a slimmer container without including the C compiler. # 26th April 2019, 2:05 pm

An Intro to Threading in Python (via) Real Python consistently produces really comprehensive, high quality articles and tutorials. This is an excellent introduction to threading in Python, covering threads, locks, queues, ThreadPoolExecutor and more. # 18th April 2019, 5:24 am

Pyodide: Bringing the scientific Python stack to the browser (via) More fun with WebAssembly: Pyodide attempts (and mostly succeeds) to bring the full Python data stack to the browser: CPython, NumPy, Pandas, Scipy, and Matplotlib. Also includes interesting bridge tools for e.g. driving a canvas element from Python. Really interesting project from the Firefox Data Platform team. # 17th April 2019, 4:23 am

Wasmer: a Python library for executing WebAssembly binaries. This is a really interesting new tool: “pip install wasmer” and now you can load code that has been compiled to WebAssembly and call those functions directly from Python. It’s built on top of the wasmer universal WebAssembly runtime, written over just the past year in Rust by a team lead by Syrus Akbary, the author of the Graphene GraphQL library for Python. # 16th April 2019, 6:04 pm

Ministry of Silly Runtimes: Vintage Python on Cloud Run (via) Cloud Run is an exciting new hosting service from Google that lets you define a container using a Dockerfile and then run that container in a “scale to zero” environment, so you only pay for time spent serving traffic. It’s similar to the now-deprecated Zeit Now 1.0 which inspired me to create Datasette. Here Dustin Ingram demonstrates how powerful Docker can be as the underlying abstraction by deploying a web app using a 25 year old version of Python 1.x. # 9th April 2019, 5:33 pm

Generator Tricks for Systems Programmers (via) David Beazley’s definitive generators tutorial from 2008, updated for Python 3.7 in October 2018. # 9th April 2019, 5:13 pm

VisiData (via) Intriguing tool by Saul Pwanson: VisiData is a command-line “textpunk utility” for browsing and manipulating tabular data. “pip3 install visidata” and then “vd myfile.csv” (or .json or .xls or SQLite orothers) and get an interactive terminal UI for quickly searching through the data, conducting frequency analysis of columns, manipulating it and much more besides. Two tips for if you start playing with it: hit “gq” to exit, and hit “Ctrl+H” to view the help screen. # 18th March 2019, 3:45 am

huey. Charles Leifer’s “little task queue for Python”. Similar to Celery, but it’s designed to work with Redis, SQLite or in the parent process using background greenlets. Worth checking out for the really neat design. The project is new to me, but it’s been under active development since 2011 and has a very healthy looking rate of releases. # 25th February 2019, 7:49 pm

parameterized. I love the @parametrize decorator in pytest, which lets you run the same test multiple times against multiple parameters. The only catch is that the decorator in pytest doesn’t work for old-style unittest TestCase tests, which means you can’t easily add it to test suites that were built using the older model. I just found out about parameterized which works with unittest tests whether or not you are running them using the pytest test runner. # 19th February 2019, 9:05 pm

Launching LiteCLI (via) Really neat alternative command-line client for SQLite, written in Python and using the same underlying framework as the similar pgcli (PostgreSQL) and mycli (MySQL) tools. Provides really intuitive autocomplete against table names, columns and other bits and pieces of SQLite syntax. Installation is as easy as “pip install litecli”. # 5th January 2019, 11:16 pm

benfred/py-spy (via) A Python port of Julia Evans’ rbspy profiler, which she describes as “probably better” than the original. I just tried it out and it’s really impressive: it’s written in Rust but has precompiled binaries so you can just run “pip install py-spy” to install it. Shows live output in the terminal while your program is running and also includes the option to generate neat SVG flame graphs. # 29th December 2018, 5:18 am

PEP 8016 -- The Steering Council Model (via) The votes are in and Python has a new governance model, partly inspired by the model used by the Django Software Foundation. A core elected council of five people (with a maximum of two employees from any individual company) will oversee the project. # 17th December 2018, 4:02 pm

Pampy: Pattern Matching for Python (via) Ingenious implementation of Erlang/Rust style pattern matching in just 150 lines of extremely cleanly designed and well-tested Python. # 17th December 2018, 7:14 am

The ASGI specification provides an opportunity for Python to hit a productivity/performance sweet-spot for a wide range of use-cases, from writing high-volume proxy servers through to bringing large-scale web applications to market at speed.

Tom Christie # 8th October 2018, 2:43 pm

The subset of reStructuredText worth committing to memory

reStructuredText is the standard for documentation in the Python world.

[... 1183 words]

Compiling SQLite for use with Python Applications (via) Charles Leifer’s recent tutorial on how to compile and build the latest SQLite (with window function support) for use from Python via his pysqlite3 library. # 15th August 2018, 3:51 pm

coleifer/pysqlite3. Now that the pysqlite package is bundled as part of the Python standard library the original open source project is no longer actively maintained, and has not been upgraded for Python 3. Charles Leifer has been working on pysqlite3, a stand-alone package of the module. Crucially, this should enable compiling the latest version of SQLite (via the amalgamation package) without needing to upgrade the version that ships with the operating system. # 15th August 2018, 3:15 pm

Faust: Python Stream Processing (via) A new open source stream processing system released by Robinhood, created by Vineet Goel and Celery creator Ask Solem. The API looks delightful, making very smart use of Python decorators and async/await. The initial release requires Kafka but they plan to support multiple backends, hopefully including Redis Streams. # 6th August 2018, 10:51 pm

Datasette unit tests: monkeytype_call_traces (via) Faceted browse against every function call that occurs during the execution of Datasette’s test suite. I used Instagram’s MonkeyType tool to generate this, which can run Python code and generates a SQLite database of all of the traced calls. It’s intended to be used to automatically add mypy annotations to your code, but since it produces a SQLite database as a by-product I’ve started exploring the intermediary format using Datasette. Generating this was as easy as running “monkeytype run `which pytest`” in the Datasette root directory. # 2nd August 2018, 9:03 pm

XARs: An efficient system for self-contained executables (via) Really interesting new open source project from Facebook: a XAR is a new way of packaging up a Python executable complete with its dependencies and resources such that it can be distributed and executed elsewhere as a single file. It’s kind of like a Docker container without Docker—it uses the SquashFS compressed read-only filesystem. I can’t wait to try this out with Datasette. # 13th July 2018, 7 pm

future-fstrings (via) Clever module that backports fstrings to versions of Python earlier than 3.6, by registering itself as a codec and abusing Python’s # -*- coding: future_fstrings -*- feature. Via a conversation on Twitter that pointed out that the JavaScript community have been using transpilation to successfully experiment with new language features for years now. # 13th July 2018, 4:39 am

scrapely. Neat twist on a screen scraping library: this one lets you “train it” by feeding it examples of URLs paired with a dictionary of the data you would like to have extracted from that URL, then uses an instance based learning earning algorithm to run against new URLs. Slightly confusing name since it’s maintained by the scrapy team but is a totally independent project from the scrapy web crawling framework. # 10th July 2018, 8:25 pm

mycli. Really neat auto-complete enabled MySQL terminal client, built using the excellent python-prompt-toolkit. Has a sister-project for PostgreSQL called pgcli. # 11th June 2018, 7:08 pm

At Harvard we’ve built out an infrastructure to allow us to deploy JupyterHub to courses with authentication managed by Canvas. It has allowed us to easily deploy complex set-ups to students so they can do really cool stuff without having to spend hours walking them through setup. Instructors are writing their lectures as IPython notebooks, and distributing them to students, who then work through them in their JupyterHub environment. Our most ambitious so far has been setting up each student in the course with a p2.xlarge machine with cuda and TensorFlow so they could do deep learning work for their final projects. We supported 15 courses last year, and got deployment time for an implementation down to only 2-3 hours.

Chris Rogers # 5th June 2018, 7:37 pm

Types

Years

Tags