Simon Willison’s Weblog

Items tagged python

Filters: python ×


Hello, HPy (via) HPy provides a new way to write C extensions for Python in a way that is compatible with multiple Python implementations at once, including PyPy. # 29th March 2021, 2:40 pm

Homebrew Python Is Not For You. If you’ve been running into frustrations with your Homebrew Python environments breaking over the past few months (the dreaded “Reason: image not found” error) Justin Mayer has a good explanation. Python in a Homebrew is designed to work as a dependency for their other packages, and recent policy changes that they made to support smoother upgrades have had catastrophic problems effects on those of us who try to use it for development environments. # 25th March 2021, 3:14 pm

When you have to mock a collaborator, avoid using the Mock object directly. Either use mock.create_autospec() or mock.patch(autospec=True) if at all possible. Autospeccing from the real collaborator means that if the collaborator’s interface changes, your tests will fail. Manually speccing or not speccing at all means that changes in the collaborator’s interface will not break your tests that use the collaborator: you could have 100% test coverage and your library would fall over when used!

Thea Flowers # 17th March 2021, 4:44 pm

sqlite-uuid (via) Another Python package that wraps a SQLite module written in C: this one provides access to UUID functions as SQLite functions. # 15th March 2021, 2:55 am

sqlite-spellfix (via) I really like this pattern: “pip install sqlite-spellfix” gets you a Python module which includes a compiled (on your system when pip install ran) copy of the SQLite spellfix1 module, plus a utility variable containing its path so you can easily load it into a SQLite connection. # 15th March 2021, 2:52 am

unasync (via) Today I started wondering out loud if one could write code that takes an asyncio Python library and transforms it into the synchronous equivalent by using some regular expressions to strip out the “await ...” keywords and suchlike. Turns out that can indeed work, and Ratan Kulshreshtha built it! unasync uses the standard library tokenize module to run some transformations against an async library and spit out the sync version automatically. I’m now considering using this for sqlite-utils. # 27th February 2021, 10:20 pm

Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies (via) Alex Birsan describes a new category of security vulnerability he discovered in the npm, pip and gem packaging ecosystems: if a company uses a private repository with internal package names, uploading a package with the same name to the public repository can often result in an attacker being able to execute their own code inside the networks of their target. Alex scored over $130,000 in bug bounties from this one, from a number of name-brand companies. Of particular note for Python developers: the --extra-index-url argument to pip will consult both public and private registries and install the package with the highest version number! # 10th February 2021, 8:42 pm

Weeknotes: datasette-export-notebook, PyInstaller packaged Datasette, CBSAs

What a terrible week. I’ve found it hard to concentrate on anything substantial. In a mostly futile attempt to distract myself from doomscrolling I’ve mainly been building some experimental output plugins, fiddling with PyInstaller and messing around with shapefiles.

[... 732 words]

datasette-ripgrep: deploy a regular expression search engine for your source code

This week I built datasette-ripgrep—a web application for running regular expression searches against source code, built on top of the amazing ripgrep command-line tool.

[... 1362 words]

Unravelling `not` in Python (via) Part of a series where Brett Cannon looks at how fundamental Python syntactic sugar works, including a clearly explained dive into the underlying op codes and C implementation. # 27th November 2020, 5:59 pm

Hunting for Malicious Packages on PyPI (via) Jordan Wright installed all 268,000 Python packages from PyPI in containers, and ran Sysdig to capture syscalls made during installation to see if any of them were making extra network calls or reading or writing from the filesystem. Absolutely brilliant piece of security engineering and research. # 14th November 2020, 4:48 am

selenium-wire. Really useful scraping tool: enhances the Python Selenium bindings to run against a proxy which then allows Python scraping code to look at captured requests—great for if a site you are working with triggers Ajax requests and you want to extract data from the raw JSON that came back. # 2nd November 2020, 6:58 pm

Inevitably we got round to talking about async. As much of an unneeded complication as it is for so many day-to-day use-cases, it’s important for Python because, if and when you do need the high throughput handling of these io-bound use-cases, you don’t want to have to switch language. The same for Django: most of what you’re doing has no need of async but you don’t want to have to change web framework just because you need a sprinkling of non-blocking IO.

Carlton Gibson # 27th September 2020, 3:09 pm

Array programming with NumPy—the NumPy paper (via) The NumPy paper is out, published in Nature. I found this enlightening: for an academic paper it’s very understandable, and it filled in quite a few gaps in my mental model of what NumPy is and which problems it addresses, as well as its relationship to the many other tools in the scientific Python stack. # 17th September 2020, 4:34 pm

The “await me maybe” pattern for Python asyncio

I’ve identified a pattern for handling potentially-asynchronous callback functions in Python which I’m calling the “await me maybe” pattern. It works by letting you return a value, a callable function that returns a value OR an awaitable function that returns that value.

[... 787 words]

Announcing the Consortium for Python Data API Standards (via) Interesting effort to unify the fragmented DataFrame API ecosystem, where increasing numbers of libraries offer APIs inspired by Pandas that imitate each other but aren’t 100% compatible. The announcement includes some very clever code to support the effort: custom tooling to compare the existing APIs, and an ingenious GitHub Actions setup to run traces (via sys.settrace), derive type signatures and commit those generated signatures back to a repository. # 19th August 2020, 5:48 am

Pysa: An open source static analysis tool to detect and prevent security issues in Python code (via) Interesting new static analysis tool for auditing Python for security vulnerabilities—things like SQL injection and os.execute() calls. Built by Facebook and tested extensively on Instagram, a multi-million line Django application. # 7th August 2020, 8:50 pm

Better Python Decorators with wrapt (via) Adam Johnson explains the intricacies of decorating a Python function without breaking the ability to correctly introspect it, and dicsusses how Scout use the wrapt library by Graham Dumpleton to implement their instrumentation library. # 2nd July 2020, 9:48 pm

click-app. While working on sqlite-generate today I built a cookiecutter template for building the skeleton for Click command-line utilities. It’s based on datasette-plugin so it automatically sets up GitHub Actions for running tests and deploying packages to PyPI. # 23rd June 2020, 2:21 am

Practical Python Programming (via) David Beazley has been developing and presenting this three day Python course (aimed at people with some prior programming experience) for over thirteen years, and he’s just released the course materials under a Creative Commons license for the first time. # 29th May 2020, 1:15 pm

Waiting in asyncio. Handy cheatsheet explaining the differences between asyncio.gather(), asyncio.wait_for(), asyncio.as_completed() and asyncio.wait() by Hynek Schlawack. # 26th May 2020, 3:28 pm

pyp: Easily run Python at the shell (via) Fascinating little CLI utility which uses some deeply clever AST introspection to enable little Python one-liners that act as replacements for all manner of pipe-oriented unix utilities. Took me a while to understand how it works from the README, but then I looked at the code and the entire thing is only 380 lines long. There’s also a useful --explain option which outputs the Python source code that it would execute for a given command. # 9th May 2020, 9:05 pm

A hands-on introduction to static code analysis. Useful tutorial on using the Python standard library tokenize and ast modules to find specific patterns in Python source code, using the visitor pattern. # 5th May 2020, 12:15 am

How to get Rich with Python (a terminal rendering library). Will McGugan introduces Rich, his new Python library for rendering content on the terminal. This is a very cool piece of software—out of the box it supports coloured text, emoji, tables, rendering Markdown, syntax highlighting code, rendering Python tracebacks, progress bars and more. “pip install rich” and then “python -m rich” to render a “test card” demo demonstrating the features of the library. # 4th May 2020, 11:27 pm

How to install and upgrade Datasette using pipx (via) I’ve been using pipx to run Datasette for a while now—it’s a neat Python packaging tool which installs a Python CLI command with all of its dependencies in its own isolated virtual environment. Today, thanks to Twitter, I figured out how to install and upgrade plugins in the same environment—so I added a section to the Datasette installation documentation about it. # 4th May 2020, 7:23 pm

Weeknotes: Covid-19, First Python Notebook, more Dogsheep, Tailscale

My covid-19.datasettes.com project publishes information on COVID-19 cases around the world. The project started out using data from Johns Hopkins CSSE, but last week the New York Times started publishing high quality USA county- and state-level daily numbers to their own repository. Here’s the change that added the NY Times data.

[... 993 words]

How to cheat at unit tests with pytest and Black

I’ve been making a lot of progress on Datasette Cloud this week. As an application that provides private hosted Datasette instances (initially targeted at data journalists and newsrooms) the majority of the code I’ve written deals with permissions: allowing people to form teams, invite team members, promote and demote team administrators and suchlike.

[... 885 words]

2020 Web Milestones (via) A lot of stuff is happening in 2020! Mike Sherov rounds it up—highlights include the release of Chromium Edge (Microsoft’s Chrome-powered browser for Windows 7+), Web Components supported in every major browser, Deno 1.x, SameSite Cookies turned on by default (which should dramatically reduce CSRF exposure) and Python 2 and Flash EOLs. # 24th January 2020, 4:43 am

Portable Cloud Functions with the Python Functions Framework (via) The new functions-framework library on PyPI lets you run Google Cloud Functions written in Python in other environments—on your local developer machine or bundled in a Docker container for example. I have real trouble trusting serverless platforms that lock you into a single provider (AWS Lambda makes me very uncomfortable) so this is a breath of fresh air. # 10th January 2020, 4:58 am

Async Support—HTTPX (via) HTTPX is the new async-friendly HTTP library for Python spearheaded by Tom Christie. It works in both async and non-async mode with an API very similar to requests. The async support is particularly interesting—it’s a really clean API, and now that Jupyter supports top-level await you can run ’(await httpx.AsyncClient().get(url)).text’ directly in a cell and get back the response. Most excitingly the library lets you pass an ASGI app directly to the client and then perform requests against it—ideal for unit tests. # 10th January 2020, 4:49 am