Simon Willison’s Weblog

Items tagged python in 2020

Filters: Year: 2020 × python ×


datasette-ripgrep: deploy a regular expression search engine for your source code

This week I built datasette-ripgrep—a web application for running regular expression searches against source code, built on top of the amazing ripgrep command-line tool.

[... 1362 words]

Unravelling `not` in Python (via) Part of a series where Brett Cannon looks at how fundamental Python syntactic sugar works, including a clearly explained dive into the underlying op codes and C implementation. # 27th November 2020, 5:59 pm

Hunting for Malicious Packages on PyPI (via) Jordan Wright installed all 268,000 Python packages from PyPI in containers, and ran Sysdig to capture syscalls made during installation to see if any of them were making extra network calls or reading or writing from the filesystem. Absolutely brilliant piece of security engineering and research. # 14th November 2020, 4:48 am

selenium-wire. Really useful scraping tool: enhances the Python Selenium bindings to run against a proxy which then allows Python scraping code to look at captured requests—great for if a site you are working with triggers Ajax requests and you want to extract data from the raw JSON that came back. # 2nd November 2020, 6:58 pm

Inevitably we got round to talking about async. As much of an unneeded complication as it is for so many day-to-day use-cases, it’s important for Python because, if and when you do need the high throughput handling of these io-bound use-cases, you don’t want to have to switch language. The same for Django: most of what you’re doing has no need of async but you don’t want to have to change web framework just because you need a sprinkling of non-blocking IO.

Carlton Gibson # 27th September 2020, 3:09 pm

Array programming with NumPy—the NumPy paper (via) The NumPy paper is out, published in Nature. I found this enlightening: for an academic paper it’s very understandable, and it filled in quite a few gaps in my mental model of what NumPy is and which problems it addresses, as well as its relationship to the many other tools in the scientific Python stack. # 17th September 2020, 4:34 pm

The “await me maybe” pattern for Python asyncio

I’ve identified a pattern for handling potentially-asynchronous callback functions in Python which I’m calling the “await me maybe” pattern. It works by letting you return a value, a callable function that returns a value OR an awaitable function that returns that value.

[... 787 words]

Announcing the Consortium for Python Data API Standards (via) Interesting effort to unify the fragmented DataFrame API ecosystem, where increasing numbers of libraries offer APIs inspired by Pandas that imitate each other but aren’t 100% compatible. The announcement includes some very clever code to support the effort: custom tooling to compare the existing APIs, and an ingenious GitHub Actions setup to run traces (via sys.settrace), derive type signatures and commit those generated signatures back to a repository. # 19th August 2020, 5:48 am

Pysa: An open source static analysis tool to detect and prevent security issues in Python code (via) Interesting new static analysis tool for auditing Python for security vulnerabilities—things like SQL injection and os.execute() calls. Built by Facebook and tested extensively on Instagram, a multi-million line Django application. # 7th August 2020, 8:50 pm

Better Python Decorators with wrapt (via) Adam Johnson explains the intricacies of decorating a Python function without breaking the ability to correctly introspect it, and dicsusses how Scout use the wrapt library by Graham Dumpleton to implement their instrumentation library. # 2nd July 2020, 9:48 pm

click-app. While working on sqlite-generate today I built a cookiecutter template for building the skeleton for Click command-line utilities. It’s based on datasette-plugin so it automatically sets up GitHub Actions for running tests and deploying packages to PyPI. # 23rd June 2020, 2:21 am

Practical Python Programming (via) David Beazley has been developing and presenting this three day Python course (aimed at people with some prior programming experience) for over thirteen years, and he’s just released the course materials under a Creative Commons license for the first time. # 29th May 2020, 1:15 pm

Waiting in asyncio. Handy cheatsheet explaining the differences between asyncio.gather(), asyncio.wait_for(), asyncio.as_completed() and asyncio.wait() by Hynek Schlawack. # 26th May 2020, 3:28 pm

pyp: Easily run Python at the shell (via) Fascinating little CLI utility which uses some deeply clever AST introspection to enable little Python one-liners that act as replacements for all manner of pipe-oriented unix utilities. Took me a while to understand how it works from the README, but then I looked at the code and the entire thing is only 380 lines long. There’s also a useful --explain option which outputs the Python source code that it would execute for a given command. # 9th May 2020, 9:05 pm

A hands-on introduction to static code analysis. Useful tutorial on using the Python standard library tokenize and ast modules to find specific patterns in Python source code, using the visitor pattern. # 5th May 2020, 12:15 am

How to get Rich with Python (a terminal rendering library). Will McGugan introduces Rich, his new Python library for rendering content on the terminal. This is a very cool piece of software—out of the box it supports coloured text, emoji, tables, rendering Markdown, syntax highlighting code, rendering Python tracebacks, progress bars and more. “pip install rich” and then “python -m rich” to render a “test card” demo demonstrating the features of the library. # 4th May 2020, 11:27 pm

How to install and upgrade Datasette using pipx (via) I’ve been using pipx to run Datasette for a while now—it’s a neat Python packaging tool which installs a Python CLI command with all of its dependencies in its own isolated virtual environment. Today, thanks to Twitter, I figured out how to install and upgrade plugins in the same environment—so I added a section to the Datasette installation documentation about it. # 4th May 2020, 7:23 pm

Weeknotes: Covid-19, First Python Notebook, more Dogsheep, Tailscale

My covid-19.datasettes.com project publishes information on COVID-19 cases around the world. The project started out using data from Johns Hopkins CSSE, but last week the New York Times started publishing high quality USA county- and state-level daily numbers to their own repository. Here’s the change that added the NY Times data.

[... 993 words]

How to cheat at unit tests with pytest and Black

I’ve been making a lot of progress on Datasette Cloud this week. As an application that provides private hosted Datasette instances (initially targeted at data journalists and newsrooms) the majority of the code I’ve written deals with permissions: allowing people to form teams, invite team members, promote and demote team administrators and suchlike.

[... 885 words]

2020 Web Milestones (via) A lot of stuff is happening in 2020! Mike Sherov rounds it up—highlights include the release of Chromium Edge (Microsoft’s Chrome-powered browser for Windows 7+), Web Components supported in every major browser, Deno 1.x, SameSite Cookies turned on by default (which should dramatically reduce CSRF exposure) and Python 2 and Flash EOLs. # 24th January 2020, 4:43 am

Portable Cloud Functions with the Python Functions Framework (via) The new functions-framework library on PyPI lets you run Google Cloud Functions written in Python in other environments—on your local developer machine or bundled in a Docker container for example. I have real trouble trusting serverless platforms that lock you into a single provider (AWS Lambda makes me very uncomfortable) so this is a breath of fresh air. # 10th January 2020, 4:58 am

Async Support—HTTPX (via) HTTPX is the new async-friendly HTTP library for Python spearheaded by Tom Christie. It works in both async and non-async mode with an API very similar to requests. The async support is particularly interesting—it’s a really clean API, and now that Jupyter supports top-level await you can run ’(await httpx.AsyncClient().get(url)).text’ directly in a cell and get back the response. Most excitingly the library lets you pass an ASGI app directly to the client and then perform requests against it—ideal for unit tests. # 10th January 2020, 4:49 am

Better Python Object Serialization. TIL about functions.singledispatch, a decorator which makes it easy to create Python functions with implementations that vary based on the type of their arguments and which can have additional implementations registered after the fact—great for things like custom JSON serialization. # 7th January 2020, 8:35 pm