Simon Willison’s Weblog

681 items tagged “python”

XARs: An efficient system for self-contained executables (via) Really interesting new open source project from Facebook: a XAR is a new way of packaging up a Python executable complete with its dependencies and resources such that it can be distributed and executed elsewhere as a single file. It’s kind of like a Docker container without Docker—it uses the SquashFS compressed read-only filesystem. I can’t wait to try this out with Datasette. # 13th July 2018, 7 pm

future-fstrings (via) Clever module that backports fstrings to versions of Python earlier than 3.6, by registering itself as a codec and abusing Python’s # -*- coding: future_fstrings -*- feature. Via a conversation on Twitter that pointed out that the JavaScript community have been using transpilation to successfully experiment with new language features for years now. # 13th July 2018, 4:39 am

scrapely. Neat twist on a screen scraping library: this one lets you “train it” by feeding it examples of URLs paired with a dictionary of the data you would like to have extracted from that URL, then uses an instance based learning earning algorithm to run against new URLs. Slightly confusing name since it’s maintained by the scrapy team but is a totally independent project from the scrapy web crawling framework. # 10th July 2018, 8:25 pm

mycli. Really neat auto-complete enabled MySQL terminal client, built using the excellent python-prompt-toolkit. Has a sister-project for PostgreSQL called pgcli. # 11th June 2018, 7:08 pm

At Harvard we’ve built out an infrastructure to allow us to deploy JupyterHub to courses with authentication managed by Canvas. It has allowed us to easily deploy complex set-ups to students so they can do really cool stuff without having to spend hours walking them through setup. Instructors are writing their lectures as IPython notebooks, and distributing them to students, who then work through them in their JupyterHub environment. Our most ambitious so far has been setting up each student in the course with a p2.xlarge machine with cuda and TensorFlow so they could do deep learning work for their final projects. We supported 15 courses last year, and got deployment time for an implementation down to only 2-3 hours.

Chris Rogers # 5th June 2018, 7:37 pm

Hynek Schlawack: Testing & Packaging (via) “How to ensure that your tests run code that you think they are running, and how to measure your coverage over multiple tox runs (in parallel!)”—Hynek makes a convincing argument for putting your packaged Python code in a src/ directory for ease of testing and coverage. # 22nd May 2018, 10:12 pm

Pyre: Fast Type Checking for Python (via) Facebook’s alternative to mypy. “Pyre is designed to be highly parallel, optimizing for near-instant responses so that you get immediate feedback, even in a large codebase”. Like their Hack type checker for PHP, Pyre is implemented in OCaml. # 11th May 2018, 5:47 pm

Black Onlline Demo (via) Black is “the uncompromising Python code formatter” by Łukasz Langa—it reformats Python code to a very carefully thought out styleguide, and provides almost no options for how code should be formatted. It’s reminiscent of gofmt. José Padilla built a handy online tool for trying it out in your browser. # 25th April 2018, 5:17 am

dateparser: python parser for human readable dates (via) I’ve used dateutil.parser for this in the past, but dateparser is a major upgrade: it knows how to parse dates in 200 different language locales, can interpret different timezone representations and handles relative dates (“3 months, 1 week and 1 day ago”) as well. # 24th April 2018, 4:17 pm

How to rewrite your SQL queries in Pandas, and more (via) I still haven’t fully internalized the idioms needed to manipulate DataFrames in pandas. This tutorial helps a great deal—it shows the Pandas equivalents for a host of common SQL queries. # 19th April 2018, 6:34 pm

Intro to Threads and Processes in Python (via) I really like the diagrams in this article which compares the performance of Python threads and processes for different types of task via the excellent concurrent.futures library. # 19th April 2018, 6:32 pm

How to Use Static Type Checking in Python 3.6 (via) Useful introduction to optional static typing in Python 3.6, including how to use mypy, PyCharm and the Atom mypy plugin. # 19th April 2018, 6:30 pm

The best of Python: a collection of my favorite articles from 2017 and 2018 (so far). Gergely Szerovay has brought together an outstandingly interesting selection of Python articles from the last couple of years of activity of the Python community on Medium. A whole load of gems in here that I hadn’t seen before. # 19th April 2018, 6:28 pm

Creating Simple Interactive Forms Using Python + Markdown Using ScriptedForms + Jupyter (via) ScriptedForms is a fascinating Jupyter hack that lets you construct dynamic documents defined using markdown that provide form fields and evaluate Python code instantly as you interact with them. # 19th April 2018, 4:05 pm

Parsing CSV using ANTLR and Python 3. I’ve been trying to figure out how to use ANTLR grammars from Python—this is the first example I’ve found that really clicked for me. # 6th April 2018, 2:33 pm

rubber-docker/linux.c. rubber-docker is a workshop that talks through building a simply Docker clone from scratch in Python. I particularly liked this detail: linux.c is a Python extension written in C that exposes a small collection of Linux syscalls that are needed for the project—clone, mount, pivot_root, setns, umount, umount2 and unshare. Just reading through this module gives a really nice overview of how some of Docker’s underlying magic actually work. # 2nd April 2018, 6:18 pm

import-pypi. A devious Python 3 hack which abuses importlib.machinery to add a hook such that any time you type “import modulename” it checks to see if the module is installed and runs “pip install modulename” first if it isn’t. Intended as a joke, but if you habitually fire up temporary virtual environments for exploratory programming like I do this could actually be a neat little time-saver. # 29th March 2018, 10:16 pm

How to use HDF5 files in Python (via) HDF5: the missing manual. A detailed explanation of the HDF5 file format and how to work with it using the h5py module. HDF5 allows you to efficiently store multiple datasets (plus metatdata about them) in a single file and then load data from them without pulling the entire file into memory—kind of like SQLite but without the SQL support and more optimized for working with arrays. # 19th March 2018, 2:55 pm

Trio Tutorial. Trio is a really nice async library for Python—a simpler alternative to asyncio, with some very clean API design. Best of all, the tutorial is fantastic—it provides a very clear explanation of async/await without diving into the intricacies of coroutines. # 17th March 2018, 3:55 pm

r1chardj0n3s/parse: Parse strings using a specification based on the Python format() syntax. (via) Really neat API design: parse() behaves almost exactly in the opposite way to Python’s built-in format(), so you can use format strings as an alternative to regular expressions for extracting specific data from a string. # 25th February 2018, 4:58 pm

kennethreitz/requests-html: HTML Parsing for Humans™ (via) Neat and tiny wrapper around requests, lxml and html2text that provides a Kenneth Reitz grade API design for intuitively fetching and scraping web pages. The inclusion of html2text means you can use a CSS selector to select a specific HTML element and then convert that to the equivalent markdown in a one-liner. # 25th February 2018, 4:49 pm

s3monkey: A Python library that allows you to interact with Amazon S3 Buckets as if they are your local filesystem. (via) A particularly devious hack by Kenneth Reitz—provides a context manager within which various Python filesystem APIs such as open() and os.listdir() are monkeypatched to operate against an S3 bucket instead. Kenneth built it to make it easier to work with files from apps running on Heroku. Under the hood it uses pyfakefs, a filesystem mocking library originally released by Google. # 21st February 2018, 5:54 pm

A Promenade of PyTorch. Useful overview of the PyTorch machine learning library from Facebook AI Research described as “a Python library enabling GPU-accelerated tensor computation”. Similar to TensorFlow, but where TensorFlow requires you to explicitly construct an execution graph PyTorch instead lets you write regular Python code (if statements, for loops etc) which PyTorch then uses to construct the execution graph for you. # 21st February 2018, 5:31 am

Moving a large and old codebase to Python3 (via) Really interesting case study full of good ideas. The codebase in this case was 240,000 lines of Python and Django written over the course of 15 years. The team used Python-Modernize to aid their transition to a six-compatible codebase first. # 20th February 2018, 2:39 pm

Python & Async Simplified. Andrew Godwin: “Python’s async framework is actually relatively simple when you treat it at face value, but a lot of tutorials and documentation discuss it in minute implementation detail, so I wanted to make a higher-level overview that deliberately ignores some of the small facts and focuses on the practicalities of writing projects that mix both kinds of code.” ‪This is really useful: clearly explains the two separate worlds of Python (sync and async functions) and describes Andrew’s clever sync_to_async and async_to_sync decorators as well.‬ # 20th February 2018, 12:30 am

Channels 2.0. Andrew just shipped Channels 2.0—a major rewrite and redesign of the Channels project he started back in 2014. Channels brings async to Django, providing a logical, standardized way of supporting things like WebSockets and asynchronous execution on top of a Django application. Previously it required you to run a separate Twisted server and redis/RabbitMQ queue, but thanks to Python 3 async everything can now be deployed as a single process. And the new ASGI spec means its turtles all the way down! Everything from URL routing to view functions to middleware can be composed together using the same ASGI interface. # 2nd February 2018, 6:19 pm

Using in Your (Django) Project. Includes this neat trick: if you list in the setup(scripts=) argument you can call it from e.g. cron using the full path to within your virtual environment and it will execute in the correct context without needing to explicitly activate the environment first. # 2nd February 2018, 12:33 pm

Datasette Demo (video) from the SF Python Meetup

I gave a short talk about Datasette last month at the SF Python Meetup Holiday Party. They’ve just posted the video, so here it is:

[... 63 words]

Generating polygon representing a rough 100km circle around latitude/longitude point using Python. A question I posted to the GIS Stack Exchange—I found my own answer using a Python library called geog, then someone else posted a better solution using pyproj. # 17th January 2018, 8:57 pm

Notes on Kafka in Python. Useful review by Matthew Rocklin of the three main open source Python Kafka client libraries as of October 2017. # 13th January 2018, 7:40 pm