Simon Willison’s Weblog

653 items tagged “python”

Generating polygon representing a rough 100km circle around latitude/longitude point using Python. A question I posted to the GIS Stack Exchange—I found my own answer using a Python library called geog, then someone else posted a better solution using pyproj. # 17th January 2018, 8:57 pm

Notes on Kafka in Python. Useful review by Matthew Rocklin of the three main open source Python Kafka client libraries as of October 2017. # 13th January 2018, 7:40 pm

Let your code type-hint itself: introducing open source MonkeyType. Instagram have open sourced their tool for automatically adding type annotations to your Python 3 code via runtime tracing. By default it logs the types it sees to a SQLite database, which means you can browse them with Datasette! # 15th December 2017, 2:22 am

Python 3 Readiness (via) 345 of the 360 most popular Python packages are now compatible with Python 3. I’d love to see a version of this graph over time. # 2nd December 2017, 11:13 pm

Object models (via) Extremely comprehensive and readable discussion of the object models of Python, JavaScript, Lua and Perl 5. I learned something new about every one of those languages. # 29th November 2017, 2:59 pm

pillow-simd (via) A “friendly fork” of the Python Pillow image library that takes advantage of SIMD operations on certain CPUs to obtain massive speed-ups—they claim 16 to 40 times faster than ImageMagick. # 14th November 2017, 9:42 pm

Exploring Line Lengths in Python Packages. Interesting exploration of the impact if the 79 character length limit rule of thumb on various Python packages—and a thoroughly useful guide to histogram plotting in Jupyter, pandas and matplotlib. # 10th November 2017, 3:34 pm

dhash (via) Python library to calculate the perceptual difference hash for an image. Delightfully simple algorithm that’s fully explained in the README—it works by scaling the image to 8x8 grayscale and then creator a bitmap representing of each pixel is lighter or darker than the previous one. # 9th November 2017, 5:44 pm

Eager Execution: An imperative, define-by-run interface to TensorFlow. Lets you evaluate TensorFlow expressions interactively in Python without needing to constantly run tf.Session().run(variable). # 8th November 2017, 7:32 pm

TensorFlow 101. Concise, readable introduction to TensorFlow, with Python examples you can execute (and visualize) in Jupyter. # 8th November 2017, 5:57 pm

spaCy. “Industrial-strength Natural Language Processing in Python”. Exciting alternative to nltk—spaCy is mostly written in Cython, makes bold performance claims and ships with a range of pre-built statistical models covering multiple different languages. The API design is clean and intuitive and spaCy even includes an SVG visualizer that works with Jupyter. # 8th November 2017, 4:43 pm

Pull request #4120 · python/cpython. I just had my first ever change merged into Python! It was a one sentence documentation improvement (on how to cancel SQLite operations) but it was fascinating seeing how Python’s GitHub flow is set up—clever use of labels, plus a bot that automatically checks that you have signed a copy of their CLA. # 7th November 2017, 2:06 pm

walrus. Fascinating collection of Python utilities for working with Redis, by Charles Leifer. There are a ton of interesting ideas in here. It starts with Python object wrappers for Redis so you can interact with lists, sets, sorted sets and Redis hashes using Python-like objects. Then it gets really interesting: walrus ships with implementations of autocomplete, rate limiting, a graph engine (using a sorted set hexastore) and an ORM-style models mechanism which manages secondary indexes and even implements basic full-text search. # 6th November 2017, 1:14 am

Try hosting on PyPy by simonw. I had a go at hosting my blog on PyPy. Thanks to the combination of Travis CI, Sentry and Heroku it was pretty easy to give it a go—I had to swap psycopg2 for psycopg2cffi and switch to the currently undocumented pypy3-5.8.0 Heroku runtime (pypy3-5.5.0 is only compatible with Python 3.3, which Django 2.0 does not support). I ran it in production for a few minutes and didn’t get any Sentry errors but did end up using more Heroku dyno memory than I’m comfortable with—see the graph I posted in a comment. I’m going to stick with CPython 3.6 for the moment. Amusingly I did almost all of the work on this on my phone! Travis CI means it’s easy to create and test a branch through GitHub’s web UI, and deploying a tested branch to Heroku is then just a button click. # 5th November 2017, 7:17 pm

Super Fast String Matching in Python (via) Interesting technique for calculating string similarity at scale in Python, with much better performance than Levenshtein distances. The trick here uses TF/IDF against N-Grams, plus a CSR (Compressed Sparse Row) scipy matrix to run the calculations. Includes clear explanations of each of these concepts. # 5th November 2017, 3:26 pm

Connecting to Google Sheets with Python. Useful guide to interacting with Google Sheets via the gspread python library, including how to work with Google’s unintuitive “service account keys”. # 3rd November 2017, 4:13 am

How Adversarial Attacks Work. Adversarial attacks against machine learning classifiers involve constructing an input that deliberately produces the wrong classification. This article shows how these can be constructed, and includes examples generated using PyTorch which produce a sports car that gets identified as a toaster and a photo of Sylvester Stallone that gets classified as Keanu Reeves. # 2nd November 2017, 8:25 pm

A Minimalist Guide to SQLite. Pretty comprehensive actually—covers the sqlite3 command line app, importing CSVs, integrating with Python, Pandas and Jupyter notebooks, visualization and more. # 2nd November 2017, 1:23 am

Exploring United States Policing Data Using Python. Outstanding introduction to data analysis with Jupyter and Pandas. # 29th October 2017, 4:58 pm

Fast GeoSpatial Analysis in Python. Some clever advanced performance tricks with Cython and Dask, but it also introduced me to GeoPandas. # 29th October 2017, 4:47 pm

profiling. “An interactive continuous Python profiler”. This is really neat—simply run “profiling myscript.py” to get an interactive, navigable console-based profile inspector at the end of your script... or run “profiling live-profile mywebverver.py” to see a live, updating profile of a long-running process. Has options for statistical profiling as well, which has a much lower overhead in exchange for a less accurate view of what is going on. # 28th October 2017, 1:06 am

Contributors to python/cpython, Aug 5, 1990—Oct 26, 2017. I love how the graphs on this page summarize the history of the last 27 years of Python development, showing exactly when each core contributor was most active. # 26th October 2017, 6:58 pm

hupper (via) Handy Python module for adding “live reload” development support to just about anything. I’m using it with Sanic—I run “hupper -m app” and it starts up my code in app.py and automatically reloads it any time any of the corresponding files changes on disk. # 23rd October 2017, 12:34 am

Parse shell one-liners with pyparsing. Neat introduction to the pyparsing library, both for parsing tokens into labeled sections and constructing an AST from them. # 22nd October 2017, 1:35 pm

Getting the Most out of Sqlite3 with Python. A couple of neat tricks I didn’t know: you can skip cursors entirely by calling .execute and .executemany directly on the connection object, and you can use the connection object as a context manager to execute transactions using a “with” block. # 22nd October 2017, 12:35 pm

Porting my blog to Python 3

This blog is now running on Python 3! Admittedly this is nearly nine years after the first release of Python 3.0, but it’s the first Python 3 project I’ve deployed myself so I’m pretty excited about it.

[... 883 words]

Deploying an asynchronous Python microservice with Sanic and Zeit Now

Back in 2008 Natalie Downe and I deployed what today we would call a microservice: json-head, a tiny Google App Engine app that allowed you to make an HTTP head request against a URL and get back the HTTP headers as JSON. One of our initial use-scase for this was Natalie’s addSizes.js, an unobtrusive jQuery script that could annotate links to PDFs and other large files with their corresponding file size pulled from the Content-Length header. Another potential use-case is detecting broken links, since the API can be used to spot 404 status codes (as in this example).

[... 1361 words]

Sanic. “Sanic is a Flask-like Python 3.5+ web server that’s written to go fast [...] On top of being Flask-like, Sanic supports async request handlers. This means you can use the new shiny async/await syntax from Python 3.5, making your code non-blocking and speedy”. # 7th October 2017, 6:39 pm

uvloop: Blazing fast Python networking. “uvloop makes asyncio fast. In fact, it is at least 2x faster than nodejs, gevent, as well as any other Python asynchronous framework. The performance of uvloop-based asyncio is close to that of Go programs.” # 7th October 2017, 5:53 pm

Building a statistical profiler in python. Generating flame graphs of production Python code is surprisingly straight-forward. Brian Pitts built a statistical flame graph profiler into our tikibar debugging tool at Eventbrite and it’s proved extremely useful. # 5th October 2017, 3:44 pm