Simon Willison’s Weblog

Subscribe
Atom feed for python

1,070 items tagged “python”

The Python programming language.

2017

walrus. Fascinating collection of Python utilities for working with Redis, by Charles Leifer. There are a ton of interesting ideas in here. It starts with Python object wrappers for Redis so you can interact with lists, sets, sorted sets and Redis hashes using Python-like objects. Then it gets really interesting: walrus ships with implementations of autocomplete, rate limiting, a graph engine (using a sorted set hexastore) and an ORM-style models mechanism which manages secondary indexes and even implements basic full-text search.

# 6th November 2017, 1:14 am / redis, python, charles-leifer

Try hosting on PyPy by simonw. I had a go at hosting my blog on PyPy. Thanks to the combination of Travis CI, Sentry and Heroku it was pretty easy to give it a go—I had to swap psycopg2 for psycopg2cffi and switch to the currently undocumented pypy3-5.8.0 Heroku runtime (pypy3-5.5.0 is only compatible with Python 3.3, which Django 2.0 does not support). I ran it in production for a few minutes and didn’t get any Sentry errors but did end up using more Heroku dyno memory than I’m comfortable with—see the graph I posted in a comment. I’m going to stick with CPython 3.6 for the moment. Amusingly I did almost all of the work on this on my phone! Travis CI means it’s easy to create and test a branch through GitHub’s web UI, and deploying a tested branch to Heroku is then just a button click.

# 5th November 2017, 7:17 pm / python, pypy, heroku, travis, sentry

Super Fast String Matching in Python (via) Interesting technique for calculating string similarity at scale in Python, with much better performance than Levenshtein distances. The trick here uses TF/IDF against N-Grams, plus a CSR (Compressed Sparse Row) scipy matrix to run the calculations. Includes clear explanations of each of these concepts.

# 5th November 2017, 3:26 pm / scipy, python

Connecting to Google Sheets with Python. Useful guide to interacting with Google Sheets via the gspread python library, including how to work with Google’s unintuitive “service account keys”.

# 3rd November 2017, 4:13 am / googlespreadsheet, python

How Adversarial Attacks Work. Adversarial attacks against machine learning classifiers involve constructing an input that deliberately produces the wrong classification. This article shows how these can be constructed, and includes examples generated using PyTorch which produce a sports car that gets identified as a toaster and a photo of Sylvester Stallone that gets classified as Keanu Reeves.

# 2nd November 2017, 8:25 pm / machine-learning, python

A Minimalist Guide to SQLite. Pretty comprehensive actually—covers the sqlite3 command line app, importing CSVs, integrating with Python, Pandas and Jupyter notebooks, visualization and more.

# 2nd November 2017, 1:23 am / sqlite, pandas, jupyter, python

Exploring United States Policing Data Using Python. Outstanding introduction to data analysis with Jupyter and Pandas.

# 29th October 2017, 4:58 pm / pandas, open-data, jupyter, python

Fast GeoSpatial Analysis in Python. Some clever advanced performance tricks with Cython and Dask, but it also introduced me to GeoPandas.

# 29th October 2017, 4:47 pm / gis, python

profiling. “An interactive continuous Python profiler”. This is really neat—simply run “profiling myscript.py” to get an interactive, navigable console-based profile inspector at the end of your script... or run “profiling live-profile mywebverver.py” to see a live, updating profile of a long-running process. Has options for statistical profiling as well, which has a much lower overhead in exchange for a less accurate view of what is going on.

# 28th October 2017, 1:06 am / profiling, python

Contributors to python/cpython, Aug 5, 1990—Oct 26, 2017. I love how the graphs on this page summarize the history of the last 27 years of Python development, showing exactly when each core contributor was most active.

# 26th October 2017, 6:58 pm / python

hupper (via) Handy Python module for adding “live reload” development support to just about anything. I’m using it with Sanic—I run “hupper -m app” and it starts up my code in app.py and automatically reloads it any time any of the corresponding files changes on disk.

# 23rd October 2017, 12:34 am / python, sanic

Parse shell one-liners with pyparsing. Neat introduction to the pyparsing library, both for parsing tokens into labeled sections and constructing an AST from them.

# 22nd October 2017, 1:35 pm / parsing, python

Getting the Most out of Sqlite3 with Python. A couple of neat tricks I didn’t know: you can skip cursors entirely by calling .execute and .executemany directly on the connection object, and you can use the connection object as a context manager to execute transactions using a “with” block.

# 22nd October 2017, 12:35 pm / sqlite, python

Porting my blog to Python 3

This blog is now running on Python 3! Admittedly this is nearly nine years after the first release of Python 3.0, but it’s the first Python 3 project I’ve deployed myself so I’m pretty excited about it.

[... 883 words]

Deploying an asynchronous Python microservice with Sanic and Zeit Now

Back in 2008 Natalie Downe and I deployed what today we would call a microservice: json-head, a tiny Google App Engine app that allowed you to make an HTTP head request against a URL and get back the HTTP headers as JSON. One of our initial use-scase for this was Natalie’s addSizes.js, an unobtrusive jQuery script that could annotate links to PDFs and other large files with their corresponding file size pulled from the Content-Length header. Another potential use-case is detecting broken links, since the API can be used to spot 404 status codes (as in this example).

[... 1,361 words]

Sanic. “Sanic is a Flask-like Python 3.5+ web server that’s written to go fast [...] On top of being Flask-like, Sanic supports async request handlers. This means you can use the new shiny async/await syntax from Python 3.5, making your code non-blocking and speedy”.

# 7th October 2017, 6:39 pm / async, sanic, python

uvloop: Blazing fast Python networking. “uvloop makes asyncio fast. In fact, it is at least 2x faster than nodejs, gevent, as well as any other Python asynchronous framework. The performance of uvloop-based asyncio is close to that of Go programs.”

# 7th October 2017, 5:53 pm / async, python

Building a statistical profiler in python. Generating flame graphs of production Python code is surprisingly straight-forward. Brian Pitts built a statistical flame graph profiler into our tikibar debugging tool at Eventbrite and it’s proved extremely useful.

# 5th October 2017, 3:44 pm / profiling, tikibar, django, python

2016

Generating interactive HTML charts from Python?

D3 is absolutely amazing but the learning curve is a bit steep. Totally worth the effort to learn it in the long run, but it’s not so useful if you want to get something done quickly.

[... 97 words]

Why is snapEDA.com slow? Is it because it uses Django?

No, it’s not slow because it uses Django.

[... 36 words]

2014

Feature Flags, from PyCon 2014. Slides from a 15 minute talk I gave at PyCon 2014 about feature flags - what they are, how to use them and how we implemented them at both Lanyrd and Eventbrite.

This was part of a longer workshop on Advanced Django Patterns from Eventbrite and Lanyrd, which I co-presented with Andrew Godwin and Nathan Yergler.

# 10th April 2014, 6:27 pm / pycon, python, django, feature-flags, speaking

What’s the best way to communicate with go libraries from within Python programs?

Go speaks HTTP extremely well, so one simple but powerful approach is to hook your Go libraries up as simple HTTP+JSON APIs and have Python call them over HTTP (the Requests: HTTP for Humans library is awesome for this).

[... 59 words]

2013

What are some good resources to learn how to cleanse data using Python?

http://gnosis.cx/TPiP/ “Text Processing in Python” is a free online book that covers a bunch of useful topics related to data cleanup. It’s over 10 years old now but is still mostly relevant—the chapter on regular expressions is particularly good.

Is it better to create your own framework, or would it be best to just use Django or something like that?

You should absolutely use an existing framework such as Django rather than writing your own.

[... 176 words]

For a Django application, deployed on Heroku, what are my options for storing user-uploaded media files?

S3 is really a no-brainer for this, it’s extremely inexpensive, very easy to integrate with and unbelievably reliable. It’s so cheap that it will be practically free for testing purposes (expect to spend pennies a month on it).

[... 88 words]

Web app: what programming knowledge do I need to create a goings on app?

For this kind of application a much more important question than "how can I build it? is "where will I get the data from?"—If you don’t have a good answer for that building the app is a waste of your time. The world is littered with local events listings apps that no one uses because they don’t have good data.

[... 82 words]

How do I choose between asynchronous web frameworks? My tech group is fairly language agnostic and we’re trying to standardize on some technologies.

Since they are all pretty close to each other and it sounds like your tech group’s skills would support any of them, I would suggest having your tram build a simple prototype in all three so you can compare them for your own particular team and situation.

[... 76 words]

Which companies in London are using Python?

We use Python/Django for http://lanyrd.com/—we’re based in London.

[... 39 words]

What advice would Simon Willison give to a beginner Python/Django developer?

Build something and put it on the internet. Make sure you have an easy way to deploy new versions (Heroku is a good bet if you don’t want to figure out Fabric). Pick a project that’s useful to you—a simple blogging engine is often a good bet, or maybe something that aggregates together your posts from Twitter and Instagram and so on. Or come up with something a bit more creative!

[... 109 words]

Why does Python not have any data structures that store data in sorted order?

The bisect module provides functions for achieving this using a python list as the underlying data structure: http://docs.python.org/2/library...

[... 39 words]