Simon Willison’s Weblog

Items tagged python in Nov

Filters: Month: Nov × python ×


Datasette 0.31. Released today: this version adds compatibility with Python 3.8 and breaks compatibility with Python 3.5. Since Glitch support Python 3.7.3 now I decided I could finally give up on 3.5. This means Datasette can use f-strings now, but more importantly it opens up the opportunity to start taking advantage of Starlette, which makes all kinds of interesting new ASGI-based plugins much easier to build. # 12th November 2019, 6:11 am

My Python Development Environment, 2020 Edition (via) Jacob Kaplan-Moss shares what works for him as a Python environment coming into 2020: pyenv, poetry, and pipx. I’m not a frequent user of any of those tools—it definitely looks like I should be. # 12th November 2019, 1:30 am

Automate the Boring Stuff with Python: Working with PDF and Word Documents. I stumbled across this while trying to extract some data from a PDF file (the kind of file with actual text in it as opposed to dodgy scanned images) and it worked perfectly: PyPDF2.PdfFileReader(open(“file.pdf”, “rb”)).getPage(0).extractText() # 6th November 2019, 4:17 pm

Why you should use `python -m pip` (via) Brett Cannon explains why he prefers “python -m pip install...” to “pip install...”—it ensures you always know exactly which Python interpreter environment you are installing packages for. He also makes the case for always installing into a virtual environment, created using “python -m venv”. # 2nd November 2019, 4:41 pm

Object models (via) Extremely comprehensive and readable discussion of the object models of Python, JavaScript, Lua and Perl 5. I learned something new about every one of those languages. # 29th November 2017, 2:59 pm

pillow-simd (via) A “friendly fork” of the Python Pillow image library that takes advantage of SIMD operations on certain CPUs to obtain massive speed-ups—they claim 16 to 40 times faster than ImageMagick. # 14th November 2017, 9:42 pm

Exploring Line Lengths in Python Packages. Interesting exploration of the impact if the 79 character length limit rule of thumb on various Python packages—and a thoroughly useful guide to histogram plotting in Jupyter, pandas and matplotlib. # 10th November 2017, 3:34 pm

dhash (via) Python library to calculate the perceptual difference hash for an image. Delightfully simple algorithm that’s fully explained in the README—it works by scaling the image to 8x8 grayscale and then creator a bitmap representing of each pixel is lighter or darker than the previous one. # 9th November 2017, 5:44 pm

Eager Execution: An imperative, define-by-run interface to TensorFlow. Lets you evaluate TensorFlow expressions interactively in Python without needing to constantly run tf.Session().run(variable). # 8th November 2017, 7:32 pm

TensorFlow 101. Concise, readable introduction to TensorFlow, with Python examples you can execute (and visualize) in Jupyter. # 8th November 2017, 5:57 pm

spaCy. “Industrial-strength Natural Language Processing in Python”. Exciting alternative to nltk—spaCy is mostly written in Cython, makes bold performance claims and ships with a range of pre-built statistical models covering multiple different languages. The API design is clean and intuitive and spaCy even includes an SVG visualizer that works with Jupyter. # 8th November 2017, 4:43 pm

Pull request #4120 · python/cpython. I just had my first ever change merged into Python! It was a one sentence documentation improvement (on how to cancel SQLite operations) but it was fascinating seeing how Python’s GitHub flow is set up—clever use of labels, plus a bot that automatically checks that you have signed a copy of their CLA. # 7th November 2017, 2:06 pm

walrus. Fascinating collection of Python utilities for working with Redis, by Charles Leifer. There are a ton of interesting ideas in here. It starts with Python object wrappers for Redis so you can interact with lists, sets, sorted sets and Redis hashes using Python-like objects. Then it gets really interesting: walrus ships with implementations of autocomplete, rate limiting, a graph engine (using a sorted set hexastore) and an ORM-style models mechanism which manages secondary indexes and even implements basic full-text search. # 6th November 2017, 1:14 am

Try hosting on PyPy by simonw. I had a go at hosting my blog on PyPy. Thanks to the combination of Travis CI, Sentry and Heroku it was pretty easy to give it a go—I had to swap psycopg2 for psycopg2cffi and switch to the currently undocumented pypy3-5.8.0 Heroku runtime (pypy3-5.5.0 is only compatible with Python 3.3, which Django 2.0 does not support). I ran it in production for a few minutes and didn’t get any Sentry errors but did end up using more Heroku dyno memory than I’m comfortable with—see the graph I posted in a comment. I’m going to stick with CPython 3.6 for the moment. Amusingly I did almost all of the work on this on my phone! Travis CI means it’s easy to create and test a branch through GitHub’s web UI, and deploying a tested branch to Heroku is then just a button click. # 5th November 2017, 7:17 pm

Super Fast String Matching in Python (via) Interesting technique for calculating string similarity at scale in Python, with much better performance than Levenshtein distances. The trick here uses TF/IDF against N-Grams, plus a CSR (Compressed Sparse Row) scipy matrix to run the calculations. Includes clear explanations of each of these concepts. # 5th November 2017, 3:26 pm

Connecting to Google Sheets with Python. Useful guide to interacting with Google Sheets via the gspread python library, including how to work with Google’s unintuitive “service account keys”. # 3rd November 2017, 4:13 am

How Adversarial Attacks Work. Adversarial attacks against machine learning classifiers involve constructing an input that deliberately produces the wrong classification. This article shows how these can be constructed, and includes examples generated using PyTorch which produce a sports car that gets identified as a toaster and a photo of Sylvester Stallone that gets classified as Keanu Reeves. # 2nd November 2017, 8:25 pm

A Minimalist Guide to SQLite. Pretty comprehensive actually—covers the sqlite3 command line app, importing CSVs, integrating with Python, Pandas and Jupyter notebooks, visualization and more. # 2nd November 2017, 1:23 am

Generating interactive HTML charts from Python?

D3 is absolutely amazing but the learning curve is a bit steep. Totally worth the effort to learn it in the long run, but it’s not so useful if you want to get something done quickly.

[... 97 words]

Is it better to create your own framework, or would it be best to just use Django or something like that?

You should absolutely use an existing framework such as Django rather than writing your own.

[... 176 words]

What are some apps, problems you would suggest to solve a new python developer?

The best way to learn python in my opinion is using the interactive prompt. Install ipython (a massive improvement on the standard python shell) and use it to interactively solve some simple tasks—things like downloading a CSV file from the web using the urllib library, parsing it with the csv module, then poking around in the data using python list comprehensions and saving some of the results out to a JSON file.

[... 95 words]

What are the tradeoffs (e.g. development speed, performance, scalability) between using various php frameworks, ruby/rails, or python/django?  Is there any reason to choose one overwhelmingly over another?

At this point, I’d argue that the decision between them comes down to programming language rather than framework—the frameworks have mostly converged on a very similar set of features.

[... 145 words]

Who are the top Python programmers on Twitter?

Our site Lanyrd catalogs conference speakers by their Twitter ID, so you could use it to find people who speak at Python conferences—for example:

[... 52 words]

Haystack 1.0 Final Released. I’ve used Haystack on a number of projects recently, and it has proved itself as a completely painless way of adding full-text search (using Solr or Whoosh—I haven’t tried the Xapian backend yet) to a Django ORM powered project in just a few minutes. Congratulations, Daniel + contributors. # 30th November 2009, 8:07 am

Djangopeople JSON parser. Awesome—Andy McKay has compensated for the lack of an official DjangoPeople API by creating a JSONP screen scraped API and hosting it on App Engine. As far as I’m concerned this is an officially supported feature—I’ll make sure future site changes don’t break it, and when I do add an API I’ll try to keep it compatible and help Andy set up redirects. # 28th November 2009, 11:29 am

Request Routing With URI Templates in Node.JS. I quite like this approach (though the implementation is a bit “this” heavy for my taste). JavaScript has no equivalent to Python’s raw strings, so regular expression based routing ala Django ends up being a bit uglier in JavaScript. URI template syntax is more appealing. # 24th November 2009, 9:06 am

Woof—simply exchange files (via) Ultra simple file sharing for local networks: run “woof filename” to start a local web server which will serve up that file, just once, and then terminate. Can also serve up an entire directory as a compressed archive. Written in Python, as a single script which you can drop in to your ~/bin. “woof -s” serves the script itself, so you can easily pass it to someone who has a file you want. # 24th November 2009, 8:44 am

django-batch-select (via) A smart attempt at solving select_related for many-to-many relationships in Django. Add a custom manager to your model and call e.g. Entry.objects.all()[:10].batch_select(“tags”) to execute two queries—one pulling back the first ten entries and another using an “IN” query against the tags table to pull back all of the tags for those entries in one go. # 23rd November 2009, 4:19 pm

Debugging in Python. The missing manual for Python’s powerful pdb debugger. # 18th November 2009, 12:34 pm

Drupal or Django? A Guide for Decision Makers. A surprisingly interesting comparison—the author describes Django as “a framework with CMS-like tendencies” and Drupal as “a CMS with framework-like tendencies”, then explores the benefits of those two different approaches. # 15th November 2009, 10:14 pm