Simon Willison’s Weblog

Items tagged python in Feb

Filters: Month: Feb × python ×


unasync (via) Today I started wondering out loud if one could write code that takes an asyncio Python library and transforms it into the synchronous equivalent by using some regular expressions to strip out the “await ...” keywords and suchlike. Turns out that can indeed work, and Ratan Kulshreshtha built it! unasync uses the standard library tokenize module to run some transformations against an async library and spit out the sync version automatically. I’m now considering using this for sqlite-utils. # 27th February 2021, 10:20 pm

Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies (via) Alex Birsan describes a new category of security vulnerability he discovered in the npm, pip and gem packaging ecosystems: if a company uses a private repository with internal package names, uploading a package with the same name to the public repository can often result in an attacker being able to execute their own code inside the networks of their target. Alex scored over $130,000 in bug bounties from this one, from a number of name-brand companies. Of particular note for Python developers: the --extra-index-url argument to pip will consult both public and private registries and install the package with the highest version number! # 10th February 2021, 8:42 pm

How to cheat at unit tests with pytest and Black

I’ve been making a lot of progress on Datasette Cloud this week. As an application that provides private hosted Datasette instances (initially targeted at data journalists and newsrooms) the majority of the code I’ve written deals with permissions: allowing people to form teams, invite team members, promote and demote team administrators and suchlike.

[... 885 words]

huey. Charles Leifer’s “little task queue for Python”. Similar to Celery, but it’s designed to work with Redis, SQLite or in the parent process using background greenlets. Worth checking out for the really neat design. The project is new to me, but it’s been under active development since 2011 and has a very healthy looking rate of releases. # 25th February 2019, 7:49 pm

parameterized. I love the @parametrize decorator in pytest, which lets you run the same test multiple times against multiple parameters. The only catch is that the decorator in pytest doesn’t work for old-style unittest TestCase tests, which means you can’t easily add it to test suites that were built using the older model. I just found out about parameterized which works with unittest tests whether or not you are running them using the pytest test runner. # 19th February 2019, 9:05 pm

r1chardj0n3s/parse: Parse strings using a specification based on the Python format() syntax. (via) Really neat API design: parse() behaves almost exactly in the opposite way to Python’s built-in format(), so you can use format strings as an alternative to regular expressions for extracting specific data from a string. # 25th February 2018, 4:58 pm

kennethreitz/requests-html: HTML Parsing for Humans™ (via) Neat and tiny wrapper around requests, lxml and html2text that provides a Kenneth Reitz grade API design for intuitively fetching and scraping web pages. The inclusion of html2text means you can use a CSS selector to select a specific HTML element and then convert that to the equivalent markdown in a one-liner. # 25th February 2018, 4:49 pm

s3monkey: A Python library that allows you to interact with Amazon S3 Buckets as if they are your local filesystem. (via) A particularly devious hack by Kenneth Reitz—provides a context manager within which various Python filesystem APIs such as open() and os.listdir() are monkeypatched to operate against an S3 bucket instead. Kenneth built it to make it easier to work with files from apps running on Heroku. Under the hood it uses pyfakefs, a filesystem mocking library originally released by Google. # 21st February 2018, 5:54 pm

A Promenade of PyTorch. Useful overview of the PyTorch machine learning library from Facebook AI Research described as “a Python library enabling GPU-accelerated tensor computation”. Similar to TensorFlow, but where TensorFlow requires you to explicitly construct an execution graph PyTorch instead lets you write regular Python code (if statements, for loops etc) which PyTorch then uses to construct the execution graph for you. # 21st February 2018, 5:31 am

Moving a large and old codebase to Python3 (via) Really interesting case study full of good ideas. The codebase in this case was 240,000 lines of Python and Django written over the course of 15 years. The team used Python-Modernize to aid their transition to a six-compatible codebase first. # 20th February 2018, 2:39 pm

Python & Async Simplified. Andrew Godwin: “Python’s async framework is actually relatively simple when you treat it at face value, but a lot of tutorials and documentation discuss it in minute implementation detail, so I wanted to make a higher-level overview that deliberately ignores some of the small facts and focuses on the practicalities of writing projects that mix both kinds of code.” ‪This is really useful: clearly explains the two separate worlds of Python (sync and async functions) and describes Andrew’s clever sync_to_async and async_to_sync decorators as well.‬ # 20th February 2018, 12:30 am

Channels 2.0. Andrew just shipped Channels 2.0—a major rewrite and redesign of the Channels project he started back in 2014. Channels brings async to Django, providing a logical, standardized way of supporting things like WebSockets and asynchronous execution on top of a Django application. Previously it required you to run a separate Twisted server and redis/RabbitMQ queue, but thanks to Python 3 async everything can now be deployed as a single process. And the new ASGI spec means its turtles all the way down! Everything from URL routing to view functions to middleware can be composed together using the same ASGI interface. # 2nd February 2018, 6:19 pm

Using setup.py in Your (Django) Project. Includes this neat trick: if you list manage.py in the setup(scripts=) argument you can call it from e.g. cron using the full path to manage.py within your virtual environment and it will execute in the correct context without needing to explicitly activate the environment first. # 2nd February 2018, 12:33 pm

What’s the best way to communicate with go libraries from within Python programs?

Go speaks HTTP extremely well, so one simple but powerful approach is to hook your Go libraries up as simple HTTP+JSON APIs and have Python call them over HTTP (the Requests: HTTP for Humans library is awesome for this).

[... 59 words]

What are the best resources for learning regular expressions?

The O’Reilly book on Regular Expressions is absolutely superb. It will help you build a much deeper understanding if how they actually work than any online tutorial I’ve seen.

[... 62 words]

What web programming framework best supports ’drag and drop’ actions?  Please give examples of sites and/or plug-ins that support the interaction.

Drag and drop is a client-side thing—it has nothing to do with the server-side technology being used.

[... 72 words]

Python Django load MySQL database from csv files performance issue?

Don’t use the Django ORM for bulk imports—the performance overhead is pretty small for regular web page stuff, but it adds up if you are running millions of inserts.

[... 63 words]

How can I look up Django functions?

You can use the ./manage.py shell command to get a shell which will import any Django modules (or any of your own code) without complaining about the location of the settings.py module. Install IPython first to get a much more useful interactive shell when you run that command.

[... 190 words]

If python dictionaries are inherently orderless, why were they given the name if a real dictionary is sorted by letter?

The metaphor here is that paper dictionaries make it easy to look stuff up by letter or word—just like Python dictionaries make looking something up by key an instant operation.

[... 114 words]

How can I install Django in a server without shell access?

I don’t think you can.

[... 42 words]

Is South the best tool to use when doing database migrations in Django?

Yes. And I say that as an author of another Django migrations tool (dmigrations) which offered a small subset of South’s current functionality.

[... 42 words]

Johnny Cache. Clever twist on ORM-level caching for Django. Johnny Cache (great name) monkey-patches Django’s QuerySet classes and caches the result of every single SELECT query in memcached with an infinite expiry time. The cache key includes a “generation” ID for each dependent database table, and the generation is changed every single time a table is updated. For apps with infrequent writes, this strategy should work really well—but if a popular table is being updated constantly the cache will be all but useless. Impressively, the system is transaction-aware—cache entries created during a transaction are held in local memory and only pushed to memcached should the transaction complete successfully. # 28th February 2010, 10:55 pm

Unit Testing Achievements. A plugin for Python’s nose test runner that adds achievements—“Night Shift: Make a failing suite pass between 12am and 5am.” # 28th February 2010, 3:56 pm

PiCloud. An interesting twist on cloud computing for Python. “import cloud; cloud.call(my_function, arguments)” serialises my_function and its arguments, pushes it up to one of their EC2 servers and hands you back a job ID which you can poll (or block on) for a response. They suggest using it for long running tasks such as web crawling or image processing. # 26th February 2010, 6:25 pm

jacobian’s django-deployment-workshop. Notes and resources from Jacob’s 3 hour Django deployment workshop at PyCon, including example configuration files for Apache2 + mod_wsgi, nginx, PostgreSQL and pgpool. # 19th February 2010, 2:28 pm

Why toppcloud will not be agnostic. Ian Bicking’s toppcloud aims to offer deployment with the ease of use of AppEngine against a standard, open source Ubuntu + Python 2.6 + mod_wsgi + Varnish stack. Here he explains why he’s not going to vary the required components: keeping everything completely standardised means everyone gets the same bugs (and the same fixes). # 12th February 2010, 9:21 am

Presenting django-devserver, a better runserver. I really like this—it’s a Django management command (./manage.py rundevserver) which adds SQL logging and cache access summaries to the console output of Django’s local development server. It solves a similar set of problems to the debug toolbar, but requires slightly less setup and doesn’t inject additional HTML in to your pages. You can add your own custom modules to it as well. # 10th February 2010, 11:33 am

Integrate Tornado in Django. A handy ./manage.py runtornado management command for firing up a Tornado server that serves your Django application. # 8th February 2010, 11:12 am

What’s hot? Introducing Zeitgeist. Dan Catt’s first project at the Guardian. “When something appears on the Zeitgeist page, it’s because it performed better (got more attention) than the norm for that content type/section/day”. The application itself is written in Python and runs on Google App Engine. # 5th February 2010, 12:17 pm

Distributed lock on top of memcached. A simple Python context manager (taking advantage of the with statement) that implements a distributed lock using memcached to store lock state: “memcached_lock can be used to ensure that some global data is only updated by one server”. Redis would work well for this kind of thing as well. # 1st February 2010, 10:15 am