Simon Willison’s Weblog

Subscribe

Items tagged python in Aug

Filters: Month: Aug × python × Sorted by date


Understanding Immortal Objects in Python 3.12. Abhinav Upadhyay provides a clear and detailed explanation of immortal objects coming in Python 3.12, which ensure Python no longer updates reference counts for immutable objects such as True, False, None and low-values integers. The trick (which maintains ABI compatibility) is pretty simple: a reference count value of 4294967295 now means an object is immortal, and the existing Py_INCREF and Py_DECREF macros have been updated to take that into account. # 26th August 2023, 12:08 pm

Lark parsing library JSON tutorial (via) A very convincing tutorial for a new-to-me parsing library for Python called Lark.

The tutorial covers building a full JSON parser from scratch, which ends up being just 19 lines of grammar definition code and 15 lines for the transformer to turn that tree into the final JSON.

It then gets into the details of optimization—the default Earley algorithm is quite slow, but swapping that out for a LALR parser (a one-line change) provides a 5x speedup for this particular example. # 13th August 2023, 9:50 pm

deno_python (via) A wildly impressive hack: deno_python uses Deno’s FFI interface to load your system’s Python framework (.dll/.dylib/.so) and sets up JavaScript proxy objects for imported Python objects—so you can run JavaScript code that instantiates objects from Python libraries and uses them to process data in different ways.

The latest release added pip support, so things like ’const np = await pip.import(“numpy”)’ now work. # 12th August 2023, 10:14 pm

Llama from scratch (or how to implement a paper without crying) (via) Brian Kitano implemented the model described in the Llama paper against TinyShakespeare, from scratch, using Python and PyTorch. This write-up is fantastic—meticulous, detailed and deeply informative. It would take several hours to fully absorb and follow everything Brian does here but it would provide multiple valuable lessons in understanding how all of this stuff fits together. # 9th August 2023, 7:21 pm

Python cocktail: mix a context manager and an iterator in equal parts (via) Explanation of a neat trick used by the Tenacity Python library, which provides a mechanism for retrying a chunk of code automatically on errors up to three times using a mixture of an iterator and a context manager to work around Python’s lack of multi-line lambda functions. # 6th August 2023, 5:44 pm

Catching up on the weird world of LLMs

I gave a talk on Sunday at North Bay Python where I attempted to summarize the last few years of development in the space of LLMs—Large Language Models, the technology behind tools like ChatGPT, Google Bard and Llama 2.

[... 10489 words]

storysniffer (via) Ben Welsh built a small Python library that guesses if a URL points to an article on a news website, or if it’s more likely to be a category page or /about page or similar. I really like this as an example of what you can do with a tiny machine learning model: the model is bundled as a ~3MB pickle file as part of the package, and the repository includes the Jupyter notebook that was used to train it. # 1st August 2022, 11:40 pm

SQLModel. A new project by FastAPI creator Sebastián Ramírez: SQLModel builds on top of both SQLAlchemy and Sebastián’s Pydantic validation library to provide a new ORM that’s designed around Python 3’s optional typing. The real brilliance here is that a SQLModel subclass is simultaneously a valid SQLAlchemy ORM model AND a valid Pydantic validation model, saving on duplicate code by allowing the same class to be used both for form/API validation and for interacting with the database. # 24th August 2021, 11:16 pm

sqlite-utils API reference (via) I released sqlite-utils 3.15.1 today with just one change, but it’s a big one: I’ve added docstrings and type annotations to nearly every method in the library, and I’ve started using sphinx-autodoc to generate an API reference page in the documentation directly from those docstrings. I’ve deliberately avoided building this kind of documentation in the past because I so often see projects where the class reference is the ONLY documentation, which I find makes it really hard to figure out how to actually use it. sqlite-utils already has extensive narrative prose documentation so in this case I think it’s a useful enhancement—especially since the docstrings and type hints can help improve the usability of the library in IDEs and Jupyter notebooks. # 11th August 2021, 1:03 am

Announcing the Consortium for Python Data API Standards (via) Interesting effort to unify the fragmented DataFrame API ecosystem, where increasing numbers of libraries offer APIs inspired by Pandas that imitate each other but aren’t 100% compatible. The announcement includes some very clever code to support the effort: custom tooling to compare the existing APIs, and an ingenious GitHub Actions setup to run traces (via sys.settrace), derive type signatures and commit those generated signatures back to a repository. # 19th August 2020, 5:48 am

Pysa: An open source static analysis tool to detect and prevent security issues in Python code (via) Interesting new static analysis tool for auditing Python for security vulnerabilities—things like SQL injection and os.execute() calls. Built by Facebook and tested extensively on Instagram, a multi-million line Django application. # 7th August 2020, 8:50 pm

PyPI now supports uploading via API token (via) All of my open source Python libraries are set up to automatically deploy new tagged releases as PyPI packages using Circle CI or Travis, but I’ve always get a bit uncomfortable about sharing my PyPI password with those CI platforms to get this to work. PyPI just added scopes authentication tokens, which means I can issue a token that’s only allowed to upload a specific project and see an audit log of when that token was last used. # 1st August 2019, 4:03 pm

The subset of reStructuredText worth committing to memory

reStructuredText is the standard for documentation in the Python world.

[... 1186 words]

Compiling SQLite for use with Python Applications (via) Charles Leifer’s recent tutorial on how to compile and build the latest SQLite (with window function support) for use from Python via his pysqlite3 library. # 15th August 2018, 3:51 pm

coleifer/pysqlite3. Now that the pysqlite package is bundled as part of the Python standard library the original open source project is no longer actively maintained, and has not been upgraded for Python 3. Charles Leifer has been working on pysqlite3, a stand-alone package of the module. Crucially, this should enable compiling the latest version of SQLite (via the amalgamation package) without needing to upgrade the version that ships with the operating system. # 15th August 2018, 3:15 pm

Faust: Python Stream Processing (via) A new open source stream processing system released by Robinhood, created by Vineet Goel and Celery creator Ask Solem. The API looks delightful, making very smart use of Python decorators and async/await. The initial release requires Kafka but they plan to support multiple backends, hopefully including Redis Streams. # 6th August 2018, 10:51 pm

Datasette unit tests: monkeytype_call_traces (via) Faceted browse against every function call that occurs during the execution of Datasette’s test suite. I used Instagram’s MonkeyType tool to generate this, which can run Python code and generates a SQLite database of all of the traced calls. It’s intended to be used to automatically add mypy annotations to your code, but since it produces a SQLite database as a by-product I’ve started exploring the intermediary format using Datasette. Generating this was as easy as running “monkeytype run `which pytest`” in the Datasette root directory. # 2nd August 2018, 9:03 pm

Web app: what programming knowledge do I need to create a goings on app?

For this kind of application a much more important question than "how can I build it? is "where will I get the data from?"—If you don’t have a good answer for that building the app is a waste of your time. The world is littered with local events listings apps that no one uses because they don’t have good data.

[... 82 words]

Does SQLAlchemy depend on MySQLdb?

No. SQLAlchemy can talk to all sorts of different DB-API compliant backends, including MySQL Connector/J (Jython only), MySQL Connector/Python, mysql-python (the MySQLdb module) and OurSQL—plus backends for many other databases. See the full list here:

[... 50 words]

What is the best way to learn about setting up server software for Python based web apps?

I’m a big fan of Fabric for automated deployment scripts. Start by reading this tutorial: http://morethanseven.net/2009/07...

[... 40 words]

What is the highest traffic website built on top of Django?

My best guess would be Disqus. Instagram are pretty enormous these days as well.

[... 31 words]

Writing your own traceroute in Python. How to implement traceroute in Python, using the low-level socket module. # 9th August 2010, 12:58 pm

Exploring OAuth-Protected APIs. One of the downsides of OAuth is that it makes debugging APIs in your browser much harder. Seth Fitzsimmons’ oauth-proxy solves this by running a Twisted-powered proxy on your local machine which OAuth-signs every request going through it using your consumer key, secret and tokens for that API. Using it with a browsers risks exposing your key and token (but not secret) to sites you accidentally browse to—it would be useful if you could pass a whitelist of API domains as a command line option to the proxy. # 23rd August 2009, 11:06 am

easy_install no longer working with SourceForge-hosted projects? Unsurprising, since installation software (which is often run as root) that crawls the web and scrapes HTML pages for download links is a horrible, horrible idea. # 19th August 2009, 11:38 am

Kung Fu People (via) The first site to launch based on the open source Django code from djangopeople.net! # 19th August 2009, 11:37 am

How do you install lxml on OS X Leopard without using MacPorts or Fink? I’ve asked on Stack Overflow... hope I get a good answer. # 14th August 2009, 1:04 pm

Python logging from multiple processes. Use Python’s socket log handler to send all log messages to a single server—the python-loggingserver project implements such a server as a Twisted application with a handy web interface for viewing the aggregated logs. # 13th August 2009, 11:55 pm

Scriptlets—Quick web scripts (via) From the prolific Jeff Lindsay, a pastebin-style tool for short server-side scripts written in Python, JavaScript or PHP that executes them within a Google App Engine powered sandbox. The Java code that implements the service is available on GitHub. # 13th August 2009, 1:51 pm

minixsv (via) As far as I can tell, this is the only library that can validate XML using pure Python (no C extension required). I’d be extremely happy if someone would write a pure Python library (or one that only depends on ElementTree, which is included in the standard library) for validating XML against a Relax NG Compact syntax schema. Even DTD validation would be better than nothing! # 12th August 2009, 4:59 pm

topia.termextract. Impressive Python term extraction library (similar to the various term extraction web APIs but you can run it on your own hardware), incorporating a Parts-Of-Speech tagging algorithm. # 10th August 2009, 9:26 pm