Simon Willison’s Weblog

Subscribe

Items tagged opensource, python

Filters: opensource × python × Sorted by date


Reviving PyMiniRacer (via) PyMiniRacer is “a V8 bridge in Python”—it’s a library that lets Python code execute JavaScript code in a V8 isolate and pass values back and forth (provided they serialize to JSON) between the two environments.

It was originally released in 2016 by Sqreen, a web app security startup startup. They were acquired by Datadog in 2021 and the project lost its corporate sponsor, but in this post Ben Creech announces that he is revitalizing the project, with the approval of the original maintainers.

I’m always interested in new options for running untrusted code in a safe sandbox. PyMiniRacer has the three features I care most about: code can’t access the filesystem or network by default, you can limit the RAM available to it and you can have it raise an error if code execution exceeds a time limit.

The documentation includes a newly written architecture overview which is well worth a read. Rather than embed V8 directly in Python the authors chose to use ctypes—they build their own V8 with a thin additional C++ layer to expose a ctypes-friendly API, then the Python library code uses ctypes to call that.

I really like this. V8 is a notoriously fast moving and complex dependency, so reducing the interface to just a thin C++ wrapper via ctypes feels very sensible to me.

This blog post is fun too: it’s a good, detailed description of the process to update something like this to use modern Python and modern CI practices. The steps taken to build V8 (6.6 GB of miscellaneous source and assets!) across multiple architectures in order to create binary wheels are particularly impressive—the Linux aarch64 build takes several days to run on GitHub Actions runners (via emulation), so they use Mozilla’s Sccache to cache compilation steps so they can retry until it finally finishes.

On macOS (Apple Silicon) installing the package with “pip install mini-racer” got me a 37MB dylib and a 17KB ctypes wrapper module. # 24th March 2024, 5 pm

Marimo (via) This is a really interesting new twist on Python notebooks.

The most powerful feature is that these notebooks are reactive: if you change the value or code in a cell (or change the value in an input widget) every other cell that depends on that value will update automatically. It’s the same pattern implemented by Observable JavaScript notebooks, but now it works for Python.

There are a bunch of other nice touches too. The notebook file format is a regular Python file, and those files can be run as “applications” in addition to being edited in the notebook interface. The interface is very nicely built, especially for such a young project—they even have GitHub Copilot integration for their CodeMirror cell editors. # 12th January 2024, 9:17 pm

Introducing PyPI Organizations. Launched at PyCon US today: Organizations allow packages on the Python Package Index to be owned by a group, not an individual user account. “We’re making organizations available to community projects for free, forever, and to corporate projects for a small fee.”—this is the first revenue generating PyPI feature. # 23rd April 2023, 8:29 pm

How to build, test and publish an open source Python library

At PyGotham this year I presented a ten minute workshop on how to package up a new open source Python library and publish it to the Python Package Index. Here is the video and accompanying notes, which should make sense even without watching the talk.

[... 2055 words]

cinder: Instagram’s performance oriented fork of CPython (via) Instagram forked CPython to add some performance-oriented features they wanted, including a method-at-a-time JIT compiler and a mechanism for eagerly evaluating coroutines (avoiding the overhead of creating a coroutine if an awaited function returns a value without itself needing to await). They’re open sourcing the code to help start conversations about implementing some of these features in CPython itself. I particularly enjoyed the warning that accompanies the repo: this is not intended to be a supported release, and if you decide to run it in production you are on your own! # 4th May 2021, 10:13 pm

PEP 8016 -- The Steering Council Model (via) The votes are in and Python has a new governance model, partly inspired by the model used by the Django Software Foundation. A core elected council of five people (with a maximum of two employees from any individual company) will oversee the project. # 17th December 2018, 4:02 pm

Pull request #4120 · python/cpython. I just had my first ever change merged into Python! It was a one sentence documentation improvement (on how to cancel SQLite operations) but it was fascinating seeing how Python’s GitHub flow is set up—clever use of labels, plus a bot that automatically checks that you have signed a copy of their CLA. # 7th November 2017, 2:06 pm

Slide, Inc.—open source. slide.com have open sourced a whole bunch of interesting Python libraries, most of them involving C extensions or greenlet non-blocking I/O. wirebin (fast binary serialization of native Python types) and meminfo (an extension for finding precise in-memory sizes of Python objects) look particularly interesting. No documentation yet—not even a readme. # 17th June 2010, 8:05 pm

Django 1.2 release notes (via) Released today, this is a terrific upgrade. Multiple database connections, model validation, improved CSRF protection, a messages framework, the new smart if template tag and lots, lots more. I’ve been using the 1.2 betas for a major new project over the past few months and it’s been smooth sailing all the way. # 17th May 2010, 9:11 pm

Why I like Redis

I’ve been getting a lot of useful work done with Redis recently.

[... 900 words]

Introducing Cloudera Desktop. It’s a GUI for Hadoop, and under the hood is a whole stack of open source software, including Python, Django, MooTools, Twisted, lxml, CherryPy, Mako, Java and AspectJ. # 21st October 2009, 6:48 pm

Kung Fu People (via) The first site to launch based on the open source Django code from djangopeople.net! # 19th August 2009, 11:37 am

Scriptlets—Quick web scripts (via) From the prolific Jeff Lindsay, a pastebin-style tool for short server-side scripts written in Python, JavaScript or PHP that executes them within a Google App Engine powered sandbox. The Java code that implements the service is available on GitHub. # 13th August 2009, 1:51 pm

Django 1.1 release notes (via) Django 1.1 is out! Congratulations everyone who worked on this, it’s a fantastic release. New features include aggregate support in the ORM, proxy models, deferred fields and some really nice admin improvements. Oh, and the testing framework is now up to 10 times thanks to smart use of transactions. # 29th July 2009, 9:34 am

NASA NEBULA Services (via) NASA’s new NEBULA cloud computing platform appears to be built entirely on open source infrastructure, including Python, Django, Fabric, Eucalyptus, RabbitMQ, Trac and Solr. # 28th July 2009, 12:10 pm

EveryBlock source code released. EveryBlock’s Knight Foundation grant required them to release the source code after two years, under the GPL. Lots of neat Django / PostgreSQL / GIS tricks to be found within. # 1st July 2009, 8:01 pm

djangopeople.net on GitHub. I’ve released the source code for Django People, the geographical community site developed last year by myself and Natalie Downe (it hasn’t otherwise been touched since April last year, so it needs porting to Django 1.1). If you want a new feature on the site, implement it and I’ll see about merging it in. # 4th May 2009, 6:12 pm

DB2 support for Django is coming. From IBM, under the Apache 2.0 License. I’m not sure if this makes it hard to bundle it with the rest of Django, which uses the BSD license. # 18th February 2009, 10:58 pm

Whoosh. A brand new, pure-python full text indexing engine (think Lucene). Claims to offer performance in the same league as wrappers to C or Java libraries. If this works as well as it claims it will be an excellent tool for adding search to projects that wish to avoid a dependency on an external engine. # 12th February 2009, 12:49 pm

google-mobwrite. Neal Fraser’s terrifyingly clever differential synchronization algorithm (for SubEthaEdit-style collaboration over the web) is now available as an open source Python and JavaScript library. # 24th January 2009, 11:55 pm

Announcing dmigrations

The team at Global Radio (formerly GCap Media) is the largest group of Django developers I’ve personally worked with, consisting of 14 developers split into two scrum teams, all contributing to the same overall codebase.

[... 625 words]

FLOSS Weekly 34: Django. Randal Schwartz interviewed Jacob Kaplan-Moss at OSCON for the consistently excellent FLOSS Weekly podcast. # 27th July 2008, 9:47 am

Protocol Buffers: Google’s Data Interchange Format. Open sourced today. Highly efficient binary protocol for storing and transmitting structured data between C++, Java and Python. Uses a .proto file describing the data structure which is compiled to classes in those languages for serializing and deserializing. 3-10 times smaller and 20-100 times faster than XML. # 8th July 2008, 8:20 am

Reddit release their codebase. Under the same Common Public Attribution License used by Facebook for their recent source release. # 18th June 2008, 2:32 pm

Good architectural layering, and Bzr 1.1. Mark Shuttleworth on the growing importance of plug-in architectures as an open source project evolves, as they allow new developers to release their own components without needing commit access to the project. Django is pretty good for this, but more hooks (and a faster event dispatch system) would be useful. # 9th January 2008, 2:06 pm

The future of web standards. Nice analysis from James Bennett, who suggests that successful open source projects (Linux, Python, Perl etc) could be used as the model for a more effective standards process, and points out that Ian Hickson is something of a BDFL for the WHAT-WG. # 17th December 2007, 1:16 pm

/trunk/jl/scraper. journa-list.com is open source, and the screen scrapers are written in Python. # 11th October 2007, 4:10 pm

The Rubinius Sprint. Sun are throwing a ton of resources at Ruby, because as Tim Bray says, “it’s not fast enough”. Imagine where they’d be if they’d invested this kind of support in Jython five years ago... # 21st September 2007, 11:32 pm

gSculpt. Powerful open source modelling software, written in Python and demonstrated (to much applause) as the last lightning talk of EuroPython 2007. # 11th July 2007, 11:48 pm

google-diff-match-patch (via) Robust algorithms to perform the operations required for synchronizing plain text, in Java, JavaScript and Python. # 9th June 2007, 6:15 pm