Simon Willison’s Weblog

Subscribe

Items tagged opensource in 2024

Filters: Year: 2024 × opensource × Sorted by date


in July 2023, we [Hugging Face] wanted to experiment with a custom license for this specific project [text-generation-inference] in order to protect our commercial solutions from companies with bigger means than we do, who would just host an exact copy of our cloud services.

The experiment however wasn’t successful.

It did not lead to licensing-specific incremental business opportunities by itself, while it did hamper or at least complicate the community contributions, given the legal uncertainty that arises as soon as you deviate from the standard licenses.

Julien Chaumond # 8th April 2024, 6:35 pm

Cally: Accessibility statement (via) Cally is a neat new open source date (and date range) picker Web Component by Nick Williams.

It’s framework agnostic and weighs less than 9KB grilled, but the best feature is this detailed page of documentation covering its accessibility story, including how it was tested—in JAWS, NVDA and VoiceOver.

I’d love to see other open source JavaScript libraries follow this example. # 2nd April 2024, 7:38 pm

Merge pull request #1757 from simonw/heic-heif. I got a PR into GCHQ’s CyberChef this morning! I added support for detecting heic/heif files to the Forensics -> Detect File Type tool.

The change was landed by the delightfully mysterious a3957273. # 28th March 2024, 5:37 am

gchq.github.io/CyberChef (via) CyberChef is “the Cyber Swiss Army Knife—a web app for encryption, encoding, compression and data analysis”—entirely client-side JavaScript with dozens of useful tools for working with different formats and encodings.

It’s maintained and released by GCHQ—the UK government’s signals intelligence security agency.

I didn’t know GCHQ had a presence on GitHub, and I find the URL to this tool absolutely delightful. They first released it back in 2016 and it has over 3,700 commits.

The top maintainers also have suitably anonymous usernames—great work, n1474335, j433866, d98762625 and n1073645. # 26th March 2024, 5:08 pm

Reviving PyMiniRacer (via) PyMiniRacer is “a V8 bridge in Python”—it’s a library that lets Python code execute JavaScript code in a V8 isolate and pass values back and forth (provided they serialize to JSON) between the two environments.

It was originally released in 2016 by Sqreen, a web app security startup startup. They were acquired by Datadog in 2021 and the project lost its corporate sponsor, but in this post Ben Creech announces that he is revitalizing the project, with the approval of the original maintainers.

I’m always interested in new options for running untrusted code in a safe sandbox. PyMiniRacer has the three features I care most about: code can’t access the filesystem or network by default, you can limit the RAM available to it and you can have it raise an error if code execution exceeds a time limit.

The documentation includes a newly written architecture overview which is well worth a read. Rather than embed V8 directly in Python the authors chose to use ctypes—they build their own V8 with a thin additional C++ layer to expose a ctypes-friendly API, then the Python library code uses ctypes to call that.

I really like this. V8 is a notoriously fast moving and complex dependency, so reducing the interface to just a thin C++ wrapper via ctypes feels very sensible to me.

This blog post is fun too: it’s a good, detailed description of the process to update something like this to use modern Python and modern CI practices. The steps taken to build V8 (6.6 GB of miscellaneous source and assets!) across multiple architectures in order to create binary wheels are particularly impressive—the Linux aarch64 build takes several days to run on GitHub Actions runners (via emulation), so they use Mozilla’s Sccache to cache compilation steps so they can retry until it finally finishes.

On macOS (Apple Silicon) installing the package with “pip install mini-racer” got me a 37MB dylib and a 17KB ctypes wrapper module. # 24th March 2024, 5 pm

Redis Adopts Dual Source-Available Licensing (via) Well this sucks: after fifteen years (and contributions from more than 700 people), Redis is dropping the 3-clause BSD license going forward, instead being “dual-licensed under the Redis Source Available License (RSALv2) and Server Side Public License (SSPLv1)” from Redis 7.4 onwards. # 21st March 2024, 2:24 am

Paying people to work on open source is good actually. In which Jacob expands his widely quoted (including here) pithy toot about how quick people are to pick holes in paid open source contributor situations into a satisfyingly comprehensive rant. This is absolutely worth your time—there’s so much I could quote from here, but I’m going to go with this:

“Many, many more people should be getting paid to write free software, but for that to happen we’re going to have to be okay accepting impure or imperfect mechanisms.” # 17th February 2024, 1:42 am

Aya (via) “A global initiative led by Cohere For AI involving over 3,000 independent researchers across 119 countries. Aya is a state-of-art model and dataset, pushing the boundaries of multilingual AI for 101 languages through open science.”

Both the model and the training data are released under Apache 2. The training data looks particularly interesting: “513 million instances through templating and translating existing datasets across 114 languages”—suggesting the data is mostly automatically generated. # 13th February 2024, 5:14 pm

“We believe that open source should be sustainable and open source maintainers should get paid!”

Maintainer: *introduces commercial features*
“Not like that”

Maintainer: *works for a large tech co*
“Not like that”

Maintainer: *takes investment*
“Not like that”

Jacob Kaplan-Moss # 12th February 2024, 5:18 am

The Open Source Sustainability Crisis (via) Chad Whitacre: “What is Open Source sustainability? Why do I say it is in crisis? My answers are that sustainability is when people are getting paid without jumping through hoops, and we’re in a crisis because people aren’t and they’re burning out.”

I really like Chad’s focus on “jumping through hoops” in this piece. It’s possible to build a financially sustainable project today, but it requires picking one or more activities that aren’t directly aligned with working on the core project: raising VC and starting a company, building a hosted SaaS platform and becoming a sysadmin, publishing books and courses and becoming a content author.

The dream is that open source maintainers can invest all of their effort in their projects and make a good living from that work. # 23rd January 2024, 4:48 pm

We estimate the supply-side value of widely-used OSS is $4.15 billion, but that the demand-side value is much larger at $8.8 trillion. We find that firms would need to spend 3.5 times more on software than they currently do if OSS did not exist. [...] Further, 96% of the demand-side value is created by only 5% of OSS developers.

The Value of Open Source Software, Harvard Business School Strategy Unit # 22nd January 2024, 4:35 pm

DSF calls for applicants for a Django Fellow. The Django Software Foundation employs contractors to manage code reviews and releases, responsibly handle security issues, coach new contributors, triage tickets and more.

This is the Django Fellows program, which is now ten years old and has proven enormously impactful.

Mariusz Felisiak is moving on after five years and the DSF are calling for new applicants, open to anywhere in the world. # 20th January 2024, 8:35 am

Talking about Open Source LLMs on Oxide and Friends

I recorded an episode of the Oxide and Friends podcast on Monday, talking with Bryan Cantrill and Adam Leventhal about Open Source LLMs.

[... 1995 words]

Open Source LLMs with Simon Willison. I was invited to the Oxide and Friends weekly audio show (previously on Twitter Spaces, now using broadcast using Discord) to talk about open source LLMs, and to respond to a very poorly considered op-ed calling for them to be regulated as “uniquely dangerous”. It was a really fun conversation, now available to listen to as a podcast or YouTube audio-only video. # 17th January 2024, 8:53 pm

Marimo (via) This is a really interesting new twist on Python notebooks.

The most powerful feature is that these notebooks are reactive: if you change the value or code in a cell (or change the value in an input widget) every other cell that depends on that value will update automatically. It’s the same pattern implemented by Observable JavaScript notebooks, but now it works for Python.

There are a bunch of other nice touches too. The notebook file format is a regular Python file, and those files can be run as “applications” in addition to being edited in the notebook interface. The interface is very nicely built, especially for such a young project—they even have GitHub Copilot integration for their CodeMirror cell editors. # 12th January 2024, 9:17 pm

Microsoft Research relicense Phi-2 as MIT (via) Phi-2 was already an interesting model—really strong results for its size—made available under a non-commercial research license. It just got significantly more interesting: Microsoft relicensed it as MIT open source. # 6th January 2024, 6:06 am

NPM: modele-social (via) This is a fascinating open source package: it’s an NPM module containing an implementation of the rules for calculating social security contributions in France, maintained by a team at Urssaf, the not-quite-government organization in France that manages the collection of social security contributions there.

The rules themselves can be found in the associated GitHub repository, encoded in a YAML-like declarative language called Publicodes that was developed by the French government for this and similar purposes. # 2nd January 2024, 5:55 pm