Simon Willison’s Weblog

Items in Nov

Filters: Month: Nov ×


In general, reviewers should favor approving a CL [code review] once it is in a state where it definitely improves the overall code health of the system being worked on, even if the CL isn’t perfect.

Google Standard of Code Review # 28th November 2019, 5:40 am

niche-museums.com, powered by Datasette

I just released a major upgrade to my www.niche-museums.com website (launched last month).

[... 1154 words]

With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.

Hyrum's Law # 21st November 2019, 10:45 pm

How Do You Remove Unused CSS From a Site? (via) Chris Coyier takes an exhaustive look at the current set of tools for automatically removing unused CSS, and finds that there’s no magic bullet but you can get OK results if you use them carefully. # 21st November 2019, 4:41 am

Weeknotes: datasette-template-sql

Last week I talked about wanting to take ona a larger Datasette project, and listed some candidates. I ended up pushing a big project that I hadn’t listed there: the upgrade of Datasette to Python 3.8, which meant dropping support for Python 3.5 (thanks to incompatible dependencies).

[... 521 words]

datasette-template-sql (via) New Datasette plugin, celebrating the new ability in Datasette 0.32 to have asynchronous custom template functions in Jinja (which was previously blocked by the need to support Python 3.5). The plugin adds a sql() function which can be used to execute SQL queries that are embedded directly in custom templates. # 15th November 2019, 12:59 am

I have sometimes wondered how I would fare with a problem where the solution really isn’t in sight. I decided that I should give it a try before I get too old. I’m going to work on artificial general intelligence (AGI). I think it is possible, enormously valuable, and that I have a non-negligible chance of making a difference there, so by a Pascal’s Mugging sort of logic, I should be working on it.

John Carmack # 14th November 2019, 1:18 am

Datasette 0.31. Released today: this version adds compatibility with Python 3.8 and breaks compatibility with Python 3.5. Since Glitch support Python 3.7.3 now I decided I could finally give up on 3.5. This means Datasette can use f-strings now, but more importantly it opens up the opportunity to start taking advantage of Starlette, which makes all kinds of interesting new ASGI-based plugins much easier to build. # 12th November 2019, 6:11 am

My Python Development Environment, 2020 Edition (via) Jacob Kaplan-Moss shares what works for him as a Python environment coming into 2020: pyenv, poetry, and pipx. I’m not a frequent user of any of those tools—it definitely looks like I should be. # 12th November 2019, 1:30 am

Weeknotes: Python 3.7 on Glitch, datasette-render-markdown

Streaks is really working well for me. I’m at 12 days of commits to Datasette, 16 posting a daily Niche Museum, 19 of actually reviewing my email inbox and 14 of guitar practice. I rewarded myself for that last one by purchasing an actual classical (as opposed to acoustic) guitar.

[... 1141 words]

pinboard-to-sqlite (via) Jacob Kaplan-Moss just released the second Dogsheep tool that wasn’t written by me (after goodreads-to-sqlite by Tobias Kunze)—this one imports your Pinterest bookmarks. The repo includes a really clean minimal example of how to use GitHub actions to run tests and release packages to PyPI. # 7th November 2019, 8:46 pm

The first ever commit to Sentry (via) This is fascinating: the first 70 lines of code that started the Sentry error tracking project. It’s a straight-forward Django process_exception() middleware method that collects the traceback and the exception class and saves them to a database. The trick of using the md5 hash of the traceback message to de-dupe errors has been there from the start, and remains one of my favourite things about the design of Sentry. # 6th November 2019, 11:08 pm

Automate the Boring Stuff with Python: Working with PDF and Word Documents. I stumbled across this while trying to extract some data from a PDF file (the kind of file with actual text in it as opposed to dodgy scanned images) and it worked perfectly: PyPDF2.PdfFileReader(open(“file.pdf”, “rb”)).getPage(0).extractText() # 6th November 2019, 4:17 pm

Weeknotes: More releases, more museums

Lots of small releases this week.

[... 538 words]

selenium-demoscraper (via) Really useful minimal example of a Binder project. Click the button to launch a Jupyter notebook in Binder that can take screenshots of URLs using Selenium-controlled headless Firefox. The binder/ folder uses an apt.txt file to install Firefox, requirements.txt to get some Python dependencies and a postBuild Python script to download the Gecko Selenium driver. # 4th November 2019, 3:05 pm

Cloud Run Button: Click-to-deploy your git repos to Google Cloud (via) Google Cloud Run now has its own version of the Heroku deploy button: you can add a button to a GitHub repository which, when clicked, will provide an interface for deploying your repo to the user’s own Google Cloud account using Cloud Run. # 4th November 2019, 4:57 am

sqlite-transform. I released a new CLI tool today: sqlite-transform, which lets you run “transformations” against a SQLite database. I built it out of frustration of constantly running into CSV files that use horrible American date formatting—the “sqlite-transform parsedatetime my.db mytable col1” command runs dateutil’s parser against those columns and replaces them with a nice, sortable ISO formatted timestamp. I’ve also added a “sqlite-transform lambda” command that lets you specify Python code directly on the command-line that should be used to transform every value in a specified column. # 4th November 2019, 2:41 am

Why you should use `python -m pip` (via) Brett Cannon explains why he prefers “python -m pip install...” to “pip install...”—it ensures you always know exactly which Python interpreter environment you are installing packages for. He also makes the case for always installing into a virtual environment, created using “python -m venv”. # 2nd November 2019, 4:41 pm

repo2docker (via) Neat tool from the Jupyter project team: run “jupyter-repo2docker https://github.com/norvig/pytudes” and it will pull a GitHub repository, create a new Docker container for it, install Jupyter and launch a Jupyter instance for you to start trying out the library. I’ve been doing this by hand using virtual environments, but using Docker for even cleaner isolation seems like a smart improvement. # 28th November 2018, 10:06 pm

BigInt: arbitrary-precision integers in JavaScript (via) The BigInt specification is now supported in Chrome—but it hasn’t yet made it to other browsers. The Chrome team have a really interesting solution: they’ve released a JSBI library which you can use to do BigInt calculations in any browser today, and an accompanying Babel plugin which can rewrite calls to that library into BigInt syntax once browser support catches up. I’ve never seen a library that includes a tool for refactoring itself into oblivion before. # 28th November 2018, 2:22 am

AWS Ground Station – Ingest and Process Data from Orbiting Satellites. OK this is cool. “Instead of building your own ground station or entering in to a long-term contract, you can make use of AWS Ground Station on an as-needed, pay-as-you-go basis. [...] You don’t need to build or maintain antennas, and can focus on your work or research. We’re starting out with a pair of ground stations today, and will have 12 in operation by mid-2019. Each ground station is associated with a particular AWS Region; the raw analog data from the satellite is processed by our modem digitizer into a data stream (in what is formally known as VITA 49 baseband or VITA 49 RF over IP data streams) and routed to an EC2 instance that is responsible for doing the signal processing to turn it into a byte stream.” # 28th November 2018, 1:04 am

Whether you like it or not, whether you approve it or not, people outside of your design team are making significant design choices that affect your customers in important ways. They are designing your product. They are designers.

Daniel Burka # 25th November 2018, 7:03 pm

React is “value UI”. Its core principle is that UI is a value, just like a string or an array. You can keep it in a variable, pass it around, use JavaScript control flow with it, and so on. That expressiveness is the point — not some diffing to avoid applying changes to the DOM.

Dan Abramov # 24th November 2018, 5:58 pm

Helicopter accident analysis notebook (via) Ben Welsh worked on an article for the LA Times about helicopter accident rates, and has published the underlying analysis as an extremely detailed Jupyter notebook. Lots of neat new (to me) notebook tricks in here as well. # 19th November 2018, 6:25 pm

dive (via) Handy command-line tool (as with so much of the Docker ecosystem it’s written in Go, which means you can download a Darwin binary directly from the GitHub releases page and run it directly on your Mac) for visually exploring the different layers of a given Docker image. # 19th November 2018, 4:41 am

Zeit 2.0, and building smaller Python Docker images

Changes are afoot at Zeit Now, my preferred hosting provider for the past year (see previous posts). They have announced Now 2.0, an intriguing new approach to providing auto-scaling immutable deployments. It’s built on top of lambdas, and comes with a whole host of new constraints: code needs to fit into a 5MB bundle for example (though it looks like this restriction will soon be relaxed a littleupdate November 19th you can now bump this up to 50MB).

[... 1872 words]

Squoosh. This is by far the most useful example of web assembly I’ve seen so far: Squoosh is a progressive web app for image optimization (JPEG, PNG, GIF, SVG and more) which uses emscripten-compiled versions of best in breed image codec implementations to provide a browser interface for applying and previewing those optimizations. # 12th November 2018, 11:15 pm

The premise of “The Good Place” is absurdly high concept. It sounds less like the basis of a prime-time sitcom than an experimental puppet show conducted, without a permit, on the woodsy edge of a large public park.

Sam Anderson # 10th November 2018, 9:48 pm

Tracking Jupyter: Newsletter, the Third... (via) Tony Hirst’s tracking Jupyter newsletter is fantastic. The Jupyter ecosystem is incredibly exciting and fast moving at the moment as more and more groups discover how productive it is, and Tony’s newsletter is a wealth of information on what’s going on out there. # 9th November 2018, 5:42 pm