Simon Willison’s Weblog

Items in Nov

Filters: Month: Nov ×


Datasette 0.31. Released today: this version adds compatibility with Python 3.8 and breaks compatibility with Python 3.5. Since Glitch support Python 3.7.3 now I decided I could finally give up on 3.5. This means Datasette can use f-strings now, but more importantly it opens up the opportunity to start taking advantage of Starlette, which makes all kinds of interesting new ASGI-based plugins much easier to build. # 12th November 2019, 6:11 am

My Python Development Environment, 2020 Edition (via) Jacob Kaplan-Moss shares what works for him as a Python environment coming into 2020: pyenv, poetry, and pipx. I’m not a frequent user of any of those tools—it definitely looks like I should be. # 12th November 2019, 1:30 am

Weeknotes: Python 3.7 on Glitch, datasette-render-markdown

Streaks is really working well for me. I’m at 12 days of commits to Datasette, 16 posting a daily Niche Museum, 19 of actually reviewing my email inbox and 14 of guitar practice. I rewarded myself for that last one by purchasing an actual classical (as opposed to acoustic) guitar.

[... 1141 words]

pinboard-to-sqlite (via) Jacob Kaplan-Moss just released the second Dogsheep tool that wasn’t written by me (after goodreads-to-sqlite by Tobias Kunze)—this one imports your Pinterest bookmarks. The repo includes a really clean minimal example of how to use GitHub actions to run tests and release packages to PyPI. # 7th November 2019, 8:46 pm

The first ever commit to Sentry (via) This is fascinating: the first 70 lines of code that started the Sentry error tracking project. It’s a straight-forward Django process_exception() middleware method that collects the traceback and the exception class and saves them to a database. The trick of using the md5 hash of the traceback message to de-dupe errors has been there from the start, and remains one of my favourite things about the design of Sentry. # 6th November 2019, 11:08 pm

Automate the Boring Stuff with Python: Working with PDF and Word Documents. I stumbled across this while trying to extract some data from a PDF file (the kind of file with actual text in it as opposed to dodgy scanned images) and it worked perfectly: PyPDF2.PdfFileReader(open(“file.pdf”, “rb”)).getPage(0).extractText() # 6th November 2019, 4:17 pm

Weeknotes: More releases, more museums

Lots of small releases this week.

[... 538 words]

selenium-demoscraper (via) Really useful minimal example of a Binder project. Click the button to launch a Jupyter notebook in Binder that can take screenshots of URLs using Selenium-controlled headless Firefox. The binder/ folder uses an apt.txt file to install Firefox, requirements.txt to get some Python dependencies and a postBuild Python script to download the Gecko Selenium driver. # 4th November 2019, 3:05 pm

Cloud Run Button: Click-to-deploy your git repos to Google Cloud (via) Google Cloud Run now has its own version of the Heroku deploy button: you can add a button to a GitHub repository which, when clicked, will provide an interface for deploying your repo to the user’s own Google Cloud account using Cloud Run. # 4th November 2019, 4:57 am

sqlite-transform. I released a new CLI tool today: sqlite-transform, which lets you run “transformations” against a SQLite database. I built it out of frustration of constantly running into CSV files that use horrible American date formatting—the “sqlite-transform parsedatetime my.db mytable col1” command runs dateutil’s parser against those columns and replaces them with a nice, sortable ISO formatted timestamp. I’ve also added a “sqlite-transform lambda” command that lets you specify Python code directly on the command-line that should be used to transform every value in a specified column. # 4th November 2019, 2:41 am

Why you should use `python -m pip` (via) Brett Cannon explains why he prefers “python -m pip install...” to “pip install...”—it ensures you always know exactly which Python interpreter environment you are installing packages for. He also makes the case for always installing into a virtual environment, created using “python -m venv”. # 2nd November 2019, 4:41 pm

repo2docker (via) Neat tool from the Jupyter project team: run “jupyter-repo2docker https://github.com/norvig/pytudes” and it will pull a GitHub repository, create a new Docker container for it, install Jupyter and launch a Jupyter instance for you to start trying out the library. I’ve been doing this by hand using virtual environments, but using Docker for even cleaner isolation seems like a smart improvement. # 28th November 2018, 10:06 pm

BigInt: arbitrary-precision integers in JavaScript (via) The BigInt specification is now supported in Chrome—but it hasn’t yet made it to other browsers. The Chrome team have a really interesting solution: they’ve released a JSBI library which you can use to do BigInt calculations in any browser today, and an accompanying Babel plugin which can rewrite calls to that library into BigInt syntax once browser support catches up. I’ve never seen a library that includes a tool for refactoring itself into oblivion before. # 28th November 2018, 2:22 am

AWS Ground Station – Ingest and Process Data from Orbiting Satellites. OK this is cool. “Instead of building your own ground station or entering in to a long-term contract, you can make use of AWS Ground Station on an as-needed, pay-as-you-go basis. [...] You don’t need to build or maintain antennas, and can focus on your work or research. We’re starting out with a pair of ground stations today, and will have 12 in operation by mid-2019. Each ground station is associated with a particular AWS Region; the raw analog data from the satellite is processed by our modem digitizer into a data stream (in what is formally known as VITA 49 baseband or VITA 49 RF over IP data streams) and routed to an EC2 instance that is responsible for doing the signal processing to turn it into a byte stream.” # 28th November 2018, 1:04 am

Whether you like it or not, whether you approve it or not, people outside of your design team are making significant design choices that affect your customers in important ways. They are designing your product. They are designers.

Daniel Burka # 25th November 2018, 7:03 pm

React is “value UI”. Its core principle is that UI is a value, just like a string or an array. You can keep it in a variable, pass it around, use JavaScript control flow with it, and so on. That expressiveness is the point — not some diffing to avoid applying changes to the DOM.

Dan Abramov # 24th November 2018, 5:58 pm

Helicopter accident analysis notebook (via) Ben Welsh worked on an article for the LA Times about helicopter accident rates, and has published the underlying analysis as an extremely detailed Jupyter notebook. Lots of neat new (to me) notebook tricks in here as well. # 19th November 2018, 6:25 pm

dive (via) Handy command-line tool (as with so much of the Docker ecosystem it’s written in Go, which means you can download a Darwin binary directly from the GitHub releases page and run it directly on your Mac) for visually exploring the different layers of a given Docker image. # 19th November 2018, 4:41 am

Zeit 2.0, and building smaller Python Docker images

Changes are afoot at Zeit Now, my preferred hosting provider for the past year (see previous posts). They have announced Now 2.0, an intriguing new approach to providing auto-scaling immutable deployments. It’s built on top of lambdas, and comes with a whole host of new constraints: code needs to fit into a 5MB bundle for example (though it looks like this restriction will soon be relaxed a littleupdate November 19th you can now bump this up to 50MB).

[... 1872 words]

Squoosh. This is by far the most useful example of web assembly I’ve seen so far: Squoosh is a progressive web app for image optimization (JPEG, PNG, GIF, SVG and more) which uses emscripten-compiled versions of best in breed image codec implementations to provide a browser interface for applying and previewing those optimizations. # 12th November 2018, 11:15 pm

The premise of “The Good Place” is absurdly high concept. It sounds less like the basis of a prime-time sitcom than an experimental puppet show conducted, without a permit, on the woodsy edge of a large public park.

Sam Anderson # 10th November 2018, 9:48 pm

Tracking Jupyter: Newsletter, the Third... (via) Tony Hirst’s tracking Jupyter newsletter is fantastic. The Jupyter ecosystem is incredibly exciting and fast moving at the moment as more and more groups discover how productive it is, and Tony’s newsletter is a wealth of information on what’s going on out there. # 9th November 2018, 5:42 pm

A Netflix Web Performance Case Study (via) Fascinating description of how Netflix knocked the 3G loading times of their homepage in half for logged-out users by rendering the React templates on the server-side and using the bare amount of vanilla JavaScript necessary to get the homepage interactive—then XHR prefetching the full React code needed to power the subsequent signup flow. Via Alex Russell, who tweets “I’m increasingly optimistic that we can cap JS emissions by quarantining legacy frameworks to the server side.” # 6th November 2018, 8:54 pm

Optimizing Django Admin Paginator. The Django admin paginator uses a count(*) to calculate the total number of rows, so it knows how many pages to display. This makes it unpleasantly slow over large datasets. Haki Benita has an ingenious solution: drop in a custom paginator which uses the PostgreSQL “SET LOCAL statement_timeout TO 200” statement first, then if a timeout error is raised returns 9999999999 as the count instead. This means small tables get accurate page counts and giant tables load display in the admin within a reasonable time period. # 6th November 2018, 6:17 pm

11 barriers to coding in the open and how to overcome them (via) “Terence Eden, open standards lead at GDS, also gave a talk about overcoming barriers to coding in the open”—an intriguing recap of that talk revealing exactly how the UK government have been encouraging a culture of coding in the open and going open source first. # 5th November 2018, 8:53 pm

Twitter conversation about long-term pre-paid archival storage. I kicked off a conversation on Twitter yesterday about long-time archival storage of web content: “Anyone know of a web hosting provider where I can pay a lump sum of money to host a file at a reliable URL essentially forever? Is this even remotely feasible?”. The thread is really interesting—this is definitely an unsolved problem, and it’s clear that the challenge is more organizational (how do you create an entity that can keep this kind of promise—does it need to be some kind of foundation or trust?) than technical. # 5th November 2018, 6:50 pm

If you stop thinking in terms of MVC you might notice that at its core, React is a runtime for effectful functions that don’t execute “once”, but run continuously while being anchored to a call tree.

Dan Abramov # 3rd November 2018, 9:51 pm

Apple’s New Map (via) Map nerds rejoice! Justin O’Beirne had written another spectacularly illustrated essay about web cartography, this time examining the iOS 12 upgrade to Apple Maps in most of California and a little bit of Nevada. # 3rd November 2018, 9:28 pm

Every bitcoin proof-of-work mined is an incremental addition to a vast distributed summoning ritual powering the demon-soul at the heart of the maze, the computational equivalent of a Buddhist prayer wheel spinning in a Himalayan breeze.

Charles Stross # 3rd November 2018, 5:47 pm