Simon Willison’s Weblog

Blogmarks in Nov

Filters: Type: blogmark × Month: Nov ×

datasette-template-sql (via) New Datasette plugin, celebrating the new ability in Datasette 0.32 to have asynchronous custom template functions in Jinja (which was previously blocked by the need to support Python 3.5). The plugin adds a sql() function which can be used to execute SQL queries that are embedded directly in custom templates. # 15th November 2019, 12:59 am

Datasette 0.31. Released today: this version adds compatibility with Python 3.8 and breaks compatibility with Python 3.5. Since Glitch support Python 3.7.3 now I decided I could finally give up on 3.5. This means Datasette can use f-strings now, but more importantly it opens up the opportunity to start taking advantage of Starlette, which makes all kinds of interesting new ASGI-based plugins much easier to build. # 12th November 2019, 6:11 am

My Python Development Environment, 2020 Edition (via) Jacob Kaplan-Moss shares what works for him as a Python environment coming into 2020: pyenv, poetry, and pipx. I’m not a frequent user of any of those tools—it definitely looks like I should be. # 12th November 2019, 1:30 am

pinboard-to-sqlite (via) Jacob Kaplan-Moss just released the second Dogsheep tool that wasn’t written by me (after goodreads-to-sqlite by Tobias Kunze)—this one imports your Pinterest bookmarks. The repo includes a really clean minimal example of how to use GitHub actions to run tests and release packages to PyPI. # 7th November 2019, 8:46 pm

The first ever commit to Sentry (via) This is fascinating: the first 70 lines of code that started the Sentry error tracking project. It’s a straight-forward Django process_exception() middleware method that collects the traceback and the exception class and saves them to a database. The trick of using the md5 hash of the traceback message to de-dupe errors has been there from the start, and remains one of my favourite things about the design of Sentry. # 6th November 2019, 11:08 pm

Automate the Boring Stuff with Python: Working with PDF and Word Documents. I stumbled across this while trying to extract some data from a PDF file (the kind of file with actual text in it as opposed to dodgy scanned images) and it worked perfectly: PyPDF2.PdfFileReader(open(“file.pdf”, “rb”)).getPage(0).extractText() # 6th November 2019, 4:17 pm

selenium-demoscraper (via) Really useful minimal example of a Binder project. Click the button to launch a Jupyter notebook in Binder that can take screenshots of URLs using Selenium-controlled headless Firefox. The binder/ folder uses an apt.txt file to install Firefox, requirements.txt to get some Python dependencies and a postBuild Python script to download the Gecko Selenium driver. # 4th November 2019, 3:05 pm

Cloud Run Button: Click-to-deploy your git repos to Google Cloud (via) Google Cloud Run now has its own version of the Heroku deploy button: you can add a button to a GitHub repository which, when clicked, will provide an interface for deploying your repo to the user’s own Google Cloud account using Cloud Run. # 4th November 2019, 4:57 am

sqlite-transform. I released a new CLI tool today: sqlite-transform, which lets you run “transformations” against a SQLite database. I built it out of frustration of constantly running into CSV files that use horrible American date formatting—the “sqlite-transform parsedatetime my.db mytable col1” command runs dateutil’s parser against those columns and replaces them with a nice, sortable ISO formatted timestamp. I’ve also added a “sqlite-transform lambda” command that lets you specify Python code directly on the command-line that should be used to transform every value in a specified column. # 4th November 2019, 2:41 am

Why you should use `python -m pip` (via) Brett Cannon explains why he prefers “python -m pip install...” to “pip install...”—it ensures you always know exactly which Python interpreter environment you are installing packages for. He also makes the case for always installing into a virtual environment, created using “python -m venv”. # 2nd November 2019, 4:41 pm

repo2docker (via) Neat tool from the Jupyter project team: run “jupyter-repo2docker” and it will pull a GitHub repository, create a new Docker container for it, install Jupyter and launch a Jupyter instance for you to start trying out the library. I’ve been doing this by hand using virtual environments, but using Docker for even cleaner isolation seems like a smart improvement. # 28th November 2018, 10:06 pm

BigInt: arbitrary-precision integers in JavaScript (via) The BigInt specification is now supported in Chrome—but it hasn’t yet made it to other browsers. The Chrome team have a really interesting solution: they’ve released a JSBI library which you can use to do BigInt calculations in any browser today, and an accompanying Babel plugin which can rewrite calls to that library into BigInt syntax once browser support catches up. I’ve never seen a library that includes a tool for refactoring itself into oblivion before. # 28th November 2018, 2:22 am

AWS Ground Station – Ingest and Process Data from Orbiting Satellites. OK this is cool. “Instead of building your own ground station or entering in to a long-term contract, you can make use of AWS Ground Station on an as-needed, pay-as-you-go basis. [...] You don’t need to build or maintain antennas, and can focus on your work or research. We’re starting out with a pair of ground stations today, and will have 12 in operation by mid-2019. Each ground station is associated with a particular AWS Region; the raw analog data from the satellite is processed by our modem digitizer into a data stream (in what is formally known as VITA 49 baseband or VITA 49 RF over IP data streams) and routed to an EC2 instance that is responsible for doing the signal processing to turn it into a byte stream.” # 28th November 2018, 1:04 am

Helicopter accident analysis notebook (via) Ben Welsh worked on an article for the LA Times about helicopter accident rates, and has published the underlying analysis as an extremely detailed Jupyter notebook. Lots of neat new (to me) notebook tricks in here as well. # 19th November 2018, 6:25 pm

dive (via) Handy command-line tool (as with so much of the Docker ecosystem it’s written in Go, which means you can download a Darwin binary directly from the GitHub releases page and run it directly on your Mac) for visually exploring the different layers of a given Docker image. # 19th November 2018, 4:41 am

Squoosh. This is by far the most useful example of web assembly I’ve seen so far: Squoosh is a progressive web app for image optimization (JPEG, PNG, GIF, SVG and more) which uses emscripten-compiled versions of best in breed image codec implementations to provide a browser interface for applying and previewing those optimizations. # 12th November 2018, 11:15 pm

Tracking Jupyter: Newsletter, the Third... (via) Tony Hirst’s tracking Jupyter newsletter is fantastic. The Jupyter ecosystem is incredibly exciting and fast moving at the moment as more and more groups discover how productive it is, and Tony’s newsletter is a wealth of information on what’s going on out there. # 9th November 2018, 5:42 pm

A Netflix Web Performance Case Study (via) Fascinating description of how Netflix knocked the 3G loading times of their homepage in half for logged-out users by rendering the React templates on the server-side and using the bare amount of vanilla JavaScript necessary to get the homepage interactive—then XHR prefetching the full React code needed to power the subsequent signup flow. Via Alex Russell, who tweets “I’m increasingly optimistic that we can cap JS emissions by quarantining legacy frameworks to the server side.” # 6th November 2018, 8:54 pm

Optimizing Django Admin Paginator. The Django admin paginator uses a count(*) to calculate the total number of rows, so it knows how many pages to display. This makes it unpleasantly slow over large datasets. Haki Benita has an ingenious solution: drop in a custom paginator which uses the PostgreSQL “SET LOCAL statement_timeout TO 200” statement first, then if a timeout error is raised returns 9999999999 as the count instead. This means small tables get accurate page counts and giant tables load display in the admin within a reasonable time period. # 6th November 2018, 6:17 pm

11 barriers to coding in the open and how to overcome them (via) “Terence Eden, open standards lead at GDS, also gave a talk about overcoming barriers to coding in the open”—an intriguing recap of that talk revealing exactly how the UK government have been encouraging a culture of coding in the open and going open source first. # 5th November 2018, 8:53 pm

Twitter conversation about long-term pre-paid archival storage. I kicked off a conversation on Twitter yesterday about long-time archival storage of web content: “Anyone know of a web hosting provider where I can pay a lump sum of money to host a file at a reliable URL essentially forever? Is this even remotely feasible?”. The thread is really interesting—this is definitely an unsolved problem, and it’s clear that the challenge is more organizational (how do you create an entity that can keep this kind of promise—does it need to be some kind of foundation or trust?) than technical. # 5th November 2018, 6:50 pm

Apple’s New Map (via) Map nerds rejoice! Justin O’Beirne had written another spectacularly illustrated essay about web cartography, this time examining the iOS 12 upgrade to Apple Maps in most of California and a little bit of Nevada. # 3rd November 2018, 9:28 pm

Svelte RFC 0001: Reactive assignments (via) Svelte is a really interesting JavaScript framework: it offers a similar component-based developer experience to React but does it while delivering a tiny amount of code-to-the-browser thanks to being built entirely around a compiler that generates the minimum code necessary. In this RFC the Svelte team propose taking this approach even further, by generating code to accompany every relevant variable assignment that can trigger the corresponding view update. The document also has a very clear explanation of how React, Vue and current Svelte differ in their solutions to the challenge of updating the visible HTML view when the corresponding state changes. # 3rd November 2018, 4:58 pm

jantic/DeOldify (via) “A Deep Learning based project for colorizing and restoring old images”. Delightful (and well documented) project that uses a Self-Attention Generative Adversarial Network to colorize old black and white photos, with extremely impressive results. Built on an older version of the fastai library, and trained by running for several days on a 1080TI graphics card. # 2nd November 2018, 11:13 am

Datasette: Ability to customize presentation of specific columns in HTML view. Still a work in progress, but Datasette master now allows you to inject links to one or more additional CSS and JavaScript resources (optionally with SRI hashes) which will be included on every page. Each template also now provides CSS classes on the body element derived from the current database and table names to provide hooks for custom styling. Next up: custom template support. # 30th November 2017, 7:27 am

Can I use... input type=color. TIL <input type=“color”> has reached 78.83% support globally already—biggest gap right now is Mobile Safari. # 29th November 2017, 9:56 pm

Interactive Workflows for C++ with Jupyter. Whoa, this really works... not just an interactive C++ REPL in a Jupyter notebook, but inline graph plotting support and interactive widgets as well. Scroll to the bottom of the article for Binder links which let you fire up an interactive C++ REPL in your browser and start interacting with it instantly. # 29th November 2017, 9:51 pm

Object models (via) Extremely comprehensive and readable discussion of the object models of Python, JavaScript, Lua and Perl 5. I learned something new about every one of those languages. # 29th November 2017, 2:59 pm

Big Data Workflow with Pandas and SQLite (via) Handy tutorial on dealing with larger data (in this case a 3.9GB CSV file) by incrementally loading it into pandas and writing it out to SQLite. # 28th November 2017, 11:02 pm