Simon Willison’s Weblog

Blogmarks in Nov

Filters: Type: blogmark × Month: Nov ×


Big Data Workflow with Pandas and SQLite (via) Handy tutorial on dealing with larger data (in this case a 3.9GB CSV file) by incrementally loading it into pandas and writing it out to SQLite. # 28th November 2017, 11:02 pm

Boiling the Ocean, Incrementally—How Stylo Brought Rust and Servo to Firefox. Firefox Quantum is the product of an impressive, highly risky chain of software engineering—Rust, Servo, then Stylo. # 28th November 2017, 8:34 pm

Firefox Debugger Playground. Excellent hands-on tutorial to using the Firefox JavaScript debugger. I learned a bunch of neat tricks from this—including using conditional breakpoints to add temporary console.log statements—since that function returns undefined it won’t pause your code, but this saves you from having to remember to remove the lines from your source code later. I also didn’t know that the Firefox debugger can show the value of variables in paused code if you hover over them in the source pane. [UPDATE: Turns out Chrome DevTools do this as well—TIL] # 28th November 2017, 4:01 pm

The Best Request Is No Request, Revisited · An A List Apart Article. In HTTP/2 the rules have changed: serving unnecessary code as part of a larger bundle to avoid extra request overhead no longer makes sense. Splitting your code into many files and loading just the ones needed by the current page can knock seconds off your load time. # 28th November 2017, 3:50 pm

Inside Docker’s “FROM scratch” (via) I’m a big fan of understanding your abstractions. Here’s a neat tutorial that dives deep into Docker’s “scratch” image which offers the smallest possible Docker image, and hence provides a great opportunity to understand what a Docker container at its most minimal does for you. # 27th November 2017, 4:33 pm

A Complete CMS with No Server and 18 Lines of Code | Netlify. Slightly hyperbolic title, but there’s something really interesting going on here. Netlify is a CDN/hosting provider optimized for static site builders—it can hook up to a GitHub repository and build and deploy your site on every commit. Netlify CMS is their open-source CMS tool which works in a fascinating way: it’s a single page React app which stores structured content (as Markdown files with embedded key/value pairs) directly to your GitHub repository. Fire up Chrome DevTools and you can watch it using the GitHub API to construct new commits every time you hit “save”. # 26th November 2017, 5:53 pm

Many Small Queries Are Efficient In SQLite. Since SQLite runs in-process rather than being accessed over a network it avoids the per-query overhead of network round trips. This means that while MySQL or PostgreSQL applications need to avoid N+1 query patterns that create 100s of queries per request, SQLite apps can be designed differently: provided you hit indexes or small tables, 200 queries just means 200 extra cheap function calls. # 26th November 2017, 4:24 pm

SQLite Query Language: WITH clause. SQLite’s documentation on recursive CTEs starts out with some nice clear examples of tree traversal using a WITH statement, then gets into graphs, then goes way off the deep end with a Mandelbrot Set query and a query that can solve Soduku puzzles (“in less than 300 milliseconds on a modern workstation”). # 26th November 2017, 7:23 am

Added TSV example to the README · simonw/csvs-to-sqlite@957d4f5. Thanks to a pull request from Jani Monoses, csvs-to-sqlite can now handle TSV (or any other separator) as well as regular CSVs. # 26th November 2017, 7:02 am

harelba/q (via) q is a neat command-line utility that lets you run SQL queries directly against CSV and TSV files. Internally it works by firing up an in-memory SQLite database, and as of the latest release (1.7.1) you can use the new --save-db-to-disk option to save that in-memory database to disk. # 25th November 2017, 5:49 pm

What is the plural of blitz? Wow, WordHippo is a straight up masterclass in keyword SEO tactics. Everything from the page URL to the keyword-crammed content to the enormous quantity of related links. # 25th November 2017, 5:42 pm

VoxelSpace (via) Lovely clear explanation of the voxel space landscape rendering technique used by NovaLogic for Comanche back in 1992, including a working JavaScript demo plus pseudo-code in Python. # 24th November 2017, 7:30 pm

TLDR pages. This is an absurdly good idea: a community maintained set of alternative man pages for common commands with a focus on usage examples, plus a “tldr netstat” command to see them. The man pages themselves are maintained on GitHub. # 24th November 2017, 5:38 am

Return of the Obra Dinn: Dithering Process (via) Lucas Pope (creator of “Papers, Please”) has a new game under development: “Return of the Obra Dinn”, a first-person adventure mystery game set in 1807 that is spectacularly rendered in a 1-bit art style. He has a development diary on tigsource.com, and in this entry he describes the extreme lengths he has gone to in order to develop the best possible dithering implementation for rendering his 3D world in 1-bit colour. “It feels a little weird to put 100 hours into something that won’t be noticed by its absence.” # 23rd November 2017, 9:21 pm

How a single PostgreSQL config change improved slow query performance by 50x. “If you are using SSDs and running PostgreSQL with default configuration, I encourage you to try tuning random_page_cost & seq_page_cost. You might be surprised by some huge performance improvements.” # 23rd November 2017, 8:11 pm

From Markdown to RCE in Atom (via) Lukas Reschke found a remote code execution vulnerability in the Atom editor by taking advantage of a combination of Markdown’s ability to embed HTML, Atom’s Content-Security-Policy allowing JavaScript from the local filesystem to be executed, and a test suite HTML file hidden away in the Atom application package that executes code passed to it via query string. # 23rd November 2017, 4:13 pm

Run the First Edition of Unix (1972) with Docker (via) This is so cool... just run “docker run --rm -it bahamat/unix-1st-ed” to drop into a simulation of a PDP-11 running genuine 1972 era Unix! If you haven’t got into Docker yet, Docker for Mac is a single click install these days and works incredibly well. # 22nd November 2017, 3:36 pm

gzthermal-web (via) I built a quick web application wrapping the gzthermal gzip visualization tool and deployed it to Zeit Now wrapped up in a Docker container. Give it a URL and it shows you a PNG visualization of how gzip encodes that page. # 21st November 2017, 6:24 pm

simonepri/geo-maps. Neat project which publishes GeoJSON maps of the world automatically derived from OpenStreetMap. Three variants are available: country political maritime boundaries, country political coastline boundaries and a general outline of the world’s land territories. # 21st November 2017, 4:06 pm

gzthermal (via) “pseudo thermal view of Gzip/Deflate compression efficiency”—neat tool for visualizing gzip compressed data and understanding exactly how run-length encoding and back references apply to a gzipped file. # 21st November 2017, 2:56 pm

Of SVG, Minification and Gzip. Delightfully nerdy exploration of tricks you can use to hand-optimize your SVG in order to maximize gzip compression. Premature optimization may be the root of all evil but this is still a great way to learn about how gzip actually works. # 21st November 2017, 2:54 pm

JOY.js (via) Delightful new exploratory programming environment from Nicky Case. Try the Turtle demo, which takes Logo and updates it for the modern web. # 21st November 2017, 2:50 pm

SSL Issuer Popularity. The impressive growth of Let’s Encrypt in one graph: from 4.87% of TLS-enabled domains in May 2016 to 36.68% in November 2017. # 21st November 2017, 2:44 pm

Publishing the Lahman Baseball Database with Datasette (via) Ted Lawless used Datasette to publish an API for the Lahman Baseball Database—a dataset of players and teams spanning 1871 to 2016, available under a Creative Commons License. # 20th November 2017, 9:15 pm

Medicaid Eligibility | plainlanguage.gov (via) Useful resource for helping government writers use clear language. I love that this is an official US government website written using Jekyll and developed entirely in the open on GutHub—the commit history is fascinating. # 20th November 2017, 1:59 am

csvs-to-sqlite: Automatically deploy tags as PyPI releases. I learned how to set up Travis CI to automatically deploy new package versions to PyPI when I push a version tag to GitHub (and the tests pass). # 19th November 2017, 6:35 am

pythondotorg/admin.py (via) There are some neat tricks in the Django application that powers Python.org (built a few years ago by RevSys). Here’s how their admin app handles creator/last_modified_by user relationships. # 19th November 2017, 6:28 am

Monodraw for macOS (via) Spectacular OS X application for drawing ASCII art diagrams—it’s basically OmniGraffle for ASCII art. The output is beautiful, and it even lets you do things like draw connecting arrows between boxes and then re-arrange the boxes such that the arrows keep up with your changes. # 18th November 2017, 6:58 pm

Scaling Postgres with Read Replicas & Using WAL to Counter Stale Reads (via) The problem with sending writes to the primary and balancing reads across replicas is dealing with replica lag—what if you write to the primary and then read from a replica that hasn’t had the new state applied to it yet? Brandur Leach dives deep into an elegant solution using PostgreSQL’s LSN (log sequence numbers) accesesed using pg_last_wal_replay_lsn(). An observer process continuously polls the replicas for their most recently applied LSN and stores them in a table. A column in the Users table then records the min_lsn valid for that user, updating it to the pg_current_wal_lsn() of the primary whenever that user makes a write. Combining the two allows the application to randomly select a replica that is up-to-date for the purposes of a specific user any time it needs to make a read. # 18th November 2017, 6:42 pm

csvs-to-sqlite: Refactoring columns into separate lookup tables. I just shipped a new version of csvs-to-sqlite with the ability to extract specified columns into a separate SQLite lookup table by passing additional command-line arguments. # 17th November 2017, 6:41 am