Simon Willison’s Weblog

Blogmarks

Filters: Type: blogmark ×


New developer features in Firefox 75 (via) Firefox 75 just came out with a bunch of new developer features. My favourite is instant evaluation in the JavaScript console: any statement without side effects now shows a preview of its results as you type. # 7th April 2020, 7:23 pm

Zeit Now v1 to sunset soon: no new deployments from 1st May, total shutdown 7th August. I posted a thread on Twitter with some thoughts. Zeit Now v1 remains the best hosting platform I’ve ever used given my particular tastes. They’ve handled the shutdown very responsibly, but I’m sad to see it go. # 4th April 2020, 5:32 am

Django Release Cycle (via) Really nice visual representation of Django’s release cycle, built by Jeff Triplett as a remix of the Python release cycle by Dustin Ingram. # 3rd April 2020, 4:56 pm

Several grumpy opinions about remote work at Tailscale. Really useful in-depth reviews of the tools Tailscale are using to build their remote company. “We decided early on—about the time we realized all three cofounders live in different cities—that we were going to go all-in on remote work, at least for engineering, which for now is almost all our work. As several people have pointed out before, fully remote is generally more stable than partly remote.” # 2nd April 2020, 2:48 pm

gifcap (via) This is really neat: a purely client-side implementation of animated gif screen capture, using navigator.mediaDevices.getDisplayMedia for the screen capturing, mithril for the UI and the gif.js pure JavaScript GIF encoding library to render the output. # 30th March 2020, 9:29 pm

PostGraphile: Production Considerations. PostGraphile is a tool for building a GraphQL API on top of an existing PostgreSQL schema. Their “production considerations” documentation is particularly interesting because it directly addresses some of my biggest worries about GraphQL: the potential for someone to craft an expensive query that ties up server resources. PostGraphile suggests a number of techniques for avoiding this, including a statement timeout, a query whitelist, pagination caps and (in their “pro” version) a cost limit that uses a calculated cost score for the query. # 27th March 2020, 1:22 am

Making Datasets Fly with Datasette and Fly (via) It’s always exciting to see a Datasette tutorial that wasn’t written by me! This one is great—it shows how to load Central Park Squirrel Census data into a SQLite database, explore it with Datasette and then publish it to the Fly hosting platform using datasette-publish-fly and datasette-cluster-map. # 26th March 2020, 11:56 pm

hacker-news-to-sqlite (via) The latest in my Dogsheep series of tools: hacker-news-to-sqlite uses the Hacker News API to fetch your comments and submissions from Hacker News and save them to a SQLite database. # 21st March 2020, 4:27 am

Django: Added support for asynchronous views and middleware (via) An enormously consequential feature just landed in Django, and is set to ship as part of Django 3.1 in August. Asynchronous views will allow Django applications to define views using “async def myview(request)”—taking full advantage of Python’s growing asyncio ecosystem and providing enormous performance improvements for Django sites that do things like hitting APIs over HTTP. Andrew has been puzzling over this for ages and it’s really exciting to see it land in a form that should be usable in a stable Django release in just a few months. # 19th March 2020, 3:43 am

datasette-publish-fly (via) Fly is a neat new Docker hosting provider with a very tempting pricing model: Just $2.67/month for their smallest always-on instance, and they give each user $10/month in free credit. datasette-publish-fly is the first plugin I’ve written using the publish_subcommand plugin hook, which allows extra hosting providers to be added as publish targets. Install the plugin and you can run “datasette publish fly data.db” to deploy SQLite databases to your Fly account. # 19th March 2020, 3:40 am

New governance model for the Django project. This has been under discussion for a long time: I’m really excited to see it put into action. It’s difficult to summarize, but they key effect should be a much more vibrant, active set of people involved in making decisions about the framework. # 12th March 2020, 5:27 pm

Announcing Daylight Map Distribution. Mike Migurski announces a new distribution of OpenStreetMap: a 42GB dump of the version of the data used by Facebook, carefully moderated to minimize the chance of incorrect or maliciously offensive edits. Lots of constructive conversation in the comments about the best way for Facebook to make their moderation decisions more available to the OSM community. # 12th March 2020, 11:44 am

The unexpected Google wide domain check bypass (via) Fantastic story of discovering a devious security vulnerability in a bunch of Google products stemming from a single exploitable regular expression in the Google closure JavaScript library. # 9th March 2020, 11:27 pm

Millions of tiny databases. Fascinating, detailed review of a paper that describes Amazon’s Physalia, a distributed configuration store designed to provide extremely high availability coordination for Elastic Block Store replication. My eyebrows raised at “Physalia is designed to offer consistency and high-availability, even under network partitions.” since that’s such a blatant violation of CAP theorem, but it later justifies it like so: “One desirable property therefore, is that in the event of a partition, a client’s Physalia database will be on the same side of the partition as the client. Clever placement of cells across nodes can maximise the chances of this.” # 5th March 2020, 4:37 am

Wildcard: Spreadsheet-Driven Customization of Web Applications (via) What a fascinating collection of ideas. Wildcard is a browser extension (currently using Tampermonkey and sadly not yet available to try out) which lets you add “spreadsheet-driven customization” to any web application. Watching the animated screenshots in the videos helps explain what this mean—essentially it’s a two-way scraping trick, where content on the page (e.g. Airbnb listings) are extracted into a spreadsheet-like table interface using JavaScript—but then interactions you make in that spreadsheet like filtering and sorting are reflected back on the original page. It even has the ability to serve editable cells by mapping them to form inputs on the page. Lots to think about here. # 28th February 2020, 7:39 pm

Why Google invested in providing Google Fonts for free. Fascinating comment from former Google Fonts team member Raph Levien. In short: text rendered as PNGs hurt a Google Search, fonts were a delay in the transition from Flash, Google Docs needed them to better compete with Office and anything that helps create better ads is easy to find funding for. # 23rd February 2020, 2:13 pm

pup. This is a great idea: a command-line tool for parsing HTML on stdin using CSS selectors. It’s like jq but for HTML. Supports a sensible collection of selectors and has a number of output options for the selected nodes, including plain text and JSON. It also works as a simple pretty-printer for HTML. # 14th February 2020, 4:25 pm

Deep learning isn’t hard anymore. This article does a great job of explaining how transfer learning is unlocking a new wave of innovation around deep learning. Previously if you wanted to train a model you needed vast amounts if data and thousands of dollars of compute time. Thanks to transfer learning you can now take an existing model (such as GPT2) and train something useful on top of it that’s specific to a new domain in just minutes it hours, with only a few hundred or a few thousand new labeled samples. # 7th February 2020, 8:47 am

geojson-to-sqlite (via) I just put out the first release of geojson-to-sqlite—a CLI tool that can convert GeoJSON files (consisting of a Feature or a set of features in a FeatureCollection) into a table in a SQLite database. If you use the --spatialite option it will initalize the table with SpatiaLite and store the geometries in a spacially indexed geometry field—without that option it stores them as GeoJSON. # 31st January 2020, 6:40 am

2020 Web Milestones (via) A lot of stuff is happening in 2020! Mike Sherov rounds it up—highlights include the release of Chromium Edge (Microsoft’s Chrome-powered browser for Windows 7+), Web Components supported in every major browser, Deno 1.x, SameSite Cookies turned on by default (which should dramatically reduce CSRF exposure) and Python 2 and Flash EOLs. # 24th January 2020, 4:43 am

Generated Columns in SQLite (via) SQLite 3.31.0 released today, and generated columns are the single most notable new feature. PostgreSQL 12 added these in October 2019, and MySQL has had them since 5.7 in October 2015. MySQL and SQLite both offer either “stored” or “virtual” generated columns, with virtual columns being calculated at runtime. PostgreSQL currently only supports stored columns. # 24th January 2020, 4:20 am

Ray Bandar’s Bone Palace (via) The hundredth museum I’ve posted to Niche Museums, and this one is my absolute favourite. Ray Bandar spent sixty years collecting over 7,000 animal skulls and putting them on display in his house and basement. Natalie managed to score us a tour a few weeks before the entire collection was donated to the California Academy of Sciences. It was the most amazing room I’ve ever been in. # 18th January 2020, 7:05 am

Your own hosted blog, the easy, free, open way (even if you’re not a computer expert) (via) Jeremy Howard and the fast.ai team have released fast_template—a GitHub repository designed to be used as a template to create new repositories with a complete Jekyll blog configured for use with GitHub pages. GitHub’s official document recommends you install Ruby on your machine to do this, but Jeremy points out that with the right repository setup you can run a blog entirely by editing files through the GitHub web interface. # 17th January 2020, 1:12 am

How to do Zero Downtime Deployments of Docker Containers. I’m determined to get reliable zero-downtime deploys working for a new project, because I know from experience that even a few seconds of downtime during a deploy changes the project mentality from “deploy any time you want” to “don’t deploy too often”. I’m using Docker containers behind Traefik, which means new containers should have traffic automatically balanced to them by Traefik based on their labels. After much fiddling around the pattern described by this article worked best for me: it lets me start a new container, then stop the old one and have Traefik’s “retry” mechanism send any requests to the stopped container over to the new one instead. # 16th January 2020, 11:12 pm

Demos, Prototypes, and MVPs (via) I really like how Jacob describes the difference between a demo and a prototype: a demo is externally facing and helps explain a concept to a customer; a prototype is internally facing and helps prove that something can be built. # 16th January 2020, 8:24 pm

Serving 100µs reads with 100% availability (via) Fascinating use-case for SQLite from Segment: they needed a massively replicated configuration database across all of their instances that process streaming data. They chose to make the configuration available as a ~50GB SQLite database file mirrored to every instance, meaning lookups against that data could complete in microseconds. Changes to the central MySQL configuration store are pulled every 2-3 seconds, resulting in a trade-off of consistency for availability which fits their use-case just fine. # 10th January 2020, 5:15 am

A visual introduction to machine learning. Beautiful interactive essay explaining how a decision tree machine learning module is constructed, and using that to illustrate the concept of overfitting. This is one of the best explanations of machine learning fundamentals I’ve seen anywhere. # 10th January 2020, 5:12 am

Snowpack (via) Really interesting new twist on build systems for JavaScript. Modern browsers (everything since IE11) support JavaScript modules, but actually working with them is tricky since so much of the JavaScript ecosystem expects you to be using a bundler like Webpack. Snowpack is a tool for converting npm dependencies into JavaScript modules which can then be loaded directly by the browser, taking advantage of HTTP/2 to efficiently load the resulting larger number of files. # 10th January 2020, 5:06 am

Portable Cloud Functions with the Python Functions Framework (via) The new functions-framework library on PyPI lets you run Google Cloud Functions written in Python in other environments—on your local developer machine or bundled in a Docker container for example. I have real trouble trusting serverless platforms that lock you into a single provider (AWS Lambda makes me very uncomfortable) so this is a breath of fresh air. # 10th January 2020, 4:58 am

Async Support—HTTPX (via) HTTPX is the new async-friendly HTTP library for Python spearheaded by Tom Christie. It works in both async and non-async mode with an API very similar to requests. The async support is particularly interesting—it’s a really clean API, and now that Jupyter supports top-level await you can run ’(await httpx.AsyncClient().get(url)).text’ directly in a cell and get back the response. Most excitingly the library lets you pass an ASGI app directly to the client and then perform requests against it—ideal for unit tests. # 10th January 2020, 4:49 am

Types

Years

Tags