Simon Willison’s Weblog

Blogmarks in Oct

Filters: Type: blogmark × Month: Oct ×

Calling C functions from BigQuery with web assembly (via) Google BigQuery lets you define custom SQL functions in JavaScript, and it turns out they expose the WebAssembly.instantiate family of APIs. Which means you can write your UDD in C or Rust, compile it to WebAssembly and run it as part of your query! # 27th October 2019, 5:55 am

Azure Readiness Checklist (via) I love a good comprehensive checklist. This one is focused on large projects running on Azure but it’s still fun to browse through if you are hosting elsewhere, mainly as a reminder of quite how much still goes into deploying large web services into production. # 26th October 2019, 8:32 pm Uber built this open source geospatial analysis tool for large-scale data sets, and they offer it as a free hosted online tool—just click Get Started on the site. I uploaded two CSV files with 30,000+ latitude/longitude points in them just now and used Kepler to render them as images. # 25th October 2019, 4:16 am

Thematic map—GIS Wiki. This is a really useful wiki full of GIS information, and the coverage of different types of thematic maps is particularly thorough. # 21st October 2019, 2:25 am

Setting up Datasette, step by step (via) Tobias describes how he runs Datasette on his own server/VPS, using nginx and systemd. I’m doing something similar for some projects and systemd really does feel like the solution to the “ensure a Python process keeps running” problem I’ve been fighting for over a decade. I really like how Tobias creates a dedicated Linux user for each of his deployed Python projects. # 21st October 2019, 2:20 am

2018 Central Park Squirrel Census in Datasette (via) The Squirrel Census project released their data! 3,000 squirrel observations in Central Park, each with fur color and latitude and longitude and behavioral observations. I love this data so much. I’ve loaded it into a Datasette running on Glitch. # 16th October 2019, 6:01 pm

μPlot (via) “An exceptionally fast, tiny time series chart. [...] from a cold start it can create an interactive chart containing 150,000 data points in 40ms. [...] at < 10 KB, it’s likely the smallest and fastest time series plotter that doesn’t make use of WebGL shaders or WASM” # 14th October 2019, 11:03 pm

goodreads-to-sqlite (via) This is so cool! Tobias Kunze built a Python CLI tool to import your Goodreads data into a SQLite database, inspired by github-to-sqlite and my various other Dogsheep tools. It’s the first Dogsheep style tool I’ve seen that wasn’t built by me—and Tobias’ write-up includes some neat examples of queries you can run against your Goodreads data. I’ve now started using Goodreads and I’m importing my books into my own private Dogsheep Datasette instance. # 14th October 2019, 4:07 am

SQL Murder Mystery in Datasette (via) “A crime has taken place and the detective needs your help. The detective gave you the  crime scene report, but you somehow lost it. You vaguely remember that the crime  was a murder that occurred sometime on ​Jan.15, 2018 and that it took place in SQL  City. Start by retrieving the corresponding crime scene report from the police  department’s database.”—Really fun game to help exercise your skills with SQL by the NU Knight Lab. I loaded their SQLite database into Datasette so you can play in your browser. # 7th October 2019, 11:37 pm

twitter-to-sqlite 0.6, with track and follow. I shipped a new release of my twitter-to-sqlite command-line tool this evening. It now includes experimental features for subscribing to the Twitter streaming API: you can track keywords or follow users and matching Tweets will be written to a SQLite database in real-time as they come in through the API. Since Datasette supports mutable databases now you can run Datasette against the database and run queries against the tweets as they are inserted into the tables. # 6th October 2019, 4:54 am

Streamlit: Turn Python Scripts into Beautiful ML Tools (via) A really interesting new tool / application development framework. Streamlit is designed to help machine learning engineers build usable web frontends for their work. It does this by providing a simple, productive Python environment which lets you declaratively build up a sort-of Notebook style interface for your code. It includes the ability to insert a DataFrame, geospatial map rendering, chart or image into the application with a single Python function call. It’s hard to describe how it works, but the tutorial and demo worked really well for me: “pip install streamlit” and then “streamlit hello” to get a full-featured demo in a browser, then you can run through the tutorial to start building a real interactive application in a few dozen lines of code. # 6th October 2019, 3:52 am

Get your own Pocket OAuth token (via) I hate it when APIs make you jump through extensive hoops just to get an access token for pulling data directly from your own personal account. I’ve been playing with the Pocket API today and it has a pretty complex OAuth flow, so I built a tiny Flask app on Glitch which helps go through the steps to get an API token for your own personal Pocket account. # 5th October 2019, 9:56 pm

Client-Side Certificate Authentication with nginx. I’m intrigued by client-side browser certificates, which allow you to lock down a website such that only browsers with a specific certificate installed can access them. They work on both laptops and mobile phones. I followed the steps in this tutorial and managed to get an nginx instance running which only allows connections from my personal laptop and iPhone. # 5th October 2019, 5:26 pm

NGINX: Authentication Based on Subrequest Result (via) TIL about this neat feature of NGINX: you can use the auth_request directive to cause NGINX to make an HTTP subrequest to a separate authentication server for each incoming HTTP request. The authentication server can see the cookies on the incoming request and tell NGINX if it should fulfill the parent request (via a 2xx status code) or if it should be denied (by returning a 401 or 403). This means you can run NGINX as an authenticating proxy in front of any HTTP application and roll your own custom authentication code as a simple webhook-recieving endpoint. # 4th October 2019, 3:36 pm

SQL queries don’t start with SELECT. This is really useful. Understanding that SELECT (and associated window functions) happen after the WHERE, GROUP BY and HAVING helps explain why you can’t filter a query based on the results of a window function for example. # 3rd October 2019, 8:56 pm

Reinforcement Learning with Prediction-Based Rewards (via) Fascinating result: by teaching a reinforcement learning agent that plays video games to optimize for “unfamiliar states”—states where it cannot predict what will happen next—the agent does a much better job of playing some games. “... for the first time exceeds average human performance on Montezuma’s Revenge. RND achieves state-of-the-art performance, periodically finds all 24 rooms and solves the first level without using demonstrations or having access to the underlying state of the game.” # 31st October 2018, 11:51 pm

October 21 post-incident analysis (via) Legitimately fascinating post-mortem by GitHub. They run database masters in multiple data centers with raft for leader election... but when they had an unexpected network split between east and west coast they ended up with several seconds of write that had not been correctly replicated. Cleaning up the resulting mess took the best part of 24 hours! Distributed systems are hard. # 31st October 2018, 8:50 pm

Making Sense of React Hooks (via) Dan Abramov provides the most comprehensive justification I’ve seen so far for the new React hooks API. # 31st October 2018, 4:26 am

This Is How We Radicalized The World (via) Don’t be put off by the click-baity title: this article by Ryan Broderick is absolutely worth your time. Ryan has been traveling the world covering the global rise of populism, which has been driven in a great part by new patterns of social media usage and distrust of the news media. Ryan ties together stories from a bunch of different countries over the last few years and make a compelling case that we need to come to terms with social media radicalization as a global problem and figure out how to respond to it and deal with the fallout. # 31st October 2018, 1:11 am

matthewp/haunted: React’s Hooks API implemented for web components (via) It’s been fascinating over the past few days watching various frontend web stacks start playing with the new ideas introduced by the proposed React hooks API. lit-html is one of my favourite React alternatives—it’s built on web components and makes really clever use of ES6 template literals (in place of React’s JSX, which requires an additional compilation step). With Haunted Matthew Phillips explores the combination of lit-html, web components and hooks-style state management. # 31st October 2018, 1:04 am

python-twitter/ Creating an OAuth token for accessing a specific Twitter account is way harder than it needs to be. I was about to write my own command-line script for doing this using PIN-based authentication (where you pop open a browser showing the Twitter login flow, then get a PIN number at the end which you paste back into your script) when I discovered that the python-twitter library already ships with a script to do exactly that. Just run “python”, paste in your app’s consumer key and secret, follow a link, enter the resulting PIN and the script will spit out the consumer_key / consumer_secret / access_token_key / access_token_secret combo you need to start using the Twitter API. # 28th October 2018, 5:25 pm

Introduction to Redis Streams. Redis 5.0 is out, introducing the first new Redis data type in several years: streams, a Kafka-like mechanism for implementing a replayable event stream that can be read by many different subscribers. # 18th October 2018, 8:35 am (via) This is a neat trick: GitHub have a dedicated site for their support engineers to send you to if you can’t connect to them. The site tests download speeds from their various domains and then lets you click a button to have GitHub run a traceroute/ping from their servers to your detected IP address and output the results (use devtools to spy on their API method for doing this). Then you can paste the results into a message to their support team. Turns out and implement a similar pattern for those services as well. # 10th October 2018, 7:32 pm

Oxford Deep NLP 2017 course (via) Slides, course description and links to lecture videos for the 2017 Deep Natural Language Processing course at the University of Oxford presented by a team from Google DeepMind. # 31st October 2017, 8:39 pm

Squeezing every drop of performance out of a Django app on Heroku. Ben Firshman describes some lesser known tricks for scaling Django on Heroku—in particular, using gunicorn gevent asynchronous workers and setting up PostgreSQL connection pooling using django-db-geventpool. # 31st October 2017, 2:08 pm

Scaling the GitLab database. Lots of interesting details on how GitLab have worked to scale their PostgreSQL setup. They’ve avoided sharding so far, instead opting for database pooling with pgbouncer and read-only replicas using hot standbys. I like the way they deal with replica lag—they store the current WAL position in a redis key for the user every time there’s a write, then use pg_last_xlog_replay_location() on the various replicas to check and see if they have caught up next time the user makes a request that needs to read some data. # 30th October 2017, 8:53 pm

Fun facts about Rust’s growing popularity. TIL Rust is used “on hundreds of millions of desktops via Dropbox”. And Rust developers are called Rustaceans. # 30th October 2017, 4:45 am

Exploring United States Policing Data Using Python. Outstanding introduction to data analysis with Jupyter and Pandas. # 29th October 2017, 4:58 pm

Fast GeoSpatial Analysis in Python. Some clever advanced performance tricks with Cython and Dask, but it also introduced me to GeoPandas. # 29th October 2017, 4:47 pm

How to teach technical concepts with cartoons. Julia Evans: “This post is about a few patterns I use when illustrating ideas about computers. If you are interested in using drawings to teach people about your very favorite computer topics, hopefully this will help you!” # 28th October 2017, 2:55 pm