Simon Willison’s Weblog

Subscribe

Items in Apr, 2018

Filters: Year: 2018 × Month: Apr × Sorted by date


Datasette—a talk at Zeit Day SF 2018 (via) Slides from the talk I gave today about Datasette and Datasette Publish at the Zeit Day SF conference. # 28th April 2018, 9:31 pm

Make Near Me (via) The natural evolution of owlsnearme.com—Make Near Me uses the Zeit Now API to allow anyone to deploy their own version of Owls Near Me for any species! I announced this on stage at Zeit Day SF 2018 as part of my talk on Datasette and Datasette Publish. # 28th April 2018, 9:28 pm

Scaling a High-traffic Rate Limiting Stack With Redis Cluster. Brandur Leach describes the simple, elegant and performant design of Redis Cluster, and talks about how Stripe used it to scaled their rate-limiting from one to ten nodes. # 26th April 2018, 6:34 pm

Exploring the UK Register of Members Interests with SQL and Datasette

Ever wondered which UK Members of Parliament get gifted the most helicopter rides? How about which MPs have been given Christmas hampers by the Sultan of Brunei? (David Cameron, William Hague and Michael Howard apparently). Here’s how to dig through the Register of Members Interests using SQL and Datasette.

[... 1167 words]

System-Versioned Tables in MariaDB (via) Fascinating new feature from the SQL:2011 standard that’s getting its first working implementation in MariaDB 10.3.4. “ALTER TABLE products ADD SYSTEM VERSIONING;” causes the database to store every change made to that table—then you can run “SELECT * FROM products FOR SYSTEM_TIME AS OF TIMESTAMP @t1;” to query the data as of a specific point in time. I’ve tried all manner of horrible mechanisms for achieving this in the past, having it baked into the database itself would be fantastic. # 25th April 2018, 2:34 pm

China had about 99 percent of the 385,000 electric buses on the roads worldwide in 2017, accounting for 17 percent of the country’s entire fleet. Every five weeks, Chinese cities add 9,500 of the zero-emissions transporters—the equivalent of London’s entire working fleet

Jeremy Hodges # 25th April 2018, 7:19 am

Black Onlline Demo (via) Black is “the uncompromising Python code formatter” by Łukasz Langa—it reformats Python code to a very carefully thought out styleguide, and provides almost no options for how code should be formatted. It’s reminiscent of gofmt. José Padilla built a handy online tool for trying it out in your browser. # 25th April 2018, 5:17 am

JSON Escape Text. I built a tiny tool for turning text into an escaped JSON string—I needed it to help create descriptions and canned SQL queries for adding to Datasette’s metadata.json files. # 25th April 2018, 4:13 am

The current linkedin.com homepage clocks in at 1.9MB of CSS (156KB compressed). After re-building a fully-functional version of the homepage with CSS Blocks, we were able to serve the same page with just 38KB of CSS. To be clear: that’s the uncompressed size. After compression, that CSS file weighed in at less than 9KB!

Chris Eppstein # 24th April 2018, 8:40 pm

dateparser: python parser for human readable dates (via) I’ve used dateutil.parser for this in the past, but dateparser is a major upgrade: it knows how to parse dates in 200 different language locales, can interpret different timezone representations and handles relative dates (“3 months, 1 week and 1 day ago”) as well. # 24th April 2018, 4:17 pm

csvs-to-sqlite 0.8. I released a new version of my csvs-to-sqlite tool this morning with a bunch of handy new features. It can now rename columns and define their types, add the CSV filenames as an additional column, add create indexes on columns and parse dates and datetimes into SQLite-friendly ISO formatted values. # 24th April 2018, 4:11 pm

react-jsonschema-form. Exciting library from the Mozilla Services team: given a JSON Schema definition, react-jsonschema-form can produce a cleanly designed React-powered form for adding and editing data that matches that schema. Includes support for adding multiple items in a nested array, re-ordering them, custom form widgets and more. # 23rd April 2018, 9:38 pm

Why it took a long time to build that tiny link preview on Wikipedia (via) Wikipedia now shows a little preview card on internal links with an image and summary paragraph of the linked page. As a Wikpedia user I absolutely love this feature—and as an engineer and product designer, it’s fascinating to hear the challenges they overcame to ship it. Of particular interest: actually generating a useful summary of a page, while stripping out the cruft that often accumulates at the beginning of their text. It’s also an impressive scaling challenge: the API they use for this feature is now handling more than 500,000 requests per minute. # 23rd April 2018, 9:07 pm

Migrations are both essential and frustratingly frequent as your codebase ages and your business grows: most tools and processes only support about one order of magnitude of growth before becoming ineffective, so rapid growth makes them a way of life. [...] As a result you switch tools a lot, and your ability to migrate to new software can easily become the defining constraint for your overall velocity. [...] Migrations matter because they are usually the only available avenue to make meaningful progress on technical debt.

Will Larson # 23rd April 2018, 3:03 pm

Datasette ClusterMap Plugin – Querying UK Food Standards Agency (FSA) Food Hygiene Ratings Open Data (via) Tony Hirst wrote a tutorial on using datasette-cluster-map to analyze food hygiene ratings data from the FSA # 20th April 2018, 8:50 pm

Datasette plugins, and building a clustered map visualization

Datasette now supports plugins!

[... 751 words]

I submitted a PWA to 3 app stores. Here’s what I learned (via) Useful real-world experience shipping a progressive web app to the iOS, Android and Windows app stores. # 19th April 2018, 9:06 pm

How to rewrite your SQL queries in Pandas, and more (via) I still haven’t fully internalized the idioms needed to manipulate DataFrames in pandas. This tutorial helps a great deal—it shows the Pandas equivalents for a host of common SQL queries. # 19th April 2018, 6:34 pm

Intro to Threads and Processes in Python (via) I really like the diagrams in this article which compares the performance of Python threads and processes for different types of task via the excellent concurrent.futures library. # 19th April 2018, 6:32 pm

How to Use Static Type Checking in Python 3.6 (via) Useful introduction to optional static typing in Python 3.6, including how to use mypy, PyCharm and the Atom mypy plugin. # 19th April 2018, 6:30 pm

The best of Python: a collection of my favorite articles from 2017 and 2018 (so far). Gergely Szerovay has brought together an outstandingly interesting selection of Python articles from the last couple of years of activity of the Python community on Medium. A whole load of gems in here that I hadn’t seen before. # 19th April 2018, 6:28 pm

Creating Simple Interactive Forms Using Python + Markdown Using ScriptedForms + Jupyter (via) ScriptedForms is a fascinating Jupyter hack that lets you construct dynamic documents defined using markdown that provide form fields and evaluate Python code instantly as you interact with them. # 19th April 2018, 4:05 pm

What’s New in MySQL 8.0. MySQL 8 has lots of exciting improvements: Window functions, SRS aware spatial types for GIS, utf8mb4 by default, a ton of JSON improvements and atomic DDL. I no longer feel at a significant disadvantage when I have to use MySQL in place of PostgreSQL. # 19th April 2018, 4:03 pm

Text Embedding Models Contain Bias. Here’s Why That Matters (via) Excellent discussion from the Google AI team of the enormous challenge of building machine learning models without accidentally encoding harmful bias in a way that cannot be easily detected. # 17th April 2018, 8:54 pm

Suppose a runaway success novel/tv/film franchise has “Bob” as the evil bad guy. Reams of fanfictions are written with “Bob” doing horrible things. People endlessly talk about how bad “Bob” is on twitter. Even the New York times writes about Bob latest depredations, when he plays off current events.

Your name is Bob. Suddenly all the AIs in the world associate your name with evil, death, killing, lying, stealing, fraud, and incest. AIs silently, slightly ding your essays, loan applications, uber driver applications, and everything you write online. And no one believes it’s really happening. Or the powers that be think it’s just a little accidental damage because the AI overall is still, overall doing a great job of sentiment analysis and fraud detection.

Daniel Von Fange # 17th April 2018, 8:51 pm

A rating system for open data proposed by Tim Berners-Lee, founder of the World Wide Web. To score the maximum five stars, data must (1) be available on the Web under an open licence, (2) be in the form of structured data, (3) be in a non-proprietary file format, (4) use URIs as its identifiers (see also RDF), (5) include links to other data sources (see linked data). To score 3 stars, it must satisfy all of (1)-(3), etc.

Five stars of open data # 17th April 2018, 4:20 am

Datasette 0.19: Plugins Documentation (via) I’ve released the first preview of Datasette’s new plugin support, which uses the pluggy package originally developed for py.test. So far the only two plugin hooks are for SQLite connection creation (allowing custom SQL functions to be registered) and Jinja2 template environment initialization (for custom template tags), but this release is mainly about exercising the plugin registration mechanism and starting to gather feedback. Lots more to come. # 17th April 2018, 3:59 am

The way I would talk about myself as a senior engineer is that I’d say “I know how I would solve the problem” and because I know how I would solve it I could also teach someone else to do it. And my theory is that the next level is that I can say about myself “I know how others would solve the problem”. Let’s make that a bit more concrete. You make that sentence: “I can anticipate how the API choices that I’m making, or the abstractions that I’m introducing into a project, how they impact how other people would solve a problem.”

Malte Ubl # 15th April 2018, 5:23 pm

Datasette 0.18: units (via) This release features the first Datasette feature that was entirely designed and implemented by someone else (yay open source)—Russ Garrett wanted unit support (Hz, ft etc) for his Wireless Telegraphy Register project. It’s a really neat implementation: you can tell Datasette what units are in use for a particular database column and it will display the correct SI symbols on the page. Specifying units also enables unit-aware filtering: if Datasette knows that a column is measured in meters you can now query it for all rows that are less than 50 feet for example. # 14th April 2018, 3:56 pm

What do you mean “average”? (via) Lovely example of an interactive explorable demonstrating mode/mean/median, built as an Observable notebook using D3. # 12th April 2018, 4:41 pm