Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

Compiling SQLite for use with Python Applications (via) Charles Leifer’s recent tutorial on how to compile and build the latest SQLite (with window function support) for use from Python via his pysqlite3 library.

# 15th August 2018, 3:51 pm / pysqlite, python, sqlite, charles-leifer

coleifer/pysqlite3. Now that the pysqlite package is bundled as part of the Python standard library the original open source project is no longer actively maintained, and has not been upgraded for Python 3. Charles Leifer has been working on pysqlite3, a stand-alone package of the module. Crucially, this should enable compiling the latest version of SQLite (via the amalgamation package) without needing to upgrade the version that ships with the operating system.

# 15th August 2018, 3:15 pm / pysqlite, python, sqlite, charles-leifer

Window Functions in SQLite 3.25.0. The next release of SQLite (apparently due for release in September) will add window functions, as specified in various SQL standards and already available in PostgreSQL. This is going to dramatically improve SQLite as an engine for performing analytical queries, especially across time series data. It’s also going to further emphasize the need for people to be able to upgrade their SQLite versions beyond those provided by the operating system—the default Ubuntu run by Travis CI still only ships with SQLite 3.8 for example.

# 15th August 2018, 3:12 pm / sqlite

Experiences with running PostgreSQL on Kubernetes (via) Fascinating interview that makes a solid argument for the idea that running stateful data stores like PostgreSQL or Cassandra is made harder, not easier when you add an orchestration tool like Kubernetes into the mix.

# 13th August 2018, 2:30 pm / postgresql, kubernetes

Using achievement stats to estimate sales on steam (via) Really interesting data leak exploit here: Valve’s Steam API was showing the percentage of users that gained a specific achievement up to 16 decimal places—which inadvertently leaked their exact usage statistics, since if 0.012782207690179348 percent of players get an achievement the only possible input is 8 players out of 62,587.

# 9th August 2018, 9:03 am / privacy, security

Securing Web Sites Made Them Less Accessible (via) This is fascinating: the move to HTTP everywhere breaks local HTTP caching servers (like Squid) which are still used in remote areas that get their internet by a high latency satellite connection.

# 7th August 2018, 5:52 pm / eric-meyer, https

Faust: Python Stream Processing (via) A new open source stream processing system released by Robinhood, created by Vineet Goel and Celery creator Ask Solem. The API looks delightful, making very smart use of Python decorators and async/await. The initial release requires Kafka but they plan to support multiple backends, hopefully including Redis Streams.

# 6th August 2018, 10:51 pm / python, streaming

How to Read an RFC. An extremely useful guide to reading RFCs by Mark Nottingham. I didn’t know most of the stuff in here.

# 6th August 2018, 10:38 pm / mark-nottingham, rfc

OWASP Top 10 2007-2017: The Fall of CSRF. I was surprised to learn recently that CSRF didn’t make it into the 2017 OWASP Top 10 security vulnerabilities (after featuring almost every year since the list started). The credited reason is that web frameworks do a good enough job protecting against CSRF by default that it’s no longer a top-ten problem. Defaults really do matter.

# 6th August 2018, 10:02 pm / csrf, owasp, security

Datasette unit tests: monkeytype_call_traces (via) Faceted browse against every function call that occurs during the execution of Datasette’s test suite. I used Instagram’s MonkeyType tool to generate this, which can run Python code and generates a SQLite database of all of the traced calls. It’s intended to be used to automatically add mypy annotations to your code, but since it produces a SQLite database as a by-product I’ve started exploring the intermediary format using Datasette. Generating this was as easy as running “monkeytype run `which pytest`” in the Datasette root directory.

# 2nd August 2018, 9:03 pm / python, sqlite, static-typing, testing, datasette, mypy

The death of a TLD. Sony have terminated their .xperia TLD. Ben Cox used Certificate Transparency logs to evaluate the 11 total TLDs that have been abandoned since the gTLD gold rush started—since HTTPS is becoming the default now these logs of issued certificates are a great indicator of which domains (or TLDs) are being actively used. The only deleted TLD with legitimate looking certificates (apparently for a mail server) was .mcdonalds

# 28th July 2018, 8:07 pm / certificates, dns, domains, tls

Big tech warns of ’Japan’s millennium bug’ ahead of Akihito’s abdication (via) Emperor Akihito’s abdication in April 2019 triggers a new era, and the Japanese calendar counts years from the coronation of the current emperor. The era hasn’t changed since 1989 and a great deal of software is unable to handle a change. To make things more complicated... the name of the new era will be announced in late February, but it needs to be represented in unicode as a single new character... and the next version of Unicode (v12) is due out in early March. There may have to be a Unicode 12.1 released shortly afterwards that includes the new codepoint.

# 28th July 2018, 2:04 pm / timezones, unicode, japan

Datasette: publish_subcommand hook + default plugins mechanism, used for publish heroku/now (via) I just landed a new plugin hook to Datasette master: publish_subcommand, which lets you define new publisher subcommands for the “datasette publish” CLI tool in addition to Heroku and Zeit Now. As part of this I’ve refactored the heroku/now publisher implementations into two default plugins that ship as part of Datasette—I hope to use this pattern for other core functionality in the future.

# 26th July 2018, 5:27 am / cli, plugins, datasette

The Free Stack—Running your application for free on AWS (via) Parikshit Agnihotry provides a useful rundown of quite how much you can get done using the first 12 month free tier of AWS API Gateway, Lambda, DynamoDB, S3 and CloudFront.

# 25th July 2018, 6:33 pm / aws

Teaching Tech Together (via) A new free book compiled by Greg Wilson about teaching programming, based on the instructor training program developed for the Software Carpentry course over the past six years. This is fantastic—it’s filling in a whole load of holes in my understanding of effective teaching theory.

# 17th July 2018, 6:18 pm / greg-wilson, teaching

ES6 In Depth: Template strings. Useful guide to the advanced features of backtick template strings in ES6.

# 17th July 2018, 4:08 pm / javascript

Feature comparison between hyperHTML and lit-html. Compiled by hyperHTML author Andrea Giammarchi. lit-html is a similar project maintained by Google’s Polymer team.

# 17th July 2018, 3:58 pm / javascript, hyperhtml, lit-html

hyperHTML (via) I’ve been playing around with multiline backtick template literals in JavaScript recently, and Claudio Cicali on Twitter pointed me to hyperHTML. It’s a fascinating project: it implements an alternative to React’s virtual DOM in just 4.6KB of code, taking advantage of the way tagged template literals can call a custom function which gets to affect how the interpolated values from the template are treated. The documentation does a great job of describing the various clever tricks that are used under the hood, in particular the way it takes advantage of the way tagged literals “freeze” the incoming array of literal chunks allowing hyperHTML to compile each template once and reuse the complied version for different interpolations.

# 17th July 2018, 3:39 pm / javascript, react, hyperhtml

Bowiebranchia. I spent the weekend learning about Nudibranchs, which are beautiful sea slugs (common on the coast of California) which are definitely best explained by their resemblance to different eras of David Bowie.

# 16th July 2018, 1:35 am / nudibranchs, david-bowie

XARs: An efficient system for self-contained executables (via) Really interesting new open source project from Facebook: a XAR is a new way of packaging up a Python executable complete with its dependencies and resources such that it can be distributed and executed elsewhere as a single file. It’s kind of like a Docker container without Docker—it uses the SquashFS compressed read-only filesystem. I can’t wait to try this out with Datasette.

# 13th July 2018, 7 pm / facebook, python, datasette

future-fstrings (via) Clever module that backports fstrings to versions of Python earlier than 3.6, by registering itself as a codec and abusing Python’s # -*- coding: future_fstrings -*- feature. Via a conversation on Twitter that pointed out that the JavaScript community have been using transpilation to successfully experiment with new language features for years now.

# 13th July 2018, 4:39 am / javascript, python

The Now CDN (via) Huge announcement from Zeit Now today: all .now.sh deployments are now served through the Cloudflare CDN, which means they benefit from 150 worldwide CDN locations that obey HTTP caching headers. This is particularly relevant for Datasette, since it serves far-future cache headers by default and uses Cloudflare-compatible HTTP/2 push hints to accelerate 302 redirects. This means that both the “datasette publish now” CLI command and the Datasette Publish web app will now result in Cloudflare-accelerated deployments.

# 12th July 2018, 3:34 am / cdn, performance, zeit-now, datasette, cloudflare

Usage of ARIA attributes via HTTP Archive. A neat example of a Google BigQuery query you can run against the HTTP Archive public dataset (a crawl of the “top” websites run periodically by the Internet Archive, which captures the full details of every resource fetched) to see which ARIA attributes are used the most often. Linking to this because I used it successfully today as the basis for my own custom query—I love that it’s possible to analyze a huge representative sample of the modern web in this way.

# 12th July 2018, 3:16 am / aria, http, internet-archive, big-data

scrapely. Neat twist on a screen scraping library: this one lets you “train it” by feeding it examples of URLs paired with a dictionary of the data you would like to have extracted from that URL, then uses an instance based learning earning algorithm to run against new URLs. Slightly confusing name since it’s maintained by the scrapy team but is a totally independent project from the scrapy web crawling framework.

# 10th July 2018, 8:25 pm / python, scraping

react-from-zero (via) Interesting approach to teaching and understanding React: unlike most other tutorials this skips Webpack and ES6 entirely an focuses on things you can get running just using a browser and loading code via script tags. It does eventually load Babel to enable client-side JSX transforms, but before that it shows how React can be used by loading react.js and react-dom.js and then calling React.createElement() manually (or by using the 0xeac7 magic symbol and constructing JavaScript objects manually with $$typeof: magicValue).

# 3rd July 2018, 5:27 pm / javascript, react

Digg’s v4 launch: an optimism born of necessity. Riveting behind-the-scenes story of the disastrous Digg V4 launch by former Digg engineer Will Larson.

# 2nd July 2018, 5:25 pm / digg, will-larson

datasette-vega (via) I wrote a visualization plugin for Datasette that uses the excellent Vega “visualization grammar” library to provide bar, line and scatter charts configurable against any Datasette table or SQL query.

# 29th June 2018, 3 pm / plugins, projects, visualization, datasette

Migrating Messenger storage to optimize performance (via) Fascinating case-study of a truly gargantuan migration. Messenger has over a billion users, and Facebook successfully migrated its backend storage from HBase to their MyRocks database (a fork of MySQL with a storage engine built on their SSD-optimized RocksDB key/value library) without any user-visible downtime. They ended up using two migration paths: one for the 99.9% of regular accounts, and a separate path for extremely high volume accounts (businesses with very active chat bots or support systems).

# 27th June 2018, 3:05 pm / facebook, migration, mysql, scaling, zero-downtime

mkcert (via) Handy new tool from Filippo Valsorda (a cryptographer at Google) for easily generating TLS certificates for your local development environment. You can use this to get a certificate pair for a localhost web server created with a couple of simple commands.

# 26th June 2018, 6:55 pm / certificates, go, https, filippo-valsorda

ActorDB. Distributed SQL database written in Erlang built on top of SQLite (on top of LMDB), adding replication using the raft consensus algorithm (so sharded with no single-points of failure) and a MySQL protocol interface. Interesting combination of technologies.

# 24th June 2018, 9:48 pm / erlang, scaling, sqlite, big-data

Years

Tags