Blogmarks
Filters: Sorted by date
Targeted diagnostic logging in production (via) Will Sargent defines diagnostic logging as “debug logging statements with an audience”, and proposes controlling this style if logging via a feature flat system to allow detailed logging to be turned on in production against a selected subset if users in order to help debug difficult problems. Lots of great background material in the topic of observability here too.
When a rewrite isn’t: rebuilding Slack on the desktop. Slack appear to have pulled off the almost impossible: finishing a complete, incremental rewrite of their core product. They moved from jQuery to React over the course of two years, constantly shipping new features as they went along. The biggest gain was in rewriting their code to support multiple workspaces, which means desktop client users no longer have to run a separate copy of Electron for every workspace they are signed into.
healthkit-to-sqlite. Ever since I got an Apple Watch I’ve been itching to get my hands on the step tracking and health data that it’s been collecting for me. I know it’s there in a SQLite database on my wrist, but I couldn’t figure out how to get it! A few days ago I stumbled across the “Export Health Data” button in the iOS Health app, and it turns out it creates a zip file containing XML with a full dump of the data collected by Apple Health. healthkit-to-sqlite is the tool I’ve built that can read that export and use it to create a SQLite database ready to be queried and explored with Datasette. It’s a pretty basic implementation but it’s already giving me access to over 3 million rows of data. Lots of potential here for interesting work with personal analytics.
Unlocking the Department of State’s foreign military training data for good this time (via) I’m so excited about this: Security Force Monitor used Datasette to publish a 200,000 row database of training engagements between the US military and foreign military units, based on their own massive efforts to clean up the official data (from thousands of PDF files). This is pretty much my dream use-case for Datasette, and their future goals are inspiring: “Our hope is that when the next report arrives in a short few months, we will be able to turn it into machine readable data and pass it around the sector in minutes, rather than months.”
Details of the Cloudflare outage on July 2, 2019 (via) Best retrospective I’ve read in a long time. The outage was caused by a backtracking regex rule that was added to the Web Application Firewall project, which rolls out globally and skips most of Cloudflare’s regular graduar rollout process (delightfully animal themed, named DOG for the dogfooding PoP that their employees use, PIG for the Guinea Pig PoPs reserved for free customers, then Canary for the final step) so that they can deploy counter-measures to newly discovered vulnerabilities as quickly as possible—but the real value in the retro is that it provides an extremely deep insight into how Cloudflare organize, test and manage their changes. Really interesting stuff.
datasette-cors (via) My other Datasette ASGI plugin: this one wraps my asgi-cors project and lets you configure CORS access from a list of domains (or a set of domain wildcards) so you can make JavaScript calls to a Datasette instance from a specific set of other hosts.
datasette-auth-github (via) My first big ASGI plugin for Datasette: datasette-auth-github adds the ability to require users to authenticate against the GitHub OAuth API. You can whitelist specific users, or you can restrict access to members of specific GitHub organizations or teams. While it’s structured as a Datasette plugin it also includes ASGI middleware which can be applied to any ASGI application.
Datasette 0.29 (via) I shipped Datasette 0.29! • ASGI all the way down! Plus a new asgi_wrapper plugin hook letting plugins do all kinds of powerful new things • New mechanism for secret plugin configuration options • Facet by date • ?_through= for joins through m2m tables. Much more.
How FZF and ripgrep improved my workflow (via) I’m already a keen user of ripgrep (a crazy-fast grep alternative) but fzf was new to me: it’s a CLI utility that lets you pipe in a list of strings, then gives you a typeahead search interface to search and select a string before returning the selected string to stdout when you hit enter. This means you can pipe it together with other tools to add a dynamic selection step, which has all kinds of delightful combinations. “vi $(find . | fzf)” for example opens vi against the file you selected.
PugSQL. Interesting new twist on a definitely-not-an-ORM library for Python. With PugSQL you define SQL queries in files, give them names and then load them into a module which allows you to execute them as Python methods with keyword arguments. You can mark statements as only returning a single row (or a single scalar value) with a comment at the top of their file.
Choose Boring Technology (via) The definitive write-up of Dan McKinley’s presentation on why you should mostly use “boring” technology rather than going after the latest shiniest stack components. There’s so much accumulated wisdom in here. I particularly like how Dan owns up to having introduced Scala and MongoDB at Etsy before eventually helping remove them and go back to something less exciting and far more predictable. Also neat: the site is generated using Dan’s better-keynote-export tool which helps turn Keynote presentations into a flat web page with notes and images.
db-to-sqlite 1.0 release. I’ve released version 1.0 of my db-to-sqlite tool, which lets you create a SQLite database copy of any database supported by SQLAlchemy (I’ve tested it against MySQL and PostgreSQL). The tool has a bunch of new features: you can use --redact to redact specific columns, specify --table multiple times to copy a subset of tables, and the --all option now efficiently adds all foreign keys at the end of the import. The project now has unit tests which run against MySQL and PostgreSQL in Travis CI. Also included in the README: a shell one-liner for creating a local SQLite copy of a remote Heroku Postgres database based on extracting the connection string from a Heroku config environment variable.
Feature Toggles (aka Feature Flags). I’m a huge fan of feature flags as a way of managing feature releases and keeping incomplete code in master as opposed to maintaining long-running branches. Recently I’ve found myself pointing people to this essay by Pete Hodgson—it’s a great overview of feature flags (here called toggles) and I particularly like how it splits them into four categories: Release Toggles, Experiment Toggles, Ops Toggles and Permissioning Toggles.
json-flatten. A little Python library I wrote that attempts to flatten a JSON object into a set of key/value pairs suitable for transmitting in a query string or using to construct an HTML form. I first wrote this back in 2015 as a Gist—I’ve reconstructed the Gist commit history in a new repository and shipped it to PyPI.
Announcing Envoy Mobile. This is a fascinating development: Lyft’s Envoy proxy / service mesh has been widely adopted across the industry as a server-side component for adding smart routing and observability to the network calls made between services in microservice architectures. “The reality is that three 9s at the server-side edge is meaningless if the user of a mobile application is only able to complete the desired product flows a fraction of the time”—so Lyft are building a C++ embedded library companion to Envoy which is designed to be shipped as part of iOS and Android client applications. “Envoy Mobile in conjunction with Envoy in the data center will provide the ability to reason about the entire distributed system network, not just the server-side portion.” Their decision to release an early working prototype and then conduct ongoing development entirely in the open is interesting too.
Toward a “Kernel Python” (via) Glyph makes a strong case for releasing a slimmed down “kernel” version of Python with the minimal possible standard library, and argues that the current standard library is proving impossible for a single core team to productively maintain. “If I wanted to update the colorsys module to be more modern—perhaps to have a Color object rather than a collection of free functions, perhaps to support integer color models—I’d likely have to wait 500 days, or more, for a review.”
When should you be using Web Workers? 85% of worldwide mobile devices are massively less performant than high end iPhones. Surma argues that we should be making aggressive use of Web Workers to keep as much of our JavaScript as possible off the main UI thread, to avoid freezing up the entire interface.
Convert Locations.kml (pulled from an iPhone backup) to SQLite. I’ve been playing around with data from my iPhone using the iPhone Backup Extractor app and one of the things it exports for you is a Locations.kml file full of location history data. I wrote a tiny script using Python’s ElementTree XMLPullParser to efficiently iterate through the Placemarks and yield them as dictionaries, which I then batch-inserted into sqlite-utils to create a SQLite database.
paginate-json (via) I released a fun tiny utility: paginate-json, which knows how to paginate through JSON APIs that use the HTTP Link header for pagination. I built it so I could pull data from the GitHub API and pipe it directly into SQLite via sqlite-utils.
Serverless Microservice Patterns for AWS (via) A handy collection of 19 architectural patterns for AWS Lambda collected by Jeremy Daly.
datasette-render-binary (via) Yet another tiny Datasette plugin. This one attempts to render binary data in a slightly more readable fashion—it shows ASCII characters as they are, and shows all other data as monospace octets. Useful as a tool for exploring new unfamiliar databases as it makes it easier to spot if a binary column may contain a decipherable binary format.
datasette-bplist (via) It turns out an OS X laptop is positively crammed with SQLite databases, and many of them contain values that are data structures encoded using Apple’s binary plist format. datasette-bplist is my new plugin to help explore those files: it provides a display hook for rendering their contents, and a custom bplist_to_json() SQL function which can be used to extract and query information that is embedded in those values. The README includes tips on how to pull interesting EXIF data out of the SQLite database that sits behind Apple Photos.
Friday wins and a case study in ritual design. “Culture is what you celebrate. Rituals are the tools you use to shape culture.”
Los Angeles Weedmaps analysis (via) Ben Welsh at the LA Times published this Jupyter notebook showing the full working behind a story they published about LA’s black market weed dispensaries. I picked up several useful tricks from it—including how to load points into a geopandas GeoDataFrame (in epsg:4326 aka WGS 84) and how to then join that against the LA Times neighborhoods GeoJSON boundaries file.
Building a stateless API proxy (via) This is a really clever idea. The GitHub API is infuriatingly coarsely grained with its permissions: you often end up having to create a token with way more permissions than you actually need for your project. Thea Flowers proposes running your own proxy in front of their API that adds more finely grained permissions, based on custom encrypted proxy API tokens that use JWT to encode the original API key along with the permissions you want to grant to that particular token (as a list of regular expressions matching paths on the underlying API).
datasette-jq (via) I released another tiny Datasette plugin: datasette-jq registers a single custom SQL function, jq(), which lets you execute the jq expression language against a JSON column (or literal value) to filter and transform the JSON data. The README includes a link to a live demo—it’s a neat way to play with the jq micro-language.
Falsehoods Programmers Believe About Search (via) These are great. “When you find the boolean operator ‘OR’, you always know it doesn’t mean Oregon”.
gls: Goroutine local storage (via) Go doesn't provide a mechanism for having "goroutine local" variables (like threadlocals in Python but for goroutines), and the structure of the language makes it really hard to get something working. JT Olio figured out a truly legendary hack: Go's introspection lets you see the current stack, so he figured out a way to encode a base-16 identifer tag into the call order of 16 special nested functions.
I particularly like the "What are people saying?" section of the README:
"Wow, that's horrifying."
"This is the most terrible thing I have seen in a very long time."
"Where is it getting a context from? Is this serializing all the requests? What the heck is the client being bound to? What are these tags? Why does he need callers? Oh god no. No no no."
Zdog (via) Well this is absolutely delightful: Zdog is a pseudo-3D engine for canvas and SVG that outputs 3D models rendered as super-stylish flat shapes. It’s hard to describe with words—go play with the demos!
Using dependabot to bump Django on my blog from 2.2 to 2.2.1 (via) GitHub recently acquired dependabot and made it free, and I decided to try it out on my blog. It’s a really neat piece of automation: it scans your requirements.txt (plus a number of other packaging definitions across several different languages), checks for updates to your dependencies and opens pull requests against any that it finds. Combine it with a CI service such as Circle CI and your tests will run automatically against the pull request, letting you know if it’s safe to merge. dependabot constantly rebases other changes against the pull request to try and ensure it will merge as cleanly as possible.