Blogmarks
Filters: Sorted by date
Query Engines: Push vs. Pull (via) Justin Jaffray (who has worked on Materialize) explains the difference between push and pull query execution engines using some really clear examples built around JavaScript generators.
country-coder (via) Given a latitude and longitude, how can you tell what country that point sits within? One way is to do a point-in-polygon lookup against a set of country polygons, but this can be tricky: some countries such as New Zealand have extremely complex outlines, even though for this use-case you don’t need the exact shape of the coastline. country-coder solves this with a custom designed 595KB GeoJSON file with detailed land borders but loosely defined ocean borders. It also comes with a wrapper JavaScript library that provides an API for resolving points, plus useful properties on each country with details like telepohen calling codes and emoji flags.
Why you shouldn’t use ENV variables for secret data (via) I do this all the time, but this article provides a good set of reasons that secrets in environment variables are a bad pattern—even when you know there’s no multi-user access to the host you are deploying to. The biggest problem is that they often get captured by error handling scripts, which may not have the right code in place to redact them. This article suggests using Docker secrets instead, but I’d love to see a comprehensive write-up of other recommended patterns for this that go beyond applications running in Docker.
Behind GitHub’s new authentication token formats (via) This is a really smart design. GitHub’s new tokens use a type prefix of “ghp_” or “gho_” or a few others depending on the type of token, to help support mechanisms that scan for accidental token publication. A further twist is that the last six characters of the tokens are a checksum, which means token scanners can reliably distinguish a real token from a coincidental string without needing to check back with the GitHub database. “One other neat thing about _ is it will reliably select the whole token when you double click on it”—what a useful detail!
Render single selected county on a map (via) Another experiment at the intersection of Datasette and Observable notebooks. This one imports a full Datasette table (3,200 US counties) using streaming CSV and loads that into Observable’s new Search and Table filter widgets. Once you select a single county a second Datasette SQL query (this time retuning JSON) fetches a GeoJSON representation of that county which is then rendered as SVG using D3.
Spatialite Speed Test. Part of an excellent series of posts about SpatiaLite from 2012—here John C. Zastrow reports on running polygon intersection queries against a 1.9GB database file in 40 seconds without an index and 0.186 seconds using the SpatialIndex virtual table mechanism.
ifconfig.co (via) I really like this: “curl ifconfig.co” gives you your IP address as plain text, “curl ifconfig.co/city” tells you your city according to MaxMind GeoLite2, “curl ifconfig.co/json” gives you all sorts of useful extra data. Suggested rate limit is one per minute, but the code is open source Go that you can run yourself.
Hello, HPy (via) HPy provides a new way to write C extensions for Python in a way that is compatible with multiple Python implementations at once, including PyPy.
sqlite-plus (via) Anton Zhiyanov bundled together a bunch of useful SQLite C extensions for things like statistical functions, unicode string normalization and handling CSV files as virtual tables. The GitHub Actions workflow here is a particularly useful example of compiling SQLite extensions for three different platforms.
Why all my servers have an 8GB empty file (via) This is such a good idea: keep a 8GB empty spacer.img file on any servers you operate, purely so that if the server runs out of space you can delete the file and get some breathing room for getting everything else working again. I’ve had servers run out of space in the past and it’s an absolute pain to sort out—this trick would have really helped.
Homebrew Python Is Not For You. If you’ve been running into frustrations with your Homebrew Python environments breaking over the past few months (the dreaded “Reason: image not found” error) Justin Mayer has a good explanation. Python in a Homebrew is designed to work as a dependency for their other packages, and recent policy changes that they made to support smoother upgrades have had catastrophic problems effects on those of us who try to use it for development environments.
Understanding JSON Schema (via) Useful, comprehensive short book guide to JSON Schema, which finally helped me feel like I fully understand the specification.
A Complete Guide To Accessible Front-End Components. I’m so excited about this article: it brings together an absolute wealth of resources on accessible front-end components, including many existing component implementations that are accessible out of the box. Date pickers, autocomplete widgets, modals, menus—all sorts of things that I’ve been dragging my heels on implementing because I didn’t fully understand their accessibility implications.
The Accountability Project Datasettes. The Accountability Project “curates, standardizes and indexes public data to give journalists, researchers and others a simple way to search across otherwise siloed records”—they have a wide range of useful data, and they’ve started experimenting with Datasette to provide SQL access to a subset of the information that they have collected.
How we found and fixed a rare race condition in our session handling. GitHub had a terrifying bug this month where a user reported suddenly being signed in as another user. This is a particularly great example of a security incident report, explaining how GitHub identified the underlying bug, what caused it and the steps they are taking to ensure bugs like that never happen in the future. The root cause was a convoluted sequence of events which could cause a Ruby Hash to be accidentally shared between two requests, caused as a result of a new background thread that was introduced as a performance optimization.
logpaste (via) Useful example of how to use the Litestream SQLite replication tool in a Dockerized application: S3 credentials are passed to the container on startup, it then attempts to restore the SQLite database from S3 and starts a Litestream process in the same container to periodically synchronize changes back up to the S3 bucket.
sqlite-uuid (via) Another Python package that wraps a SQLite module written in C: this one provides access to UUID functions as SQLite functions.
sqlite-spellfix (via) I really like this pattern: “pip install sqlite-spellfix” gets you a Python module which includes a compiled (on your system when pip install ran) copy of the SQLite spellfix1 module, plus a utility variable containing its path so you can easily load it into a SQLite connection.
The SOC2 Starting Seven (via) "So, you plan to sell your startup’s product to big companies one day. Congratu-dolences! [...] Here’s how we’ll try to help: with Seven Things you can do now that will simplify SOC2 for you down the road while making your life, or at least your security posture, materially better in the immediacy.
google-cloud-4-words. This is really useful: every Google Cloud service (all 250 of them) with a four word description explaining what it does. I'd love to see the same thing for AWS.
Update: Turns out I had.
How I cut GTA Online loading times by 70% (via) Incredible debugging war story: t0st was fed up of waiting six minutes (!) for GTA Online to load on their PC, so they used a host of devious debugging tricks to try and figure out what was going on. It turned out the game was loading a 10MB JSON file detailing all of the available in-game purchases, but inefficient JSON parsing meant it was pegging an entire CPU for 4 minutes mainly running the strlen() C function. Despite not having access to source code or debugging symbols t0st figured out the problem and managed to inject a custom DLL that hooked some internal functions and dropped load times down from 6m down to to 1m50s!
unasync (via) Today I started wondering out loud if one could write code that takes an asyncio Python library and transforms it into the synchronous equivalent by using some regular expressions to strip out the “await ...” keywords and suchlike. Turns out that can indeed work, and Ratan Kulshreshtha built it! unasync uses the standard library tokenize module to run some transformations against an async library and spit out the sync version automatically. I’m now considering using this for sqlite-utils.
cosmopolitan libc (via) “Cosmopolitan makes C a build-once run-anywhere language, similar to Java, except it doesn’t require interpreters or virtual machines be installed beforehand. [...] Instead, it reconfigures stock GCC to output a POSIX-approved polyglot format that runs natively on Linux + Mac + Windows + FreeBSD + OpenBSD + BIOS with the best possible performance and the tiniest footprint imaginable.” This is a spectacular piece of engineering.
How to secure an Ubuntu server using Tailscale and UFW. This is the Tailscale tutorial I’ve always wanted: it explains in detail how you can run an Ubuntu server (from any cloud provider) such that only devices on your personal Tailscale network can access it.
Fuzzy Name Matching in Postgres. Paul Ramsey describes how to implement fuzzy name matching in PostgreSQL using the fuzzystrmatch extension and its levenshtein() and soundex() functions, plus functional indexes to query against indexed soundex first and then apply slower Levenshtein. The same tricks should also work against SQLite using the datasette-jellyfish plugin.
Blazing fast CI with pytest-split and GitHub Actions (via) pytest-split is a neat looking variant on the pattern of splitting up a test suite to run different parts of it in parallel on different machines. It involves maintaining a periodically updated JSON file in the repo recording the average runtime of different tests, to enable them to be more fairly divided among test runners. Includes a recipe for running as a matrix in GitHub Actions.
People, processes, priorities. Twitter thread from Adrienne Porter Felt outlining her model for thinking about engineering management. I like this trifecta of “people, processes, priorities” a lot.
trustme
(via)
This looks incredibly useful. Run python -m trustme and it will create three files for you: server.pem, server.key and a client.pem client certificate, providing a certificate for "localhost" (or another host you spefict) using a fake certificate authority. Looks like it should be the easiest way to test TLS locally.
Why I Built Litestream. Litestream is a really exciting new piece of technology by Ben Johnson, who previously built BoltDB, the key-value store written in Go that is used by etcd. It adds replication to SQLite by running a process that converts the SQLite WAL log into a stream that can be saved to another folder or pushed to S3. The S3 option is particularly exciting—Ben estimates that keeping a full point-in-time recovery log of a high write SQLite database should cost in the order of a few dollars a month. I think this could greatly expand the set of use-cases for which SQLite is sensible choice.
Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies (via) Alex Birsan describes a new category of security vulnerability he discovered in the npm, pip and gem packaging ecosystems: if a company uses a private repository with internal package names, uploading a package with the same name to the public repository can often result in an attacker being able to execute their own code inside the networks of their target. Alex scored over $130,000 in bug bounties from this one, from a number of name-brand companies. Of particular note for Python developers: the --extra-index-url argument to pip will consult both public and private registries and install the package with the highest version number!