Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

Baltimore Sun Public Salary Records (via) The Baltimore Sun have published an interactive search engine for public salaries of Maryland state employees, and it’s powered by Datasette! Since data journalism is one of my key use-cases for Datasette I’m incredibly excited to see this in the wild. They’ve also published the underlying source code (see the via link) which is a really nice example of how to use Datasette’s custom templates and canned query functionality.

# 28th March 2018, 5:12 pm / data-journalism, datasette

Charles Proxy now available on iOS (via) I didn’t think this was possible, but the Charles debugging proxy is now available for iOS. It works by setting itself up as a VPN such that all app traffic runs through it. You can also optionally turn on SSL decryption for specific hosts by installing a special certificate (which involves jumping through several hoops). It won’t work for apps that implement SSL certificate pinning but from playing with it for a few minutes it looks like most apps haven’t done that, even apps from Google. Well worth $8.99.

# 28th March 2018, 3:57 pm / charles, proxies, ios

Cloud-first: Rapid webapp deployment using containers (via) The Research Software Engineering group at ICL have written a tutorial on deploying web apps as Docker containers using Azure and they use Datasette as the example application.

# 28th March 2018, 3:50 pm / azure, docker, datasette

Touring a Fast, Safe, and Complete(ish) Web Service in Rust. Brandur’s notes from building a high performance web service in Rust, using PostgreSQL via the Diesel ORM and the Rust actix-web framework which provides Erlang-style actors and promise-based async concurrency.

# 28th March 2018, 3:47 pm / async, postgresql, rust, brandur-leach

Describing events in code (via) Phil Gyford built an online directory of every play, movie, gig and exhibition he has been to in the past 38 years using a combination of digital archaeology and saved ticket stubs. He built it using Django and published this piece extensively describing the process he went through to design the data model.

# 28th March 2018, 3:41 pm / django, orm, phil-gyford

Using flamegraphs. I really like flamegraphs as a profiling tool—we have support for them baked into our Tikibar debugging toolbar at Eventbrite—but interpreting them isn’t particularly intuitive on first glance. Julia Evans has put together a great explanation of how to read them as part of the documentation for her rbspy Ruby profiler.

# 21st March 2018, 8:56 pm / profiling, julia-evans

User-defined Order in SQL (via) This is a fun intellectual exercise: how can one efficiently implement a user-defined order in a SQL table? The obvious initial approach is to have an integer position column, but this means every subsequent row must be updated when an item changes position. Joe “begriffs” Nelson explores some clever alternatives, including floating point or decimal positions (allowing new items to be inserted at a midpoint between existing positions) and a new custom rational number type he buiIt as a PostgreSQL extension.

# 21st March 2018, 2:07 pm / postgresql, sql

Protecting Against HSTS Abuse (via) Any web feature that can be used to persist information will eventually be used to build super-cookies. In this case it’s HSTS—a web feature that allows sites to tell browsers “in the future always load this domain over HTTPS even if the request specified HTTP”. The WebKit team caught this being exploited in the wild, by encoding a user identifier in binary across 32 separate sub domains. They have a couple of mitigations in place now—I expect other browser vendors will follow suit.

# 19th March 2018, 10:21 pm / privacy, security, ssl, webkit

How to use HDF5 files in Python (via) HDF5: the missing manual. A detailed explanation of the HDF5 file format and how to work with it using the h5py module. HDF5 allows you to efficiently store multiple datasets (plus metatdata about them) in a single file and then load data from them without pulling the entire file into memory—kind of like SQLite but without the SQL support and more optimized for working with arrays.

# 19th March 2018, 2:55 pm / python

Trio Tutorial. Trio is a really nice async library for Python—a simpler alternative to asyncio, with some very clean API design. Best of all, the tutorial is fantastic—it provides a very clear explanation of async/await without diving into the intricacies of coroutines.

# 17th March 2018, 3:55 pm / async, python

Everyone can now run JavaScript on Cloudflare with Workers. This is such a brilliant piece of software design: Cloudflare took the service workers spec and used it as the basis for their edge-executed JacaScript feature. This means you can run server-side JavaScript in hundreds of edge locations worldwide, applying custom dynamic logic (including additional async cached fetch() calls) with only around 1ms if additional overhead. The pricing model is a steal: $0.50 per million requests with a $5/month minimum.

# 13th March 2018, 4:36 pm / cdn, javascript, cloudflare, serviceworkers

Being fast and light: Using binary data to optimise libraries on the client and the server. (via) Ada Rose Cannon provides a detailed introduction to ArrayBuffers in JavaScript and describes how she used them for a custom binary protocol to sync the state of 170 Virtual Reality users in the same venue without bringing down the network.

# 13th March 2018, 2:34 pm / javascript, websockets

BAD TRAFFIC: Sandvine’s PacketLogic Devices Used to Deploy Government Spyware in Turkey and Redirect Egyptian Users to Affiliate Ads? “Targeted users in Turkey and Syria who downloaded Windows applications from official vendor websites including Avast Antivirus, CCleaner, Opera, and 7-Zip were silently redirected to malicious versions by way of injected HTTP redirects. This redirection was possible because official websites for these programs, even though they might have supported HTTPS, directed users to non-HTTPS downloads by default.”

# 10th March 2018, 10:40 am / https, security

Real-time photogrammetry with #ARKit. Astonishing photogrammetry demo by Tim Field using ARKit in iOS 11.3.

# 10th March 2018, 10:32 am / augmented-reality, ios

Upgrades to Facebook’s link security (via) Facebook have started scanning links shared on the site for HSTS headers, which are used to indicate that an HTTP page is also available over HTTPS and are intended to be cached by browsers such that future HTTP access is automatically retrieved over HTTPS instead. Facebook will now obey those headers itself and link directly to the HTTPS version. What a great idea: all sites with sophisticated link sharing (where links are fetched to retrieve extracts and images for example) should do this as well.

# 5th March 2018, 3:32 pm / facebook, https, security

BearID: Bear Face Detector. Comprehensive tutorial on building a computer vision system to identify faces of bears, using dlib and the Histogram of Oriented Gradients (HOG) technique. Bears!

# 1st March 2018, 5:31 pm / computer-vision, machine-learning

Wagtail 2.0. The leading Django content management system just released it’s 2.0 version—now Django 2.0 and Python 3 only, and with a new rich text editor based on Draft.js. I really like Wagtail—it’s full-feature,d mature, well-documented and philosophically aligned with how I think a CMS should work. I also like this line about the new Python 3 requirement: “Call us reckless neophiles, but we think that, nine years in, Python 3 is looking pretty solid.”

# 1st March 2018, 4:06 pm / cms, django, python3, wagtail

Responsive Components: a Solution to the Container Queries Problem (via) Philip Walton uses Chrome’s new ResizeObserver API (best described as document.onresize for elements, currently a W3C Editor’s Draft, not yet supported by other browsers) to implement a media-query style mechanism for applying CSS based on the size of the parent container. This is really clever. In the absence of ResizeObserver (which can be polyfilled) it can fall back to showing the narrowest design, which is probably best for mobile anyway. Desktop browsers are better equipped to run the polyfill.

# 27th February 2018, 1:21 pm / css, mediaqueries, progressive-enhancement, polyfill

r1chardj0n3s/parse: Parse strings using a specification based on the Python format() syntax. (via) Really neat API design: parse() behaves almost exactly in the opposite way to Python’s built-in format(), so you can use format strings as an alternative to regular expressions for extracting specific data from a string.

# 25th February 2018, 4:58 pm / python, regular-expressions

kennethreitz/requests-html: HTML Parsing for Humans™ (via) Neat and tiny wrapper around requests, lxml and html2text that provides a Kenneth Reitz grade API design for intuitively fetching and scraping web pages. The inclusion of html2text means you can use a CSS selector to select a specific HTML element and then convert that to the equivalent markdown in a one-liner.

# 25th February 2018, 4:49 pm / html, python, requests, scraping

github-trending-repos (via) This is a really clever hack: Vitaliy Potapov built a system for subscribing to a weekly digest of trending GitHub repos in your favourite languages entirely on top of the existing GitHub issues notification system. Find the issue for your particular language and hit “subscribe” and you’ll get an email (or push notification depending on how you get your issue notifications) once a week with the latest trends. The implementation is a 220 line Node.js script which runs on a daily and weekly schedule using Circle CI, so Vitaliy doesn’t even have to host or pay for any of the underlying infrastructure. It’s brilliant.

# 23rd February 2018, 5:36 pm / github, nodejs, github-issues

GitHub: Weak cryptographic standards removal notice. GitHub deprecated TLSv1 and TLSv1.1 yesterday. I like how they handled the deprecation: they disabled the protocols for one hour on February 8th in order to (hopefully) warm people by triggering errors in automated processes, then disabled them completely a couple of weeks later.

# 23rd February 2018, 3:41 pm / github, security

I’ve Just Launched “Pwned Passwords” V2 With Half a Billion Passwords for Download (via) Troy Hunt has collected 501,636,842 passwords from a wide collection of major breaches. He suggests using the to build a password strength checker that can say “your password has been used by 53,274 other people”. The full collection is available as a list of SHA1 codes (brute-force reversible but at least slightly obfuscated) in an 8GB file or as an API. Where things get really clever is the API design: you send just the first 5 characters of the SHA1 hash of the user’s password and the API responds with the full list of several hundred hashes that match that prefix. This lets you build a checking feature without sharing full passwords with a remote service, if you don’t want to host the full 8GB of data yourself.

# 22nd February 2018, 7:24 pm / passwords, security, troy-hunt

s3monkey: A Python library that allows you to interact with Amazon S3 Buckets as if they are your local filesystem. (via) A particularly devious hack by Kenneth Reitz—provides a context manager within which various Python filesystem APIs such as open() and os.listdir() are monkeypatched to operate against an S3 bucket instead. Kenneth built it to make it easier to work with files from apps running on Heroku. Under the hood it uses pyfakefs, a filesystem mocking library originally released by Google.

# 21st February 2018, 5:54 pm / python, s3, heroku, monkeypatch

A Promenade of PyTorch. Useful overview of the PyTorch machine learning library from Facebook AI Research described as “a Python library enabling GPU-accelerated tensor computation”. Similar to TensorFlow, but where TensorFlow requires you to explicitly construct an execution graph PyTorch instead lets you write regular Python code (if statements, for loops etc) which PyTorch then uses to construct the execution graph for you.

# 21st February 2018, 5:31 am / machine-learning, python, tensorflow, pytorch

Andrew Godwin’s www-router Docker container (via) Really clever Docker trick: a container that runs Nginx and uses it to route traffic to other containers based on the hostname—but the hostnames to be routed are configured using environment variables which the run-nginx.py CMD script uses to dynamically construct an nginx config when the container starts.

# 21st February 2018, 5:04 am / andrew-godwin, nginx, docker

Photos from our tour of the amazing bone collection of Ray Bandar. Ray Bandar (1927-2017) was an artist, scientist, naturalist and an incredibly prolific collector of bones. His collection is in the process of moving to the California Academy of Sciences but Natalie managed to land us a private tour lead by his great nephew. The collection is truly awe-inspiring, and a testament to an extraordinary life lived following a very particular passion.

# 21st February 2018, 4:58 am / natalie-downe, photos

Moving a large and old codebase to Python3 (via) Really interesting case study full of good ideas. The codebase in this case was 240,000 lines of Python and Django written over the course of 15 years. The team used Python-Modernize to aid their transition to a six-compatible codebase first.

# 20th February 2018, 2:39 pm / python, python3

Python & Async Simplified. Andrew Godwin: “Python’s async framework is actually relatively simple when you treat it at face value, but a lot of tutorials and documentation discuss it in minute implementation detail, so I wanted to make a higher-level overview that deliberately ignores some of the small facts and focuses on the practicalities of writing projects that mix both kinds of code.” ‪This is really useful: clearly explains the two separate worlds of Python (sync and async functions) and describes Andrew’s clever sync_to_async and async_to_sync decorators as well.‬

# 20th February 2018, 12:30 am / andrew-godwin, async, python, python3

Googlebot’s Javascript random() function is deterministic. random() as executed by Googlebot returns the same predicable sequence. More interestingly, Googlebot runs a much faster timer for setTimeout and setInterval—as Tom Anthony points out, “Why actually wait 5 seconds when you are a bot?”

# 7th February 2018, 2:41 am / crawling, google, seo

Years

Tags