Blogmarks
Filters: Sorted by date
ftfy—fix unicode that’s broken in various ways (via) I shipped a small web UI wrapper around the excellent Python FTFY library, which can take broken unicode strings and suggest a sequence of operations that can be applied to get back sensible text.
csvkit. “A suite of command-line tools for converting to and working with CSV”—includes a huge range of utilities for things like converting Excel and JSON to CSV, grepping, sorting and extracting a subset of columns, combining multiple CSV files together and exporting CSV to a relational database. Worth reading through the tutorial which shows how the different commands can be piped together.
Statistical NLP on OpenStreetMap. libpostal is ferociously clever: it’s a library for parsing and understanding worldwide addresses, built on top of a machine learning model trained on millions of addresses from OpenStreetMap. Al Barrentine describes how it works in this fascinating and detailed essay.
Himalayan Database: From Visual FoxPro GUI to JSON API with Datasette (via) The Himalayan Database is a compilation of records for all expeditions that have climbed in the Nepalese Himalaya, originally compiled by journalist Elizabeth Hawley over several decades. The database is published as a Visual FoxPro database—here Raffaele Messuti provides step-by-step instructions for extracting the data from the published archive, converting them to CSV using dbfcsv and then converting the CSVs to SQLite using csvs-to-sqlite so you can browse them using Datasette.
Frontend in 2017: The important parts. Keeping track of developments in the frontend and JavaScript community is pretty much a full time job here days, so I found this summary of trends and developments over 2017 very useful for trying to catch up.
Game developer’s guide to graphical projections (with video game examples), Part 1: Introduction. Absolutely delightful series of illustrated essays by Matej ‘Retro’ Jan explaining how different graphical projections can be used for video game art. Each concept is illustrated by screenshots or gifs from a mixture of games spanning four decades. Reading this was a real treat.
My Internet Mea Culpa. Rick Webb asks “What if we were wrong?” about the internet leading to enormous benefit for humankind. I’ve been worrying about this a lot recently: it turns out the internet provides tools that allow bad people to spread lies, propaganda and discrimination with lethal effectiveness. It’s hard to believe that universal access to the sum of all human knowledge can have negative effects, but there are clearly a whole load of negative effects that us internet utopians failed to predict.
Computer latency: 1977-2017 (via) Dan Luu used a 240 fps camera to investigate the latency between hitting a key and having the character show up on the display across four decades of computing devices... and found 1983’s Apple IIe outperformed everything else. He goes to great lengths to explain why in his accompanying write-up.
Google Maps’s Moat. Gorgeous essay by digital cartographer Justin O’Beirne, exploring how Google Maps has evolved over time and how the fantastically useful “areas of interest” feature (where commercial corridors and business districts are automatically highlighted) uses data derived from a combination of Street View business data and 3D building outlines derived from aerial imagery.
How to compile and run a Pony program using Docker. My notes on using the Docker ponylang/ponyc container to compile and execute a Pony program without needing to install anything (since Docker will download and run the image the first time you run the command).
An Early History of Pony. Pony is an interesting looking new programming language, built around actor-based concurrency on top of a mathematically proved type system. The history of the language makes for interesting reading: it’s based on experience with actor libraries in C at an investment bank, combined with research into type systems at Imperial College London.
How do Ruby & Python profilers work? Julia Evans: “As a precursor to writing a Ruby profiler I wanted to do a survey of how existing Ruby & Python profilers work.”
The Mirai Botnet Was Part of a College Student Minecraft Scheme. Fascinating story about last year’s Mirai botnet, which was originally developed to help corner the Minecraft server market.
Let your code type-hint itself: introducing open source MonkeyType. Instagram have open sourced their tool for automatically adding type annotations to your Python 3 code via runtime tracing. By default it logs the types it sees to a SQLite database, which means you can browse them with Datasette!
The State of JavaScript 2017: Front-end Frameworks. This year’s State of JavaScript survey results are out. As usual, the “I’ve USED it before, and would NOT use it again” answers are some of the most thought-provoking. Angular, Meteor, Backbone, Grunt, Gulp, Bootstrap and Cordova aren’t scoring well in that regard.
Extended Validation is Broken. Ian Carroll spent $100 incorporating a company called “Stripe, Inc” in the state of Kentucky and $77 on an Extended Validation certificate tied to that legal entity. Safari (and Mobile Safari) now hide the URL bar completely, displaying “Stripe, Inc” in its place. “This means the attacker does not even need to register a convincing phishing domain. They can register anything, and Safari will happily cover it with a nice green bar.”
VICE News Police Shootings in Datasette (via) VICE News collected data on both fatal and nonfatal police shootings from the 50 largest local police departments in the United States. They released the data under a CC BY-NC-SA 4.0 license so I’ve converted it to SQLite and loaded it into Datasette.
Datasette 0.14: customization edition. I just released the latest version of Datasette with a strong theme of customization: Datasette now supports custom templates and CSS styling hooks, and the metadata format has been expanded to allow per-database and per-table source/license/description information. You can also now define named canned queries which will be packaged up with your data.
Use a .dev domain? Not anymore. Google bought the .dev gTLD a few years ago for their own internal usage and in a few weeks time Chrome will start shipping a HSTS preload list rule that says that .dev must be served over HTTPS. This means that if you’re using a .dev domain in your /etc/hosts file you’ll need to switch to .test or .localhost (or set up a self-signed certificate) or your development environment will refuse to load.
PostgreSQL Exercises. Excellent set of PostgreSQL exercises by Alisdair Owens, each with an interactive editor that lets you run your queries against a real database. Starts with the basics, but also covers advanced topics like recursive queries and window aggregate functions.
deeplearn.js imagenet webcam demo (via) This is pretty astonishing... deeplearn.js is a Google Brain research tool that implements a GPU-accelerated neural network in browser-friendly JavaScript (using WebGL fragment shaders to run the algorithms). This demo hooks into your webcam and runs the SqueezeNet image recognition model against it, showing classification in real-time and providing a live-updating visualization of the different layers of the network.
A Regular Expression Matcher: Code by Rob Pike, Exegesis by Brian Kernighan (via) Delightfully clear and succinct 30-line C implementation of a regular expression matcher that supports $, ^, . and * operations.
Evolution of <img>: Gif without the GIF
(via)
Safari Technology Preview lets you use <img src="movie.mp4">, for high quality animated gifs in 1/14th of the file size.
Cybersecurity Campaign Playbook (via) “The information assembled here is for any campaign in any party. It was designed to give you simple, actionable information that will make your campaign’s information more secure from adversaries trying to attack your or-ganization—and our democracy.”
Breaking Cliques at Events. Eric proposes a new guideline for long-running conferences, which have a tendency to form somewhat insular cliques of the attendees who have been going the longest: “For every year you have attended the event, you should try to meet that many new people each day.”
Python 3 Readiness (via) 345 of the 360 most popular Python packages are now compatible with Python 3. I’d love to see a version of this graph over time.
Django 2.0 released. The first version of Django to drop support for Python 2. I’ve been running the RC on my blog for the past 5 weeks and greatly enjoying the new mobile-optimized Django admin for posting links and quotations from my phone. The new simplified URL routing syntax (an optional alternative to regular expressions) is a very welcome improvement.
Datasette: Ability to customize presentation of specific columns in HTML view.
Still a work in progress, but Datasette master now allows you to inject links to one or more additional CSS and JavaScript resources (optionally with SRI hashes) which will be included on every page. Each template also
now provides CSS classes on the body element derived from the current database and table names to provide hooks for custom styling. Next up: custom template support.
Can I use... input type=color.
TIL <input type="color"> has reached 78.83% support globally already - biggest gap right now is Mobile Safari.
Interactive Workflows for C++ with Jupyter. Whoa, this really works... not just an interactive C++ REPL in a Jupyter notebook, but inline graph plotting support and interactive widgets as well. Scroll to the bottom of the article for Binder links which let you fire up an interactive C++ REPL in your browser and start interacting with it instantly.