Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

owlsnearme source code on GitHub. Here’s the source code for our new owlsnearme.com project. It’s a single-page React application that pulls all of its data from the iNaturalist API. We built it this weekend with the SuperbOwl kick-off as a hard deadline so it’s not the most beautiful React code, but it’s a nice demonstration of how React (and create-react-app in particular) can be used for rapid development.

# 4th February 2018, 10:33 pm / github, javascript, natalie-downe, projects, react, inaturalist

Owls Near Me. Back in 2010 Natalie and I shipped owlsnearyou.com—a website for finding your nearest owls, using data from the sadly deceased WildlifeNearYou (RIP). To celebrate #SuperbOwl Sunday we rebuilt the same concept on top of the excellent iNaturalist API. Search for a place to see which owls have been spotted there, or click the magic button to geolocate your device and see which owls have been spotted in your nearby area!

# 4th February 2018, 10:26 pm / natalie-downe, projects, wildlifenearyou, inaturalist

Conditional aggregation in Django 2.0 (via) I hadn’t realised how clever this new Django ORM feature by Tom Forbes is. It lets you build an aggregation against a subset of rows, e.g. Client.objects.aggregate(regular=Count(’pk’, filter=Q(account_type=Client.REGULAR)))—then if you are using PostgreSQL it translates it into a fast FILTER WHERE clause, while other databases emulate the same behaviour using a CASE statement.

# 3rd February 2018, 9:38 pm / django, postgresql

How I made a Who’s On First subset database. Inspired by Paul Ford on Twitter, I tried out a new trick with SQLite: connect to a database containing JSON, attach a brand new empty database file using “attach database”, then populate it using INSERT INTO ... SELECT plus the json_extract() function to extract out a subset of the JSON properties into a new table in the new database.

# 3rd February 2018, 5:25 am / json, paul-ford, sqlite, whosonfirst

asgiref: AsyncToSync and SyncToAsync (via) Andrew’s classes in asgiref that can turn a synchronous callable into an awaitable (that runs in a thread pool) or an awaitable callable into a synchronous callable, using Python 3 futures and asyncio.

# 2nd February 2018, 7:06 pm / andrew-godwin, async, python3

Channels 2.0. Andrew just shipped Channels 2.0—a major rewrite and redesign of the Channels project he started back in 2014. Channels brings async to Django, providing a logical, standardized way of supporting things like WebSockets and asynchronous execution on top of a Django application. Previously it required you to run a separate Twisted server and redis/RabbitMQ queue, but thanks to Python 3 async everything can now be deployed as a single process. And the new ASGI spec means its turtles all the way down! Everything from URL routing to view functions to middleware can be composed together using the same ASGI interface.

# 2nd February 2018, 6:19 pm / andrew-godwin, async, django, python, python3, websockets

Family fun with deepfakes. Or how I got my wife onto the Tonight Show. deepfakes is dystopian nightmare technology: take a few thousand photos of two different people with similar shaped faces and you can produce an extremely realistic video where you swap one person’s face for the other. Unsurprisingly it’s being used for porn. This is a pleasantly SFW explanation of how it works, complete with a demo where Sven Charleer swaps his wife Elke for Anne Hathaway on the Tonight Show.

# 2nd February 2018, 4:06 pm / computer-vision

Using setup.py in Your (Django) Project. Includes this neat trick: if you list manage.py in the setup(scripts=) argument you can call it from e.g. cron using the full path to manage.py within your virtual environment and it will execute in the correct context without needing to explicitly activate the environment first.

# 2nd February 2018, 12:33 pm / django, python

How the Citus distributed database rebalances your data. Citus is a fascinating implementation of database sharding built on top of PostgreSQL primitives. PostgreSQL 10 introduced extremely flexible logical replication—in this post Craig Kerstiens explains how Citus use this new ability to re-balance shards (e.g. when you move from two to four physical PostgreSQL nodes) without downtime.

# 1st February 2018, 10:50 pm / architecture, postgresql, sharding, zero-downtime, craig-kerstiens

Building a Full-Text Search App Using Docker and Elasticsearch. Deep, comprehensive tutorial from Patrick Triest showing how to use docker-compose to run three containers (Node API, nginx static content, elasticsearch) and then use that to build a neat Vue.js web search UI against 100 books from Project Gutenberg.

# 1st February 2018, 3:41 pm / elasticsearch, nodejs, docker

Observable notebook: San Francisco trees from Datasette. I used an Observable notebook to rebuild my San Francisco tree search demo against a Datasette API of a CSV of trees published by the SF Department of Public Works. The map updates live as you type a query, and every cell can be toggled to view the underlying source code.

# 1st February 2018, 12:37 am / datasette, observable

USGS World Earthquake Map (observable notebook). Here’s an extended version of the notebook constructed by Jeremy Ashkenas in that Observable YouTube demo. You really need to check this thing out—the notebook itself has sliders in that you can manipulate (even on a mobile browser) or you can click to edit the code and see your changes reflected in real-time. If you sign in with GitHub you can fork the project to your own account and save your changes.

# 31st January 2018, 7:07 pm / jeremy-ashkenas, explorables, observable

Observable: An Earthquake Globe in Ten Minutes. Well worth your time. Jeremy Ashkenas uses Observable to live-code an interactive visualization of recent earthquakes around the world, using USGS data (fetched as JSON), d3, topoJSON and an Observable notebook. I’m sold—this is truly ground-breaking new technology.

# 31st January 2018, 5:01 pm / javascript, jeremy-ashkenas, visualization, explorables, d3, observable

Observable Beta (via) Observable just released their beta, and it’s quite something. It’s by Mike Bostock (d3), Jeremy Ashkenas (Backbone, CoffeeScript) and Tom MacWright (Mapbox Studio). The easiest way to describe it is Jupyter notebooks for JavaScript supporting reactive programming—so code is evaluated as you type and you can add interactive widgets (like sliders and canvas views) to construct explorable visualizations on the fly.

# 31st January 2018, 4:46 pm / javascript, jeremy-ashkenas, d3, jupyter, observable, mike-bostock, tom-macwright

SQLite: The Spellfix1 Virtual Table (via) A SQLite extension that lets you create a spellfix1 virtual table which can power “fuzzy” search, by suggesting corrections for misspelled words. I haven’t tried this yet but it looks pretty powerful, including a configurable edit distance and the ability to set up custom “soundslike” terms for words with known unusual spellings.

# 29th January 2018, 5:24 am / sqlite

6M observations total! Where has iNaturalist grown in 80 days with 1 million new observations? Citizen science app iNaturalist is seeing explosive growth at the moment—they’ve been around for nearly a decade but 1/6 of the observations posted to the site were added in just the past few months. Having tried the latest version of their iPhone app it’s easy to see why: snap a photo of some nature and upload it to the app and it will use surprisingly effective machine learning to suggest the genus or even the individual species. Submit the observation and within a few minutes other iNaturalist community members will confirm the identification or suggest a correction. It’s brilliantly well executed and an utter delight to use.

# 28th January 2018, 8:18 pm / computer-vision, crowdsourcing, machine-learning, science, citizenscience, inaturalist

How did the Roman Republic determine its budget? Fascinating answer on the AskHistorians subreddit about how taxation worked in the Roman Empire. Since the republic was almost permanently at war, and was very good at it, no taxes were levied on Roman citizens in Italy from 167 B.C. onwards.

# 27th January 2018, 4:51 pm / historians

Domains Search for Web: Instant, Serverless & Global (via) The team at Zeit are pioneering a whole bunch of fascinating web engineering architectural patterns. Their new domain name autocomplete search uses Next.js and server-side rendering on first load, then switches to client-side rendering from then on. It can then load results asynchronously over a custom WebSocket protocol as the microservices on the backend finish resolving domain availability from the various different TLD providers.

# 26th January 2018, 1:14 am / domains, websockets, zeit-now, microservices

django-postgres-copy (via) Really neat Django queryset add-on which exposes the PostgreSQL COPY statement for importing (and exporting) CSV data. MyModel.objects.from_csv(“filename.csv”). Built by the team of data journalists at the California Civic Data Coalition.

# 26th January 2018, 12:43 am / csv, django, postgresql

Nicaraguan Address System (via) “Instead of street names or numbers Nicaraguans use reference points from where they start describing a certain address. [...] There are instances, however, in which the reference points do not exist anymore!”

# 21st January 2018, 4:32 pm / gis

How to turn a list of JSON objects into a Datasette. ramadis on GitHub cleaned up data on 184,879 crimes reported in Buenos Aires since 2016 and shared them on GitHub as a JSON file. Here are my notes on how to use Pandas to convert JSON into SQLite and publish it using Datasette.

# 20th January 2018, 1:07 am / json, pandas, sqlite, datasette

GaretJax/django-click (via) I’ve been using Click to write command-line tools in Python recently (big datasette and csvs-to-sqlite use it) and its a delightful way of composing simple and complex CLI interfaces. I’ve always found Django’s default management command syntax hard to fit in my head—django-click means I can combine the two.

# 19th January 2018, 11:19 pm / cli, django

Generating polygon representing a rough 100km circle around latitude/longitude point using Python. A question I posted to the GIS Stack Exchange—I found my own answer using a Python library called geog, then someone else posted a better solution using pyproj.

# 17th January 2018, 8:57 pm / gis, python

API 2.0: Log-In with ZEIT, New Docs & More. Here’s Zeit’s write-up of their brand new API 2.0, which adds OAuth support and allows anything that can be done with their command-line tools to be achieved via their public API as well. This is the enabling technology that allowed me to build Datasette Publish.

# 17th January 2018, 3:23 pm / zeit-now

A SIM Switch Account Takeover (Mine). Someone walked into a T-Mobile store with a fake ID in his name and stole Albert Wenger’s SIM identity, then used it to gain access to his Yahoo mail account, reset his Twitter password and post a tweet boosting a specific cryptocurrency. His accounts with Google Authenticator 2FA stayed safe.

# 14th January 2018, 8:37 pm / identitytheft, security, sms

How the industry-breaking Spectre bug stayed secret for seven months. It’s pretty amazing that the bug only became public knowledge a week before the intended embargo date, considering the number of individuals and companies that has to be looped in. The biggest public clues were patches being applied in public to the Linux kernel—one smart observer noted that the page table issue “has all the markings of a security patch being readied under pressure from a deadline.”

# 14th January 2018, 4:53 pm / security

Telling stories through your commits. Joel Chippendale’s excellent guide to writing a useful commit history. I spend a lot of time on my commit messages, because when I’m trying to understand code later on they are the only form of documentation that is guaranteed to remain up-to-date against the code at that exact point of time. These tips are clear, concise, teadabale and include some great examples.

# 13th January 2018, 7:44 pm / git, sourcecontrol

Notes on Kafka in Python. Useful review by Matthew Rocklin of the three main open source Python Kafka client libraries as of October 2017.

# 13th January 2018, 7:40 pm / python, kafka

Incident report: npm. Fascinating insight into the challenges involved in managing a massive scale community code repository. An algorithm incorrectly labeled a legit user as spam, an NPM staff member acted on the report, dependent package installations started failing and because the package had been removed as spam other users were able to try and fix the bug by publishing fresh copies of the missing package to the same namespace.

# 11th January 2018, 5:27 pm / security, spammers, npm

How to compile and run the SQLite JSON1 extension on OS X. Thanks, Stack Overflow! I’ve been battling this one for a while—it turns out you can download the SQLite source bundle, compile just the json1.c file using gcc and load that extension in Python’s sqlite3 module (or with Datasette’s --load-extension= option) to gain access to the full suite of SQLite JSON functions—json(), json_extract() etc.

# 10th January 2018, 9:01 pm / json, macos, sqlite, stackoverflow, datasette

Years

Tags