Simon Willison’s Weblog

Subscribe

October 2017

Oct. 1, 2017

Getting the blog back together

Getting this blog up and running again has turned out to be one of those side-projects that keeps threatening to fall down a rabbit hole.

[... 160 words]

Oct. 2, 2017

Running gunicorn behind nginx on Heroku for buffering and logging

Heroku’s default setup for Django uses the gunicorn application server. Each Heroku dyno can only run a limited number of gunicorn workers, which means a limited number of requests can be served in parallel (around 4 per dyno is a good rule of thumb).

[... 400 words]

Should You Build a Webhooks API? (via) We had to solve for pretty much all of these issues when we built Eventbrite’s webhooks—this article would have saved us a lot of time!

# 5:01 am / webhooks

Oct. 3, 2017

Streams: a new general purpose data structure in Redis. Exciting new Redis feature inspired by Kafka: redis streams, which allow you to construct an efficient, in-memory list of messages (similar to a Kafka log) which clients can read sections of or block against and await real-time delivery of new messages. As expected from Salvatore the API design is clean, obvious and covers a wide range of exciting use-cases. Planned for release with Redis 4 by the end of the year!

# 3:25 pm / redis, kafka, salvatore-sanfilippo

Oct. 4, 2017

Feature request: a batch version of mapzen.places.getHierarchiesByLatLon. I’ve been having a lot of fun filing issues against various Mapzen / Who’s On First repositories recently—mainly because the team there are incredibly responsive to suggestions and feature requests. Here’s a fun thread where Aaron Straup Cope and myself have been bouncing around some ideas around batch API design.

# 11:51 pm / geo, mapzen, whosonfirst, aaron-straup-cope

Oct. 5, 2017

Learning a language behind bars is a good way to meet foreign nationals and hear interesting stories about their lives back home and how they ended up doing porridge. I learnt a bit of Spanish with a 70 year-old Columbian grandfather who’d hit hard times and tried to bring a suitcase full of cocaine through Heathrow. He didn’t speak a word of English before jail and had learnt it all from the Cockney geezers on the wing. As a result he didn’t understand basic outside world vocabulary such as ‘traffic cone’ or ‘coat hanger’, but he did talk about ‘avin a bubble with his china plates’

Carl Cattermole

# 1:49 pm / prison

Implementing faceted search with Django and PostgreSQL

Visit Implementing faceted search with Django and PostgreSQL

I’ve added a faceted search engine to this blog, powered by PostgreSQL. It supports regular text search (proper search, not just SQL“like” queries), filter by tag, filter by date, filter by content type (entries vs blogmarks vs quotation) and any combination of the above. Some example searches:

[... 3,103 words]

PostgreSQL 10 Released. Highlights include major improvements to parallelized queries, quorum commit for synchronous replication (sounds reminiscent of Cassandra) and logical replication, which allows modifications to specific tables to be replicated to different clusters. They’re also changing their versioning scheme to Major.Minor, so the next minor release will be 10.1 and the next major release will be 11.

# 3:31 pm / replication, postgresql

Building a statistical profiler in python. Generating flame graphs of production Python code is surprisingly straight-forward. Brian Pitts built a statistical flame graph profiler into our tikibar debugging tool at Eventbrite and it’s proved extremely useful.

# 3:44 pm / profiling, tikibar, django, python

PyPy v5.9 Released, Now Supports Pandas, NumPy. NumPy and Pandas now work on PyPy2.7. “Many other modules based on C-API extensions work on PyPy as well.”

# 4:58 pm / pypy, pandas, numpy

Oct. 6, 2017

SQL Fiddle demonstrating the PostgreSQL to_tsvector() function (via) SQL Fiddle is amazing—it’s an interactive pastebin that lets you execute queries against MySQL, PostgreSQL, Oracle, SQLite & SQL Server, and then share both the input and the results by sending around the resulting URL. Here I’m using it to demonstrate that stripping tags before indexing documents in PostgreSQL is unnecessary because the ts_vector() function already does that for you.

# 10:11 pm / sql, postgresql

Oct. 7, 2017

Should I build my startup’s web-based product as if it’s going to one day be widely adopted and experience high-volume, or instead focus on quick delivery over scalability?

Absolutely the second: build for rapid learning, not for eventual scalability. The vast majority of startups fail, and the number one reason they fail is that they didn’t achieve product-market fit: they failed to build something that customers actually wanted.

[... 169 words]

uvloop: Blazing fast Python networking. “uvloop makes asyncio fast. In fact, it is at least 2x faster than nodejs, gevent, as well as any other Python asynchronous framework. The performance of uvloop-based asyncio is close to that of Go programs.”

# 5:53 pm / async, python

Sanic. “Sanic is a Flask-like Python 3.5+ web server that’s written to go fast [...] On top of being Flask-like, Sanic supports async request handlers. This means you can use the new shiny async/await syntax from Python 3.5, making your code non-blocking and speedy”.

# 6:39 pm / async, sanic, python

Oct. 8, 2017

Facets. New open source visualization and data exploration tool from Google (“Disclaimer: This is not an official Google product”, whatever that means). It’s intended for visualizing machine learning datasets but it’s obviously useful outside of ML as well—any time you need to understand a large dataset this looks like it could be extremely useful. Ships with example jupyter notebooks and an easy mechanism for embedding the Facets interactive UI directly inside a notebook cell.

# 12:21 am / facetedsearch, visualisation, google, jupyter

China Demonstrates Quantum Encryption By Hosting a Video Call. This reads like pure science fiction. “Pan’s team first established a connection and generated a secure key between a ground station in Xinglong and the Micius satellite as it passed overhead, orbiting about 500 kilometers above Earth. [...] Next, the Chinese team waited for Micius to pass over Vienna, where their collaborators at the Austria Academy of Sciences were waiting to also receive the key from the satellite. Then, with the keys in hand, the groups initiated a video conference and used those keys to encrypt the video data through a standard VPN protocol.”

# 2:49 am / quantum, encryption

C is a bit like Latin these days. We no longer write everything in it, but knowing it affords deeper knowledge of more-recent languages.

Norman Wilson

# 4:03 pm / c

Dillinger. I really like this online Markdown editor. It has source syntax highlighting, live previews of the generated HTML and it constantly syncs to localStorage so you won’t lose your work if you accidentally shut your browser window. The code is also available open source on GitHub.

# 6:38 pm / markdown

Recovering missing content from the Internet Archive

When I restored my blog last weekend I used the most recent SQL backup of my blog’s database from back in 2010. I thought it had all of my content from before I started my 7 year hiatus, but in watching the 404 logs I started seeing the occasional hit to something that really should have been there but wasn’t. Turns out the SQL backup I was working from was missing some content.

[... 636 words]

Generate dynamic titles for /search/ pages. Fun little enhancement to my faceted search implementation: I now generate dynamic titles for each search results page describing the search, e.g. “Blogmarks tagged security in Feb, 2005” or ““python” in quotations tagged ruby, python in 2007”.

# 7:47 pm / facetedsearch, django

Removing MediaWiki from SPA: Cool URIs don’t change (via) Detailed write-up from Anna Shipman describing how she archived an old MediaWiki as static content using recursive wget and some cunning application of mod_rewrite.

# 7:54 pm / urls, annashipman

fd (via) “A simple, fast and user-friendly alternative to find.” Written in rust, with a less confusing default command-line syntax than the regular find command. Microbenchmark shows it running 7x faster. Install it on OS X using “brew install fd”.

# 9:27 pm / unix, homebrew, rust

Oct. 9, 2017

I have come to the conclusion that the real heroes of ideas are not the people who have them – they are the people who buy them

David Gluckman

# 1:38 pm / ideas

What happened: Hillary’s view. Lawrence Lessig’s review of Hillary Clinton’s book, including his thoughts on the failures of commercial media in covering the story of Russian interference during those crucial closing weeks before the election.

# 3:25 pm / lawrencelessig

Oct. 10, 2017

The Absurdly Underestimated Dangers of CSV Injection. This is horrifying. A plain old CSV file intended for import into Excel can embed formulas (a value prefixed with an equals symbol) which can execute system commands—with a big honking security prompt that most people will likely ignore. Even worse: they can embed IMPORTXML() functions that can silently leak data from the rest of the sheet to an external URL—and those will work against Google Sheets as well as Excel.

# 4:13 am / csv, security

Changelogs to help understand the fires in the North Bay

The situation in the counties north of San Francisco is horrifying right now. I’ve repurposed some of the tools I built to for the Irma Response project last month to collect and track some data that might be of use to anyone trying to understand what’s happening up there. I’m sharing these now in the hope that they might prove useful.

[... 383 words]

Oct. 11, 2017

WPO Stats (via) “Case studies and experiments demonstrating the impact of web performance optimization (WPO) on user experience and business metrics.“

# 1:36 am / web-performance

My essential django package list. Insightful list of Django packages—many of which I hadn’t seen before—by Serafeim Papastefanos, each with a handy explanation of what it’s useful for and why.

# 2:42 pm / django

Cypress (via) Promising looking new open source testing framework for full-blown web integration testing—a modern alternative to Selenium. I spent five minutes playing with the demo and was really impressed by it—especially their “time travel” feature which lets you hover over a passed test and see the state of the browser when each of those assertions was executed.

# 4:14 pm / testing, selenium, cypress

Oct. 12, 2017

How the CIA Staged Sham Academic Conferences to Thwart Iran’s Nuclear Program. “The importance of a conference may be measured not only by the number of Nobel Prize winners or Oxford dons it attracts, but by the number of spies. U.S. and foreign intelligence officers flock to conferences for the same reason that Army recruiters concentrate on low-income neighborhoods: They make the best hunting grounds. While a university campus may have only one or two professors of interest to an intelligence service, the right conference — on drone technology, perhaps, or ISIS — may have dozens.”

# 12:20 am / conferences, cia

2017 » October

MTWTFSS
      1
2345678
9101112131415
16171819202122
23242526272829
3031