Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

Facebook Adds Code for Clickjacking Prevention. Clever technique: Facebook pages check to see if they are being framed (using window.top) and, if they are, add a div covering the whole page which causes a top level reload should anything be clicked on. They also log framing attempts using an image bug.

# 13th March 2010, 10:42 am / clickjacking, facebook, framing, joey-tyson, phishing, security

Reddit is now running on Cassandra. Migrating their persistent cache over from memcacheDB to Cassandra took one developer just ten days.

# 13th March 2010, 12:14 am / caching, cassandra, memcachedb, reddit

Redis weekly update #1—Hashes and... many more! Hashes were the big missing data type in Redis—support is only partial at the moment (no ability to list all keys in a hash or delete a specific key) but at the rate Redis is developed I expect that to be fixed within a week or two.

# 13th March 2010, 12:06 am / hashes, keyvaluestores, nosql, redis

Introducing the PyPy 1.2 release. It’s been a long time coming, but 1.2 is the first PyPy release to ship with a Just-in-Time compiler! Performance looks pretty impressive.

# 12th March 2010, 11:54 pm / jit, performance, pypy, python

RE2: a principled approach to regular expression matching. Google have open sourced RE2, the C++ regular expression library they developed for Google Code Search, Sawzall, Bigtable and other internal projects. Unlike PCRE it avoids the potential for exponential run time and unbounded stack usage and guarantees that searches complete in linear time, mainly by dropping support for back references.

# 12th March 2010, 9:28 am / c-plus-plus, google, open-source, pcre, re2, regex, russcox

Cache Machine: Automatic caching for your Django models. This is the third new ORM caching layer for Django I’ve seen in the past month! Cache Machine was developed for zamboni, the port of addons.mozilla.org to Django. Caching is enabled using a model mixin class (to hook up some post_delete hooks) and a custom caching manager. Invalidation works by maintaining a “flush list” of dependent cache entries for each object—this is currently stored in memcached and hence has potential race conditions, but a comment in the source code suggests that this could be solved by moving to redis.

# 11th March 2010, 7:35 pm / cachemachine, caching, django, memcached, mozilla, orm, ormcaching, python, redis

Automate EC2 Instance Setup with user-data Scripts (via) I knew about EC2’s user-data feature—what I didn’t know is that the Alestic and Canonical images are configured so that if the user-data starts with #! the instance will automatically execute it as a shell script as soon as it boots up (after networking has been configured).

# 11th March 2010, 12:31 pm / deployment, ec2, sysadmin, userdata

grammar.coffee (via) The annotated grammar for CoffeeScript, a new language that compiles to JavaScript developed by DocumentCloud’s Jeremy Ashkenas. The linked page is generated using Jeremy’s Docco tool for literate programming, also written in CoffeeScript. CoffeeScript itself is implemented in CoffeeScript, using a bootstrap compiler originally written in Ruby.

# 8th March 2010, 7:27 pm / coffeescript, compilers, docco, documentcloud, javascript, jeremy-ashkenas, literateprogramming, programming, ruby, selfhosting

Scott and Scurvy. Did you know that Scott’s 1911 expedition to the south pole was plagued by scurvy, despite the British navy having discovered an effective cure way back in the 18th century? A fascinating tale of how scientific advances can lead to surprising regressions.

# 8th March 2010, 6:15 pm / maciej-ceglowski, science, scurvy

Announcing django-cachebot. The ORM caching space around Django is heating up. django-cachebot is used in production at mingle.com and takes a more low level approach to cache invalidation than Johnny Cache, enabling you to specifically mark the querysets you wish to cache and providing some advanced options for cache invalidation. Unfortunately it currently relies on a patch to Django core to enable its own manager.

# 6th March 2010, 12:48 pm / cachebot, caching, django, mingle, orm, ormcaching

How To Be A Good Participant On A Panel: Disagree. When I’m on a panel, I always try to have lunch or dinner with the other panelists beforehand to figure out in advance what points we disagree on.

# 5th March 2010, 9 am / panels, speaking

Geospatial Indexing in MongoDB (via) New in version 1.3.3. Handles “order by distance from” queries using a geohash approach under the hood, automatically searching nearby grid squares until the correct number of results have been gathered. Bounding box search is planned for a future release.

# 2nd March 2010, 8:12 pm / geohash, geospatial, gis, mongodb

Is johnny-cache for you? “Using Johnny is really adopting a particular caching strategy. This strategy isn’t always a win; it can impact performance negatively”—but for a high percentage of Django sites there’s a very good chance it will be a net bonus.

# 2nd March 2010, 11:44 am / caching, django, johnnycache, performance, python

Some People Can’t Read URLs. Commentary on the recent “facebook login” incident from Jono at Mozilla Labs. I’d guess that most people can’t read URLs, and it worries me more than any other aspect of today’s web. If you want to stay safe from phishing and other forms of online fraud you need at least a basic understanding of a bewildering array of technologies—URLs, paths, domains, subdomains, ports, DNS, SSL as well as fundamental concepts like browsers, web sites and web servers. Misunderstand any of those concepts and you’ll be an easy target for even the most basic phishing attempts. It almost makes me uncomfortable encouraging regular people to use the web because I know they’ll be at massive risk to online fraud.

# 2nd March 2010, 10:16 am / domains, facebook, phishing, security, urls

Running Processes. I’ve been searching for a good solution to this problem (“run this program, and restart it if it falls over”) for years. I’m currently using god which works pretty well, but according to this article I should be learning upstart instead. It never ceases to amaze me how difficult this is, and how obtuse the tools are.

# 2nd March 2010, 9:55 am / god, linux, processes, ubuntu, unix, upstart

Internet Explorer: Global Variables, and Stack Overflows. An extremely subtle IE bug—if your recursive JavaScript function is attached directly to the window (global) object, IE won’t let you call it recursively more than 12 times.

# 2nd March 2010, 9:21 am / bugs, internet-explorer, javascript, recursion

GeoPlanet Explorer. Chris Heilmann’s YQL powered explorer for the invaluable Yahoo! GeoPlanet / WhereOnEarth dataset. Every API deserves an explorer of some sort.

# 2nd March 2010, 8:14 am / apis, christian-heilmann, geoplanet, yahoo, yql

jmoiron.net: Johnny Cache. The blog entry announcing Johnny Cache (“a drop-in caching library/framework for Django that will cache all of your querysets forever in a consistent and safe manner”) to the world.

# 1st March 2010, 11:48 am / caching, django, johnnycache, python, querysets

Notes from a production MongoDB deployment. Notes from running MongoDB for 8 months in production, with 664 million documents spread across 72 GB master and slave servers located in two different data centers.

# 28th February 2010, 11:05 pm / mongodb, scaling, sysadmin

Node.js, redis, and resque (via) Paul Gross has been experimenting with Node.js proxies for allowing web applications to be upgraded without missing any requests. Here he places all incoming HTTP requests in a redis queue, then has his backend Rails servers consume requests from the queue and push the responses back on to a queue for Node to deliver. When the backend application is upgraded, requests remain in the queue and users see a few seconds of delay before their request is handled. It’s not production ready yet (POST requests aren’t handled, for example) but it’s a very interesting approach.

# 28th February 2010, 11:02 pm / highavailability, http, javascript, nodejs, rails, redis

Johnny Cache. Clever twist on ORM-level caching for Django. Johnny Cache (great name) monkey-patches Django’s QuerySet classes and caches the result of every single SELECT query in memcached with an infinite expiry time. The cache key includes a “generation” ID for each dependent database table, and the generation is changed every single time a table is updated. For apps with infrequent writes, this strategy should work really well—but if a popular table is being updated constantly the cache will be all but useless. Impressively, the system is transaction-aware—cache entries created during a transaction are held in local memory and only pushed to memcached should the transaction complete successfully.

# 28th February 2010, 10:55 pm / caching, databases, django, memcached, orm, ormcaching, performance, python

Unit Testing Achievements. A plugin for Python’s nose test runner that adds achievements—“Night Shift: Make a failing suite pass between 12am and 5am.”

# 28th February 2010, 3:56 pm / nose, nosetest, python, testing

ClearMaps: A Mapping Framework for Data Visualization. An open source library for map visualisations using ActionScript, with an Adobe AIR based encoding tool for translating data from shapefiles in to vector data suitable for use with the library.

# 28th February 2010, 3:52 pm / clearmaps, flash, mapping, visualisations

kriszyp’s node-promise. Another elegant approach to managing asynchronous flows in Node, including running things both in parallel and serial.

# 28th February 2010, 3:50 pm / callbacks, javascript, nodejs, promise

Django Advent: Scaling Django. Mike Malone’s advice on scaling Django applications, including taking advantage of new features in 1.2.

# 26th February 2010, 7:22 pm / django, mike-malone, scaling

PiCloud. An interesting twist on cloud computing for Python. “import cloud; cloud.call(my_function, arguments)” serialises my_function and its arguments, pushes it up to one of their EC2 servers and hands you back a job ID which you can poll (or block on) for a response. They suggest using it for long running tasks such as web crawling or image processing.

# 26th February 2010, 6:25 pm / cloud, cloud-computing, picloud, python

Internet Explorer Cookie Internals (FAQ). Grr... IE 6, 7 and 8 don’t support the max-age cookie argument, forcing you to use an explicit expiry date instead. This appears to affect the cache busting cookie pattern, where you set a cookie to expire in 30 seconds for any user who posts content and use the presence of that cookie to skip caches and/or send their queries to a master instead of slave database. If you have to use expires, users with incorrect system clocks may get inconsistent results. Anyone know of a workaround?

# 26th February 2010, 12:25 pm / cachebusting, caching, cookies, internet-explorer

Using Bash’s History Effectively. The HISTIGNORE environment variable is particularly useful, allowing you to suppress certain commands by specifying a pattern. This article has a tip for causing a command to be omitted from the history if you prefix it with a space.

# 25th February 2010, 12:21 pm / bash

Hg Init. Joel Spolsky’s guide to Mercurial. If you’re still using a non-distributed version control system like Subversion, this is a great introduction to the new world order.

# 25th February 2010, 10:31 am / distributedvcs, joel-spolsky, mercurial, subversion

“Do” it fast! Tim Caswell’s Do library has been upgraded for compatibility with Node v0.1.30, and now has a clever Do.convert() method which wraps Node’s low-level APIs with the Do libraries “continuable” abstraction.

# 22nd February 2010, 7:02 pm / do, javascript, nodejs, tim-caswell

Years

Tags