Simon Willison’s Weblog

Subscribe

November 2009

Nov. 1, 2009

No PDFs! The Sunlight Foundation point out that PDFs are a terrible way of implementing “more transparent government” due to their general lack of structure. At the Guardian (and I’m sure at other newspapers) we waste an absurd amount of time manually extracting data from PDF files and turning it in to something more useful. Even CSV is significantly more useful for many types of information.

# 12:04 pm / opengovernment, sunlightfoundation, adobe, csv, opendata, pdf

pudb. A full-screen, curses console based visual debugger for Python, built using the urwid console UI library.

# 12:09 pm / python, pdb, debugger, urwid, ui, console, pudb

4store Amazon Machine Image. Instructions for firing up an EC2 AMI running the recently released 4store high performance triple store and loading in 1.14 billion statements collected by crawling the semantic web.

# 12:12 pm / semanticweb, semweb, 4store, triplestore, ec2, ami

Traffic Server. Mark Nottingham explains the release of Traffic Server, a new Apache Incubator open source project donated by Yahoo! using code originally developed at Inktomi around a decade ago. Traffic Server is a HTTP proxy/cache, similar to Squid and Varnish (though Traffic Server acts as both a forward and reverse proxy, whereas Varnish only handles reverse).

# 12:15 pm / trafficserver, yahoo, inktomi, mark-nottingham, open-source, apache, http, cache, proxy, squid, varnish

Adobe is Bad for Open Government. The problem isn’t just that PDFs are a bad way of sharing data, it’s that Adobe have been actively lobbying the US government to use their PDF and Flash formats for open government initiatives.

# 12:51 pm / opengovernment, adobe, flash, pdf, sunlightfoundation

Cartographer.js. “Thematic mapping for Google Maps”—which means an easy way of adding heat maps (aka chloropleths), pie charts and point clusters as a layer over a Google map.

# 1:20 pm / google-maps, google, catography, maps, mapping, heatmaps, piecharts, graphs, infographics, visualisation, chloropleths

Nov. 2, 2009

Exploring Python (via) Notes from the introduction to Python presentation I gave today at Stack Overflow DevDays Amsterdam.

# 3:35 pm / stackoverflow, devdays, speaking, python

Nov. 3, 2009

HTML has always been a conversation between browser makers, authors, standards wonks, and other people who just showed up and liked to talk about angle brackets. Most of the successful versions of HTML have been “retro-specs,” catching up to the world while simultaneously trying to nudge it in the right direction. Anyone who tells you that HTML should be kept “pure” (presumably by ignoring browser makers, or ignoring authors, or both) is simply misinformed. HTML has never been pure, and all attempts to purify it have been spectacular failures, matched only by the attempts to replace it.

Mark Pilgrim

# 7:20 am / html, html5, standards, mark-pilgrim

Large Problems in Django, Mostly Solved: Search. Eric Holscher shows how Haystack uses a number of common Django patterns (object registration, pluggable backends, QuerySet-style chaining and class-based views) to great effect in creating a powerful search application for Django. Makes me wonder if more of those patterns should be promoted to first class concepts within Django.

# 10:42 am / django, eric-holscher, search, haystack, patterns, classbasedviews, python

I loathe [hardware load balancers]. They’re expensive, restrictive, slow, and generally cause you a lot more pain and suffering than they’re worth. At my last job, one of my projects was to convert most of one of our existing clusters from a load-balancing appliance to use keepalived. Why would we do this? Because the $100k worth of appliance wasn’t capable of doing the job that $15k worth of commodity hardware and an installation of keepalived were handling with ease.

Matt Palmer

# 10:45 am / keepalived, ops, matt-palmer, sysadmin, load-balancing

Using Graphics Card Memory as Swap (via) Interesting idea: “Graphic cards contain a lot of very fast RAM, typically between 64 and 512 MB. With Linux, it’s possible to use it as swap space, or even as RAM disk.”

# 11:01 am / ram, linux, graphicscards, performance, ops, sysadmin, memory

Nov. 4, 2009

Introducing Resque. A new background worker management queue developed at GitHub, using Redis for the persistence layer. The blog post explains both the design and the shortcomings of previous solutions at length. Within 24 hours of the release code an external developer, Adam Cooke, has completely reskinned the UI.

# 8:20 pm / resque, open-source, redis, github, queue, workers, ruby, sinatra

Frank Wierzbicki: Leaving Sun. Frank performed miracles at Sun and before, helping bring the Jython project out of stasis and turning it in to an active, community maintained modern Python implementation. If you’re looking for an expert Python/Java/Dynamic languages guy you should snap him up.

# 10:33 pm / sun, jython, python, java, frank-wierzbicki

clarity. A web interface for tailing and grepping the log files in /var/log, written in Ruby and EventMachine.

# 10:36 pm / clarity, ruby, eventmachine, logging

UK Scale Camp. We’re hosting a one day web performance and scalability unconference at the Guardian on the 4th of December. If you’re involved in running a high-scale website in the UK (or abroad) we’d love you to come along. Spaces are going fast.

# 11:12 pm / ukscalecamp, guardian, scalability, events, unconference, performance

Introducing the YUI 3 Gallery. Write a plugin for YUI3, BSD license it and sign a CLA and Yahoo! will push your module out to their CDN and make it loadable using the YUI().use() statement. They’re coordinating the submissions using GitHub.

# 11:14 pm / cla, bsd, github, javascript, git, open-source, yahoo, yui, yui3

Nov. 5, 2009

Facebook and MySpace security: backdoor wide open, millions of accounts exploitable (via) Amazingly, both services had wide open holes in their crossdomain.xml files. Facebook were serving allow-access-from-domain=“*” in the crossdomain.xml file on one of their subdomains (a subdomain that still had access to the user’s profile information) while MySpace were opting in farm.sproutbuilder.com, a service which allowed anyone to upload arbitrary SWF files.

# 9:47 am / crossdomainxml, flash, security, facebook, myspace, swf

Google Dashboard. New Google product which shows exactly how much information Google have stored against your account, all on one page. This is a really useful tool, and hopefully will help set a powerful precedent for other sites to follow.

# 2:03 pm / google, dashboard, privacy

Cross-domain policy file usage recommendations for Flash Player. One of the best explanations of the security implications of crossdomain.xml files I’ve seen. If you host a crossdomain.xml file with allow-access-from domain=“*” and don’t understand all of the points described here, you probably have a nasty security vulnerability.

# 4:24 pm / crossdomainxml, flash, security, adobe

If you are demanding registration before checkout, you need to cease this practice immediately. It is costing you a fortune.

Bruce Tognazzini

# 7:22 pm / registration, login, signup, ia, brucetognazzini

Nov. 6, 2009

Introducing Closure Tools. Google have released the pure-JavaScript library, apparently used for Gmail, Google Docs and Google Maps. It comes with a powerful JavaScript optimiser tool with linting built in and an accompanying Firebug extension to ensure the obfuscated code it produces can still be debugged. There’s also a template system which precompiles down to JavaScript and can also be called from Java.

# 7:33 am / closure, google, javascript, libraries, firebug, gmail, google-docs

It’s interesting to me how much [Closure] feels like a more advanced version of Dojo in many ways. There's a familiar package system, the widgets are significantly more mature, and Julie and Ojan's Editor component rocks. The APIs will feel familiar (if verbose) to Dojo users, the class hierarchies seem natural, and Closure even uses Acme, the Dojo CSS selector engine.

Alex Russell

# 7:35 am / alex-russell, closure, acme, css, dojo, javascript, google

Python in the Scientific World. Python continues to make strides in the scientific world—and the Hubble Space Telescope team have been using it for 10 years!

# 11:04 am / guido-van-rossum, python, science, scipy, hubblespacetelescope, astronomy

Multitouch on Unibody MacBooks. FingerMgt is a lovely little app that illustrates quite how sensitive the touchpad on modern MacBooks is —it can track up to 11 touch points and measure pressure as well as location.

# 2:44 pm / multitouch, macbook, macbookpro, apple

Nov. 9, 2009

One way to establish that peace-preserving threat of mutual assured destruction is to commit yourself beforehand, which helps explain why so many retailers promise to match any competitor's advertised price. Consumers view these guarantees as conducive to lower prices. But in fact offering a price-matching guarantee should make it less likely that competitors will slash prices, since they know that any cuts they make will immediately be matched. It's the retail version of the doomsday machine.

James Surowiecki

# 10:06 am / james-surowiecki, new-yorker, pricing, amazon, walmart

Django-Jython 1.0.0 released! Now with database backends for PostgreSQL, Oracle and MySQL. The next release (planned for next month) should provide full compatibility with Django 1.1—the current release has 1.1 support for PostgreSQL but only 1.0 support for the other two databases.

# 1:53 pm / oracle, mysql, postgresql, django, jython, python, leosoto

Fabric 0.9.0. A Python-based SSH automation and deployment tool. Released today, 0.9.0 is finally the official “stable” release—which is good, as it breaks API compatibility with previous versions and caused me all sorts of confusion when I tried to learn Fabric recently.

# 2:02 pm / fabric, python, deployment, ssh

Fixing Poor MySQL Default Configuration Values. Some tips from Jeremy Zawodny on configuring MySQL for high traffic environments—he suggests skip-name-resolve, connect_timeout=20, thread_cache_size=not-zero, max_connect_errors=very-high-number, slave_net_timeout=30.

# 5:07 pm / jeremy-zawodny, mysql

node.js. “Evented I/O for V8 JavaScript”—a JavaScript environment built on top of the super-fast V8 engine which provides event-based IO functionality for building highly concurrent TCP and HTTP servers. The API design is superb—everything is achieved using JavaScript events and callbacks (even regular file IO) and the small standard library ships with comprehensive support for HTTP and DNS. Overall it’s very similar to Twisted and friends, but JavaScript’s anonymous function syntax feels more natural than the Python equivalent. It compiles cleanly on Snow Leopard. Definitely a project to watch.

# 11:25 pm / node, javascript, io, v8, eventbasedio, twisted, http, dns

Nov. 10, 2009

Correct way to handle mobile browsers. If your site has an equivalent “mobile” version running on a different subdomain, how and when should you redirect mobile users to it and how should you let them opt in or opt out?

# 8:57 am / mobile, usability, django, eric-holscher, redirect

2009 » November

MTWTFSS
      1
2345678
9101112131415
16171819202122
23242526272829
30