Simon Willison’s Weblog

Subscribe

February 2009

Feb. 2, 2009

microprinter wiki. “Hackers across the country are buying up old old receipt printers and imaginatively repurposing them into something new. We call them microprinters.” The CBM 231 is the most common model, and is currently selling on eBay for $20 or so. Something a bit smaller would be fun for travelling with.

# 7:20 am / microprinters, cbm231, papercamp

Feb. 4, 2009

Post-Commit Web Hooks for Google Code Project Hosting (via) I really, really like web hooks (which I’ve been calling “callback APIs”, but it looks like “web hooks” is the term that’s sticking). I’m interested in their scaling challenges—I’ve heard XMPP advocates argue that a web hook style model simply won’t scale for really large sites.

# 10:22 am / webhooks, googlecodehosting, google

juno. An ultra-lightweight Python web framework inspired by Ruby’s Sinatra.

# 10:48 am / python, ruby, sinatra, juno, webframeworks

solango. Another attempt at a Django/Solr integration library, based on code written for “a top 20 newspaper site” (I’d love to know which one). This is well documented, uses a registration model clearly inspired by the Django admin which keeps search related metadata out of your regular models and includes management commands for re-indexing and generating Solr schema.xml files.

# 12:22 pm / solr, django, python, lucene, search

Feb. 5, 2009

OSM needs new servers. OpenStreetMap need to raise £10,000 to buy a new API database server.

# 7:27 pm / openstreetmap, donations, fundraising

Recreating the button. Fascinating article from Doug Bowman on the work that went in to creating custom CSS buttons for use across Google’s different applications, avoiding images to improve performance ensure they could be easily styled using just CSS. I’d love to see the Google Code team turn this in to a full open source release—the more sites using these buttons the more familiar they will become to users at large.

# 9:50 pm / buttons, css, google

Much like an oral agreement, publishing microformats is an informal agreement between you and (hopefully) a developer community that sets up a relationship with plenty of vagueness, inertial resistance to change, and potential landmines to step on. Would you create a real developer API without a TOS, agreement, or at the very least, guidelines? [...] are you prepared to announce all frontend markup changes? Does publishing a microformat without a special agreement mean that you are implicitly allowing comprehensive scraping of your web data?

Gordon Luk

# 10:46 pm / microformats, gordonluk, apis

FireScope. Neat little Firefox / Firebug extension which adds a “Reference” tab showing documentation for the selected element from the comprehensive SitePoint Reference site.

# 10:51 pm / sitepoint, firefox, firebug, firescope, extensions, documentation, reference, css, html

A couple of years ago when I was working on a now defunct search engine, we were using ActiveMQ to pass messages between the frontend and the backend. The system was unreliable, flaky, and hard to debug. It delivered exactly none of the reliability queues promised. [...] More likely there's something wrong with the whole design of network systems based on message queues, and we need to start developing alternatives.

Elliotte Rusty Harold

# 10:53 pm / message-queues, activemq, elliotte-rusty-harold

Apache Qpid. A new open source AMQP message queue with implementations in C++ and Java, developed by engineers from Red Hat, IONA and JP Morgan Chase. Anyone tried this yet? Looks pretty good on paper.

# 11:01 pm / apache, qpid, java, c-plus-plus, amqp, message-queues, red-hat, iona, jpmorganchase, open-source

Announcing the Article Search API. The most interesting API from the NYTimes yet—search against 2.8 million articles from 1981 until today using 35 searchable fields and get back detailed metadata as well as the first paragraph of the articles themselves.

# 11:06 pm / new-york-times, apis, newspapers, search

jessenoller.com—python magazine. Jesse Noller has been sharing his articles originally published in Python Magazine. Topics include SSH programming with Paramiko, context managers and the with statement and an excellent explanation of Python’s threading support and the global interpreter lock.

# 11:10 pm / jessenoller, python, pythonmagazine, ssh, paramiko, threading, gil, contextmanagers

Load Balancing in Amazon EC2 with HAProxy. Solid tutorial introduction to HAProxy.

# 11:12 pm / ec2, haproxy, load-balancing, griggheorghiu

Feb. 6, 2009

... Facebook will be hosting the second User Experience Summit for OpenID on February 10th. The goal is to convene some of the best designers that leading internet companies can muster, and bring them together to develop a series of guidelines, best practices, iterations, and interfaces for making OpenID not just suck less, but become a great experience

Chris Messina

# 12:19 am / facebook, openid, chris-messina, usability

Pragmatism, purity and JSON content types

I started a conversation about this on Twitter the other day, but Twitter is a horrible place to have an archived discussion so I’m going to try again here.

[... 555 words]

Feb. 9, 2009

Google App Engine: A roadmap update! Receiving e-mail, background tasks and XMPP. I predict a bunch of sites will start building small parts of their overall functionality on App Engine when some of these features land (much easier than hosting your own custom XMPP server).

# 7 pm / appengine, python, google, xmpp, email, cloud-computing

Four reasons why public Facebook status updates won’t kill Twitter. Mike Butcher highlights the importance of “follow” rather than “friend” in social software.

# 7:04 pm / friend, follow, twitter, facebook, mike-butcher, socialsoftware

1901EasternTelegraph.jpg (via) A map of undersea telegraph cables as of 1901.

# 8:44 pm / maps, undersea, cables

When APIs go dark, how do you do a data backup? (Answer: you often can't.) With public, microformatted content, there will likely be a public archive that can be used to reconstitute at least portions of the service. With dynamic APIs and proprietary data formats, all bets are off.

Chris Messina

# 8:46 pm / chris-messina, data-portability, microformats, archiving, apis

Introduction to Information Retrieval (via) This looks excellent—a modern guide to implementing search engines written by some of the engineers behind Yahoo! Search. The full text is available online, but it looks like it’s well worth investing in the dead tree edition.

# 8:54 pm / search, yahoosearch, books, freebooks

YQL opens up 3rd-party web service table definitions to developers. This really is astonishingly clever: you can create an XML file telling Yahoo!’s YQL service how to map an arbitrary API to YQL tables, then make SQL-style queries against it (including joins against other APIs). Another neat trick: doing a SQL “in” query causes API requests to be run in parallel and recombined before being returned to you.

# 9:08 pm / yql, yahoo, apis, sql

Facing up to Fonts. Slides and notes from Richard Rutter’s excellent typography presentation at a recent SkillSwap Brighton. Includes some new thinking about the font stack (comma separated list of fonts provided to the font-family property) you should use to get the best possible implementation of a given font on various different platforms.

# 9:16 pm / fontstacks, richardrutter, skillswap, skillswapbrighton, fonts, typography, design

A Unix Utility You Should Know About: Pipe Viewer. Useful command line utility that adds a progress bar to any unix pipeline.

# 10:15 pm / unix, pipes, pipeviewer, commandline

Open in Browser Firefox Add-on (via) Solves the “application/json wants to download” problem, among others.

# 10:24 pm / firefox, plugins, json

Yahoo! Query Language thoughts. An engineer on Google’s App Engine provides an expert review of Yahoo!’s YQL. I found this more useful than the official documentation.

# 10:29 pm / google, yahoo, appengine, yql

Feb. 10, 2009

JsonML (JSON Markup Language). An almost non-lossy serialization format for sending XML as JSON (plain text in between elements is ignored). Uses the (element-name, attribute-dictionary, list-of-children) tuple format, which sadly means many common cases end up taking more bytes than the original XML. Still an improvement on serializations that behave differently when a list of children has only one item in it.

# 3:03 pm / json, jsonml, xml, serialization

It may be hard to imagine writing rock solid one-in-a-million-or-better tests that drive Internet Explorer to click ajax frontend buttons executing backend apache, php, memcache, mysql, java and solr. I am writing this blog post to tell you that not only is it possible, it’s just one part of my day job.

Timothy Fitz

# 3:06 pm / timothy-fitz, continuous-deployment, selenium, testing

Feb. 11, 2009

Plaxo sees 92% success rate with OpenID/OAuth hybrid method. Really wish I could have been at the OpenID UX Summit hosted by Facebook yesterday—sounds like an awful lot of important problems are being solved.

# 5:20 pm / facebook, openid, plaxo, comcast, google

Feb. 12, 2009

Django Settings Tip—Setting Relative Paths. This is the first thing I do in every single one of my Django projects—it makes projects relocatable to other machines with just a couple of lines of code. I wouldn’t be at all upset to see it added to the default Django settings.py file created by ./manage.py startproject

# 12:30 pm / django, python, gareth-rushgrove, settings

Whoosh. A brand new, pure-python full text indexing engine (think Lucene). Claims to offer performance in the same league as wrappers to C or Java libraries. If this works as well as it claims it will be an excellent tool for adding search to projects that wish to avoid a dependency on an external engine.

# 12:49 pm / search, python, whoosh, lucene, open-source, full-text-search

2009 » February

MTWTFSS
      1
2345678
9101112131415
16171819202122
232425262728