Simon Willison’s Weblog

Subscribe
Atom feed

Blogmarks

Filters: Sorted by date

Tokyo Tyrant Tutorial. Buried at the bottom of the Tokyo Tyrant protocol documentation, this is the best resource I’ve seen yet for getting up and running with the database server (including setting up replication).

# 14th February 2009, 11:29 am / databases, keyvaluepairs, replication, tokyocabinet, tokyotyrant

Specify your canonical. You can now use a link rel=“canonical” to tell Google that a page has a canonical URL elsewhere. I’ve run in to this problem a bunch of times—in some sites it really does make sense to have the same content shown in two different places—and this seems like a neat solution that could apply to much more than just metadata for external search engines.

# 14th February 2009, 11:28 am / canonical, google, metadata, relcanonical, search-engines, seo, urls

pytyrant. A pure-python client library for the Tokyo Tyrant binary protocol (used to access Tokyo Cabinet databases over a network). The library appears to be developed by Bob Ippolito and the team at Mochi Media.

# 14th February 2009, 11:19 am / bob-ippolito, mochimedia, python, pytyrant, tokyocabinet, tokyotyrant

Tokyo Cabinet: Beyond Key-Value Store. Useful overview of Yet Another Scalable Key Value Store. Interesting points: multiple backends (hash table, B-Tree, in memory, on disk), a “table” engine which enables more advanced queries, a network server that supports HTTP, memcached or its own binary protocol and the ability to extend the engine with Lua scripts.

# 14th February 2009, 11:17 am / databases, hash, http, keyvaluepairs, lua, memcached, tokyocabinet

Twitter Don’t Click Exploit. Someone ran a successful ClickJacking exploit against Twitter users, using a transparent iframe holding the Twitter homepage with a status message fed in by a query string parameter. Thiss will definitely help raise awareness of ClickJacking! Twitter has now added framebusting JavaScript to prevent the exploit.

# 12th February 2009, 7:56 pm / chris-shiflett, clickjacking, framebusting, javascript, security, twitter

EuroDjangoCon 2009. Tickets are now on sale for the conference, scheduled for 4th-6th of May (not March as I originally said) in Prague (followed by two days of development sprints).

# 12th February 2009, 4:59 pm / django, djangocon, eurodjangocn, prague, python

Whoosh. A brand new, pure-python full text indexing engine (think Lucene). Claims to offer performance in the same league as wrappers to C or Java libraries. If this works as well as it claims it will be an excellent tool for adding search to projects that wish to avoid a dependency on an external engine.

# 12th February 2009, 12:49 pm / full-text-search, lucene, open-source, python, search, whoosh

Django Settings Tip—Setting Relative Paths. This is the first thing I do in every single one of my Django projects—it makes projects relocatable to other machines with just a couple of lines of code. I wouldn’t be at all upset to see it added to the default Django settings.py file created by ./manage.py startproject

# 12th February 2009, 12:30 pm / django, gareth-rushgrove, python, settings

Plaxo sees 92% success rate with OpenID/OAuth hybrid method. Really wish I could have been at the OpenID UX Summit hosted by Facebook yesterday—sounds like an awful lot of important problems are being solved.

# 11th February 2009, 5:20 pm / comcast, facebook, google, openid, plaxo

JsonML (JSON Markup Language). An almost non-lossy serialization format for sending XML as JSON (plain text in between elements is ignored). Uses the (element-name, attribute-dictionary, list-of-children) tuple format, which sadly means many common cases end up taking more bytes than the original XML. Still an improvement on serializations that behave differently when a list of children has only one item in it.

# 10th February 2009, 3:03 pm / json, jsonml, serialization, xml

Yahoo! Query Language thoughts. An engineer on Google’s App Engine provides an expert review of Yahoo!’s YQL. I found this more useful than the official documentation.

# 9th February 2009, 10:29 pm / google, google-app-engine, yahoo, yql

Open in Browser Firefox Add-on (via) Solves the “application/json wants to download” problem, among others.

# 9th February 2009, 10:24 pm / firefox, json, plugins

A Unix Utility You Should Know About: Pipe Viewer. Useful command line utility that adds a progress bar to any unix pipeline.

# 9th February 2009, 10:15 pm / cli, pipes, pipeviewer, unix

Facing up to Fonts. Slides and notes from Richard Rutter’s excellent typography presentation at a recent SkillSwap Brighton. Includes some new thinking about the font stack (comma separated list of fonts provided to the font-family property) you should use to get the best possible implementation of a given font on various different platforms.

# 9th February 2009, 9:16 pm / design, fonts, fontstacks, richard-rutter, skillswap, skillswapbrighton, typography

YQL opens up 3rd-party web service table definitions to developers. This really is astonishingly clever: you can create an XML file telling Yahoo!’s YQL service how to map an arbitrary API to YQL tables, then make SQL-style queries against it (including joins against other APIs). Another neat trick: doing a SQL “in” query causes API requests to be run in parallel and recombined before being returned to you.

# 9th February 2009, 9:08 pm / apis, sql, yahoo, yql

Introduction to Information Retrieval (via) This looks excellent—a modern guide to implementing search engines written by some of the engineers behind Yahoo! Search. The full text is available online, but it looks like it’s well worth investing in the dead tree edition.

# 9th February 2009, 8:54 pm / books, freebooks, search, yahoo-search

1901EasternTelegraph.jpg (via) A map of undersea telegraph cables as of 1901.

# 9th February 2009, 8:44 pm / cables, maps, undersea

Four reasons why public Facebook status updates won’t kill Twitter. Mike Butcher highlights the importance of “follow” rather than “friend” in social software.

# 9th February 2009, 7:04 pm / facebook, follow, friend, mike-butcher, social-software, twitter

Google App Engine: A roadmap update! Receiving e-mail, background tasks and XMPP. I predict a bunch of sites will start building small parts of their overall functionality on App Engine when some of these features land (much easier than hosting your own custom XMPP server).

# 9th February 2009, 7 pm / cloud-computing, email, google, google-app-engine, python, xmpp

jessenoller.com—python magazine. Jesse Noller has been sharing his articles originally published in Python Magazine. Topics include SSH programming with Paramiko, context managers and the with statement and an excellent explanation of Python’s threading support and the global interpreter lock.

# 5th February 2009, 11:10 pm / contextmanagers, gil, jesse-noller, paramiko, python, pythonmagazine, ssh, threads

Announcing the Article Search API. The most interesting API from the NYTimes yet—search against 2.8 million articles from 1981 until today using 35 searchable fields and get back detailed metadata as well as the first paragraph of the articles themselves.

# 5th February 2009, 11:06 pm / apis, newspapers, new-york-times, search

Apache Qpid. A new open source AMQP message queue with implementations in C++ and Java, developed by engineers from Red Hat, IONA and JP Morgan Chase. Anyone tried this yet? Looks pretty good on paper.

# 5th February 2009, 11:01 pm / amqp, apache, c-plus-plus, iona, java, jpmorganchase, message-queues, open-source, qpid, red-hat

FireScope. Neat little Firefox / Firebug extension which adds a “Reference” tab showing documentation for the selected element from the comprehensive SitePoint Reference site.

# 5th February 2009, 10:51 pm / css, documentation, extensions, firebug, firefox, firescope, html, reference, sitepoint

Recreating the button. Fascinating article from Doug Bowman on the work that went in to creating custom CSS buttons for use across Google’s different applications, avoiding images to improve performance ensure they could be easily styled using just CSS. I’d love to see the Google Code team turn this in to a full open source release—the more sites using these buttons the more familiar they will become to users at large.

# 5th February 2009, 9:50 pm / buttons, css, google

OSM needs new servers. OpenStreetMap need to raise £10,000 to buy a new API database server.

# 5th February 2009, 7:27 pm / donations, fundraising, openstreetmap

solango. Another attempt at a Django/Solr integration library, based on code written for “a top 20 newspaper site” (I’d love to know which one). This is well documented, uses a registration model clearly inspired by the Django admin which keeps search related metadata out of your regular models and includes management commands for re-indexing and generating Solr schema.xml files.

# 4th February 2009, 12:22 pm / django, lucene, python, search, solr

juno. An ultra-lightweight Python web framework inspired by Ruby’s Sinatra.

# 4th February 2009, 10:48 am / juno, python, ruby, sinatra, webframeworks

Post-Commit Web Hooks for Google Code Project Hosting (via) I really, really like web hooks (which I’ve been calling “callback APIs”, but it looks like “web hooks” is the term that’s sticking). I’m interested in their scaling challenges—I’ve heard XMPP advocates argue that a web hook style model simply won’t scale for really large sites.

# 4th February 2009, 10:22 am / google, googlecodehosting, webhooks

microprinter wiki. “Hackers across the country are buying up old old receipt printers and imaginatively repurposing them into something new. We call them microprinters.” The CBM 231 is the most common model, and is currently selling on eBay for $20 or so. Something a bit smaller would be fun for travelling with.

# 2nd February 2009, 7:20 am / cbm231, microprinters, papercamp

Years

Tags