Blogmarks
Filters: Sorted by date
Tokyo Tyrant Tutorial. Buried at the bottom of the Tokyo Tyrant protocol documentation, this is the best resource I’ve seen yet for getting up and running with the database server (including setting up replication).
Specify your canonical. You can now use a link rel=“canonical” to tell Google that a page has a canonical URL elsewhere. I’ve run in to this problem a bunch of times—in some sites it really does make sense to have the same content shown in two different places—and this seems like a neat solution that could apply to much more than just metadata for external search engines.
pytyrant. A pure-python client library for the Tokyo Tyrant binary protocol (used to access Tokyo Cabinet databases over a network). The library appears to be developed by Bob Ippolito and the team at Mochi Media.
Tokyo Cabinet: Beyond Key-Value Store. Useful overview of Yet Another Scalable Key Value Store. Interesting points: multiple backends (hash table, B-Tree, in memory, on disk), a “table” engine which enables more advanced queries, a network server that supports HTTP, memcached or its own binary protocol and the ability to extend the engine with Lua scripts.
Twitter Don’t Click Exploit. Someone ran a successful ClickJacking exploit against Twitter users, using a transparent iframe holding the Twitter homepage with a status message fed in by a query string parameter. Thiss will definitely help raise awareness of ClickJacking! Twitter has now added framebusting JavaScript to prevent the exploit.
EuroDjangoCon 2009. Tickets are now on sale for the conference, scheduled for 4th-6th of May (not March as I originally said) in Prague (followed by two days of development sprints).
Whoosh. A brand new, pure-python full text indexing engine (think Lucene). Claims to offer performance in the same league as wrappers to C or Java libraries. If this works as well as it claims it will be an excellent tool for adding search to projects that wish to avoid a dependency on an external engine.
Django Settings Tip—Setting Relative Paths. This is the first thing I do in every single one of my Django projects—it makes projects relocatable to other machines with just a couple of lines of code. I wouldn’t be at all upset to see it added to the default Django settings.py file created by ./manage.py startproject
Plaxo sees 92% success rate with OpenID/OAuth hybrid method. Really wish I could have been at the OpenID UX Summit hosted by Facebook yesterday—sounds like an awful lot of important problems are being solved.
JsonML (JSON Markup Language). An almost non-lossy serialization format for sending XML as JSON (plain text in between elements is ignored). Uses the (element-name, attribute-dictionary, list-of-children) tuple format, which sadly means many common cases end up taking more bytes than the original XML. Still an improvement on serializations that behave differently when a list of children has only one item in it.
Yahoo! Query Language thoughts. An engineer on Google’s App Engine provides an expert review of Yahoo!’s YQL. I found this more useful than the official documentation.
Open in Browser Firefox Add-on (via) Solves the “application/json wants to download” problem, among others.
A Unix Utility You Should Know About: Pipe Viewer. Useful command line utility that adds a progress bar to any unix pipeline.
Facing up to Fonts. Slides and notes from Richard Rutter’s excellent typography presentation at a recent SkillSwap Brighton. Includes some new thinking about the font stack (comma separated list of fonts provided to the font-family property) you should use to get the best possible implementation of a given font on various different platforms.
YQL opens up 3rd-party web service table definitions to developers. This really is astonishingly clever: you can create an XML file telling Yahoo!’s YQL service how to map an arbitrary API to YQL tables, then make SQL-style queries against it (including joins against other APIs). Another neat trick: doing a SQL “in” query causes API requests to be run in parallel and recombined before being returned to you.
Introduction to Information Retrieval (via) This looks excellent—a modern guide to implementing search engines written by some of the engineers behind Yahoo! Search. The full text is available online, but it looks like it’s well worth investing in the dead tree edition.
1901EasternTelegraph.jpg (via) A map of undersea telegraph cables as of 1901.
Four reasons why public Facebook status updates won’t kill Twitter. Mike Butcher highlights the importance of “follow” rather than “friend” in social software.
Google App Engine: A roadmap update! Receiving e-mail, background tasks and XMPP. I predict a bunch of sites will start building small parts of their overall functionality on App Engine when some of these features land (much easier than hosting your own custom XMPP server).
Load Balancing in Amazon EC2 with HAProxy. Solid tutorial introduction to HAProxy.
jessenoller.com—python magazine. Jesse Noller has been sharing his articles originally published in Python Magazine. Topics include SSH programming with Paramiko, context managers and the with statement and an excellent explanation of Python’s threading support and the global interpreter lock.
Announcing the Article Search API. The most interesting API from the NYTimes yet—search against 2.8 million articles from 1981 until today using 35 searchable fields and get back detailed metadata as well as the first paragraph of the articles themselves.
Apache Qpid. A new open source AMQP message queue with implementations in C++ and Java, developed by engineers from Red Hat, IONA and JP Morgan Chase. Anyone tried this yet? Looks pretty good on paper.
FireScope. Neat little Firefox / Firebug extension which adds a “Reference” tab showing documentation for the selected element from the comprehensive SitePoint Reference site.
Recreating the button. Fascinating article from Doug Bowman on the work that went in to creating custom CSS buttons for use across Google’s different applications, avoiding images to improve performance ensure they could be easily styled using just CSS. I’d love to see the Google Code team turn this in to a full open source release—the more sites using these buttons the more familiar they will become to users at large.
OSM needs new servers. OpenStreetMap need to raise £10,000 to buy a new API database server.
solango. Another attempt at a Django/Solr integration library, based on code written for “a top 20 newspaper site” (I’d love to know which one). This is well documented, uses a registration model clearly inspired by the Django admin which keeps search related metadata out of your regular models and includes management commands for re-indexing and generating Solr schema.xml files.
juno. An ultra-lightweight Python web framework inspired by Ruby’s Sinatra.
Post-Commit Web Hooks for Google Code Project Hosting (via) I really, really like web hooks (which I’ve been calling “callback APIs”, but it looks like “web hooks” is the term that’s sticking). I’m interested in their scaling challenges—I’ve heard XMPP advocates argue that a web hook style model simply won’t scale for really large sites.
microprinter wiki. “Hackers across the country are buying up old old receipt printers and imaginatively repurposing them into something new. We call them microprinters.” The CBM 231 is the most common model, and is currently selling on eBay for $20 or so. Something a bit smaller would be fun for travelling with.