Simon Willison’s Weblog

Subscribe

March 2003

March 1, 2003

Vector search engines

Building a Vector Space Search Engine in Perl:

[... 304 words]

An interview with Cory

An interview with Cory Doctorow, via Leonard. Provides some great background insight in to the world described in Down and Out, along with Cory’s thoughts on such topics as the recording industry and the Disney corporation.

March 2, 2003

Dependencies suck

Sigh. I guess I’ll stick with the console version.

[... 37 words]

Creative commons query

Aaron Swartz has been talking to Google about indexing Creative Commons licensed works:

[... 120 words]

March 3, 2003

The importance of titles

Gordon Weakliem reminds us that the most important RSS element is <title>. I’m painfully reminded of this each and every time I add a new entry—I have well over 800 entries now, and I’ve promised myself that next time I perform a major upgrade on this blog I’m going to go through and manually add titles to every single one. The task should be made slightly easier by the camelCase permalinks, which I can convert in to “suggested” titles to help the task along. It’s still not going to be much fun though.

Sitepoint redesigns

I don’t know how I missed it, but SitePoint have redesigned in funky valid structural XHTML and CSS. I quite like the new look (not so keen on the new logo though) and the navigation is definitely a huge improvement—instead of the previous confusing arangement of several “sister” sites they now divide content in to “Articles” and “Forums”, which makes a lot more sense. They have some clever DHTML enhancements as well, such as a collapsible left hand navigation panel. The right hand navigation that scrolls up and down (jerkily) with the window is quite distracting unfortunately.

[... 155 words]

March 4, 2003

Mozilla for bloggers

Matthew Haughey (freshly redesigned) has published a Mozilla advocacy article explaining why Mozilla (and variants) are excellent tools for bloggers. Spot on.

BCSS

Here’s the reason I’ve been blogging at a relatively low frequency lately: BCSS—the Bath University Computer Science Society. The site is still under heavy development (there’s a surprise) but there’s enough information there now for it to be worth linking to. At the moment the site is valid HTML 4.01 Strict but it works as XHTML as well (try appending ?xhtml=1 to the URL of any page on the site) thanks to an ultra flexible page template class (outlined here) and a few simple tricks to convert XHTML in to HTML beforethe page is displayed.

[... 386 words]

HTTP status codes

Craig Saila has a minor rant about HTTP error codes. Did you know that a 410 should be served instead of a 404 when a resource has been deliberately, permanently removed? I didn’t.

Yahoo to one day go Google

Jeremy Zawodny drops a so-subtle-that-I-probably-misread-it hint that the Yahoo search page might some day follow AllTheWeb in Google’s footsteps.

March 6, 2003

Credibility and CSS

James Buckley links to a new report on How people evaluate a web site’s credibility. His comments:

[... 324 words]

Scott Andrew redesigns

Scott Andrew has been hit by redesign fever as well. His reworking is accompanied by a change in direction:

[... 193 words]

Python power

Sam Ruby’s ultra-simple 3-paned aggregator is a great example of the power of high level scripting languages. Using the wxPython cross-platform GUI toolkit and Mark Pilgrim’s ultra-liberal RSS Parser it provides a full application in a mere 107 lines of (highly readable and maintainable) code.

[... 127 words]

Jeff minter blogs

Wow. Jeff Minter has a blog.

March 8, 2003

Spell check in web applications

Sam Ruby has enabled spell checking for the preview comment tool on his blog. I wonder how it works... I’ve lost track of the scripting language Sam uses for Intertwingly (PHP? Python? Perl?) but I know PHP can be compiled with support for the Pspell module.

[... 204 words]

WThRemix entrants

The WThRemix contest has posted a list of submitted entries. The contest (to design a new homepage for the W3C) asked entrants to use valid tableless XHTML, CSS and meet WAI accessibility level 1. The entries demonstrate a wide variety of layout and design techniques and are well worth browsing through. The winners will be announced on March 17th.

Roogle

Scott Johnson has put together a blog search engine with a difference: it indexes RSS feeds rather than crawling the blogs themselves. Roogle is still under heavy development (and Scott is blogging it as he goes) but is shaping up to be a very neat tool. If your blog isn’t already being indexed, you can add it using this form.

March 9, 2003

Thirty five year old cookies

I’m finding myself slightly confused about the Google backlash washing around the blogosphere, which is summarised quite well by Gavin Sheridan. Most of the arguments against using Google unsurprisingly centre around privacy issues, in particular the “35 year cookie”. I was under the impression that cookies could only be set for a maximum of a year, but having checked Netscape’s Cookie Specification and RFC 2965 it appears I was mistaken.

[... 566 words]

A plea for pings

Blogs I would read a lot more often if only they pinged weblogs.com when they updated:

[... 124 words]

Replacing text with images

Douglas Bowman writes about Guiltless Image Use, describing a technique that uses CSS to cause text to vanish from the page, then replaces it with a background-image. I experimented with this technique (see comments attached to that entry) myself last year but ended up using image tags inside h1 elements instead. Doug’s drop cap example shows that the technique can be applied in interesting ways outside of headers.

March 10, 2003

Web standards for news sites

Adrian Holovaty’s open email to Staci D. Kramer of Online Journalism Review makes an excellent case for the adoption of web standards by online news sites. It’s written in nice, clear non technical terms and does a good job of explaining the web standards movement in a short space of time. Could definitely be useful for forwarding on to non-technical people (managers for example?) to help spread the word.

Surviving Slashdot

Scott Johnson’s Roogle RSS search engine got slashdotted yesterday, and survived the storm unharmed thanks to Scott’s quick thinking server admin Demitrious setting up mod_throttle to help handle the load. Demitrious describes the solution in this post.

March 12, 2003

Blosxom rocks

I’ve been hearing a few good things about Blosxom recently, so a few days ago I decided to see what all the fuss was about. It’s a blogging tool, but it’s a very different species from the average system. Firstly, it’s only 282 lines of Perl (of which only 135 are actual code). Secondly, rather than having a web interface of some sort to add entries it runs entirely from the file system. You specify a data directory, then create entries by dropping .txt files in to that directory using your favourite text editor. The first line of each file is the title, the rest of the file is the entry, and the entry’s date is taken from the last-modified time of the file.

[... 392 words]

More lightweight software

The other toy I’ve been playing with recently is SQLite. SQLite is an embeddable SQL database engine written in just under 25,000 lines of (heavily commented) C. Don’t let the size fool you—it’s phenomenally powerful and is released under a no-holds-barred public domain license that practically begs you to include it in your applications, commercial or not.

[... 236 words]

More nukes

[PHP|Post|myPHP]-Nuke has to be one of the most-forked open source projects in history! Xaraya appears to be a fork from Post-Nuke, which itself forked from PHP-Nuke several years ago (and I’m pretty sure there are more). They’ve got an interesting set of RFCs on how they intend to build the next big open source content / community management system (nothing about generating pretty URLs yet). While browsing their site I found a link to PHPXref, a powerful looking tool for generating PHP source code documentation. Unsurprisingly for such a lot of text munging, it’s written in Perl ;)

[... 119 words]

March 13, 2003

Python and micropayments

Fredrik Lundh has started posting his book The Standard Python Library online, in response to O’Reilly’s decision not to publish a second edition of the book. I’d never read it before, but having sampled the first two chapters I’m hooked. It works a bit like a “cookbook”, with a plethora of code samples explained in detail accompanied by tips and tricks relating to the language. The Lazy Import class, which loads a module only when an attribute of the module is called for the first time, is a classic example:

[... 209 words]

March 16, 2003

Wrox and glasshaus go under

It looks like there’s a shakedown going on in technical book publishing land. Glasshaus are no more, and (so far unsubstantiated) rumours are flying round that Wrox are going bust / have gone bust as well.

[... 373 words]

Clearing out my tabs

I’ve inadvertantly discovered a flaw in the tabbed browsing model—if you’re not disciplined about them you can quickly end up lost in a see of tabs. Right now I have 6 Phoenix windows open with a total of 57 tabs between them. This is the result of about a week’s accumulated browsing, leaving me unable to even think about shutting down or rebooting my computer without clearing them all out first. The fact that Mandrake is churning along happily without any noticable slowdown (despite me having several other applications running as well) doesn’t help at all as it gives me even less impetus to tidy everything up.

[... 466 words]

March 17, 2003

The onion gets it spot on

Okay, now I have absolutely no intention of taking this blog in a political direction (for the record I’m anti-war) but I’ve seen a couple of links to the Onion recently that I just can’t resist blogging. First up is Bush: “Our Long National Nightmare of Peace and Propserity is Finally Over” which was written two years ago but, read now, just looks spookily accurate (link via Back-to-Iraq). The second one is the absolute classic God Angrily Clarifies “Don’t Kill” Rule, linked by Simon Brunning.

Flash Functionality not quite so flash

What Do I Know points to Macromedia’s progress report explaining how thy have been responding to feedback on their recent site redesign. Todd Dominey makes the following insightful observation:

[... 200 words]

2003 » March

MTWTFSS
     12
3456789
10111213141516
17181920212223
24252627282930
31