Simon Willison’s Weblog

Entries in Jul, 2002

Filters: Type: entry × Year: 2002 × Month: Jul ×

Students and the web

Point. Click. Think? (via from the orient). This is an interesting, well written piece on the effect that the web is having on student learning and research habits. Looking back at my first year at Uni I am probably a text book example of the symptoms the article describes—I hardly ventured in to the library at all, getting most of my additional course information from the web. Then again a Computer Science degree is probably better suited to web research than many other courses.

[... 92 words]

My shortest entry ever


[... 7 words]

The CSS bug ring

Killer CSS link: Position Is Everything, part of the CSS bug ring. Big John on CSS-Discuss is probably the single most helpful individual I have ever encountered on a mailing list—he explains the most complicated (and simple) of solutions quickly, comprehensibly and in easy to understand terms. Position Is Everything is his collection of common but complex CSS browser bugs, complete with full explanations and any effective workarounds. The site links in to the CSS bug ring, a small ring of five sites providing more of the same. Truly an indespensable resource for anyone with an interest in CSS.

[... 112 words]

Reasons not to use Access

I’ve never used MS Access for anything web related, and I certainly don’t intend to at any point in the future. however, I do see a lot of questions regarding Access on various forums and mailing lists. Nine Reasons NOT To Use MS Access To Power A DB-Driven Website (published 29th June 2002) is an excellent article explaining exactly why using Access on the web is a really bad idea.

[... 72 words]


Stuart has updated aqTree, his excellent unordered-list-to-dynamic-tree script. The script is a clever piece of Javascript that uses the DOM to turn a nested unordered list in to a Windows Explorer style tree, without needing to add any extra HTML code. The new version is rather nice and less crufty now.

[... 58 words]

Tabs are not MDI

Dave Hyatt explains why Mozilla’s tabbed browsing is different to (and better than) Opera’s MDI model:

[... 162 words]

Multi-lingual PHP

A thread on SitePoint got me thinking about how PHP’s little known parse_ini_file() function could be used to easily manage multiple language versions of web site messages. Sections could be set up for each supported language, with message definitions repeated in each section. You could even have a default message section at the top which is used when a message has not been defined for a particular language. The comments on the parse_ini_file manual page suggest that the function is not particularly suitable for large scale use—PHP exits if the ini file is malformed and it can’t handle files larger than 16,382 bytes. That said, rolling a more reliable native PHP version should be a trivial project.

[... 127 words]

Funky stuff coming soon

Peter Van Dijck has unveiled the Secret Metadata Project (here and here), so I hguess it’s time for me to come clean as well :) I’m working with him on a proof-of-concept PHP application for XFML, his markup language for exchanging faceted metadata. So far I’m really enjoying the challenge—I get to play with XML and collaborate on an exciting new project with an extremely talented information architect. The application is shaping up at an impressive rate, and we hope to have something live and doing things within a couple of weeks.

[... 115 words]

XHTML 1.1 Woes

Tim Luoma on thelist poined out this table, which details the media types that can be used when serving XHTML documents. The table shows that XHTML 1.1 should not be served with a text/html Content-Type header. Unfortunately using any of the allowed headers (application/xhtml+xml, application/xml or text/xml) will cause Netscape 4 to pop up a “download file” dialog, and is likely to cause problems in other older browsers as well. Looks like I’ll be sticking with XHTML 1.0 Strict for a good while to come. I don’t really understand the hurry to move to XHTML 1.1 exhibited by some developers—to my mind, the single biggest advantage of XHTML is the fact that it allows documents to be parsed by any XML parsing tool, and this benefit is available in XHTML 1.0.

[... 149 words]

Back to normal at diveintomark

Mark Pilgrim has made his first update since finishing his accessibility series a week ago. He has launched a new site design (as previewed on css-discuss) in an attractive shade of blue, and posted a catch up of the many events that took place over the weeks that his blog was devoted to accessibility. My favourite quote:

[... 89 words]

XHTML ODP attribution

The ODP require you to display an attribution on any page that reuses ODP data. The recommended attribution fails to validate as XHTML, so I created an XHTML compliant alternative which looks visually identical (at least in standards compliant browsers) but uses <div>s with CSS styles. ODP editor in chief rdkeating25 has informally approved my alternative version on the ODP editor forums so I’m ready to go—at least as soon as I finish writing a script to parse data from their pages ;)

[... 109 words]

Another free Python book

How to Think Like a Computer Scientist: Learning with Python is a new Python text book covered by the GNU Free Documentation License and available on the web. The thread discussing it on Slashdot gives mixed reviews, with other recommended free alternatives including Mark Pilgrim’s Dive Into Python and Thinking in Python by Bruce Eckel.

[... 66 words]


Yet another interesting take on XML metadata representations: FacetMaps. A facet map (as I understand it) is a way of combining facets with hierarchies, best explained by the excellent interactive three minute concept intro on the site. One of the main contrasts to XFML is that in a Facet Map Facets, rather than Topics, are the principle categorisation element. A resource in a Facet Map is linked directly to one or more facets, rather than going through a topic. The XML format is pretty simple (a lot simpler than XTM and XFML) so I might have a go at a PHP implementation at some point.

[... 124 words]

The mind of God

Red Herring: Dinner with the mind behind the mind of God, an informal interview with Sergey Brin, cofounder of Google. The “mind of God” reference stems from this quote:

[... 74 words]

Facets understood

And suddenly I understand faceted metadata. Sometimes all you need for that final moment of insight is a good example, and Peter Van Djick’s Columbia Guide Site Map is just what I needed. A facet is simply a “flat”, mutually exclusive (at least as far as the XFML specification is concerned) way of categorising a topic—it can be described as a bottom-up method of categorisation rather than the more common hierarchical top-down approach (as seen on the ODP) which seeks to assign all topics as sub-topics of something else. Peter writes in XFML Background and Concepts that Faceted taxonomies are generally more powerful for websites than classic hierarchical taxonomies—this seems to make a great deal of sense, and it will be interesting to see this demonstrated by XFML in the near future.

[... 163 words]

Syndicating the ODP

Having looked at some of these tools for syndicating content from the ODP, it seems that the standard method is to grab and parse the actual HTML files from the site rather than grabbing the huge RDF files. This would be a lot easier if the pages of the site were valid XHTML, but unfortunately they don’t even have a DOCTYPE. Luckily I wrote a page-link parser the other day for something else which seems to do a pretty good job on the ODP, so I should be able to put together a decent script without too much trouble.

[... 122 words]

DMOZ for Bath

I’ve had my application for editorship of the DMOZ University of Bath Category accepted. Bath’s main site has notoriously bad navigation, so hopefully I’ll be able to use DMOZ to build an alternative. I’m also looking in to eventually syndicating the DMOZ category via RDF and replicating it elsewhere. Unfortunately it looks like you have to grab the whole 130MB RDF file to do this, but I’ve seen tools that syndicate smaller portions of DMOZ so it must be possible to extract only the information you are interested in.

[... 103 words]

Stanford guidelines

Stanford Guidelines for Web Credibility:

[... 115 words]

Browser specifications

Browser Specifications:

[... 27 words]

Google and the semantic web

I’ve long been wondering what kind of research Google are doing with respect to the Semantic Web. August 2009: How Google beat Amazon and Ebay to the Semantic Web (via From the Orient) is a superbly written essay which explains the concepts behind the Semantic Web through a “history” of how Google has become the world’s largest marketplace by the year 2009. Stuart has written his own essay discussing some of the issues raised by the piece, such as security and classification problems.

[... 88 words]

Here comes another meme

Via Kottke: True Porn Clerk Stories, a hilarious, touching and insightful journal of the trials and tribulations of a female Porn video store clerk in Chicago. This is some of the best and most original writing I’ve seen on the web in a long time. It is also quickly turning in to something of a meme. When you’ve finished reading the journal, be sure to have a look around the rest of the site it is hosted on—the community there is undergoing something of a trial by fire, with over 180,000 reads of the Porn Clerk journal resulting in massively increased traffic and a flood of new members that threatens the stability of the existing community. They have even had to open a PayPal scheme to help cover the dramatic increases in bandwidth ($15/month is now up to $25/day).

[... 150 words]

Warp factor PHP

I’ve been working on a PHP application that can take an XTM formatted Topic Map and convert it in to relational data in MySQL, run queries on it and convert it back to an XTM later. My work on the initial parser has involved some pretty heavy duty processing, and the speed with which PHP and MySQL are handling the data I’m throwing at them is phenomenal. The classic Italian Opera Topic Map example weighs in over a megabyte of XML, but PHP is munching it up and spitting out (and executing) over 13,000 SQL queries in less than seven seconds.

[... 124 words]

Admirably prompt response from KPMG

A member of KPMG’s web team responded to my query about Mozilla support (sent via their online contact form using IE because the site was unusable in Mozilla) and informed me that a new site is on the way. You never know, they might even embrace web standards ;)

[... 54 words]

Blog Hot or Not

Blog Hot or Not. I’m surprised no one had thought of this before—it’s clever idea, well implemented. When adding my own blog I was asked to come up with some keywords to describe it, so here they are for posterity and my own future reference:

[... 67 words]

How browsers load images

From the stuff-I-never-knew department: Bill Posters explains how browsers load images simultaneously on the SitePoint forums, in response to a question about getting images to load in a specific order.

[... 34 words]

Another rubbish site

Why is it that badly designed high profile sites that completely fall apart in Mozilla never have a “contact our web team” or “send us feedback” link anywhere? You would expect better from a company that provides web technology and customer management consultancy services (well if you’re a cynic like me you wouldn’t, but that’s beside the point).

[... 128 words]


I’ve been trying to get my head around Topic Maps, a powerful but complex standard for building intricate networks of metadata. I couldn’t even begin to describe them myself but the following resources have proved very useful:

[... 147 words]

Windows SSL support in Python

Adding SSL support to Python on Windows is as easy as dropping a couple of DLLs and a .pyd file in to your Python DLLs directory. Grab the zip file from this page and off you go. I haven’t tried it out yet but it appears to work—the socket.ssl function miraculously appeared when I installed the new files. Why is this useful? Because it opens the way for secure XML-RPC calls from Python applications...

[... 95 words]

Instant PHP Web Services

XML-RPC Class Server is a really clever piece of code. It consists of a single file which you can drop in a directory full of PHP .class.php files to instantly provide an XML-RPC interface to every class in the directory. Private methods that begin with an underscore are not included in the web service. Unfortunately the system requires PHP’s XML-RPC extensions to be enabled.

[... 76 words]

PHP object overloading

I’m not sure how this one snuck under the radar, but PHP now supports object overloading (as of version 4.2.0). It can be implemented by creating class methods __set(), __get() and __call() and then applying the new overload() function to the class name. The documentation claims that __call() is not yet supported but is apparently out of date. Standard warnings about the experimental and unfrozen nature of the extension apply.

[... 75 words]