Simon Willison’s Weblog

Subscribe

Items in Aug, 2002

Filters: Year: 2002 × Month: Aug × Sorted by date


A plan for spam

Paul Graham: A Plan for Spam. Paul suggests using content based filters that learn from users specifically marking messages as spam or legitimate mail. The system then picks emails apart looking for commmon terms (in both the body and the header of the message) that can then be used later on to identify spam messages. He claims his test have let through only 5 per 1000 spams, with 0 false positives. Impressive stuff, and great reading for the excellent explanations of some advanced alogithmic and statistical techniques.

[... 89 words]

Fiendish markup quiz

Hixie has posed a fiendish markup quiz—spot the four markup errors in a document that validates. It’s harder than it sounds. I’ve mailed off my answers, but I’m not expecting to get full marks.

[... 43 words]

Comments improvement

I’ve improved the comment system at the bequest of Adrian Holovaty. URLs posted in a comment (both those beginning with http:// and those beginning just with www.) will now be converted in to links.

[... 34 words]

Magic quotes solution

Pink Goblin (otherwise known as HarryF) explains why magic quotes are evil. This is an issue that every PHP developer should be aware of, as it can cause all kinds of problems in your scripts if you ignore it. He suggests using a custom myAddSlashes() function which only calls addslashes() if magic quotes are turned off. I have an alternative solution—chose your preferred setting (quotes on or off) and apply it at run time to all incoming data in one go. My code for doing this is available here. By a bizzare coincidence I wrote the script this morning, then spotted a link to the Pink Goblin article on tidak ada literally five minutes after finishing it.

[... 130 words]

New memes make Baby Jesus cry

Things that make Baby Jesus cry (stolen from Mark Pilgrim). Google as social commentary?

[... 21 words]

Python RSS locator

Mark Pilgrim has written an ultra-liberal RSS locator (in Python, naturally). I guess he had to scratch an itch. The amount of work it puts in to locating an RSS feed for a site is astonishing, especially when you consider how short the actual code is.

[... 50 words]

Zeldman played by a stand up comic

Eric Meyer has confessed.

[... 7 words]

css-discuss rocks

css-discuss has seen some interesting threads in the past 24 hours and the new archive means I can link straight to them—so here goes. Kentaro Kaji kicked off the topic of techniques for aligning an image with the bottom of a block of text. In the same thread, Benn Nunn advocated avoiding width and height attributes on images and keeping that information in an external style sheet. Other topics included accessible navigation and a tricky absolute positioning problem with Opera. The most informative mailing list I’m currently subscribed to just keeps getting better.

[... 122 words]

Today’s required reading

10 Tips on Writing the Living Web is full of invaluable advice for anyone who wants their weblog to be of interest to other people.

[... 26 words]

PHP numbered code listings

Michael V has written a couple of functions to apply my CSS numbered code listing technique to PHP’s built in syntax highlighting.

[... 25 words]

Patented IMBots

I wonder if these muppets have heard of eggdrop (created 1993). Something tells me prior art for this one won’t be too hard to find.

[... 34 words]

More mailing list etiquette

Madhu Menon: Avoiding personal conflict on mailing lists.

[... 9 words]

Hacking Las Vegas

Hacking Las Vegas (via Kryogenix)—the story of how a bunch of wizz kids from MIT devised the perfect card counting technique and took the casinos for the ride of their lives. Edge-of-your-seat stuff.

[... 44 words]

CSS Trickery

New CSS Experiment: Trickery with Floats and Negative Margins, inspired by this message on css-discuss. By applying both position: relative and a negative margin to a floated element it is possible to pull it out of the flow of text in to the margin of the document. I have used a variant of this technique in the third revision of my SitePoint in CSS demonstration.

[... 76 words]

Fun with FOLDOC

The Free On-Line Dictionary of Computing does exactly what it says on the tin. It is available under the GNU Free Documentation License so I grabbed a copy of the archive (which expands to a 4MB text file) and had a go at dumping it in to a mySQL database. I haven’t done anything with it yet (apart from putting together a rudimentary interface) but I have a few ideas for interesting ways of reusing the data.

[... 84 words]

Thanks for the link

Stuart has pointed out that this is the second time Jeffrey Zeldman (who is actually Eric Meyer) has spelt my name wrong :)

[... 27 words]

PHP and ID3 tags

MP3 Piranha is a clever application which indexes your MP3 collection and uses the Amazon Web Service API to look up the album cover, related albums and provide a link to buy the album from Amazon. Out of curiosity, I ran a search for a PHP library to decode ID3 tags to see if such a thing could be built with PHP, and came up with this script by Leknor. The class is well written and I learnt a lot about ID3 tags looking through it—it seems they take up the last 128 bytes of an MP3 file and can be decoded using PHP’s unpack() function.

[... 111 words]

Q tag bad

Mark Pilgrim explains why the <q> tag is bad for accessibility.

[... 16 words]

Alchemist contest

AlltheWeb.com introduced an innovative feature called Alchemist a while ago which allows visitors to customise the site by specifying the URL to their own style sheet. They have now announced a CSS design contest for the service, with top prizes of $750 in Amazon vouchers available for three categories (“Simple, Yet Beautiful”, “To CSS Infinity and Beyond” and “So 22nd Century”). This is a great oportunity for advocates of CSS to show just how powerful it really is.

[... 86 words]

Bulletin board spam

My friend Tim recently received a spam from a company called TrafficBBS, who specialise in bulk submissions to 50,000 search engines and 120,000+ BBS (web based bulletin boards). A quick look at their list of targetted forums reveals that they are spidering and spamming a whole bunch of simple web based forum scripts that don’t require user authentication, such as WWWBoard. This is a form of spam I wasn’t aware of until now. It’s scary to think how easily the system could be expanded to automatically register on more advanced widespread forum systems such as vBulletin.

[... 140 words]

Controlled vocabularies

Christina Wodtke: Mind your phraseology!, a tutorial on controlled vocabularies. The concept is very similar to that used by TopicMaps—relationships are defined between terms that take in to account hierarchies, associated terms and even alternative spellings. I’m planning an overhaul of the category / metadata system used on this blog in the near future and Christina’s tutorial has given me a whole load of new ideas.

[... 69 words]

SitePoint CSS experiment

SitePoint are trialling a new design for their front page. For fun, I had a go at recreating the new design using structural XHTML and CSS. The result isn’t my normal style (I normally avoid fixed pixel font sizes and go for liquid rather than fixed width layouts) but replicates the existing design nicely and looks good in IE 5/6, Mozilla and Opera 6 on Windows. Netscape 4 doesn’t get the stylesheet and I have yet to try it out on a mac.

[... 92 words]

Tidakada

Spotted in my referrals: tidak ada, a beautifully designed blog covering web development and other related topics. This is another great example of what you can achieve with some creative CSS.

[... 32 words]

Optimising Javascript

A thread on SitePoint lead me to these two excellent tutorials: Javascript Optimisation and Tackling JavaScript strict warnings.

[... 27 words]

Zeldman interview

Jeffrey Zeldman: “99 percent of Web sites are obsolete”. An excellent interview covering web standards and the new techniques they encourage.

[... 22 words]

Archivist goes live

After a successful private beta, the new searchable css-discuss archive is ready for use by the general public. If you spot any bugs or have any suggestions for improving the archive please drop me a line.

[... 88 words]

dChat released

Glen Murphy has released the source code to his innovative dChat PHP/DHTML chat system. I’ve been playing around with it this morning and it’s a very nice piece of software. dChat uses an interesting take on the remote scripting concept, using the DOM to append <script> elements to the head of the document in order to grab additional information from the server without refreshing the page. Unfortunately this technique does not work in IE on the Mac, but it works fine on Mozilla and IE/Windows.

[... 104 words]

One for Paul

This one’s for Paul from Uni: Tales of the Plush Cthulhu

[... 107 words]

Benefits of XHTML

Phil Ringnalda is questioning the point of XHTML. The single, huge advantage it has over HTML is that XHTML can be parsed by anything (or any language) with an XML parser. As an example, a few weeks ago I was asked to write a script to grab links from a bunch of HTML pages and insert them in to a database. I solved the problem with a combination of PHP’s strip_tags() function and XML parsing abilities, by killing off every tag that wasn’t an <a> tag and slapping on a start and end element to turn the document in to valid XML—a step that would not have been necessary had the page used XHTML in the first place.

[... 190 words]

Offline until Sunday

I’ll probably be offline until Sunday. Have a nice weekend :)

[... 11 words]

Types

Years

Months

Tags