Simon Willison’s Weblog

Subscribe

Entries in 2003

Filters: Type: entry × Year: 2003 × Sorted by date


Downloading your hotmail inbox

Adrian just pointed me to a fantastic tool: Gotmail, a utility to download mail from Hotmail accounts. It’s a command line utitlity, written in Perl and making use of the curl binary, which can connect to Hotmail over the web and grab any new emails, saving them locally as an mbox file and deleting them from the Hotmail server.

[... 289 words]

HTML entities for email addresses: don’t bother

I’ve suspected this for a long time, and now here’s the empirical evidence: Popular Spam Protection Technique Doesn’t Work. If you’re relying on HTML entities to protect your email address from spam harvesters—for example username@example.com—your email address may as well be in plain text. Chip Rosenthal downloaded a tool called “Web Data Extractor v4.0” and tried it on some test data to prove once and for all that the technique doesn’t work.

[... 220 words]

Selectutorial

New from Russ Weakley: Selectutorial, which taks his widely acclaimed step by step CSS tutorial style and applies it to CSS selectors. Having a full understanding of selectors is critical if you’re going to take full advantage of CSS, so if you don’t get them yet you should really check this out.

[... 62 words]

Repartitioning with Knoppix

I’ve been long bemoaning the fact that if you want to repartition your hard drive to install Linux as a dual boot with an existing Windows system the most frequently recommended method is to buy a copy of Partion Magic. You would have thought the open source software world would have provided a free alternative by now.

[... 217 words]

Un-happened

Charles Miller, in Google, Microsoft and Tall Poppies.:

[... 116 words]

IXR 2.0

Harry Fuecks has been hacking on my XML-RPC library, and has released a new version with some significant changes. His article on phpPatterns describes the changes and provides a link to download the updated code. He’s made a bunch of interesting architectural changes which take advantage of a number of useful PEAR classes, including HTTP_Request which provides support for proxies and authentication, two frequently requested features.

[... 127 words]

Why run Windows on an ATM?

So you’re writing the software for an ATM. It needs to display something pretty on the screen, control the hardware that serves out the money and talk securely to your central servers. It also needs to be stable, secure, reliable and allow remote administration. Why on earth would you choose Windows as the operating system?

[... 213 words]

Pyrex

Pyrex is a language for writing Python extension modules. It’s pretty interesting—the syntax looks very similar to Python (the authors claim you can write C extension modules without knowing anything about the Python/C API) but uses additional type hints to compile down to ultra efficient C code, ready to be imported in to your Python applications. The prime numbers example maakes things a lot more clear:

[... 236 words]

Discovering Berkeley DB

I’m working on a project at the moment which involves exporting a whole bunch of data out of an existing system. The system is written in Perl and uses Berkeley DB files for most of its storage.

[... 339 words]

Feed you

Wow, that’s what I call feedback! It’s a shame pretty much everyone hates the new design but I like it so it stays. I’ve taken a few tips though and tweaked the link colours a bit, as well as making a few other small changes such as a darker green for the header and a 1em margin around the page.

[... 129 words]

PostgreSQL 7.4

Last week’s release of PostgreSQL 7.4 made a great open source project even better—it even managed to impress hard-core MySQL advocate Jeremy Zawodny. The detailed release notes show that most of the improvements were with regards to performance, but the thing that really caught my eye was tsearch2, the new full text indexing suite. A bit of digging brought up the CVS tree for the new module, which in turn lead me to this tutorial style overview of its capabilities.

[... 132 words]

Collaborative Redesign

Out with the orange, in with the green. As with my last redesign, only the CSS changed. A fun deviation with this one was that it was a collaboration between myself and Natalie over nearly 5,000 miles, using edit styles and AIM to pass each other snippets of CSS and instantly try them out.

[... 123 words]

Blogmarks

This entry was going to be another list of links, together with a note about how much I really needed to set up a separate link blog. Then I realised that it would make more sense just to set one up so that’s exactly what I’ve done. I still need to implement the archive but it’s getting dark so I’m posting this and heading home.

[... 211 words]

The underscore hack

Via Web-Graphics, Petr Pisar’s Underscore Hack provides a new way of targetting CSS rules specifically at Internet Explorer on Windows. As with all such hacks, the pros and cons of using this approach need to be closely examined before deploying it. The hack takes advantage of the fact that adding an underscore to the start of a property name causes that declaration to be ignored by every browser except IE for Windows. However, the hack takes the dangerous step of using one bug to solve another. Peter-Paul Koch explained why this is a risky thing to do in a recent column for Digital Web magazine:

[... 431 words]

Status Notification

Status Notification is a web application pattern from Ian Bicking that uses sessions to solve the problem of how to display simple status messages without displaying whole pages with just a one line message on them or passing a message in a URL. My current project could certainly benefit from this.

[... 56 words]

cgi_buffer

cgi_buffer is voodoo magic for Perl, Python and PHP scripts that automatically handles a bunch of bandwidth saving HTTP tricks such as Content-Length headers (which enable persistent connections), ETags for caching and GZip content compression. Pretty neat.

[... 44 words]

Contribute / ProFTPd problem solved

After further analysis of the Contribute problem described earlier, we discovered that Contribute was opening a new FTP connection every time we clicked a link within the application even before we had hit the “edit page” button to fire up the editing mode. Switching the connection over to use SFTP instead of FTP had the same problem, with a secure connection being opened for each link we clicked instead. The connections remained open until we shut down Contribute.

[... 137 words]

Teaching CSS: there’s a long way to go

This email to the css-discuss mailing list does a great job of describing the confusion and frustration that still confronts traditional web developers who are only just starting out on the road to mastering CSS. When you’ve “got it”, it’s easy to forget how much of a paradigm shift it is away from old school table methods. Here’s an extract:

[... 314 words]

Sprint PCS goes CSS

The Sprint PCS site has relaunched, using XHTML 1.0 transitional and CSS. It’s another great example of a mostly web standards compliant commerical/corporate; there are a few validation errors thanks to a quarrelsome CMS. France Rupert is the lead developer behind the new site and is promising a detailed writeup of the process and challenges behind the redesign. France hails from Kansas City, so hopefully we’ll be able to get him along to a web meetup in the not too distant future.

[... 155 words]

Contribute hammering FTP servers?

We’re having a problem at work with Macromedia Contribute. We host sites for a number of local companies, and one of them wants to use Contribute to update its site. The problem is that whenever Contribute tries to connect to our FTP server, it opens up 30 simultaneous connections, effectively running a denial of service that prevents other clients from logging in during peak times. I’ve searched the ’net and haven’t found any references to this problem; does anyone know anything about the issue? We’re running ProFTPD 1.2.9 and the client is using Macromedia Contribute 2.

[... 100 words]

Linux on the desktop at IBM

Spotted on Slashdot, IBM’s Open Source Desktop—Directions for today... and Tomorrow presentation includes one slide that really caught my attention:

[... 95 words]

High security is low security

Via Crypto-Gram, a great piece from Bruce Tognazzini about how tough security measures can actively reduce the security of a system:

[... 225 words]

Analysing methodologies

Joel Spolsky on analysing development methodologies:

[... 98 words]

An apology

It turns out that the Javascript on PHP.net mentioned previously was not deliberately obfuscated to protect the code from prying eyes; it was merely compressed to reduce the size of the script. See this comment for further details. I’d like to apologise to the maintainers of PHP.net for jumping the gun on this issue. Incidentally, the unobfuscated code is now available in CVS.

[... 72 words]

Click Maps

I’m not a very visual person; complex entity relation ship diagrams, data flow diagrams and the like are usually completely lost on me, and I try to avoid them when they are mandated by coursework at University. Give me a text based SQL schema any day. Click Maps on the other hand I could learn to like—they’re nice and straight forward and solve the very real problem of planning how different parts of a web application will link to each other.

[... 84 words]

The good and the ugly

PHP.net has a new feature on their search page—a really nice implementation of an auto complete text widget in Javascript. Even better, the search page is valid XHTML 1.0 Strict and uses CSS for the layout. Let’s hope this is an indication of things to the come for the rest of the site, which still mostly consists of tag soup.

[... 368 words]

Extracting EXIF data with Python

I’ve been rewriting the photo gallery management system for KUSports.com in Python. One of the new features is that the system can automagically extract caption and photographer information from the photos, provided the information has previously been added to the jpeg file as EXIF data. I tried several methods of doing this but eventually settled on EXIF.py because it worked straight away using a simple process_file() function and doesn’t require any additional software. Recommended.

[... 81 words]

Easy installers for PHP scripts

I tried out FUDforum last night, after Rasmus Lerdorf recommended it in a comment on Jeremy Zawodny’s blog. Feature wise, it’s pretty impressive but still doesn’t quite do it for me—I want something that’s trivial to integrate with an existing authentication system and outputs valid HTML (or XHTML) out of the box. Rasmus says it’s the only board he’s seen that doesn’t have obvious security holes though so it’s probably worth checking out if you need to set up a forum of that kind.

[... 275 words]

The little things

We put together a bookmarklet today that allows our editing staff to jump instantly from looking at a story on one of our web sites to the interface for editing it within our current content management system. It took about 5 minutes development time, plus an extra 15 minutes spent showing it to people, setting it up on machines and demonstrating how it works. It’s hard to over state how well this new shortcut was received by the people who spend hours every day using the system. For end users, a little feature can sometimes go a very long way.

[... 106 words]

Roundup of roundups

There’s blogging a list of links, and then there’s blogging a list of lists of links:

[... 152 words]