Simon Willison’s Weblog

Stylesheet parsing gets complicated

Craig Saila points to the SearchEngineWatch Webpage Size Checker. It’s a nice tool, but it doesn’t appear to take the size of linked style sheets in to account. I was playing around with the idea of a web page cache written in Python over Christmas and I hit the same kind of problem—while finding linked stylesheets using Python’s HTML parser wasn’t too difficult (and could be achieved equally well using a regular expression) things get a lot hairier when you start to take @import statements and CSS defined background images / custom bullet images in to account. Again I imagine a solution could be hacked out with regular expressions but a nicer method would be some kind of CSS parser (the Python standard library has yet to include one). Maybe another project for a rainy day...

This is Stylesheet parsing gets complicated by Simon Willison, posted on 20th January 2003.

Next: More Vellum

Previous: Scaling the two way web

Previously hosted at http://simon.incutio.com/archive/2003/01/20/stylesheetParsingGetsComplicat