Stylesheet parsing gets complicated
20th January 2003
Craig Saila points to the SearchEngineWatch Webpage Size Checker. It’s a nice tool, but it doesn’t appear to take the size of linked style sheets in to account. I was playing around with the idea of a web page cache written in Python over Christmas and I hit the same kind of problem—while finding linked stylesheets using Python’s HTML parser wasn’t too difficult (and could be achieved equally well using a regular expression) things get a lot hairier when you start to take @import
statements and CSS defined background images / custom bullet images in to account. Again I imagine a solution could be hacked out with regular expressions but a nicer method would be some kind of CSS parser (the Python standard library has yet to include one). Maybe another project for a rainy day...
More recent articles
- Maybe Meta's Llama claims to be open source because of the EU AI act - 19th April 2025
- Image segmentation using Gemini 2.5 - 18th April 2025
- GPT-4.1: Three new million token input models from OpenAI, including their cheapest model yet - 14th April 2025