Simon Willison’s Weblog

Blogmarks tagged html

Filters: Type: blogmark × html ×


pup. This is a great idea: a command-line tool for parsing HTML on stdin using CSS selectors. It’s like jq but for HTML. Supports a sensible collection of selectors and has a number of output options for the selected nodes, including plain text and JSON. It also works as a simple pretty-printer for HTML. # 14th February 2020, 4:25 pm

Using the HTML lang attribute (via) TIL the HTML lang attribute is used by screen readers to understand how to provide the correct accent and pronunciation. # 18th April 2019, 9:09 pm

kennethreitz/requests-html: HTML Parsing for Humans™ (via) Neat and tiny wrapper around requests, lxml and html2text that provides a Kenneth Reitz grade API design for intuitively fetching and scraping web pages. The inclusion of html2text means you can use a CSS selector to select a specific HTML element and then convert that to the equivalent markdown in a one-liner. # 25th February 2018, 4:49 pm

Can I use... input type=color. TIL <input type=“color”> has reached 78.83% support globally already—biggest gap right now is Mobile Safari. # 29th November 2017, 9:56 pm

Paper 5 | Scribd (via) A more impressive example of Scribd’s new HTML/CSS document viewer: a mathematics-heavy LaTeX paper by one of Scribd’s engineers. # 7th May 2010, 12:12 pm

Scribd in HTML5. Outstanding piece of engineering work from Scribd—they can now render documents using HTML, webfonts and a ton of CSS absolute positioning (using ems rather than pixels) instead of Flash. Nothing to do with HTML5 of course, which is rapidly replacing Ajax as the most mis-applied terminology on the Web. That nit-pick feels pretty insignificant compared to their overall achievement though—being able to convert any formatted document (.doc, pdf etc) in to HTML and CSS that displays correctly is a real leap forward. # 7th May 2010, 12:09 pm

Video for Everybody! Reminiscent of the early days of Web Standards, Kroc Camen has created a fiendishly clever chunk of HTML which can play a video on any browser, starting with HTML5 video then falling back on Flash and eventually just an HTML message telling the user where they can download the file. No JavaScript to be seen, but conditional comments abound. Requires you to encode as both Ogg and H.264, but Kroc includes details instructions for doing that using Handbrake. # 2nd July 2009, 7:33 pm

Reducing XSS by way of Automatic Context-Aware Escaping in Template Systems (via) The Google Online Security Blog reminds us that simply HTML-escaping everything isn’t enough—the type of escaping needed depends on the current markup context, for example variables inside JavaScript blocks should be escaped differently. Google’s open source Ctemplate library uses an HTML parser to keep track of the current context and apply the correct escaping function automatically. # 14th April 2009, 9:26 am

FireScope. Neat little Firefox / Firebug extension which adds a “Reference” tab showing documentation for the selected element from the comprehensive SitePoint Reference site. # 5th February 2009, 10:51 pm

Using SVG on the Web. I’ve been having a lot of fun playing with SVG recently. Here are some useful tips for including SVG images in HTML and XHTML documents. # 23rd December 2008, 1 pm

YQL—converting the web to JSON with mock SQL. YQL just got a whole lot more interesting to me—I had no idea they were exposing an HTML and RSS scraping tool over a JSONP API in addition to all of the Yahoo! web service methods. # 13th December 2008, 9:39 am

Conditional classnames. Yahoo!’s internal coding standards still recommend CSS hacks over conditional comments because a separate stylesheet for IE imposes an additional HTTP request. Paul Hammond points out that you can use conditional comments to write out an extra class=“ie” attribute on the body element and use that to target the IE specific fixes in your stylesheets. # 17th October 2008, 1:32 pm

XHTML—myths and reality. Useful overview of XHTML from Tina Holmboe of the W3C’s XHTML Working Group, which suggests considering HTML 4.01 strict unless you need mixed namespaces for things like MathML. I’ve been storing this blog’s content as XHTML but serving as HTML for several years now. # 7th October 2008, 4:56 pm

django-html. A small project I’m working on to make Django behave better with regards to HTML v.s. XHTML. # 9th September 2008, 11:59 pm

Coding Horror: Protecting Your Cookies: HttpOnly. Jeff Atwood discovers the hard way that writing an HTML sanitizer is significantly harder than you would think. HttpOnly cookies aren’t the solution though: they’re potentially useful as part of a defense in depth strategy, but fundamentally if you have an XSS hole you’re going to get 0wned, HttpOnly cookies or not. Auto-escape everything on output and be extremely cautious with things like HTML sanitizers. # 29th August 2008, 2:01 am

Javascript protocol fuzz results. If your HTML sanitizer uses blacklisting rather than whitelisting here are a few more weird ways of injecting javascript: in to a link that you need to worry about—but you should really switch to whitelisting http:// and https:// instead. # 30th June 2008, 3:57 pm

James Bennett: Why HTML. Finally, somewhere to point people when they ask why I avoid XHTML that’s a bit more up to date than Hixie’s rant from 2002. # 18th June 2008, 12:27 pm

Elliotte Rusty Harold: Why XHTML. “XHTML makes life harder for document authors in exchange for making life easier for document consumers.”—since there are a lot more document authors than there are tools for consuming, this seems like an argument AGAINST XHTML to me. # 5th June 2008, 9:25 pm

Embedding custom non-visible data in HTML 5. “Every HTML element may have any number of attributes starting with the string ’data-’ specified, with any value.”—this will be incredibly useful for unobtrusive JavaScript where there’s no sensible place to store configuration data as HTML content. It will also mean Dojo has an approved method for adding custom attributes to declaratively instantiate Dojo widgets. # 19th April 2008, 10:58 pm

hash. Douglas Crockford: “Any HTML tag that accepts a src= or href= attribute should also be allowed to take a hash= attribute”—to protect against file tampering and (more importantly) provide a truly robust caching mechanism. # 30th March 2008, 6:34 pm

SVG and text/html. Anne van Kesteren discusses the need for SVG and MathML to be embeddable in HTML 5, not just XHTML. # 17th October 2007, 4:06 pm

The longdesc lottery. Mark Pilgrim is now writing for the WHATWG blog. Here he makes the case for replacing the longdesc attribute with a better solution, based on ten years of developer ignorance and misuse. As always with that site, check the comments for a microcosm of the larger debate. # 14th September 2007, 11:44 am

html4all. New mailing list / advocacy group focusing on accessibility issues relevant to HTML 5. This is something that the core HTML 5 group have taken a lot of criticism for, although it’s unfair to say that they don’t care about accessibility (they are however challenging a lot of sacred cows). # 14th September 2007, 11:35 am

Restructured Text to Anything. Slick set of online tools for converting Restructured Text (one of the more mature wiki-style markup languages) to HTML or PDF. Includes a nice looking API. Powered by Django. # 13th September 2007, 3:54 pm

jQuery 1.2. Lots of neat new stuff; my favourite new feature is “Partial .load()” which lets you pull in HTML with Ajax and then use a CSS selector to grab a subset of that page and inject it in to the DOM. # 11th September 2007, 8:44 am

Why the Alt Attribute May Be Omitted. “The benefit of requiring the alt attribute to be omitted, rather than simply requiring the empty value, is that it makes a clear distinction between an image that has no alternate text (such as an iconic or graphical representation of the surrounding text) and an image that is a critical part of the content, but for which not alt text is available.” # 25th August 2007, 1:11 pm

WebCore Rendering I—The Basics. Dave Hyatt has started a series of posts explaining the internals of WebCore’s rendering system. # 10th August 2007, 3:21 pm

The CSS Redundancy Checker. A tool for checking your markup for outdated CSS rules that don’t match any of your HTML. We were discussing the need for something similar to this at Torchbox a few weeks ago. # 6th July 2007, 12:02 pm

HTML Entity Character Lookup. Look up HTML entities by characters that are a similar shape. # 3rd July 2007, 3:41 pm