Simon Willison’s Weblog

Subscribe

65 items tagged “html”

2024

Streaming HTML out of order without JavaScript (via) A really interesting new browser capability. If you serve the following HTML:

<template shadowrootmode="open">
<slot name="item-1">Loading...</slot>
</template>

Then later in the same page stream an element specifying that slot:

<span slot="item-1">Item number 1</span>

The previous slot will be replaced while the page continues to load.

I tried the demo in the most recent Chrome, Safari and Firefox (and Mobile Safari) and it worked in all of them.

The key feature is shadowrootmode=open, which looks like it was added to Firefox 123 on February 19th 2024—the other two browsers are listed on caniuse.com as gaining it around March last year. # 1st March 2024, 4:59 pm

htmz (via) Astonishingly clever browser platform hack by Lean Rada.

Add this to a page:

<iframe hidden name=htmz onload="setTimeout(() => document.querySelector( this.contentWindow.location.hash || null)?.replaceWith( ...this.contentDocument.body.childNodes ))"></iframe>

Then elsewhere add a link like this:

<a href="/flower.html#my-element" target=htmz>Flower</a>

Clicking that link will fetch content from /flower.html and replace the element with ID of my-element with that content. # 20th February 2024, 1:21 am

Portable EPUBs. Will Crichton digs into the reasons people still prefer PDF over HTML as a format for sharing digital documents, concluding that the key issues are that HTML documents are not fully self-contained and may not be rendered consistently.

He proposes “Portable EPUBs” as the solution, defining a subset of the existing EPUB standard with some additional restrictions around avoiding loading extra assets over a network, sticking to a smaller (as-yet undefined) subset of HTML and encouraging interactive components to be built using self-contained Web Components.

Will also built his own lightweight EPUB reading system, called Bene—which is used to render this Portable EPUBs article. It provides a “download” link in the top right which produces the .epub file itself.

There’s a lot to like here. I’m constantly infuriated at the number of documents out there that are PDFs but really should be web pages (academic papers are a particularly bad example here), so I’m very excited by any initiatives that might help push things in the other direction. # 25th January 2024, 8:32 pm

2023

You can stop using user-scalable=no and maximum-scale=1 in viewport meta tags now. Luke Plant points out that your meta viewport tag should stick to just “width=device-width, initial-scale=1” these days—the user-scalable=no and maximum-scale=1 attributes are no longer necessary, and have a negative impact on accessibility, especially for Android users. # 4th August 2023, 11:41 pm

The Page With No Code (via) A fun demo by Dan Q, who created a web page with no HTML at all—but in Firefox it still renders content, thanks to a data URI base64 encoded stylesheet served in a link: header that uses html::before, html::after, body::before and body::after with content: properties to serve the content. It even has a background image, encoded as a base64 SVG nested inside another data URI. # 21st January 2023, 6:59 pm

2022

Introducing sqlite-html: query, parse, and generate HTML in SQLite (via) Another brilliant SQLite extension module from Alex Garcia, this time written in Go. sqlite-html adds a whole family of functions to SQLite for parsing and constructing HTML strings, built on the Go goquery and cascadia libraries. Once again, Alex uses an Observable notebook to describe the new features, with embedded interactive examples that are backed by a Datasette instance running in Fly. # 3rd August 2022, 5:31 pm

Fastest way to turn HTML into text in Python (via) A light benchmark of the new-to-me selectolax Python library shows it performing extremely well for tasks such as extracting just the text from an HTML string, after first manipulating the DOM. selectolax is a Python binding over the Modest and Lexbor HTML parsing engines, which are written in no-outside-dependency C. # 27th July 2022, 5:55 pm

HTML event handler attributes: down the rabbit hole (via) onclick=“myfunction(event)” is an idiom for passing the click event to a function—but how does it work? It turns out the answer is buried deep in the HTML spec—the browser wraps that string of code in a function(event) { ... that string ... } function and makes the event available to its local scope that way. # 26th April 2022, 8:35 pm

2020

pup. This is a great idea: a command-line tool for parsing HTML on stdin using CSS selectors. It’s like jq but for HTML. Supports a sensible collection of selectors and has a number of output options for the selected nodes, including plain text and JSON. It also works as a simple pretty-printer for HTML. # 14th February 2020, 4:25 pm

2019

Using the HTML lang attribute (via) TIL the HTML lang attribute is used by screen readers to understand how to provide the correct accent and pronunciation. # 18th April 2019, 9:09 pm

2018

If you wrap your main content – that is, the stuff that isn’t navigation, logo and main header etc – in a <main> tag, a screen reader user can jump immediately to it using a keyboard shortcut. Imagine how useful that is – they don’t have to listen to all the content before it, or tab through it to get to the main meat of your page.

Bruce Lawson # 19th December 2018, 1:07 pm

kennethreitz/requests-html: HTML Parsing for Humans™ (via) Neat and tiny wrapper around requests, lxml and html2text that provides a Kenneth Reitz grade API design for intuitively fetching and scraping web pages. The inclusion of html2text means you can use a CSS selector to select a specific HTML element and then convert that to the equivalent markdown in a one-liner. # 25th February 2018, 4:49 pm

2017

Can I use... input type=color. TIL <input type="color"> has reached 78.83% support globally already—biggest gap right now is Mobile Safari. # 29th November 2017, 9:56 pm

2013

Why can’t I do style=“padding: 20px” and a border in the same div?

You can’t have two style attributes on the same element—but you can have two styles rules inside the same attribute. Try this instead:

[... 48 words]

Should I store markdown instead of HTML into database fields?

You should store the exact format that was entered by the user.

[... 95 words]

What are the different ways in which web sites can be developed?

There are a few languages that provide an alternative syntax that compiles to HTML (Haml is quite a popular one) but generally you need to have a very good understanding of HTML in order to do any web development at all, no matter what server-side technology you use. Likewise for CSS—Sass and LESS provide alternative syntax that compiles to CSS, but they are no replacement for understanding how CSS actually works.

[... 94 words]

What data structures are used to implement the DOM tree?

You may enjoy this post from Hixie back in 2002 which illustrates how different browsers deal with incorrectly nested HTML. IE6 used to create a tree that wasn’t actually a tree! http://ln.hixie.ch/?start=103791...

[... 49 words]

2012

What’s the best way to handle logins?

First, make sure you’re storing the password as a salted hash, using a deliberately slow hashing algorithm such as bcrypt, scrypt or PBKDF2—here are some recent articles to get you up to speed:

[... 176 words]

What is the difference between XHTML 1.0 strict and transitional?

Not a lot. XHTML transitional lets you use a few presentational attributes and elements that aren’t available in XHTML strict. Here’s a more detailed overview from back in 2005: http://24ways.org/2005/transitio...

[... 59 words]

2011

Could browsers be made to scroll down (e.g. by 67%) if you add #67% to a URL?

I’d say no.

[... 89 words]

2010

Is there any consensus yet on link rel=shorturl vs rev=canonical?

It’s pretty clear from the answers that rev=canonical v.s. rel=canonical is way too confusing—so it’s down to rel=shortlink v.s. rel=shorturl.

[... 38 words]

The Web for me is still URLs and HTML. I don’t want a Web which can only be understood by running a JavaScript interpreter against it.

Me, on Twitter # 27th September 2010, 4:37 pm

Paper 5 | Scribd (via) A more impressive example of Scribd’s new HTML/CSS document viewer: a mathematics-heavy LaTeX paper by one of Scribd’s engineers. # 7th May 2010, 12:12 pm

Scribd in HTML5. Outstanding piece of engineering work from Scribd—they can now render documents using HTML, webfonts and a ton of CSS absolute positioning (using ems rather than pixels) instead of Flash. Nothing to do with HTML5 of course, which is rapidly replacing Ajax as the most mis-applied terminology on the Web. That nit-pick feels pretty insignificant compared to their overall achievement though—being able to convert any formatted document (.doc, pdf etc) in to HTML and CSS that displays correctly is a real leap forward. # 7th May 2010, 12:09 pm

Want to know if your ‘HTML application’ is part of the web? Link me into it. Not just link me to it; link me into it. Not just to the black-box frontpage. Link me to a piece of content. Show me that it can be crawled, show me that we can draw strands of silk between the resources presented in your app. That is the web: The beautiful interconnection of navigable content

Ben Ward # 6th May 2010, 8:53 pm

If HTML is just another bytecode container and rendering runtime, we’ll have lost part of what made the web special, and I’m afraid HTML will lose to other formats by willingly giving up its differentiators and playing on their turf.

Alex Russell # 17th March 2010, 10:37 pm

2009

Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide.

Andrew Clover # 16th November 2009, 10:32 am

HTML has always been a conversation between browser makers, authors, standards wonks, and other people who just showed up and liked to talk about angle brackets. Most of the successful versions of HTML have been “retro-specs,” catching up to the world while simultaneously trying to nudge it in the right direction. Anyone who tells you that HTML should be kept “pure” (presumably by ignoring browser makers, or ignoring authors, or both) is simply misinformed. HTML has never been pure, and all attempts to purify it have been spectacular failures, matched only by the attempts to replace it.

Mark Pilgrim # 3rd November 2009, 7:20 am

Django ponies: Proposals for Django 1.2

I’ve decided to step up my involvement in Django development in the run-up to Django 1.2, so I’m currently going through several years worth of accumulated pony requests figuring out which ones are worth advocating for. I’m also ensuring I have the code to back them up—my innocent AutoEscaping proposal a few years ago resulted in an enormous amount of work by Malcolm and I don’t think he’d appreciate a repeat performance.

[... 1674 words]