Simon Willison’s Weblog

54 items tagged “html”

kennethreitz/requests-html: HTML Parsing for Humans™ (via) Neat and tiny wrapper around requests, lxml and html2text that provides a Kenneth Reitz grade API design for intuitively fetching and scraping web pages. The inclusion of html2text means you can use a CSS selector to select a specific HTML element and then convert that to the equivalent markdown in a one-liner. # 25th February 2018, 4:49 pm

Can I use... input type=color. TIL <input type=“color”> has reached 78.83% support globally already—biggest gap right now is Mobile Safari. # 29th November 2017, 9:56 pm

Why can’t I do style=“padding: 20px” and a border in the same div?

You can’t have two style attributes on the same element—but you can have two styles rules inside the same attribute. Try this instead:

[... 48 words]

Should I store markdown instead of HTML into database fields?

You should store the exact format that was entered by the user.

[... 95 words]

What are the different ways in which web sites can be developed?

There are a few languages that provide an alternative syntax that compiles to HTML (Haml is quite a popular one) but generally you need to have a very good understanding of HTML in order to do any web development at all, no matter what server-side technology you use. Likewise for CSS—Sass and LESS provide alternative syntax that compiles to CSS, but they are no replacement for understanding how CSS actually works.

[... 94 words]

What data structures are used to implement the DOM tree?

You may enjoy this post from Hixie back in 2002 which illustrates how different browsers deal with incorrectly nested HTML. IE6 used to create a tree that wasn’t actually a tree! http://ln.hixie.ch/?start=103791...

[... 49 words]

What’s the best way to handle logins?

First, make sure you’re storing the password as a salted hash, using a deliberately slow hashing algorithm such as bcrypt, scrypt or PBKDF2—here are some recent articles to get you up to speed:

[... 176 words]

What is the difference between XHTML 1.0 strict and transitional?

Not a lot. XHTML transitional lets you use a few presentational attributes and elements that aren’t available in XHTML strict. Here’s a more detailed overview from back in 2005: http://24ways.org/2005/transitio...

[... 59 words]

Could browsers be made to scroll down (e.g. by 67%) if you add #67% to a URL?

I’d say no.

[... 89 words]

Is there any consensus yet on link rel=shorturl vs rev=canonical?

It’s pretty clear from the answers that rev=canonical v.s. rel=canonical is way too confusing—so it’s down to rel=shortlink v.s. rel=shorturl.

[... 38 words]

The Web for me is still URLs and HTML. I don’t want a Web which can only be understood by running a JavaScript interpreter against it.

Me, on Twitter # 27th September 2010, 4:37 pm

Paper 5 | Scribd (via) A more impressive example of Scribd’s new HTML/CSS document viewer: a mathematics-heavy LaTeX paper by one of Scribd’s engineers. # 7th May 2010, 12:12 pm

Scribd in HTML5. Outstanding piece of engineering work from Scribd—they can now render documents using HTML, webfonts and a ton of CSS absolute positioning (using ems rather than pixels) instead of Flash. Nothing to do with HTML5 of course, which is rapidly replacing Ajax as the most mis-applied terminology on the Web. That nit-pick feels pretty insignificant compared to their overall achievement though—being able to convert any formatted document (.doc, pdf etc) in to HTML and CSS that displays correctly is a real leap forward. # 7th May 2010, 12:09 pm

Want to know if your ‘HTML application’ is part of the web? Link me into it. Not just link me to it; link me into it. Not just to the black-box frontpage. Link me to a piece of content. Show me that it can be crawled, show me that we can draw strands of silk between the resources presented in your app. That is the web: The beautiful interconnection of navigable content

Ben Ward # 6th May 2010, 8:53 pm

If HTML is just another bytecode container and rendering runtime, we’ll have lost part of what made the web special, and I’m afraid HTML will lose to other formats by willingly giving up its differentiators and playing on their turf.

Alex Russell # 17th March 2010, 10:37 pm

Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide.

Andrew Clover # 16th November 2009, 10:32 am

HTML has always been a conversation between browser makers, authors, standards wonks, and other people who just showed up and liked to talk about angle brackets. Most of the successful versions of HTML have been “retro-specs,” catching up to the world while simultaneously trying to nudge it in the right direction. Anyone who tells you that HTML should be kept “pure” (presumably by ignoring browser makers, or ignoring authors, or both) is simply misinformed. HTML has never been pure, and all attempts to purify it have been spectacular failures, matched only by the attempts to replace it.

Mark Pilgrim # 3rd November 2009, 7:20 am

Django ponies: Proposals for Django 1.2

I’ve decided to step up my involvement in Django development in the run-up to Django 1.2, so I’m currently going through several years worth of accumulated pony requests figuring out which ones are worth advocating for. I’m also ensuring I have the code to back them up—my innocent AutoEscaping proposal a few years ago resulted in an enormous amount of work by Malcolm and I don’t think he’d appreciate a repeat performance.

[... 1674 words]

Video for Everybody! Reminiscent of the early days of Web Standards, Kroc Camen has created a fiendishly clever chunk of HTML which can play a video on any browser, starting with HTML5 video then falling back on Flash and eventually just an HTML message telling the user where they can download the file. No JavaScript to be seen, but conditional comments abound. Requires you to encode as both Ogg and H.264, but Kroc includes details instructions for doing that using Handbrake. # 2nd July 2009, 7:33 pm

Reducing XSS by way of Automatic Context-Aware Escaping in Template Systems (via) The Google Online Security Blog reminds us that simply HTML-escaping everything isn’t enough—the type of escaping needed depends on the current markup context, for example variables inside JavaScript blocks should be escaped differently. Google’s open source Ctemplate library uses an HTML parser to keep track of the current context and apply the correct escaping function automatically. # 14th April 2009, 9:26 am

FireScope. Neat little Firefox / Firebug extension which adds a “Reference” tab showing documentation for the selected element from the comprehensive SitePoint Reference site. # 5th February 2009, 10:51 pm

Using SVG on the Web. I’ve been having a lot of fun playing with SVG recently. Here are some useful tips for including SVG images in HTML and XHTML documents. # 23rd December 2008, 1 pm

YQL—converting the web to JSON with mock SQL. YQL just got a whole lot more interesting to me—I had no idea they were exposing an HTML and RSS scraping tool over a JSONP API in addition to all of the Yahoo! web service methods. # 13th December 2008, 9:39 am

Conditional classnames. Yahoo!’s internal coding standards still recommend CSS hacks over conditional comments because a separate stylesheet for IE imposes an additional HTTP request. Paul Hammond points out that you can use conditional comments to write out an extra class=“ie” attribute on the body element and use that to target the IE specific fixes in your stylesheets. # 17th October 2008, 1:32 pm

XHTML—myths and reality. Useful overview of XHTML from Tina Holmboe of the W3C’s XHTML Working Group, which suggests considering HTML 4.01 strict unless you need mixed namespaces for things like MathML. I’ve been storing this blog’s content as XHTML but serving as HTML for several years now. # 7th October 2008, 4:56 pm

django-html. A small project I’m working on to make Django behave better with regards to HTML v.s. XHTML. # 9th September 2008, 11:59 pm

Coding Horror: Protecting Your Cookies: HttpOnly. Jeff Atwood discovers the hard way that writing an HTML sanitizer is significantly harder than you would think. HttpOnly cookies aren’t the solution though: they’re potentially useful as part of a defense in depth strategy, but fundamentally if you have an XSS hole you’re going to get 0wned, HttpOnly cookies or not. Auto-escape everything on output and be extremely cautious with things like HTML sanitizers. # 29th August 2008, 2:01 am

Javascript protocol fuzz results. If your HTML sanitizer uses blacklisting rather than whitelisting here are a few more weird ways of injecting javascript: in to a link that you need to worry about—but you should really switch to whitelisting http:// and https:// instead. # 30th June 2008, 3:57 pm

James Bennett: Why HTML. Finally, somewhere to point people when they ask why I avoid XHTML that’s a bit more up to date than Hixie’s rant from 2002. # 18th June 2008, 12:27 pm