Simon Willison’s Weblog

Subscribe

Items tagged html, python

Filters: html × python × Sorted by date


Fastest way to turn HTML into text in Python (via) A light benchmark of the new-to-me selectolax Python library shows it performing extremely well for tasks such as extracting just the text from an HTML string, after first manipulating the DOM. selectolax is a Python binding over the Modest and Lexbor HTML parsing engines, which are written in no-outside-dependency C. # 27th July 2022, 5:55 pm

kennethreitz/requests-html: HTML Parsing for Humans™ (via) Neat and tiny wrapper around requests, lxml and html2text that provides a Kenneth Reitz grade API design for intuitively fetching and scraping web pages. The inclusion of html2text means you can use a CSS selector to select a specific HTML element and then convert that to the equivalent markdown in a one-liner. # 25th February 2018, 4:49 pm

Django ponies: Proposals for Django 1.2

I’ve decided to step up my involvement in Django development in the run-up to Django 1.2, so I’m currently going through several years worth of accumulated pony requests figuring out which ones are worth advocating for. I’m also ensuring I have the code to back them up—my innocent AutoEscaping proposal a few years ago resulted in an enormous amount of work by Malcolm and I don’t think he’d appreciate a repeat performance.

[... 1674 words]

django-html. A small project I’m working on to make Django behave better with regards to HTML v.s. XHTML. # 9th September 2008, 11:59 pm

Restructured Text to Anything. Slick set of online tools for converting Restructured Text (one of the more mature wiki-style markup languages) to HTML or PDF. Includes a nice looking API. Powered by Django. # 13th September 2007, 3:54 pm

start.gotapi.com. Lightning fast lookups of API documentation; includes Python docs, YUI, HTML, CSS and lots more. # 5th June 2007, 6:05 pm