Simon Willison’s Weblog

Subscribe

Sunday, 25th February 2018

Publishing history has various examples of advertising-only business models. But they are very much the exception. They mainly exist when there are near monopoly barriers to entry into the market which allow publishers to command and defend robust ad rates.

Josh Marshall

# 4:03 pm / advertising

kennethreitz/requests-html: HTML Parsing for Humans™ (via) Neat and tiny wrapper around requests, lxml and html2text that provides a Kenneth Reitz grade API design for intuitively fetching and scraping web pages. The inclusion of html2text means you can use a CSS selector to select a specific HTML element and then convert that to the equivalent markdown in a one-liner.

# 4:49 pm / scraping, html, python, requests

r1chardj0n3s/parse: Parse strings using a specification based on the Python format() syntax. (via) Really neat API design: parse() behaves almost exactly in the opposite way to Python’s built-in format(), so you can use format strings as an alternative to regular expressions for extracting specific data from a string.

# 4:58 pm / regular-expressions, python

2018 » February

MTWTFSS
   1234
567891011
12131415161718
19202122232425
262728