Simon Willison’s Weblog

Subscribe

strip-tags 0.6. It's been a while since I updated this tool, but in investigating a tricky mistake in my tutorial for LLM schemas I discovered a bug that I needed to fix.

Those release notes in full:

  • Fixed a bug where strip-tags -t meta still removed <meta> tags from the <head> because the entire <head> element was removed first. #32
  • Kept <meta> tags now default to keeping their content and property attributes.
  • The CLI -m/--minify option now also removes any remaining blank lines. #33
  • A new strip_tags(remove_blank_lines=True) option can be used to achieve the same thing with the Python library function.

Now I can do this and persist the <meta> tags for the article along with the stripped text content:

curl -s 'https://apnews.com/article/trump-federal-employees-firings-a85d1aaf1088e050d39dcf7e3664bb9f' | \
  strip-tags -t meta --minify

Here's the output from that command.