strip-tags 0.6. It's been a while since I updated this tool, but in investigating a tricky mistake in my tutorial for LLM schemas I discovered a bug that I needed to fix.
Those release notes in full:
- Fixed a bug where
strip-tags -t meta
still removed<meta>
tags from the<head>
because the entire<head>
element was removed first. #32- Kept
<meta>
tags now default to keeping theircontent
andproperty
attributes.- The CLI
-m/--minify
option now also removes any remaining blank lines. #33- A new
strip_tags(remove_blank_lines=True)
option can be used to achieve the same thing with the Python library function.
Now I can do this and persist the <meta>
tags for the article along with the stripped text content:
curl -s 'https://apnews.com/article/trump-federal-employees-firings-a85d1aaf1088e050d39dcf7e3664bb9f' | \
strip-tags -t meta --minify
Here's the output from that command.
Recent articles
- Structured data extraction from unstructured content using LLM schemas - 28th February 2025
- Initial impressions of GPT-4.5 - 27th February 2025
- Claude 3.7 Sonnet, extended thinking and long output, llm-anthropic 0.14 - 25th February 2025