Simon Willison’s Weblog

Subscribe

24th October 2025

Research blog-tags-scikit-learn — Automatically assigning meaningful tags to historic, untagged blog posts, this project leverages the Simon Willison blog database and scikit-learn to train and compare multi-label text classification models. Four approaches—TF-IDF + Logistic Regression, Multinomial Naive Bayes, Random Forest, and LinearSVC—were tested on posts’ title and body text using the 158 most frequently used tags.

This is a beat by Simon Willison, posted on 24th October 2025.

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe