Super Fast String Matching in Python (via) Interesting technique for calculating string similarity at scale in Python, with much better performance than Levenshtein distances. The trick here uses TF/IDF against N-Grams, plus a CSR (Compressed Sparse Row) scipy matrix to run the calculations. Includes clear explanations of each of these concepts.
Recent articles
- Introducing gisthost.github.io - 1st January 2026
- 2025: The year in LLMs - 31st December 2025
- How Rob Pike got spammed with an AI slop "act of kindness" - 26th December 2025