Simon Willison’s Weblog

4 items tagged “strings”


datasette-jellyfish. I learned about a handy Python library called Jellyfish which implements approximate and phonetic matching of strings—soundex, metaphone, porter stemming, levenshtein distance and more. I’ve built a simple Datasette plugin which wraps the library and makes each of those algorithms available as a SQL function. # 9th March 2019, 6:29 pm

String length—Rosetta Code (via) Calculating the length of a string is surprisingly difficult once Unicode is involved. Here’s a fascinating illustration of how that problem can be attached dozens of different programming languages. From that page: the string “J̲o̲s̲é̲” (“J\x{332}o\x{332}s\x{332}e\x{301}\x{332}”) has 4 user-visible graphemes, 9 characters (code points), and 14 bytes when encoded in UTF-8. # 22nd February 2019, 3:27 pm


String types in Python 3. bytes are now immutable (just like the bytestrings they are replacing) and a new mutable buffer type has been introduced. # 9th October 2007, 2:08 am

How should JSON strings be represented in Erlang? Erlang’s poor support for strings makes this a surprisingly tricky question. # 14th September 2007, 8:17 am