Feed Sign in with OpenID OpenID

Simon Willison’s Weblog

Tim Bray on Unicode

Tim Bray’s ongoing really is one of the best technical blogs out there (if it even is a blog). One of his current topics is Unicode, which is one of those topics that pretty much every software developer should try to get under their belt. On the Goodness of Unicode gives a thorough, entertaining overview of the subject (including its importance and why it isn’t as scary as it sounds) while Characters vs. Bytes is the first in a promised three part essay covering the technical details of modern character processing.

This is Tim Bray on Unicode by Simon Willison, posted on 28th April 2003.

View blog reactions

Next: Threads and Dynamic Content

Previous: Fixed Point Arithmetic in Python

4 comments

  1. That's pretty cool - related entries picked up all three previous entries that link to Tim Bray as well as my only other entry mentioning Unicode (and a story about the Guardian for some reason, but nothing's perfect).

    Simon Willison - 28th April 2003 20:40 - #

  2. Have you thought about doing it on the back-end instead? What I mean is, when you submit the post, it comes back with a list of possible related posts - tick the box next to each of them to "relate" them explicitly. It's a little extra work, but negligable, I think.

    Remember that you need to check relations in both directions though - otherwise older posts won't list anything posted after them.

    Jim - 28th April 2003 22:14 - #

  3. I've been looking at how I can implement a similar function the system I'm trying to get around to starting. Without having looked into it much I had thought about asking the user to put in a few keywords next to each entry and using MySQL full text indexing on this field instead of the main entry field. Jim great idea... I'll definitely look into adding that.

    Jon - 28th April 2003 22:44 - #

  4. Jim: That's a really good idea. I quite like the uncertainty of the current method though - it's quirky :) I would definitely use a human-approved version of the system on a commercial site though. The problem of updating related older articles could be solved by having any relationship work as a two-way thing, although that could lead to older entries collecting a large number of related items without me realising.

    One thing that could be really interesting is generating some kind of tree or graph of relationships between entries - it could even lead to auto-forming categories when relationships form sub-graphs of the overall network. Something like that would rely on accurate relationships data so would definitely benefit from human validation of the relationships.

    Simon Willison - 28th April 2003 22:54 - #

Comments are closed.

Previously hosted at http://simon.incutio.com/archive/2003/04/28/ongoingOnUnicode

A django site