Simon Willison’s Weblog

Items tagged unicode in 2007

Filters: Year: 2007 × unicode ×

Sam Ruby: Ruby 1.9 Strings—Updated. A follow up to yesterday’s post: Sam’s principle complaints about Ruby 1.9’s character encoding support were down to a bug which has now been fixed. # 29th December 2007, 7:34 pm

I definitely like Python 3K’s Unicode support better [...] In fact, I think I prefer Ruby 1.8’s non-support for Unicode over Ruby 1.9’s “support”. The problem is one that is all to familiar to Python programmers. You can have a fully unit tested library and have somebody pass you a bad string, and you will fall over.

Sam Ruby # 28th December 2007, 7:05 pm

Ruby 1.9—Right for You? Dave Thomas on the just-released Ruby 1.9. It’s a development release that breaks backwards compatibility in a few minor ways, but new features include the YARV virtual machine (hence significant speed improvements) and unicode support via associating encodings with bytestrings. # 26th December 2007, 12:09 pm

Unicode code converter (via) Richard Ishida’s tool for converting pretty much any unicode representation to any other. # 28th October 2007, 6:26 pm

String types in Python 3. bytes are now immutable (just like the bytestrings they are replacing) and a new mutable buffer type has been introduced. # 9th October 2007, 2:08 am

The larger question is why on earth, in 2007 and ten years after XML came out, we are still using text files that don’t label their encoding?

Rick Jelliffe # 8th October 2007, 12:27 pm

Sam Ruby: 2to3. Sam’s report on an attempt to port the Universal Feed Parser to Python 3.0. The 2to3 tool does most of the work, but it seems the unicode changes can be pretty tricky. # 3rd September 2007, 1:38 am

Announcing Babel. Impressive new Python i18n / l10n package, with improved message extraction and a huge amount of bundled locale data. # 20th July 2007, 12:20 pm

UnicodeBranch: Porting Applications. A checklist for porting Django applications to handle the new unicode changes. If your application only handles ASCII text at the moment you shouldn’t have to change a thing. # 4th July 2007, 2:41 pm

Unicode data in Django. Documentation for Django’s new unicode support. # 4th July 2007, 2:24 pm

Django changeset 5609. “Merged Unicode branch into trunk. This should be fully backwards compatible for all practical purposes.” # 4th July 2007, 2:22 pm

HTML Entity Character Lookup. Look up HTML entities by characters that are a similar shape. # 3rd July 2007, 3:41 pm

Django unicode-branch: testers wanted. Malcolm’s outstanding work on the unicode branch appears to be nearing completion. # 24th May 2007, 11:46 pm