Simon Willison’s Weblog

6 items tagged “regex”

Introduction to Surlex. A neat drop-in alternative for Django’s regular expression based URL parsing, providing simpler syntax for common path patterns. # 11th April 2010, 7:23 pm

RE2: a principled approach to regular expression matching. Google have open sourced RE2, the C++ regular expression library they developed for Google Code Search, Sawzall, Bigtable and other internal projects. Unlike PCRE it avoids the potential for exponential run time and unbounded stack usage and guarantees that searches complete in linear time, mainly by dropping support for back references. # 12th March 2010, 9:28 am

Request Routing With URI Templates in Node.JS. I quite like this approach (though the implementation is a bit “this” heavy for my taste). JavaScript has no equivalent to Python’s raw strings, so regular expression based routing ala Django ends up being a bit uglier in JavaScript. URI template syntax is more appealing. # 24th November 2009, 9:06 am

Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide.

Andrew Clover # 16th November 2009, 10:32 am

Python gems of my own (via) Did you know you can pass 128 as a flag to Python’s re.compile() function to spit out a parse tree? I didn’t. re.compile(“pattern”, 128) # 3rd November 2008, 11:59 am

Primality regex. A regular expression that can identify prime numbers. Unsurprisingly, this one comes from the Perl community. # 18th March 2007, 1:17 am