Simon Willison’s Weblog

Subscribe

12 items tagged “regularexpressions”

2022

Why I invented “dash encoding”, a new encoding scheme for URL paths

Datasette now includes its own custom string encoding scheme, which I’ve called dash encoding. I really didn’t want to have to invent something new here, but unfortunately I think this is the best solution to my very particular problem. Some notes on how dash encoding works and why I created it.

[... 1392 words]

2020

The unexpected Google wide domain check bypass (via) Fantastic story of discovering a devious security vulnerability in a bunch of Google products stemming from a single exploitable regular expression in the Google closure JavaScript library. # 9th March 2020, 11:27 pm

2019

Details of the Cloudflare outage on July 2, 2019 (via) Best retrospective I’ve read in a long time. The outage was caused by a backtracking regex rule that was added to the Web Application Firewall project, which rolls out globally and skips most of Cloudflare’s regular graduar rollout process (delightfully animal themed, named DOG for the dogfooding PoP that their employees use, PIG for the Guinea Pig PoPs reserved for free customers, then Canary for the final step) so that they can deploy counter-measures to newly discovered vulnerabilities as quickly as possible—but the real value in the retro is that it provides an extremely deep insight into how Cloudflare organize, test and manage their changes. Really interesting stuff. # 12th July 2019, 5:36 pm

2018

r1chardj0n3s/parse: Parse strings using a specification based on the Python format() syntax. (via) Really neat API design: parse() behaves almost exactly in the opposite way to Python’s built-in format(), so you can use format strings as an alternative to regular expressions for extracting specific data from a string. # 25th February 2018, 4:58 pm

2017

A Regular Expression Matcher: Code by Rob Pike, Exegesis by Brian Kernighan (via) Delightfully clear and succinct 30-line C implementation of a regular expression matcher that supports $, ^, . and * operations. # 5th December 2017, 6:36 pm

2010

Escaping regular expression characters in JavaScript (updated). The JavaScript regular expression meta-character escaping code I posted back in 2006 has some serious flaws—I’ve just posted an update to the original post. # 4th July 2010, 6:23 pm

2009

Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide.

Andrew Clover # 16th November 2009, 10:32 am

Django security updates released. A potential denial of service vulnerability has been discovered in the regular expressions used by Django form library’s EmailField and URLField—a malicious input could trigger a pathological performance. Patches (and patched releases) for Django 1.1 and Django 1.0 have been published. # 10th October 2009, 12:24 am

Introducing Yardbird. I absolutely love it—an IRC bot built on top of Twisted that passes incoming messages off to Django code running in a separate thread. Requests and Response objects are used to represent incoming and outgoing messages, and Django’s regex-based URL routing is used to dispatch messages to different handling functions based on their content. # 22nd May 2009, 11:13 pm

2006

Escaping regular expression characters in JavaScript

JavaScript’s support for regular expressions is generally pretty good, but there is one notable omission: an escaping mechanism for literal strings. Say for example you need to create a regular expression that removes a specific string from the end of a string. If you know the string you want to remove when you write the script this is easy:

[... 362 words]

2003

“sexeger”[::-1]

Via Ned Batchelder, an article on Reversing Regular Expressions from Perl.com. Otherwise known as Sexeger, these offer a performance boost over normal regular expressions for certain tasks. The basic idea is pretty simple: searching backwards through a string using a regular expression can be a messy business, but by reversing both the string and the expression, running it, then reversing the result far better performance can be achieved (reversing a string is a relatively inexpensive operation). The example code is in Perl, but I couldn’t resist trying it in Python. The challenge is to find the last number occurring in a string.

[... 384 words]

Verbose Regular Expressions

Ned Batchelder describes Verbose Python regular expressions. This is one of the things I’ve known about (as in known that they exist) for ages but have never got around to using. I’ve been working with some pretty heavy regular expressions recently that could really do with the clarity of being defined in verbose format with comments.

[... 96 words]