Simon Willison’s Weblog

Subscribe

Escaping regular expression characters in JavaScript

20th January 2006

JavaScript’s support for regular expressions is generally pretty good, but there is one notable omission: an escaping mechanism for literal strings. Say for example you need to create a regular expression that removes a specific string from the end of a string. If you know the string you want to remove when you write the script this is easy:


var newString = oldString.replace(/Remove from end$/, '');

But what if the string to be removed comes from a variable? You’ll need to construct a regular expression from the variable, using the RegExp constructor function:


var re = new RegExp(stringToRemove + '$');
var newString = oldString.replace(re, '');

But what if the string you want to remove may contain regular expression metacharacters—characters like $ or . that affect the behaviour of the expression? Languages such as Python provide functions for escaping these characters (see re.escape); with JavaScript you have to write your own.

Here’s mine:


RegExp.escape = function(text) {
  if (!arguments.callee.sRE) {
    var specials = [
      '/', '.', '*', '+', '?', '|',
      '(', ')', '[', ']', '{', '}', '\\'
    ];
    arguments.callee.sRE = new RegExp(
      '(\\' + specials.join('|\\') + ')', 'g'
    );
  }
  return text.replace(arguments.callee.sRE, '\\$1');
}

This deals with another common problem in JavaScript: compiling a regular expression once (rather than every time you use it) while keeping it local to a function. argmuments.callee inside a function always refers to the function itself, and since JavaScript functions are objects you can store properties on them. In this case, the first time the function is run it compiles a regular expression and stashes it in the sRE property. On subsequent calls the pre-compiled expression can be reused.

In the above snippet I’ve added my function as a property of the RegExp constructor. There’s no pressing reason to do this other than a desire to keep generic functionality relating to regular expression handling the same place. If you rename the function it will still work as expected, since the use of arguments.callee eliminates any coupling between the function definition and the rest of the code.

This is Escaping regular expression characters in JavaScript by Simon Willison, posted on 20th January 2006.

Next: Can social bookmarking services prevent a bookmark from becoming dead links?

Previous: Happy New Year!

Previously hosted at http://simon.incutio.com/archive/2006/01/20/escape