Of course, the title is a joke... but only kind of. RegEx is a perfect testament to the saying "with great power comes great responsibility." It's extremely easy to misuse and can create hard-to-spot bugs in your code. This is exactly what I faced when developing some backend logic for Tondova.
Here is the faulty code. Can you figure out what's wrong?
const regexp =
/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/gi
function isUUID(id: string) {
return regexp.test(id);
}
This code aims to test whether a particular id
matches the structure of a UUID. UUIDs are essentially
128-bit numbers, but are often represented as hexadecimal strings (with hyphens for readability).
The bug?
It's the g
modifier at the end! Modifiers alter how an input string is matched against the regular expression. The g
or "global" modifier allows us to match
all occurrences of a pattern within a string. A simple example would be to get all the phone numbers in a piece of text.
const text = "...reach me at 524-624-2721 or 651-211-9258..."
const exp = /[0-9]{3}-[0-9]{3}-[0-9]{4}/g
text.matchAll(exp)
This expression would match both the phone number 524-624-2721 and 651-211-9258 (these numbers are made up).
But how does it work?
By adding the global modifier, we actually turn our regex stateful! It will keep track of a lastIndex
property that points to the position just after
the last match. So, to find all occurences of a pattern, for example, we simply repeatedly call the regular expression on our string until there are no matches left.
The expression will keep moving forward past the previous match until it cannot find any more occurences of the pattern.
So what was wrong with the first code snippet? Let's go step by step:
- We define the regular expression as a global variable (not within the function).
- This means that it will not be reinitialized every function call. However, it also means its
lastIndex
property is maintained across function calls.
- This means that it will not be reinitialized every function call. However, it also means its
- Let's call the function on a valid UUID. The expression matches the string, moves the
lastIndex
property to index 36 (the index right after the match), and returns true. - Now, if we call the function again on a valid UUID, the regular expression starts looking after index 36, finds nothing, and returns false.
In other words, the combination of defining the regular expression outside of the function and giving it the global modifier made the lastIndex
state propagate
across function calls, resulting in unexpected behavior:
const validId = "ca3b2904-2fd1-4402-a903-e0a5457f4164";
console.log(isUUID(validId)); // true
console.log(isUUID(validId)); // false
Calling the function consecutively on valid UUIDs would give two different answers. The effects of this bug were only realized at a much higher level in the application's logic, which is the main reason why it was so hard to spot.
Solution
The best way to fix this bug is to remove the global modifier. Logically, since we are only doing a regex test, meaning we just want to ensure the string adheres to the pattern and don't care about the actual match itself, the global modifier is actually not useful at all!
I hope you found this as interesting as I did! This was such an easy fix, but so elusive.