.Net Regex – Matching Mixed Balanced Parentheses
Yesterday I got thinking about matching different types balanced parentheses using .net regular expression. I assumed it was similar to matching quotes or a single kind of parentheses, but soon realized it isn’t quite that simple. The main problem is looking into the stack – when I see a closed curly brace, for example, how can I tell the top of the stack has an open one?
A quick search brought up Linguistic Forms’ Regex Balancing Group in Depth, which has a clever approach to the problem in the best use I’ve seen yet to balancing groups, and taught me a few new tricks. I’d recommend reading it. The tl;dr version is this: when I see a closed bracket I look-behind my current position using whatever was matched since the last open bracket, revisit it, and check it again. So, if I see I had a { (by matching it again), I know I should now match a }.
While reading the post I had an interesting idea – who says I must push what I matched to the stack? What if wanted to push an arbitrary string to the stack? When I see an open bracket I really want to push a closing bracket – but how can I?
The trick is to use a look-ahead: find the next occurrence of the character I want to push, and push it:
{(?=.*?(<Stack>}))
Next, when I want to match a closing brace, I already have the right one in stack.
Using this approach, here’s a regex that matches tokens with matching balanced parentheses of 3 different typed:
(
[^(){}\[\]]+
| \( (?=[^)]* (?<Stack> \) ) )
| \[ (?=[^\]]* (?<Stack> \] ) )
| \{ (?=[^}]* (?<Stack> \} ) )
| \k<Stack> (?<-Stack>)
)+?
(?(Stack) (?!))
Of course, this approach does have limitations – you may not find the character you’d like to push (which might be a good thing – it allows you to fail early). It also gets much trickier if you want to balance something more complicated constant strings, but that’s another subject.
Here’s a sample program on ideone: http://ideone.com/WaXfZ
Happy matching.