.Net Regex – Matching Mixed Balanced Parentheses

Yesterday I got thinking about matching different types balanced parentheses using .net regular expression. I assumed it was similar to matching quotes or a single kind of parentheses, but soon realized it isn’t quite that simple. The main problem is looking into the stack – when I see a closed curly brace, for example, how can I tell the top of the stack has an open one?

A quick search brought up Linguistic Forms’ Regex Balancing Group in Depth, which has a clever approach to the problem in the best use I’ve seen yet to balancing groups, and taught me a few new tricks. I’d recommend reading it. The tl;dr version is this: when I see a closed bracket I look-behind my current position using whatever was matched since the last open bracket, revisit it, and check it again. So, if I see I had a { (by matching it again), I know I should now match a }.

While reading the post I had an interesting idea – who says I must push what I matched to the stack? What if wanted to push an arbitrary string to the stack? When I see an open bracket I really want to push a closing bracket – but how can I?
The trick is to use a look-ahead: find the next occurrence of the character I want to push, and push it:

{(?=.*?(<Stack>}))

Next, when I want to match a closing brace, I already have the right one in stack.
Using this approach, here’s a regex that matches tokens with matching balanced parentheses of 3 different typed:

(
    [^(){}\[\]]+
    | \( (?=[^)]*  (?<Stack> \) ) )
    | \[ (?=[^\]]* (?<Stack> \] ) )
    | \{ (?=[^}]*  (?<Stack> \} ) )
    | \k<Stack> (?<-Stack>)
)+?
(?(Stack) (?!))

Of course, this approach does have limitations – you may not find the character you’d like to push (which might be a good thing – it allows you to fail early). It also gets much trickier if you want to balance something more complicated than constant strings, but that’s another subject.

Happy matching.

Source Code

Source code and test cases can be found on GitHub: https://github.com/kobi/RecreationalRegex

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s