.Net Regular Expressions – Using the Stack State to Understand Numeric Values

It is common knowledge that regular expressions should handle text and not values. A recent stack overflow question got me thinking though – it is possible to use .Net regular expressions to understand numbers or other values while matching the pattern?
Regular expressions can be used to perform numerical tasks, but that is usually when working in unary base.
It turns out this is possible – .Net keeps a stack for every capture of every matched group while matching the pattern, and that state is available for use while matching. The idea is simple: we can represent numbers as depth of the stack, so the number 0 is an empty stack, 6 is a stack with 6 captures, and so forth.

(?>
    (?=[0-9])   # optimization - don't multiply when we don't have a digit.
    # multiply the content of the stack by 10
    # for each item on Stack, push 10 items to a Temp stack.
    (?(Decimal)
        (?<-Decimal>
            (?<Temp>){10}
        )
    ){100000}
    (?(Decimal)(?!))
    # Push all items from Temp back to Stack
    (?(Temp)
        (?<-Temp>
            (?<Decimal>)
        )
    ){100000}
    (?(Temp)(?!))
    # match a digit, and push its value to the stack
    (?:
        0                 |
        1 (?<Decimal>)    |
        2 (?<Decimal>){2} |
        3 (?<Decimal>){3} |
        4 (?<Decimal>){4} |
        5 (?<Decimal>){5} |
        6 (?<Decimal>){6} |
        7 (?<Decimal>){7} |
        8 (?<Decimal>){8} |
        9 (?<Decimal>){9}
    )
)+

The idea is very simple: when we see a new digit, we multiply the depth of the stack by 10, and add the number represented by the new digit. The value of the number can be verified using:

match.Groups["Decimal"].Captures.Count

A curious bit here is the use of the loop to copy stacks:

(?(Temp)
    (?<-Temp>
        (?<Decimal>)
    )
){100000}

I’d expect this to be enough:

(?<-Temp> (?<Decimal>) )*
(?(Temp)(?!))

It turns out the above loop is only executed once, and the condition always fails. It is probably a documented optimization, I’ll look more into that later. As a proof of concept, the workaround should do.

It is even possible to perform basic arithmetic operations on these stacks such as adding, subtracting, multiplying and such from within the regex engine, but that may be a few extra steps too many.
It should go without saying, of course, that regex isn’t a good option here – this is for recreational use. The run time and complexity are far from ideal.

See also:

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s