Using .Net Regex Balancing Groups to Match Words in Fibonacci Lengths

\b
(?<A>\w)+\s+
(?<A>\w)+\s+
(?<-A>\w)+
(?(A)(?!))
\b

The first example captures three words. Besides spaces and word boundaries, the interesting bits read:

  • (?<A>\w)+ – Capture the first and second word. Push each letter to the A stack. A note here is the it might have been more correct to write (?:\w(?<A>))+, so I don’t push a value I don’t use to the stack, but I think the (?<A>\w) is clearer. I could have also compressed it to ((?<A>\w)+\s+){2}, but then it would have been even less readable.
  • (?<-A>\w)+ – On the third word, this will only match as many letters as matched before. An alternative syntax here would be (\w(?<-A>))+.
  • (?(A)(?!)) – Fail the match if there are still letters in the A stack. This one is a must, in case the third word is shorter than the first two.
  • \b – Last but the least, I need to make sure I match at word boundaries.

Note that In the above regex I don’t care if the second word is longer or of equal length to the first. If this is a problem for you, it can be checked using the following pattern. I won’t explain it too much, but the idea here is to push two letters to the B stack for each letter you remove from the A stack:

\b
(?<A>\w)+   # First word
\s+
(?<B-A>(?<B>)\w)+  # Second word - consume all A's, push an extra B
(?(A)(?!))   # Make sure A is finished
(?<B>\w)*    # Add more B's (or none)
\s+
(?<-B>\w)+ # Third word
(?(B)(?!))
\b

Finally, here’s a regex to capture a sequence of words, where the length of each word is equal to the sum of the lengths of the two previous words:

\b
(?:
    (?<A>\w)+   # First word - push each letter
    \s+
    (?=
        (?<A>\w)+  # Second word - push each letter
        \s+
        (?<-A>\w)+ # Third word - pop letters
        (?(A)(?!))
        \b
    )    #Look Ahead
)+
\w+\s+\w+\b #Capture last two words (already looked at them)

This might look a bit strange: I match a single word at a time, and use the look-ahead to check the following two words. The result is a sequence of words whose lengths make a Fibonacci series.

Source Code

Source code and test cases can be found on GitHub: https://github.com/kobi/RecreationalRegex

See Also:
Regex Balancing Group in Depth – An excellent introduction to balancing groups and the basic concepts and tools used in .Net regular expressions.

Advertisements

One thought on “Using .Net Regex Balancing Groups to Match Words in Fibonacci Lengths

  1. For completeness, it is possible to compress that last regex to

    \b
    (?:
        (?=
            (?:(?<A>\w)+\s+){2}
            (?<-A>\w)+
            (?(A)(?!))
            \b
        )    #Look Ahead
        \w+
        \s+
    )+
    \w+\s+\w+\b
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s