Using .Net Regex Balancing Groups to Match Words in Fibonacci Lengths

\b
(?<A>\w)+\s+
(?<A>\w)+\s+
(?<-A>\w)+
(?(A)(?!))
\b

The first example captures three words. Besides spaces and word boundaries, the interesting bits read:

  • (?<A>\w)+ – Capture the first and second word. Push each letter to the A stack. A note here is the it might have been more correct to write (?:\w(?<A>))+, so I don’t push a value I don’t use to the stack, but I think the (?<A>\w) is clearer. I could have also compressed it to ((?<A>\w)+\s+){2}, but then it would have been even less readable.
  • (?<-A>\w)+ – On the third word, this will only match as many letters as matched before. An alternative syntax here would be (\w(?<-A>))+.
  • (?(A)(?!)) – Fail the match if there are still letters in the A stack. This one is a must, in case the third word is shorter than the first two.
  • \b – Last but the least, I need to make sure I match at word boundaries.

Note that In the above regex I don’t care if the second word is longer or of equal length to the first. If this is a problem for you, it can be checked using the following pattern. I won’t explain it too much, but the idea here is to push two letters to the B stack for each letter you remove from the A stack:

\b
(?<A>\w)+   # First word
\s+
(?<B-A>(?<B>)\w)+  # Second word - consume all A's, push an extra B
(?(A)(?!))   # Make sure A is finished
(?<B>\w)*    # Add more B's (or none)
\s+
(?<-B>\w)+ # Third word
(?(B)(?!))
\b

Finally, here’s a regex to capture a sequence of words, where the length of each word is equal to the sum of the lengths of the two previous words:

\b
(?:
    (?<A>\w)+   # First word - push each letter
    \s+
    (?=
        (?<A>\w)+  # Second word - push each letter
        \s+
        (?<-A>\w)+ # Third word - pop letters
        (?(A)(?!))
        \b
    )    #Look Ahead
)+
\w+\s+\w+\b #Capture last two words (already looked at them)

This might look a bit strange: I match a single word at a time, and use the look-ahead to check the following two words. The result is a sequence of words whose lengths make a Fibonacci series.

Source Code

Source code and test cases can be found on GitHub: https://github.com/kobi/RecreationalRegex

See Also:
Regex Balancing Group in Depth – An excellent introduction to balancing groups and the basic concepts and tools used in .Net regular expressions.

Advertisement