Using .Net Regex Balancing Groups to Match Words in Fibonacci Lengths

\b
(?<A>\w)+\s+
(?<A>\w)+\s+
(?<-A>\w)+
(?(A)(?!))
\b

The first example captures three words. Besides spaces and word boundaries, the interesting bits read:

  • (?<A>\w)+ – Capture the first and second word. Push each letter to the A stack. A note here is the it might have been more correct to write (?:\w(?<A>))+, so I don’t push a value I don’t use to the stack, but I think the (?<A>\w) is clearer. I could have also compressed it to ((?<A>\w)+\s+){2}, but then it would have been even less readable.
  • (?<-A>\w)+ – On the third word, this will only match as many letters as matched before. An alternative syntax here would be (\w(?<-A>))+.
  • (?(A)(?!)) – Fail the match if there are still letters in the A stack. This one is a must, in case the third word is shorter than the first two.
  • \b – Last but the least, I need to make sure I match at word boundaries.

Note that In the above regex I don’t care if the second word is longer or of equal length to the first. If this is a problem for you, it can be checked using the following pattern. I won’t explain it too much, but the idea here is to push two letters to the B stack for each letter you remove from the A stack:

\b
(?<A>\w)+   # First word
\s+
(?<B-A>(?<B>)\w)+  # Second word - consume all A's, push an extra B
(?(A)(?!))   # Make sure A is finished
(?<B>\w)*    # Add more B's (or none)
\s+
(?<-B>\w)+ # Third word
(?(B)(?!))
\b

Finally, here’s a regex to capture a sequence of words, where the length of each word is equal to the sum of the lengths of the two previous words:

\b
(?:
    (?<A>\w)+   # First word - push each letter
    \s+
    (?=
        (?<A>\w)+  # Second word - push each letter
        \s+
        (?<-A>\w)+ # Third word - pop letters
        (?(A)(?!))
        \b
    )    #Look Ahead
)+
\w+\s+\w+\b #Capture last two words (already looked at them)

This might look a bit strange: I match a single word at a time, and use the look-ahead to check the following two words. The result is a sequence of words whose lengths make a Fibonacci series.

Source Code

Source code and test cases can be found on GitHub: https://github.com/kobi/RecreationalRegex

See Also:
Regex Balancing Group in Depth – An excellent introduction to balancing groups and the basic concepts and tools used in .Net regular expressions.

Advertisement

1 thought on “Using .Net Regex Balancing Groups to Match Words in Fibonacci Lengths

  1. For completeness, it is possible to compress that last regex to

    \b
    (?:
        (?=
            (?:(?<A>\w)+\s+){2}
            (?<-A>\w)+
            (?(A)(?!))
            \b
        )    #Look Ahead
        \w+
        \s+
    )+
    \w+\s+\w+\b
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.