Solving mazes using regular expressions

Can the .Net regex engine find its way through a maze?

A maze?

We have a text based maze. We can walk on spaces, and need to find a path from the start point (S) to the end point (E). Anything else in a wall.

Examples:

██████████
█S█      █
█ █ █ █ ██
█   █ █ E█
██████████
███████████████████████████S█████████████
█         █ █           █ █   █     █   █
█ ███ ███ █ █ ███████ █ █ ███ █ ███ ███ █
█ █ █ █     █ █   █   █ █   █   █ █ █   █
█ █ █ █████ █ █ ███ ███ ███ █████ █ █ ███
█ █ █     █ █   █   █   █     █ █   █   █
█ █ █████ █████ █ █████ █ ███ █ █ █████ █
█ █     █     █ █     █   █   █ █       █
█ █████ █████ █ █████ █████ ███ ███████ █
█       █   █ █   █ █ █           █   █ █
█████ ███ █ █ ███ █ █ █████████ ███ █ █ █
█ █   █   █     █   █     █     █   █   █
█ █ ███ ███████████ █████ █ █████ ███████
█ █   █           █ █   █ █   █ █ █     █
█ ███ ███████████ █ ███ █ ███ █ █ █████ █
█   █     █ █     █     █   █   █ █     █
█ █ █████ █ █ █████████ ███ █████ █ █████
█ █   █   █ █   █       █   █     █     █
█ █ ███ ███ ███ █ ███████ ███ █████████ █
█ █           █         █               █
█████████████████████E███████████████████
█S█
█ ███████
█       █
███████ █
      █ █
      █E█

Strategy

We are going to implement the following algorithm for solving a maze:

  1. Find the start point.
  2. Pick a new position to go to. We have four choices: right, left, down, or up.
    The new position is invalid if it crosses our current path through the maze (the path cannot have loops).
  3. If we reached the End, we are done.
  4. If we cannot advance, undo the last choice and make a new one.
  5. Go to step 2.

As you may have noticed, this is backtracking: we’re making choices and reverting them when we know they were wrong.

First, before we proceed, you need to know what Balancing Groups are. I’ll wait.

The Regex:

(?=(?<Pos>.*?(?<Step>S)))
\A
(?:
    # Find next position
    (?:
        (?<End>)
        (?<=               # Stop when Pos reached E.
            \A(?=\k<Pos>(?<=E))    
        .*)
        |
        (?<Right>)
        (?<=                
            \A              # Go back to start.
            (?=\k<Pos>      # Go to the current position.
                (?<Step>[ E])
                (?<=(?<NewPos>\A.*))   # Set new Position.
            )    
        .*)
        |
        (?<Left>)
        (?<=
            \A(?=\k<Pos>(?<=
                (?<Step>[ E])
                (?<=(?<NewPos>\A.*))   # Set new Position
                .                   # One step before the current POS
            ))    
        .*)
        |
        (?<Down>)
        (?<=
            \A(?=\k<Pos>
                (?>
                (?-s:                   # Dot should not match new lines
                    (?<=^(?<X>.)*.)       # Count characters to the left. -1 caracter for current position.
                    .*\n                  # Go to end of this line
                    (?<-X>.)*(?(X)(?!))   # Find same position as above character
                )
                )
                (?<Step>[ E])
                (?<=(?<NewPos>\A.*))   # Set new Position
            )    
        .*)
        |
        (?<Up>)
        (?<=
            \A(?=\k<Pos>
                (?<=        # Lookbehind - read pattern from bottom to top: http://stackoverflow.com/q/13389560/7586
                    (?-s:                   # Dot does not match new lines
                        (?=
                            (?>(?<-X>.)*(?(X)(?!)))  # Find same position as lower character
                            (?<Step>[ E])
                            (?s:(?<=(?<NewPos>\A.*)))   # Set new Position
                        )
                        (?>
                        ^.*\n              # Go to start of the previous line
                        ^(?<X>.)*.         # Count characters to the left. -1 caracter for current position.
                        )
                    )
                )           # End of Lookbehind
            )    
        .*)
    )

. # progress the match.

    # Check the new position is unique - it wasn't matched before.
    # None of the other positions are equal to the new postion.
    (?(End)|
        (?!(?!      # Rollback all changes after matching (Reset Pos)
            (?<=\A
                (?:
                    (?<=\A
                        (?=
                            \k<Pos>             # Move to Pos
                            (?<!\k<NewPos>)     # Check the Pos is different from NewPos
                            (?<-Pos>)           # Pop Pos.
                        )
                    .*)
                .)*
                . # one extra for the new postion
            )
        ))

        # Push NewPos to Pos
        (?<=\A
            (?=
                (?<Pos>\k<NewPos>) # Set NewPos and Temp to Pos
                (?<-NewPos>)       # Pop NewPos. No real reason.
            )
        .*)
    )
)+?
(?(End)|(?!))

Comments and tricks

I won’t go too much in detail about the regex, but here are some pointers:

  • Lookbehinds are matched from right to left: Which is why in some cases the pattern starts at the bottom and goes up, so it looks like I’m popping from a stack before I push to it. It works. (lines 47-59, see also my answer on Stack Overflow)
  • Matching in any direction: In general, a Match is a continued sequence of characters, between the string’s start to its end. How can we match in another direction?
    Well, we can’t – not really. But we can fake it. In my regex, the “real” match (Match.Value) just takes one extra character at each step. However, I have a group called Pos representing the current position in the maze. At each step we capture another Pos relative to the previous one. So we have two positions: the one that the regex engine matches (line 64), and the virtual one, navigating the maze.
  • Lookahead inside lookbehind: (?<=\A(?=\k<Pos>...).*) – Normally, lookbehinds end at the correct position. In some regex engines lookbehind is implemented by truncating the string and looking for a match until the end, but in .net it is implemented using a right-to-left match. We can use a lookahead to see through the current position.
  • No backtracking into zero length assertions:
    Consider the pattern: (?=.|(?<A>..)) (?(A)|(?!))
    I’d expect the pattern to match, but it doesn’t – once the engine matches the lookahead it doesn’t backtrack into it again. With this limitation (optimization, really), A will never capture, so the pattern can never match.
    As an alternative, we can backtrack while still inside the lookahead:
    (?= (?:.|(?<A>..))(?(A)|(?!)) )
    or place the alternation between two lookaheads:
    (?: (?=.) | (?=(?<A>..)) ) (?(A)|(?!))
    In my regex I use the latter; this does mean I have to duplicate more code than I’d like, but at least it is working.
  • Quick backtrack: Two nice tricks from Stack Overflow: “vertical” regex matching in an ASCII “image”, both from M. Buettner’s excellent answer:
    • (?!(?!)()) – Used to add two dummy groups, but not capture them.
    • Use of negative lookaround to reset captures – Negative lookarounds do not change the captured state, so we can pop an whole stack in a negative lookaround, and it will reset once the negative lookaround is not match.

    Combining these two, we can wrap a positive lookahead in `(?!(?!(?= … )))`, resulting in a positive lookahead that resets the capture state. This trick saves the need of saving the whole stack to another stack, and then pushing it back.
    I probably could have used a lookbehind, but it is more complex (at least, I couldn’t think of a straightforward way).

Getting the result

Regex MazeSolverRegex = new Regex(mazeSolverPattern,
  RegexOptions.Singleline | RegexOptions.Multiline |
  RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled);

Match path = MazeSolverRegex.Match(maze);

The solution to the maze is contained in a single Match.

We can get the result in two ways: First, there is the Step group. The group has a Capture for each step along the way, with its position (the index in the string).
Next, as you may have noticed, are four groups: Right, Left, Down, and Up. Normally, this is pretty useless, because it is impossible to align groups correctly (we know we went left 5 times and up 2 time, but when?). However, here it is possible because each Capture happens in a different index. We can use the indices to get the directions from S to E:

private static Dictionary<string, string> Directions =
    new Dictionary<string, string>{{ "Left", "<" }, { "Right", ">" },
                                   { "Up", "↑" },   { "Down", "↓" },
                                   { "End", "E" }};

public static string ShowDirections(this Match mazeMatch)
{
    if (!mazeMatch.Success) return "No solution.";

    var chars = from groupName in Directions.Keys
                let letter = Directions[groupName]
                let regexGroup = mazeMatch.Groups[groupName]
                from capture in regexGroup.Captures.Cast<Capture>()
                orderby capture.Index
                select letter;
    return String.Concat(chars);
}

For the first example maze this results in "↓↓>>↑↑>>>>↓↓>E", which I thought was pretty neat.

Adding this to the Step group, we can see the path through the maze:

public static string ShowPath(this Match mazeMatch)
{
    if (!mazeMatch.Success) return "No solution.";
    // get the input string from the Match: https://goo.gl/T5e8B
    StringBuilder maze = new StringBuilder(mazeMatch.Result("$_"));
    var stepDirections = new Queue<char>(mazeMatch.ShowDirections());
    foreach (Capture step in mazeMatch.Groups["Step"].Captures)
    {
        maze[step.Index] = stepDirections.Dequeue();
    }
    return maze.ToString();
}

Example solutions:

██████████
█↓█>>>>↓ █
█↓█↑█ █↓██
█>>↑█ █>E█
██████████
███████████████████████████↓█████████████
█         █ █↓<<<<<<<<  █ █>>↓█>>>>↓█   █
█ ███ ███ █ █↓███████↑█ █ ███↓█↑███↓███ █
█ █ █ █     █↓█   █>>↑█ █   █>>↑█ █↓█   █
█ █ █ █████ █↓█ ███↑███ ███ █████ █↓█ ███
█ █ █     █ █>>↓█>>↑█   █     █ █↓<<█   █
█ █ █████ █████↓█↑█████ █ ███ █ █↓█████ █
█ █     █     █↓█↑<<<<█   █   █ █>>>>>>↓█
█ █████ █████ █↓█████↑█████ ███ ███████↓█
█       █   █ █>>↓█ █↑█           █↓<<█↓█
█████ ███ █ █ ███↓█ █↑█████████ ███↓█↑█↓█
█ █   █   █     █>>↓█↑<<<<█     █↓<<█↑<<█
█ █ ███ ███████████↓█████↑█ █████↓███████
█ █   █           █↓█   █↑█   █ █↓█     █
█ ███ ███████████ █↓███ █↑███ █ █↓█████ █
█   █     █ █     █>>>>↓█↑<<█   █↓█     █
█ █ █████ █ █ █████████↓███↑█████↓█ █████
█ █   █   █ █   █↓<<<<<<█>>↑█↓<<<<█     █
█ █ ███ ███ ███ █↓███████↑███↓█████████ █
█ █           █  >>>>↓  █↑<<<<          █
█████████████████████E███████████████████
█↓█
█↓███████
█>>>>>>↓█
███████↓█
      █↓█
      █E█

Backtracking can be stupid

In lines 66 – 82 – why do we check that the path does not cross itself?

Consider the maze "....E..S" (dots for spaces). If we allowed the path to cross itself, what would happen?
The algorithm prefers to go to the right, so the path would be: <<><><< – left, (left, right), (left, right), left, left. Not great.
This is how backtracking works: it only reverts a decision when it ought to – which only happens when we reach the end of the string (the string’s length is the limit on the length of the path). This is why this step is so important – it keeps the engine from running in circles.

Performance

The thing is pretty slow on large mazes. Seriously, what did you expect? After all, there many inefficient string operations. Still, I tried to write the lookarounds in a way that looked quicker (I aimed for less failed attempts), and used a few possessive groups.

A special problem for my regex are mazes with rooms, or thick corridors, for example:

+---+-----------------+ 
| S |                 |
|   |               --+
|       |   |    E    |
+-------+---+---------+

This is much more complicated than a single-space maze, because there are dozens of paths you can take – it is much more difficult for the engine to eliminate incorrect paths.

Source Code

Source code and test cases can be found on GitHub: https://github.com/kobi/RecreationalRegex

2 thoughts on “Solving mazes using regular expressions

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.