We have some rectangles:
We also have a bigger, empty rectangle:
The goal is to fill the whole empty rectangle with some of the smaller rectangles, and to do it using a .Net regular expression (obviously).
We may use each rectangle only once, and we are allowed to rotate a rectangle as needed.
For example, these are valid solutions:
But this one isn’t:
We’ll represent the rectangles as alphanumeric blocks, and the empty rectangle as a block of tildes (~). There is an empty line between the rectangles:
I chose this representation because it is relatively simple, and it makes the pattern easier to follow:
~ is part of the target rectangle,
\w is a small rectangle.
An important comment regarding solution representation
There are many ways of representing a solution, and this affects the way our algorithm is written and works.
Consider this solution again:
Either of these work:
(I’m using roman numeral for the small rectangles: I, II, III, and IV)
- (not this) Rectangle number II goes on position (1, 1), rectangle number III goes on position (3,1), and rectangle number IV goes on position (1,2). (we will not be solving it like this though)
- (not this) Rectangle I is not used. Rectangle II is used first, Rectangle III is second, and Rectangle IV is third. (we will not be solving it like this either)
- First use Rectangle II, then Rectangle III, and then Rectangle IV. (this is the one we’ll use)
This is a little nuanced. The third option works best here because it fits backtracking very well.
When I say “next free position” I mean in character order: we go over all tildes from left to right, line by line, until we find one that isn’t already occupied.
Using this representation, the solution above can be written as
(II, III, IV).
(?=(?<NextPos>[^~]*)) # \k<NextPos> always matches the position *before* the next free tilde.
(?<IndexLength>.)+? # Match n characters. We will use rectangle number n in position <Index>.
# 0 is not an option - we need some shape to be first, second, third, etc.
(?<=\A(?=(?<Index>(?<-IndexLength>.)+)).*) # Capture into <Index> the first n characters.
(?<=\A(?<CurrentIndex>\k<Index>).*) # Copy Index into CurrentIndex
(?=.*\Z # Ensure <Index> is unique. We cannot use the same rectangle twice
(?(IndexLength)(?!)|) # Not needed, just an assert.
#Copy the shape of rectangle <Index> to the target area.
#Find rectangle number <Index>
(?<=(?=.(?<IndexLength>.)*?(?<=\A\k<Index>))\A.*) # Populate <IndexLength> again.
# ^-- we are skipping over one character. We want to reach rectangle number <IndexLength>,
# so we're skiping over <IndexLength>-1 rectangles
(?<=(?=\s*(?<-IndexLength>(?:\w+\r?\n)+\r?\n)*(?<=(?<RectangleStart>.*))\w)\A.*) # Capture starting position of this rectangle.
(?<Rotate>) # Capture 0 characters into <Rotate>. Indicates that we are not rotating this rectangle.
(?<Rotate>.) # Capture 1 character into <Rotate>. Indicates that we are rotating this rectangle.
(?<TempRotate>) # Also mark <TempRotate>. This is a flag that is cleared for each rectangle. Allows conditional matching.
(?<=(?=\k<NextPos>(?<=(?<X>~*)))\A.*) # Init <X> with the number of tildes to the left of the starting position.
# Basically `(?:\w+\n)+\n` to match the whole rectangle.
(?<=(?=\k<RectangleStart> # Go to the position before the first character in the current rectangle.
(?<Rectangle> # Capture this rectangle, just so we have it while printing the solution.
\r?\n # Match until we reach the end of this rectangle.
(?<=(?=[^~]+(?<Width>(?<-TempWidth>~)+))\A.*)(?(TempWidth)(?!)) # Capture as many tildes as the current width.
(?=(?<-TempHeight>\S*(?<Height>\r?\n))+)(?(TempHeight)(?!)) # Capture newlines into stack <Height>.
(?(TempRotate)(?<-TempRotate>)) # Clear <TempRotate>.
# Try to fit the rectangle into empty positions in the target rectangle.
(?: # Match tildes
(?<=(?<Filled>\A.*)) # Push the current position to <Filled>
(?<=(?<TempCurrentFilled>\A.*)) # Also push the current position to <TempCurrentFilled>
(?=.*\Z # Ensure <Filled> is unique. No overlap between rectangles.
(?<=^\k<X>\k<Width>) # Match exactly <Width> tildes.
~*(?<-Height>\k<Height>\k<X>|\r?\n) # Match until the same position on the net line (or just the last line).
# Find the next free position - .
(?<=(?<NextPos>\A.*)) # <NextPos> is the position before the next free tilde.
(?<=(?<TempNextPos>\A.*)) # We compare it to <Filled>, which is the position including the tilde.
.*\r?\n\Z (?<Done>) # If we cannot find more an empty position it means we are done. Set the <Done> flag.
At its heart, the pattern works like this:
This is fairly simple, and is the main trick used here.
The pattern matches characters from the beginning of the string. It doesn’t matter what it matches, just how many characters and how many times. The group
IndexLength is captured multiple times – as many as the number of rectangles used in the solution. Its length indicates which rectangle is used. In our example, when the solution is
(II, III, IV), the group will have three captures with lengths 2, 3, and 4:
A few more examples:
(II, III, IV)
(.+)+ pattern usually leads to catastrophic backtracking – but here it allows us to do an exhaustive search: we check all possible orders of the rectangles. To avoid catastrophe we have a large number of asserts that fail as quickly as possible when a match cannot be found.
Tricks and Comments
Getting the Solution
The result is a single
Match. The match is successful when there is a solution.
This is already pretty good.
We are also capturing these groups:
|As shown in the table above, for each rectangle used in the solution there’s a capture of
IndexLength. The length of the group indicates the index of the rectangle.
|Captures the rectangles in the order they were used.
|For each rectangle – captures one character if the rectangle is rotated, and an empty capture if it isn’t rotated.
|Captures the position of each rectangle in the solution, on the target rectangle. The position before the tilde where the rectangle should be placed.
|Contains each character as used in the order they were iterated. In our example with solution (I, II) the captures of
SolutionChar would be
|Contains all tildes in the target rectangle in the order they were filled. This is the same order as
SolutionChar, and we can use both groups to paint the solution if we wanted.
Source code and test cases can be found on GitHub: https://github.com/kobi/RecreationalRegex
Rectangles were drawn in HTML/CSS using JS Bin: http://jsbin.com/bafaxijiqa/edit?html,css,output