Filling a large rectangle with smaller rectangles using regular expressions

We have some rectangles:

Four rectangles. I: 2x2, II: 2x1, III: 1x1, IV: 3x1

We also have a bigger, empty rectangle:

Empty area, 3x2


The goal is to fill the whole empty rectangle with some of the smaller rectangles, and to do it using a .Net regular expression (obviously).
We may use each rectangle only once, and we are allowed to rotate a rectangle as needed.
For example, these are valid solutions:

Use rectangle I and II rotated

Use rectangle II, III, and IV

But this one isn’t:
Use rectangle I and III


We’ll represent the rectangles as alphanumeric blocks, and the empty rectangle as a block of tildes (~). There is an empty line between the rectangles:






I chose this representation because it is relatively simple, and it makes the pattern easier to follow: ~ is part of the target rectangle, \w is a small rectangle.

An important comment regarding solution representation

There are many ways of representing a solution, and this affects the way our algorithm is written and works.
Consider this solution again:

Use rectangle II, III, and IV

Either of these work:
(I’m using roman numeral for the small rectangles: I, II, III, and IV)

  • (not this) Rectangle number II goes on position (1, 1), rectangle number III goes on position (3,1), and rectangle number IV goes on position (1,2). (we will not be solving it like this though)
  • (not this) Rectangle I is not used. Rectangle II is used first, Rectangle III is second, and Rectangle IV is third. (we will not be solving it like this either)
  • First use Rectangle II, then Rectangle III, and then Rectangle IV. (this is the one we’ll use)

This is a little nuanced. The third option works best here because it fits backtracking very well.
When I say “next free position” I mean in character order: we go over all tildes from left to right, line by line, until we find one that isn’t already occupied.
Using this representation, the solution above can be written as (II, III, IV).

The pattern

(?=(?<NextPos>[^~]*))       # \k<NextPos> always matches the position *before* the next free tilde.
        (?<IndexLength>.)+?             # Match n characters. We will use rectangle number n in position <Index>.
                                        # 0 is not an option - we need some shape to be first, second, third, etc.
        (?<=\A(?=(?<Index>(?<-IndexLength>.)+)).*)    # Capture into <Index> the first n characters.
        (?<=\A(?<CurrentIndex>\k<Index>).*)           # Copy Index into CurrentIndex
        (?=.*\Z                                       # Ensure <Index> is unique. We cannot use the same rectangle twice
        (?(IndexLength)(?!)|)   # Not needed, just an assert.
        #Copy the shape of rectangle <Index> to the target area.
        #Find rectangle number <Index>
        (?<=(?=.(?<IndexLength>.)*?(?<=\A\k<Index>))\A.*)    # Populate <IndexLength> again.
             # ^-- we are skipping over one character. We want to reach rectangle number <IndexLength>,
             #     so we're skiping over <IndexLength>-1 rectangles
        (?<=(?=\s*(?<-IndexLength>(?:\w+\r?\n)+\r?\n)*(?<=(?<RectangleStart>.*))\w)\A.*) # Capture starting position of this rectangle.
            (?<Rotate>)     # Capture 0 characters into <Rotate>. Indicates that we are not rotating this rectangle.
            (?<Rotate>.)    # Capture 1 character into <Rotate>. Indicates that we are rotating this rectangle.
            (?<TempRotate>) # Also mark <TempRotate>. This is a flag that is cleared for each rectangle. Allows conditional matching.

        (?<=(?=\k<NextPos>(?<=(?<X>~*)))\A.*)   # Init <X> with the number of tildes to the left of the starting position.

        # Basically `(?:\w+\n)+\n` to match the whole rectangle.
        (?<=(?=\k<RectangleStart>   # Go to the position before the first character in the current rectangle.
            (?<Rectangle>           # Capture this rectangle, just so we have it while printing the solution.
            \r?\n        # Match until we reach the end of this rectangle.

        (?<=(?=[^~]+(?<Width>(?<-TempWidth>~)+))\A.*)(?(TempWidth)(?!)) # Capture as many tildes as the current width.
        (?=(?<-TempHeight>\S*(?<Height>\r?\n))+)(?(TempHeight)(?!))     # Capture newlines into stack <Height>.
        (?(TempRotate)(?<-TempRotate>))                                 # Clear <TempRotate>.

        # Try to fit the rectangle into empty positions in the target rectangle.
                (?:                         # Match tildes
                    (?<=(?<Filled>\A.*))               # Push the current position to <Filled>
                    (?<=(?<TempCurrentFilled>\A.*))    # Also push the current position to <TempCurrentFilled>
                    (?=.*\Z                            # Ensure <Filled> is unique. No overlap between rectangles.
                (?<=^\k<X>\k<Width>)        # Match exactly <Width> tildes.
                ~*(?<-Height>\k<Height>\k<X>|\r?\n)     # Match until the same position on the net line (or just the last line).

        # Find the next free position - .
            (?<=(?<NextPos>\A.*))       # <NextPos> is the position before the next free tilde.
            (?<=(?<TempNextPos>\A.*))   # We compare it to <Filled>, which is the position including the tilde.
            .*\r?\n\Z (?<Done>)         # If we cannot find more an empty position it means we are done. Set the <Done> flag.

The idea

At its heart, the pattern works like this:


This is fairly simple, and is the main trick used here.
The pattern matches characters from the beginning of the string. It doesn’t matter what it matches, just how many characters and how many times. The group IndexLength is captured multiple times – as many as the number of rectangles used in the solution. Its length indicates which rectangle is used. In our example, when the solution is (II, III, IV), the group will have three captures with lengths 2, 3, and 4: (..)(...)(....)

A few more examples:

Solution Rectangles Indices IndexLength Captures
Use rectangle II, III, and IV (II, III, IV) (..)(...)(....)
Use rectangle I and II rotated (I, II⟳) (.)(..⟳)
Use rectangle I and III (I, III) (.)(...)

The (.+)+ pattern usually leads to catastrophic backtracking – but here it allows us to do an exhaustive search: we check all possible orders of the rectangles. To avoid catastrophe we have a large number of asserts that fail as quickly as possible when a match cannot be found.

Tricks and Comments

  • I’m using \r?\n everywhere because that’s what string literals contain.
  • The regex is written in a procedural-programming style – similar to statements and variables. I probably could have written it with more nesting and fewer groups, but I find it easier to write and understand this way.
  • In many groups we need to point to a position in the string. To do that we take all characters from the beginning of the string until the position, and push them to a stack. For example, this is done in group NextPos that points to where the next rectangle should go, and Filled that contains occupied positions.
  • In many cases I use (?<=(?=\k<NextPos>…)\A.*) – this is a lookbehind to the beginning of the string, and a lookahead to our position. I’ve placed \A “after” the lookahead because lookbehinds are matched from end to start.
  • There is no relation between the actual characters we match (a few characters from the beginning) and the actual logic flow. This is very similar to the maze-solving regex.
  • Uniqueness – In three places we’re checking that a new capture is different from all previous captures: We choose an unused rectangle, we fill the rectangle into unoccupied tildes, and then we find the next empty tilde. This is done with this pattern:

    This is fairly neat. The pattern is based on Martin Ender‘s answer on Programming Puzzles & Code Golf (by the way, this is the third time today that I’ve linked one of Martin’s answers – thanks!).
    This is a negative lookbehind (remember – read from the end) which:

    • (?<-Filled>.)* – Allow popping from the stack.
    • \A.* – go to the beginning.
    • (?=\A\k<Filled>(?<=\A\k<TempNextPos>)) – see that the item in the stack is different from our new value.

    Because this is a negative lookbehind all options are checked – we check the new value against all values in the stack.
    Because this is a negative lookaround the changes are discarded – we don’t have to copy the stack, and we are not losing the captured values.

  • Rotation is done as a simple alternation, followed by conditional matches on width/height.
  • We can tweak the pattern to change its behavior. For example, we can quite easily allow multiple usages of the same rectangle, or disable rotation. We can also remove the last assertion – (?(Done)|(?!)): this causes the pattern to match the first few rectangles, and is a nice way of debugging the behavior of the pattern.
  • There’s a subtle problem here: we are using input characters to represent the order of the rectangles we are using. We need an increasing number of characters for each additional rectangle. If we have a large enough number of small rectangles we may run out of input characters and the match will fail. We can fix it by allowing additional spaces at the end, for example. While this is a bug, it is not too interesting.
  • This problem was the second programming assignment I got at university (14 years ago, in Java, not regex).

Getting the Solution

The result is a single Match. The match is successful when there is a solution.
This is already pretty good.
We are also capturing these groups:

IndexLength As shown in the table above, for each rectangle used in the solution there’s a capture of IndexLength. The length of the group indicates the index of the rectangle.
Rectangle Captures the rectangles in the order they were used.
Rotate For each rectangle – captures one character if the rectangle is rotated, and an empty capture if it isn’t rotated.
NextPos Captures the position of each rectangle in the solution, on the target rectangle. The position before the tilde where the rectangle should be placed.
SolutionChar Contains each character as used in the order they were iterated. In our example with solution (I, II) the captures of SolutionChar would be {a,a,a,a,b,b}.
Filled Contains all tildes in the target rectangle in the order they were filled. This is the same order as SolutionChar, and we can use both groups to paint the solution if we wanted.

Source Code

Source code and test cases can be found on GitHub:
Rectangles were drawn in HTML/CSS using JS Bin:,css,output

Thanks! Thanks!

Two Years with Amazon Simple Workflow (SWF)


June 12 mark two years of us using Amazon Simple Workflow Service (SWF) in production, and I thought I’d share the experience.

First, let’s get this out of the way:

What is SWF not?

  • SWF does not execute any code.
  • SWF does not contain the logic of the workflow.
  • SWF does not allow you to draw a workflow or a state machine.

So what is it?

SWF is a web service that keeps the state of your workflow.
That’s pretty much it.

What are we using it for?

Our project is based on C#. We are using the AWS API directly (using the .Net SDK).
If you are using Java Or Ruby amazon provider a higher level library for SWF called Flow Framework. For C#, I wrote what I needed myself, or simply used the “low level” API.
Out project processes a large number of files daily, and it was my task to convert our previous batch-based solution to SWF.

How does it work?

SWF is based on polling. Your code runs on your machines on AWS or on-premises – it doesn’t matter. Your code is polling for tasks from the SWF API (where they wait in queues), receives a task, executes it, and sends the result back to the SWF API.
SWF then issues new tasks to your code, and keeps the history of the workflow (state).

If you’ve read any of the documentation, you probably know there are two kind of tasks: Activity Tasks (processed by workers), and Decision Tasks (process by The Decider). This API naturally encourages and leads you to a nice design of your software, where different components do different things.


Workers handle Activity Tasks.
Workers are simple components that actually do the work of the workflow. These are the building blocks of the workflow, and typically do one simple thing:

  • Take an S3 path as input and calculate the hash of the file.
  • Add a row to the database.
  • Send an email.
  • Take a S3 path to an image and create a thumbnail.

All of my workers implement a simple interface:

public interface IWorker<TInput, TOutput>
    Task<TOutput> Process(TInput input);

An important property of workers is that all the data it needs to perform its task is included in its input.

The Decider

When I first read about SWF I had a concept of tiny workers and deciders working together like ants to achieve a greater goal. Would that it were so simple.
While workers are simple, each type of workflow has a decider with this operation:

  • Poll for a decision task.
  • Receive a decision task with all new events since the previous decision task.
  • Optically load the entire workflow history to get context.
  • Make multiple decisions based on all new events.

For a simple linear workflow this isn’t a problem. The decider code is typically:

if workflow started
  Schedule activity task A
else if activity task A finished
  Schedule activity task B
else if activity task B finished
  Schedule activity task C
else if activity task C finished
  Complete workflow execution.

However, when the logic of the workflow is complicated a decider may be required to handle this event:

Since the previous decision, Activity Task A completed successfully with this result, Activity Task B failed, we wouldn’t start the child workflow you’ve requested because you were rate limited, oh, and that timer you’ve set up yesterday finally went off.

This is a pretty complicated scenario. The Decider has just one chance of reacting to these events, and they all come at the same time. There are certainly many approaches here, but either way the decider is a hairy piece of code.

Code Design

As I’ve mentioned earlier, my workers are simple, and don’t use the SWF API directly – I have another class to wrap an IWorker. This is a big benefit because any programmer can write a worked (without knowing anything of SWF), and because it is easy to reuse the code in any context. When the worker fails I expect it to simply throw an exception – my wrapper class registers the exception as an activity task failure.

To make writing complicated deciders easier I’ve implemented helper functions to get the history of the workflow, parse it, and make new decisions. My decider is separated to a base class that uses the SWF API, and child classes (one for each workflow type) that accept the workflow history and return new decisions. My deciders do not connect to a database or any external resource, and have no side-effects (excepts logs and the SWF API, of course). This allows me to easily unit-test the decider logic – I can record a workflow history at a certain point to JSON, feed it to the decider, and see what decisions it makes. I can also tweak the history to make more test cases easily. These tests are important to me because the decided can contain a lot of logic.


In either case, for deciders and for workers, I keep zero state in the class instance. All state comes from the workflow history in the decider’s case, and task input in the worker’s case. There is no communication between threads and no shared memory. This approach makes writing scalable programs trivial: there are no locks and no race conditions. I can have as many processes running in as many machines as I’d like, and it just works – there is no stage of discovery or balancing. As a proof of concept, I even ran some of the treads in Linux (using Mono), and it all worked seamlessly.


The Flow Framework has built-in retries, but it only took me a few hours to implement retries to failed activity tasks, and a few more hours to add exponential backoff. This works nicely – the worker doesn’t know anything. The decider schedules another activity tasks or fails the workflow. The retry will wait a few minutes, and may run in another server. This does prove itself, and many errors are easily resolved.


SWF has many types of timeouts, and I’ve decided early on that I would use them everywhere. Even on manual steps we have timeouts of a few days.
Timeouts are important. They are the only way the workflow can detect a stuck worker or decider tasks because a process crashed. They also encourage you to think about your business process a little better – what does it mean when it takes four days to handle a request? Can we do something automatically?
Another good (?) property of timeouts is that timeouts can purge your queues when the volume gets too high for your solution.

Integration with other AWS services


SWF can execute an AWS Lambda instead of an activity task, which is a compelling idea. It saves the trouble of writing the worker, polling for tasks, and reduces the overhead of a large number of threads and open connections. In the simple worker examples I gave above, all of them can be written as Lambda functions (except maybe adding a database row, depending on your database and architecture). The combination of Lambda serverless execution and SWF state-fullness can make a robust, and trivially scalabale system.
But – while you can use Lambda to replace your workers, you still need to implement a decider the uses the API, and runs as a process in your servers. This is a shame. Decision tasks are quick and self contained, and deciders can easily be implemented as Lambda functions – if they didn’t have to poll for tasks.
I predict Amazon are going to add this feature: allowing AWS Lambda to work as a decider is a logical next step, and can make SWF a lot more appealing.


CloudWatch show accumulated metadata about your workflows and activity tasks. For example, this chart shows the server CPU (blue) and executions of an Activity Task (orange):
CloudWatch - CPU and an Activity Task
This is nice for seeing exclusion time and how the system handles large volumes. The downside is that while it should accumulated data – there is no drill-down. I can clearly see 25 “look for cats in images” workflows failed, but there is no way of actually seeing them. More on that below.

What can be better

Rate Limiting and Throttling

More specifically, limiting number of operations per second. I don’t get rate limiting. Mostly, rate limiting feels like this:
Little Britain - Computer Says No

I understand rate limiting can be useful, and it’s a good option when faulty code is running amok. However, even when I just started it felt like the SWF rate limiting was too trigger-happy. As a quick example – if I have a workflow that is setting a timer, and I start that workflow several hundreds of times, some workflow will fail setting the timer because of a rate limit. I then have to ask for a timer again and again until I succeed. I can’t even wait before asking for a timer again because, well, waiting means setting a timer… (to add insult to injury, the request to set the time is removed from the history, so I can’t really know exactly which timer failed)
For this reason when I’ve implemented exponential backoff between failures I didn’t use timers at all – I used a dummy activity task with a short schedule-to-start timeout. Activity tasks are not rate-limited per time (looking at the list again – this statement doesn’t look accurate, but that list wasn’t public at the time).
I just don’t get the point. The result isn’t better for Amazon or for the programmers. I understand the motive behind rate limiting, but it should be better tuned.

SWF Monitoring API

The API used for searching workflows is very limiting. A few examples:

  • Find all workflows of type Customer Request – Supported.
  • Find all failed workflows – Supported.
  • Find all failed workflows of type Customer Request – Not supported.
  • Find all workflows that used the “Send SMS” activity task – Nope.
  • Find the 6 workflows where the “Send SMS” activity task timed out – No way, not even close.

This can get frustrating. CloudWatch can happily report 406 workflows used the “Send SMS” activity task between 13:00 and 13:05, and 4 activity tasks failed. There is no way of finding these workflows.
So sure, it isn’t difficult to implement it myself (we do have logs), but a feature like this is missing.

The AWS Console

The AWS management console in poor in general. The UI is dated and riddled with small bugs and oversights: JavaScript based links do not allow middle-clicking, bugs when the workflow history is too big, or missing links where they are obvious, like clicking on RunId of parent or child workflow, number of decision task should link to that decision, link from queue name can count of pending tasks, etc.
And of course, the console is using the API, so everything the API cannot do, the console can’t either.
Working with the console leaves a lot to be desired.


There is virtually no noteworthy discussion on SWF. I’m not sure that’s important.


While SWF has its quirks, I am confident and happy with our solution.

Finding interesting Japanese words using C#

My awesome wife studies Japanese. Recently she showed me a challenge a friend of ours came up with. But first some basic (and over-simplistic) concepts:


(skip if you know anything about Japanese)

  • Japanese has two kinds of symbols:
    kana – which are more like letters (we don’t really care about these at this post). For example: ね, こ, だ.
    kanji – A group of over 12,000 symbols representing words. For example: 猫.
  • Kanjis are made of smaller parts are called radicals. For example, the kanji 災 (disaster) is made of two radicals: 巛 (river) and 火 (fire).
  • Words are made from several kanjis. For example, the word 火山 means volcano and is made from two kanjis: 火 (fire) and 山 (mountain). Kana also has part in words, but again, not the subject here.
  • When it comes to Unicode, each kanji has its own character. A word is a string made of several characters.


The challenge:

Can you think of Japanese words like 妊婦, 爆煙, or 価値 where the left radical is identical in both kanjis?

For example, with the radicals highlighted:

I hardly know any Japanese, but I wondered if I could write a program that can find such words…

Finding the data.

I took the data from a Firefox add-on called Rikaichan.

Rikaichan contains information on words, so we’ll start with that:

I’ve download the Japanese-English Dictionary, which is a separate add-on, and found that Rikaichan has all word in a sqlite database. Using a System.Data.SQLite we quickly connect to it:

var builder = new SQLiteConnectionStringBuilder();
builder.DataSource = _dictionaryPath;
using (var connection = new SQLiteConnection(builder.ToString()))
    /// ...

We can get the schema using select * from sqlite_master;, which reveals just a single table (and two indices):

CREATE TABLE dict (kanji TEXT, kana TEXT, entry TEXT)

We’ve got a list of 236,329 words, their kanji and kana representations, and their English meaning. This is nice, but there is still no data about the radicals…

Kanji structure and radicals

Rikaichan does have that data: when standing over a kanji it shows a lot more information:

We have three things that are interesting here:

  • One radical on top – Some sort of “main” radical. I’m not sure how Rikaichan crowns that radical.
  • List of all radicals.
  • SKIP pattern – we’ll get to that soon.

Poking a little deeper, we can see this data in the main add-on, not the English dictionary (which is unexcpected, because each kanji has its meaning in English).

There is a file called kanji.dat, with this structure:

災|B86 G5 S7 F976 N1448 V3400 H2206 DK1400 L167 IN1335 E680 P2-3-4 I4d3.3 Yzai1|サイ わざわ.い|||disaster, calamity, woe, curse, evil
灾|B86 S7 P2-3-4 Yzai1|サイ わざわ.い|||calamity, disaster, catastrophe
炁|B86 S8 P2-4-4 Yqi4|キ ケ いき|||breath, air, steam, gas, weather, used in Taoist charms
炅|B86 S8 P2-4-4 Yjiong3 Ygui4|ケイ キョウ エイ ヨウ あらわ.れる|||brilliance

So this is promising. For 災, B86 correlated with “radical 86” from the screenshot, and there’s the English meaning (but we don’t need that). There isn’t any information on the other radicals, and nothing about their positions…
One thing that did stuck out is P2-3-4. Looking at the image we can see this is called SKIP pattern – and it looks less opaque than the other kanji indices. A quick search demystifies it. The first digit can have only four options:

  • 1 – The kanji structure is left-and-right, like 炆, 炒, or 炬.
  • 2 – The kanji structure is top-and-bottom, like 災, 灾, or 粂.
  • 3 – The kanji structure is enclosing-and-contained, like 仄, 兦, or 冃.
  • 4 – The kanji structure is “other”, like 冉, 戍, or 火.

Great! Now it looks like we have enough data to find our words: We have a list of words, list of kanjis, “main” radical in each kanji, and the general structure of the kanji. It looks like the “main” radical is (mostly) the one on the left, and that’s good enough.

We can parse this file using a small regular expression:

^(?<Kanji>\w|[^|]{2})\|       # \w doesn't match 𠀋 - surrogate pair

Here’s a pie chart with the distribution of the different kanji structure. The majority of them is of the left-and-right kind, so we will probably find a lot of words:

pie chart

With 64% of kanjis in the 1 category, you have to wonder if SKIP pattern is a good way to organize kanjis.

What are the Radicals

We’ve extracted the “main” radical for each kanji, bu we are still missing something. Some kanjis are also radicals, and according to our data, they are their own radicals. For example:

Here is the source data from kanji.dat

巛|B47 S3 V1527 H9 P1-1-2 Ychuan1|セン かわ||まがりがわ|curving river radical (no.47)
川|B47 G1 S3 F181 N1447 V1526 H6 DK1 L127 IN33 E48 P1-1-2 I0a3.2 Ychuan1|セン かわ|か こ さわ|さんぼんがわ|stream, river, river or 3-stroke river radical (no. 47)
順|B47 G4 S12 F779 N1450 V6619 H18 DK9 L129 IN769 E506 P1-1-11 I9a3.2 Yshun4|ジュン|あや あり おき おさむ しげ したがう とし なお のぶ のり まさ むね もと ゆき よし より||obey, order, turn, right, docility, occasion

This is bad news, because it would introduce false positives – what if there was a word like 順巛? (not a real word, by the way).

Luckily, the Rikaichan developers are really good at naming files, and we can find that data on radicals.dat:

巛	川	まがりがわ	crooked river	侃釧訓慌荒災拶州洲酬巡順疏馴流琉硫剄勁卅廱徑惱旒梳毓獵瑙痙癰碯經緇脛腦臘莖蔬輕輜逕醯錙鑞頸駲鯔
工		たくみ	craft	恐空功巧控攻江紅腔貢項鴻佐嵯左差瑳試式拭尋惰楕築筑虹杢倥儔剄勁啌嗟噐噬墮壽嵳巫弑徑惘扛搓擣杠檮椌槓槎橢汞濤潯熕畭疇痙矼磋穩筮箜籌經縒缸肛脛隋膸莖蕁蛩覡訌誣跫蹉躊軾輕逕隨鑄隱靈鞏頸髓鵐
己	已巳	おのれ	snake	改鞄巻忌紀記起倦圏捲巷港撰選遷巽巴配妃包庖抱泡砲胞飽僊囘匏咆垉惓杞枹炮煕熈爬疱皰祀綣苞萢蚫蜷袍鉋雹靤韆饌髱鮑麭熙
巾		はば	cloth	柿希帰稀錦策刷刺姉市師獅常飾帥制製席匝掃帯滞凧帖帳吊帝締諦蹄逓肺幡帆婦布怖幅幣弊蔽瞥帽幌幕棉綿佩冪唏啻啼嫦帋帚帙帑帛帶帷幄幃幀幎幗幔幟幢幤幇掣敝斃旆晞暼柬棘棗楴楝欷歸沛滯珮箍箒篩緜羃菷蒂蓆蔕乕蟐衞閙霈鬧鯑鰤

Surrogate Pairs

As seen above in the regular expression, it is not trivial to split a string into Unicode characters. you can not assume one Char is one Unicode character. For example, "𠀋" is a surrogate pair, made of two Chars.

It is simple enough to split a string to Unicode character points:

public static IEnumerable<string> SplitBySurrogatePairs(this string str)
    var enumerator = StringInfo.GetTextElementEnumerator(str);
    while (enumerator.MoveNext())
        yield return enumerator.Current as string;

Final code

We can finally write the query to find our words:

var w = from word in _words.Words
        let chars = word.Characters
        where chars.Count >= 2
        where chars.All(_kanji.IsKanji)
        let kanjis = chars.Select(_kanji.GetKanji)
        where kanjis.All(k => k.SkipPatternType == SkipPatternType.LeftRight)
        where kanjis.All(k => !_radicals.IsRadical(k.Kanji))
        let firstKanjiRadicalIndex = kanjis.First().MainRadicalIndex
        where kanjis.All(k => k.MainRadicalIndex == firstKanjiRadicalIndex)
        orderby kanjis.Count() descending
        select word;


We’ve found 1364 such words. Naturally, some of them are quite nice:

  • 喋喋喃喃 – holding an intimate, long-winded conversation in whispers
  • 津津浦浦 – all over the country; every nook and cranny of the land; throughout the land.
  • 流汗淋漓 – profuse perspiration, dripping with sweat.
  • 経緯線網 – graticule
  • 縷縷綿綿 – going on and on in tedious detail
  • 蚯蚓蜥蜴 – worm lizard, amphisbaenian

Source Code

You can see the whole code on GitHub:

Asp.Net MVC – Nicer display for boolean values

How can we get Asp.Net MVC to display boolean values better?

Lets have a simple model:

public class Options
    [Display(Name = "Flux")]
    public bool EnableFlux { get; set; }

    [Display(Name = "Advanced Options")]
    public bool ShowAdvancedOptions { get; set; }

    [Display(Name = "Do you eat well?")]
    public bool UserEatsWell { get; set; }

The default display template isn’t great – it just displays disabled checkboxes:

@Html.LabelFor(m => m.ShowAdvancedOptions)
@Html.DisplayFor(m => m.ShowAdvancedOptions)

@Html.LabelFor(m => m.EnableFlux)
@Html.DisplayFor(m => m.EnableFlux)

@Html.LabelFor(m => m.UserEatsWell)
@Html.DisplayFor(m => m.UserEatsWell)

The result isn’t too pretty:
Mvc Booleans - default

Continue reading

Solving mazes using regular expressions

Can the .Net regex engine find its way through a maze?

A maze?

We have a text based maze. We can walk on spaces, and need to find a path from the start point (S) to the end point (E). Anything else in a wall.


█S█      █
█ █ █ █ ██
█   █ █ E█
█         █ █           █ █   █     █   █
█ ███ ███ █ █ ███████ █ █ ███ █ ███ ███ █
█ █ █ █     █ █   █   █ █   █   █ █ █   █
█ █ █ █████ █ █ ███ ███ ███ █████ █ █ ███
█ █ █     █ █   █   █   █     █ █   █   █
█ █ █████ █████ █ █████ █ ███ █ █ █████ █
█ █     █     █ █     █   █   █ █       █
█ █████ █████ █ █████ █████ ███ ███████ █
█       █   █ █   █ █ █           █   █ █
█████ ███ █ █ ███ █ █ █████████ ███ █ █ █
█ █   █   █     █   █     █     █   █   █
█ █ ███ ███████████ █████ █ █████ ███████
█ █   █           █ █   █ █   █ █ █     █
█ ███ ███████████ █ ███ █ ███ █ █ █████ █
█   █     █ █     █     █   █   █ █     █
█ █ █████ █ █ █████████ ███ █████ █ █████
█ █   █   █ █   █       █   █     █     █
█ █ ███ ███ ███ █ ███████ ███ █████████ █
█ █           █         █               █
█ ███████
█       █
███████ █
      █ █


We are going to implement the following algorithm for solving a maze:

  1. Find the start point.
  2. Pick a new position to go to. We have four choices: right, left, down, or up.
    The new position is invalid if it crosses our current path through the maze (the path cannot have loops).
  3. If we reached the End, we are done.
  4. If we cannot advance, undo the last choice and make a new one.
  5. Go to step 2.

As you may have noticed, this is backtracking: we’re making choices and reverting them when we know they were wrong.

Continue reading

Finding intersecting ranges – Small variant using Expression.Constant

Last time we saw how use expression to find intersecting ranges. Here’s another useful overload of the same method – it is probably common to have one of ranges on the entity object (for example – the employee’s vacation), while knowing the other (for example the week of training). In that case, we can revise the implementation a little to use to concrete values:

public static Expression<Func<TEntity, bool>>
    RangesIntersect<TEntity, TComparable>(TComparable from, TComparable to,
                                        Expression<Func<TEntity, TComparable>> getEntityRangeStart,
                                        Expression<Func<TEntity, TComparable>> getEntityRangeEnd)
    ParameterExpression newLambdaParameter = Expression.Parameter(typeof(TEntity));
    var rewireStart = RewireLambdaExpression(getEntityRangeStart, newLambdaParameter);
    var rewireEnd = RewireLambdaExpression(getEntityRangeEnd, newLambdaParameter);

    return Expression.Lambda<Func<TEntity, bool>>(
        RangesIntersect(Expression.Constant(from), Expression.Constant(to),
                        rewireStart.Body, rewireEnd.Body),

public static IQueryable<TEntity> WhereRangesIntersect<TEntity, TComparable>(
    this IQueryable<TEntity> query,
    TComparable range1Start,
    TComparable range1End,
    Expression<Func<TEntity, TComparable>> getEntityRange2Start,
    Expression<Func<TEntity, TComparable>> getEntityRange2End)
    return query.Where(RangesIntersect(range1Start, range1End, getEntityRange2Start, getEntityRange2End));

Example Use

using (VacationsEntities model = new VacationsEntities())
    var trainingStart = new DateTime(2013, 1, 1);
    var trainingEnd = trainingStart.AddDays(7);
    var query = model.Vacations
                                           v => v.Start.Value,
                                           v => v.End.Value);
    // ...

Resulting SQL

The Entity Framework chose to convert the two Expression.Constants to date literals instead of using SQL parameters:

[Extent1].[VacationId] AS [VacationId], 
[Extent1].[EmployeeId] AS [EmployeeId], 
[Extent1].[Start] AS [Start], 
[Extent1].[End] AS [End]
FROM [Vacations] AS [Extent1]
WHERE (convert(datetime, '2013-01-01 00:00:00.000', 121) <= [Extent1].[End]) AND
      ([Extent1].[Start] <= convert(datetime, '2013-01-08 00:00:00.000', 121))

Finding intersecting ranges using expression trees and the Entity Framework


We’re using the Entity Framework as ORM. We have entities of Employee and Vacation. Employees may go on vacations, that’s fair. A vacation is a range of dates: start and end.
Our company will hold a week long safety training, so we need to find all employees who will go on vacation and will miss any day of the training.

Or, more generally – we have two ranges, and we want to find only the items where the ranges intersect.

Sounds simple enough but as it turns out, there is no immediate way of creating such a query. Naturally, we are more interested in a general solution, not a where clause that we would have to duplicate each time.


Given two ranges, we want to return true if the ranges intersect. A simple implementation would be:

bool intersecting = (range1Start <= range2End) &&
                    (range2Start <= range1End);

<= means that we treat our ranges as inclusive at both ends – {1,2} and {2,3} would intersect at the point 2. < would be exclusive at any end or both.


The Entity Framework works by translating expression trees into SQL. We can’t write regular code like we would with Linq to Objects, that wouldn’t work: the Entity Framework only understands Expressions:

/// <summary>
/// Get two ranges represented by four expressions, and check if they intersect. Inclusive on both ends,
/// and allow intersection in a single point.
/// </summary>
/// <param name="range1From">Start value of the first range</param>
/// <param name="range1To">End value of the first range</param>
/// <param name="range2From">Start value of the second range</param>
/// <param name="range2To">End value of the second range</param>
/// <returns>A BinaryExpression that checks if the two ranges intersect.</returns>
public static BinaryExpression RangesIntersect(Expression range1From, Expression range1To,
                                               Expression range2From, Expression range2To)
        Expression.LessThanOrEqual(range1From, range2To),
        Expression.LessThanOrEqual(range2From, range1To)

Not That Easy

Next, we want to use the Queryable.Where overload that takes a predicate of type Expression<Func<TSource, Boolean>>.
My first attempt, which was wrong, was:

public static Expression<Func<TEntity, bool>> RangesIntersect<TEntity, TComparable>(
                        Expression<Func<TEntity, TComparable>> getEntityRange1Start,
                        Expression<Func<TEntity, TComparable>> getEntityRange1End,
                        Expression<Func<TEntity, TComparable>> getEntityRange2Start,
                        Expression<Func<TEntity, TComparable>> getEntityRange2End)
    (!) Incorrect code
    return Expression.Lambda<Func<TEntity, bool>>(
        RangesIntersect(getEntityRange1Start.Body, getEntityRange1End.Body,
                        getEntityRange2Start.Body, getEntityRange2End.Body),

Let’s make a quick test to see how it doesn’t work:

var trainingStart = new DateTime(2013, 1, 1);
var trainingEnd = trainingStart.AddDays(7);
var vacation = new Vacation {Start = new DateTime(2013, 1, 5),
                             End = new DateTime(2013, 1, 10)};

var intersectExpression = RangeIntersection
      .RangesIntersect<Vacation, DateTime?>(
             v => trainingStart, v => trainingEnd,
             v => v.Start, v => v.End);
bool intersect = intersectExpression.Compile()(vacation);


variable ‘v’ of type ‘Vacation’ referenced from scope ”, but it is not defined

Had we used the Expression in Linq, we would have gotten the exception:

The parameter ‘v’ was not bound in the specified LINQ to Entities query expression.

This fails because the parameter of the first Expression (which used in getEntityRange1Start.Parameters) is not present in the other three Expressions. It looks like all of them should be using the same v, but they are independent.
To get it to work we need all four expressions to use the same parameter. A quick search finds the handy ExpressionSubstitute class, by Marc Gravell:

/// <summary>
/// Visit an Expression, and replace one sub expression with another.
/// </summary>
/// <remarks> </remarks>
class ExpressionSubstitute : ExpressionVisitor
    public readonly Expression from, to;
    public ExpressionSubstitute(Expression from, Expression to)
        this.from = from; = to;
    public override Expression Visit(Expression node)
        if (node == from) return to;
        return base.Visit(node);

Now we can easily change the expressions’ parameters:

public static Expression<Func<TEntity, TReturnType>> RewireLambdaExpression<TEntity, TReturnType>(
    Expression<Func<TEntity, TReturnType>> expression,
    ParameterExpression newLambdaParameter)
    var newExp = new ExpressionSubstitute(expression.Parameters.Single(), newLambdaParameter).Visit(expression);
    return (Expression<Func<TEntity, TReturnType>>)newExp;

Correct Code

We can finally rewire the expressions, and write the extension method:

public static Expression<Func<TEntity, bool>> RangesIntersect<TEntity, TComparable>(
                                Expression<Func<TEntity, TComparable>> getEntityRange1Start,
                                Expression<Func<TEntity, TComparable>> getEntityRange1End,
                                Expression<Func<TEntity, TComparable>> getEntityRange2Start,
                                Expression<Func<TEntity, TComparable>> getEntityRange2End)
    ParameterExpression newLambdaParameter = Expression.Parameter(typeof(TEntity));
    var rewireStart1 = RewireLambdaExpression(getEntityRange1Start, newLambdaParameter);
    var rewireEnd1 = RewireLambdaExpression(getEntityRange1End, newLambdaParameter);
    var rewireStart2 = RewireLambdaExpression(getEntityRange2Start, newLambdaParameter);
    var rewireEnd2 = RewireLambdaExpression(getEntityRange2End, newLambdaParameter);

    return Expression.Lambda<Func<TEntity, bool>>(RangesIntersect(rewireStart1.Body, rewireEnd1.Body,
                                                                  rewireStart2.Body, rewireEnd2.Body),

public static IQueryable<TEntity> WhereRangesIntersect<TEntity, TComparable>(
    this IQueryable<TEntity> query,
    Expression<Func<TEntity, TComparable>> getEntityRange1Start,
    Expression<Func<TEntity, TComparable>> getEntityRange1End,
    Expression<Func<TEntity, TComparable>> getEntityRange2Start,
    Expression<Func<TEntity, TComparable>> getEntityRange2End)
    return query.Where(RangesIntersect(getEntityRange1Start, getEntityRange1End,
                                       getEntityRange2Start, getEntityRange2End));


Now we can write Linq queries that look like:

using (VacationsEntities model = new VacationsEntities())
    var trainingStart = new DateTime(2013, 1, 1);
    var trainingEnd = trainingStart.AddDays(7);
    var query = model.Vacations
        .WhereRangesIntersect(v => trainingStart, v => trainingEnd,
                              v => v.Start, v => v.End);
    // ...

We can also look at the resulting SQL, using ((ObjectQuery<Vacation>)query).ToTraceString():

[Extent1].[VacationId] AS [VacationId], 
[Extent1].[EmployeeId] AS [EmployeeId], 
[Extent1].[Start] AS [Start], 
[Extent1].[End] AS [End]
FROM [Vacations] AS [Extent1]
WHERE (@p__linq__0 <= [Extent1].[End]) AND
      ([Extent1].[Start] <= @p__linq__1)

The expression also works with OData. For example, using Stack Overflow’s endpoint:

var uri = new Uri("");
var entities = new StackOverflowOData.Entities(uri);
var trainingStart = new DateTime(2012, 3, 15);
var trainingEnd = trainingStart.AddDays(7);

var q = entities.Posts
    .WhereRangesIntersect(p => p.CreationDate.Value, p => p.ClosedDate.Value,
                          p => trainingStart, p => trainingEnd).Take(10);

The url filter will be (using q.ToString()):

&$filter=(CreationDate le datetime'2012-03-22T00:00:00') and
         (datetime'2012-03-15T00:00:00' le ClosedDate)

To me, this is has been a nice exercise, and finally gave me a reason to play with expression trees.

See Also

Jenkins CI – Error 500 When Opening Views

After messing with Jenkins a little (upgrade and downgrade), all views were broken (views are the tabs on top that filer jobs). When clicking on them we got a Server Error (500).

The log showed several errors:

Caused by: java.lang.NullPointerException: Cannot invoke method isEmpty() on null object
    at org.codehaus.groovy.runtime.NullObject.invokeMethod(
Caused by: org.apache.commons.jelly.JellyTagException: jar:file:/C:/Program Files (x86)/Jenkins/war/WEB-INF/lib/jenkins-core-1.482.jar!/hudson/model/View/index.jelly:44:43:  Cannot invoke method isEmpty() on null object
    at org.apache.commons.jelly.impl.TagScript.handleException(
27/05/2013 16:10:50 org.kohsuke.stapler.compression.CompressionFilter reportException
WARNING: Untrapped servlet exception
javax.servlet.ServletException: org.apache.commons.jelly.JellyTagException: jar:file:/C:/Program Files (x86)/Jenkins/war/WEB-INF/lib/jenkins-core-1.482.jar!/hudson/model/View/index.jelly:44:43:  Cannot invoke method isEmpty() on null object
    at org.kohsuke.stapler.jelly.JellyClassTearOff.serveIndexJelly(

I don’t know exactly what caused it, but I was able to solve the error. The views are saved in Jenkins’s main configuration file, config.xml. (on our server that was C:\Program Files (x86)\Jenkins\config.xml)

First, back up this file.

The views should still be intact in the file. I was able to create another view.
A working view should look like this:

  <owner class="hudson" reference="../../.."/>
  <name>Working View</name>
  <properties class="hudson.model.View$PropertyList"/>
  <jobNames class="tree-set">
	<comparator class="hudson.util.CaseInsensitiveComparator"/>
	<string>Job Name 1</string>
	<string>Job Name 2</string>

On line 7 we have the list of all job names. On a broken view, the class attribute was missing:

    <comparator class="hudson.util.CaseInsensitiveComparator"/>
    <string>Job Name 1</string>
    <string>Job Name 2</string>

Add class="tree-set" to the jobNames element to correct the configuration file.

“Reload Configuration from Disk” didn’t work for me – I had to take the service down, change the XML, save it, and start the service again.
After that everything worked.


Jenkins might delete all jobs in the view (the <string>Job Name 1</string> elements) – that is why we backed up the file. It should be easy to restore these, or simply select the jobs again once the view is working.

Asp.Net MVC – Avoid resetting the form authentication cookie timeout for a request

I have a site that is using Asp.Net’s forms authentication, with sliding expiration. A user may log in and make some operations (requests), and after 30 minutes (for example) from her last action, she will be logged out.
However – what if I want to make a request that will not reset that timeout? For example, I may want to send an Ajax request for tracking the user, periodically check for notices, etc. How can I prevent that “system” request from extending the authentication cookie?

As it turns out, there is a simple way of doing that, once you understand exactly how sliding authentication works in

I won’t go into too much detail, but the basic mechanism is:

  • After authentication, an authentication cookie is sent to the user, with expiration time of 30 minutes (or whatever you defined).
  • The authentication cookie contains an encrypted data, which is really the authentication ticket.
  • The ticket contains information related to the authentication, and the expiration time. Note the browsers don’t send the cookies’ expiration times when making a request, so that data is included in the authentication ticket instead.
  • For actions the user does in the first half of the expiration period (that is, the first 15 minutes), nothing happens. The authentication timeout does not reset.
  • On requests that happen on the second half of the expiration period, the timeout is reset, and the expiration is extended by another 30 minutes.

So, there is a chance our response will contain a new auth cookie, with extended expiration time. If the user is inactive, and our periodical action is the only thing happening, one of the responses will contain a new cookie, and the user will be logged in indefinitely.

So, how an we prevent that? This is as simple as removing the cookie from the response:

/// Prevent the auth cookie from being reset for this action, allows you to
/// have requests that do not reset the login timeout.
/// </summary>
public class DoNotResetAuthCookieAttribute : ActionFilterAttribute
    public override void OnActionExecuting(ActionExecutingContext filterContext)
        var response = filterContext.HttpContext.Response;

Use of this action is pretty simple:

public ActionResult SampleAction()
    return Json(new {Request.IsAuthenticated, CookiesCount = Response.Cookies.Count});


  • You may also want to disable caching for such an action, using OutputCacheAttribute
    [OutputCache(NoStore = true, Duration = 0)]
  • I have some more defensive code in production (some more null checks), but this should be ok. Specifically, you don’t have to test the cookie is actually there to delete it: HttpCookieCollection.Remove
  • I used this to check if the user is still logged in, and get the time she will (presumably) log out – I needed something that works in an Ajax application, with many possible tabs or windows open.

See Also

.Net Regular Expressions – Finding Decimal Numbers that are Divisible by Three

It’s very easy to check a decimal number is divisible by three using a simple DFA with 3 states.

A regex, therefor, is possible, but not too pretty (source):



Here’s another approach, using .Net’s stacks as a simple counter:

(?>             # No regrets - don't backtrack on if/else decisions.
    [0369]      # = 0 (mod 3)
    [147]       # = 1 (mod 3)
    (?:         # if possible pop 2, else push 1
    [258]       # = 2 (mod 3)
    (?:         # if possible pop 1, else push 2
(?(Sum)(?!)) # Assert nothing's left in the stack

Why? Well, I was bored while I shaved. Luckily, this regex is simple enough for Mono. Working example: (ok, maybe not, mono is missing 111222)