Speak Friend and Enter – Do people actually use movie passwords?

Can you think of a famous password, such as one used in a book or a movie? Do people see passwords being typed on screen and think “hey, now that is a pretty good password…”? I downloaded the Exploit.in combo list containing over seven hundred million leaked logins and tried to look how often than happens.

Famous passwords used in real leaked passwords:

The clear winner here is Mission Impossible with 948 password containing AW96B6. A nice second is TrustNo1 from The X-Files which appears in 629 passwords. Next we have 386 passwords containing Swordfish – a classic password from 1932 with its own Wikipedia page.
The safe combination from Inception, 528491 is used in 168 passwords, sometimes accompanied by character names like 528491c0bb or eames528491.

mission-impossible-server-room

“Damn it, Ethan, you don’t have to press Caps Lock for each letter…”

I thought many The Lord of the Rings fans will choose “Speak Friend” as their password – but not even one person did. The elf word for “friend”, Mellon, appears 107 times. We also have 6 YouShallNotPass, and even one SpamShallNotPass.
Finally, there are 11 My Preciouses, including My Cat Precious and My Precious Girl (which still count! we have to be fair toward other phrases that didn’t get this level of scrutiny).

Movie or Book Title Password Count
Mission Impossible AW96B6 948
The X-Files TrustNo1 629
Horse Feathers Swordfish 386
Die Hard Akagi (Whole word) 241
Inception 528491 (Exact number) 168
LotR Mellon 107
Arabian Nights Open Sesame 37
Watchmen Rameses 2 18
Casino Royale 836547 (Exact number) 14
LotR My Precious 11
LotR Shall Not Pass 7
Watchmen Rameses II 6
National Treasure Valley Forge 3
Who Framed Roger Rabbit Walt sent me 1
The Net Natoar23ae 1
Tron Reindeer Flotilla 0
Matrix Zion 0101 0
LotR Cannot Pass 0

Harry Potter

I didn’t plan on having so many Harry Potter passwords, but the good people at wikia made it too easy – they have a page with dozens of passwords used in the books. I’ve split them into three types:

Harry Potter passwords that are also Latin or other known phrases or snacks:

Password Count
Acid Pops 377
Fizzy Pop 259
Baubles 241
Alea iacta est 179
Catweazle 135
Quid Agis 31

Harry Potter Pass-phrases

Password Count
Dumbledore 23
Balderdash 8
Studious Success 6
Tapeworm 3
Caput Draconis 2
Flibbertigibbet 2
Pig Snout 1
Fortuna Major 1

Passwords based on Harry Potter spells:

Spell Brief Description Count
Accio (Contained in password as a whole word) Fetch something. Reasonable password. 885
Accio (The whole password is just “accio” with no additional characters.) 555
Stupefy 223
Alohomora Open locks. Makes sense. 68
Imperio 35
Crucio 17
Avada Kedavra 15
Expelliarmus 9
Sectumsempra 5
Impedimenta 3
Reparo 3
Expecto Patronum 2
Confundus 1

No muggle had used any of the following passwords:
Wattlebird, Oddsbodikins, Scurvy Cur, Banana Fritters, Fairy Lights, Mimbulus Mimbletonia, Abstinence, Dilligrout, Sherbet lemon, Cockroach cluster, Fizzing Whizzbees, Toffee Eclairs, Chocolate Frogs, Dissendium, Slytherins are Supreme, Facta non verba, Sea Serpent, This Password is Absurd, Libraries Liberate, Dragon s Egg, Light against Darkness, Chops and Gravy, Dashing Cadogan, Forget Me Never, Lunartickle, Surreptitiousness, Wanglewort, up to no good, nor Mischief Managed.

I see “This Password is Absurd” is up for grabs, so I’m calling it – this is my password now.

Technical Details

I’ve found the combo list on Google. It took me about an hour before I know the term “combo” or the name “Exploit.in”. I will not link it from here though.
The combo contains 111 files with lines in the format {email}:{password}, for example:

jane@example.com:12345
bob@example.com:pa$$word

From each line I’ve kept just the password and the TLD. I’ve loaded all data to an SQLite database, and grouped identical rows – keeping the count, of course. Having the data SQL made analyzing it very easy.
To automate some of queries I used KNIME – not so much for its analytics, mainly because is automatically saves results to disk.
PasswordsKnimeWorkflow

Password comparisons were all case-insensitive, and multi-word password are filtered using SQL Like, as where password like '%fizzy%pop%'.

Bonus

2,289,587 Spaceballs fans use 12345 as the first number in their passwords. 601,874 have 12345 as their entire password.

Advertisements

Filling a large rectangle with smaller rectangles using regular expressions

We have some rectangles:

Four rectangles. I: 2x2, II: 2x1, III: 1x1, IV: 3x1

We also have a bigger, empty rectangle:

Empty area, 3x2

Goal

The goal is to fill the whole empty rectangle with some of the smaller rectangles, and to do it using a .Net regular expression (obviously).
We may use each rectangle only once, and we are allowed to rotate a rectangle as needed.
For example, these are valid solutions:

Use rectangle I and II rotated

Use rectangle II, III, and IV

But this one isn’t:
Use rectangle I and III

Input

We’ll represent the rectangles as alphanumeric blocks, and the empty rectangle as a block of tildes (~). There is an empty line between the rectangles:

aa
aa

bb

c

ddd

~~~
~~~


I chose this representation because it is relatively simple, and it makes the pattern easier to follow: ~ is part of the target rectangle, \w is a small rectangle.

An important comment regarding solution representation

There are many ways of representing a solution, and this affects the way our algorithm is written and works.
Consider this solution again:

Use rectangle II, III, and IV

Either of these work:
(I’m using roman numeral for the small rectangles: I, II, III, and IV)

  • (not this) Rectangle number II goes on position (1, 1), rectangle number III goes on position (3,1), and rectangle number IV goes on position (1,2). (we will not be solving it like this though)
  • (not this) Rectangle I is not used. Rectangle II is used first, Rectangle III is second, and Rectangle IV is third. (we will not be solving it like this either)
  • First use Rectangle II, then Rectangle III, and then Rectangle IV. (this is the one we’ll use)

This is a little nuanced. The third option works best here because it fits backtracking very well.
When I say “next free position” I mean in character order: we go over all tildes from left to right, line by line, until we find one that isn’t already occupied.
Using this representation, the solution above can be written as (II, III, IV).

The pattern

\A
(?=(?<NextPos>[^~]*))       # \k<NextPos> always matches the position *before* the next free tilde.
(?:
    (?:
        (?<IndexLength>.)+?             # Match n characters. We will use rectangle number n in position <Index>.
                                        # 0 is not an option - we need some shape to be first, second, third, etc.
        (?<=\A(?=(?<Index>(?<-IndexLength>.)+)).*)    # Capture into <Index> the first n characters.
        (?<=\A(?<CurrentIndex>\k<Index>).*)           # Copy Index into CurrentIndex
        (?=.*\Z                                       # Ensure <Index> is unique. We cannot use the same rectangle twice
            (?<!(?=\A\k<Index>(?<=\A\k<CurrentIndex>))\A.*(?<-Index>.)+) 
        )
        (?(IndexLength)(?!)|)   # Not needed, just an assert.
        
        #Copy the shape of rectangle <Index> to the target area.
        #Find rectangle number <Index>
        (?<=(?=.(?<IndexLength>.)*?(?<=\A\k<Index>))\A.*)    # Populate <IndexLength> again.
             # ^-- we are skipping over one character. We want to reach rectangle number <IndexLength>,
             #     so we're skiping over <IndexLength>-1 rectangles
        (?<=(?=\s*(?<-IndexLength>(?:\w+\r?\n)+\r?\n)*(?<=(?<RectangleStart>.*))\w)\A.*) # Capture starting position of this rectangle.
        (?(IndexLength)(?!))
        (?:
            (?<Rotate>)     # Capture 0 characters into <Rotate>. Indicates that we are not rotating this rectangle.
            |
            (?<Rotate>.)    # Capture 1 character into <Rotate>. Indicates that we are rotating this rectangle.
            (?<TempRotate>) # Also mark <TempRotate>. This is a flag that is cleared for each rectangle. Allows conditional matching.
        )

        (?<=(?=\k<NextPos>(?<=(?<X>~*)))\A.*)   # Init <X> with the number of tildes to the left of the starting position.

        # Basically `(?:\w+\n)+\n` to match the whole rectangle.
        (?<=(?=\k<RectangleStart>   # Go to the position before the first character in the current rectangle.
            (?<Rectangle>           # Capture this rectangle, just so we have it while printing the solution.
                (?=(?(TempRotate)(?<TempHeight>\w)|(?<TempWidth>\w))+\r?\n) 
                (?:  
                    (?<Solution>\w)+
                    (?(TempRotate)(?<TempWidth>\r?\n)|(?<TempHeight>\r?\n))
                )+
            )
            \r?\n        # Match until we reach the end of this rectangle.
        )\A.*)

        (?<=(?=[^~]+(?<Width>(?<-TempWidth>~)+))\A.*)(?(TempWidth)(?!)) # Capture as many tildes as the current width.
        (?=(?<-TempHeight>\S*(?<Height>\r?\n))+)(?(TempHeight)(?!))     # Capture newlines into stack <Height>.
        (?(TempRotate)(?<-TempRotate>))                                 # Clear <TempRotate>.

        # Try to fit the rectangle into empty positions in the target rectangle.
        (?<=(?=\k<NextPos>
            (?:
                (?:                         # Match tildes
                    ~
                    (?<=(?<Filled>\A.*))               # Push the current position to <Filled>
                    (?<=(?<TempCurrentFilled>\A.*))    # Also push the current position to <TempCurrentFilled>
                    (?=.*\Z                            # Ensure <Filled> is unique. No overlap between rectangles.
                        (?<!(?=\A\k<Filled>(?<=\A\k<TempCurrentFilled>))\A.*(?<-Filled>.)+) 
                    )
                )+?
                (?<=^\k<X>\k<Width>)        # Match exactly <Width> tildes.
                ~*(?<-Height>\k<Height>\k<X>|\r?\n)     # Match until the same position on the net line (or just the last line).
            )+
            (?(Height)(?!))
        )\A.*)

        # Find the next free position - .
        (?<=(?=.*?
            (?<=(?<NextPos>\A.*))       # <NextPos> is the position before the next free tilde.
            ~
            (?<=(?<TempNextPos>\A.*))   # We compare it to <Filled>, which is the position including the tilde.
            (?=.*\Z
                (?<!(?=\A\k<Filled>(?<=\A\k<TempNextPos>))\A.*(?<-Filled>.)*) 
            )
            |
            .*\r?\n\Z (?<Done>)         # If we cannot find more an empty position it means we are done. Set the <Done> flag.
        )\A.*)
        
    )
)+
(?(Done)|(?!))

The idea

At its heart, the pattern works like this:

^(?<IndexLength>.+)+

This is fairly simple, and is the main trick used here.
The pattern matches characters from the beginning of the string. It doesn’t matter what it matches, just how many characters and how many times. The group IndexLength is captured multiple times – as many as the number of rectangles used in the solution. Its length indicates which rectangle is used. In our example, when the solution is (II, III, IV), the group will have three captures with lengths 2, 3, and 4: (..)(...)(....)

A few more examples:

Solution Rectangles Indices IndexLength Captures
Use rectangle II, III, and IV (II, III, IV) (..)(...)(....)
Use rectangle I and II rotated (I, II⟳) (.)(..⟳)
Use rectangle I and III (I, III) (.)(...)

The (.+)+ pattern usually leads to catastrophic backtracking – but here it allows us to do an exhaustive search: we check all possible orders of the rectangles. To avoid catastrophe we have a large number of asserts that fail as quickly as possible when a match cannot be found.

Tricks and Comments

  • I’m using \r?\n everywhere because that’s what string literals contain.
  • The regex is written in a procedural-programming style – similar to statements and variables. I probably could have written it with more nesting and fewer groups, but I find it easier to write and understand this way.
  • In many groups we need to point to a position in the string. To do that we take all characters from the beginning of the string until the position, and push them to a stack. For example, this is done in group NextPos that points to where the next rectangle should go, and Filled that contains occupied positions.
  • In many cases I use (?<=(?=\k<NextPos>…)\A.*) – this is a lookbehind to the beginning of the string, and a lookahead to our position. I’ve placed \A “after” the lookahead because lookbehinds are matched from end to start.
  • There is no relation between the actual characters we match (a few characters from the beginning) and the actual logic flow. This is very similar to the maze-solving regex.
  • Uniqueness – In three places we’re checking that a new capture is different from all previous captures: We choose an unused rectangle, we fill the rectangle into unoccupied tildes, and then we find the next empty tilde. This is done with this pattern:

    (?=.*\Z
    	(?<!(?=\A\k<Filled>(?<=\A\k<TempNextPos>))\A.*(?<-Filled>.)*) 
    )

    This is fairly neat. The pattern is based on Martin Ender‘s answer on Programming Puzzles & Code Golf (by the way, this is the third time today that I’ve linked one of Martin’s answers – thanks!).
    This is a negative lookbehind (remember – read from the end) which:

    • (?<-Filled>.)* – Allow popping from the stack.
    • \A.* – go to the beginning.
    • (?=\A\k<Filled>(?<=\A\k<TempNextPos>)) – see that the item in the stack is different from our new value.

    Because this is a negative lookbehind all options are checked – we check the new value against all values in the stack.
    Because this is a negative lookaround the changes are discarded – we don’t have to copy the stack, and we are not losing the captured values.

  • Rotation is done as a simple alternation, followed by conditional matches on width/height.
  • We can tweak the pattern to change its behavior. For example, we can quite easily allow multiple usages of the same rectangle, or disable rotation. We can also remove the last assertion – (?(Done)|(?!)): this causes the pattern to match the first few rectangles, and is a nice way of debugging the behavior of the pattern.
  • There’s a subtle problem here: we are using input characters to represent the order of the rectangles we are using. We need an increasing number of characters for each additional rectangle. If we have a large enough number of small rectangles we may run out of input characters and the match will fail. We can fix it by allowing additional spaces at the end, for example. While this is a bug, it is not too interesting.
  • This problem was the second programming assignment I got at university (14 years ago, in Java, not regex).

Getting the Solution

The result is a single Match. The match is successful when there is a solution.
This is already pretty good.
We are also capturing these groups:

IndexLength As shown in the table above, for each rectangle used in the solution there’s a capture of IndexLength. The length of the group indicates the index of the rectangle.
Rectangle Captures the rectangles in the order they were used.
Rotate For each rectangle – captures one character if the rectangle is rotated, and an empty capture if it isn’t rotated.
NextPos Captures the position of each rectangle in the solution, on the target rectangle. The position before the tilde where the rectangle should be placed.
SolutionChar Contains each character as used in the order they were iterated. In our example with solution (I, II) the captures of SolutionChar would be {a,a,a,a,b,b}.
Filled Contains all tildes in the target rectangle in the order they were filled. This is the same order as SolutionChar, and we can use both groups to paint the solution if we wanted.

Source Code

Source code and test cases can be found on GitHub: https://github.com/kobi/RecreationalRegex
Rectangles were drawn in HTML/CSS using JS Bin: http://jsbin.com/bafaxijiqa/edit?html,css,output


Thanks! Thanks!

Jenkins CI – Error 500 When Opening Views

After messing with Jenkins a little (upgrade and downgrade), all views were broken (views are the tabs on top that filer jobs). When clicking on them we got a Server Error (500).

The log showed several errors:

Caused by: java.lang.NullPointerException: Cannot invoke method isEmpty() on null object
    at org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:77)
    at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:45)
    ...
Caused by: org.apache.commons.jelly.JellyTagException: jar:file:/C:/Program Files (x86)/Jenkins/war/WEB-INF/lib/jenkins-core-1.482.jar!/hudson/model/View/index.jelly:44:43:  Cannot invoke method isEmpty() on null object
    at org.apache.commons.jelly.impl.TagScript.handleException(TagScript.java:716)
    at org.apache.commons.jelly.impl.TagScript.run(TagScript.java:282)
    ...
27/05/2013 16:10:50 org.kohsuke.stapler.compression.CompressionFilter reportException
WARNING: Untrapped servlet exception
javax.servlet.ServletException: org.apache.commons.jelly.JellyTagException: jar:file:/C:/Program Files (x86)/Jenkins/war/WEB-INF/lib/jenkins-core-1.482.jar!/hudson/model/View/index.jelly:44:43:  Cannot invoke method isEmpty() on null object
    at org.kohsuke.stapler.jelly.JellyClassTearOff.serveIndexJelly(JellyClassTearOff.java:112)
    ...

I don’t know exactly what caused it, but I was able to solve the error. The views are saved in Jenkins’s main configuration file, config.xml. (on our server that was C:\Program Files (x86)\Jenkins\config.xml)

First, back up this file.

The views should still be intact in the file. I was able to create another view.
A working view should look like this:

<listView>
  <owner class="hudson" reference="../../.."/>
  <name>Working View</name>
  <filterExecutors>false</filterExecutors>
  <filterQueue>false</filterQueue>
  <properties class="hudson.model.View$PropertyList"/>
  <jobNames class="tree-set">
	<comparator class="hudson.util.CaseInsensitiveComparator"/>
	<string>Job Name 1</string>
	<string>Job Name 2</string>
  </jobNames>
  <jobFilters/>
  <columns>
	<hudson.views.StatusColumn/>
	<hudson.views.WeatherColumn/>
	<hudson.views.JobColumn/>
	<hudson.views.LastSuccessColumn/>
	<hudson.views.LastFailureColumn/>
	<hudson.views.LastDurationColumn/>
	<hudson.views.BuildButtonColumn/>
  </columns>
</listView>

On line 7 we have the list of all job names. On a broken view, the class attribute was missing:

<jobNames>
    <comparator class="hudson.util.CaseInsensitiveComparator"/>
    <string>Job Name 1</string>
    <string>Job Name 2</string>
</jobNames>

Add class="tree-set" to the jobNames element to correct the configuration file.

“Reload Configuration from Disk” didn’t work for me – I had to take the service down, change the XML, save it, and start the service again.
After that everything worked.

Warning

Jenkins might delete all jobs in the view (the <string>Job Name 1</string> elements) – that is why we backed up the file. It should be easy to restore these, or simply select the jobs again once the view is working.