Archives for the Month of March, 2013

Hacking Pi with PowerShell

A Facebook friend recently posted a cool picture from the Pi chain that California Institute of Technology created on Pi Day, 2013:

image

After seeing that, you might wonder – “Where in Pi is that?” And, “What number does each colour represent?”

PowerShell can help here – its support for regular expressions let you find all kinds of stuff in text. But where do you find the text of Pi? Bing, of course. After copy + pasting the first 100,000 digits of Pi from into a text file, you now have some text to work with.

If you want to see some of the things you can do with Regular Expressions, check out this quick reference in the PowerShell Cookbook: http://www.powershellcookbook.com/recipe/qAxK/appendix-b-regular-expression-reference.

If you assume that the chain links are 4 inches wide, 100,000 digits will go for almost 7 miles. It’s doubtful that they made a chain longer than that, so using 100,000 digits is pretty safe.

PS C:\Users\Lee> $page = iwr http://www.geom.uiuc.edu/~huberty/math5337/groupe/digits.html
PS C:\Users\Lee> $null = "$page" -match '(\d|\.|\n){5,}'
PS C:\Users\Lee> $digits = $matches[0] -replace "\s",""
PS C:\Users\Lee> -join $digits[0..100]
3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067

Now, since we know that each colour represents a different number, and those colours appear in a certain pattern, we can look in $digits for that pattern.

What we see – 15 colours:

Yellow, Yellow, Blue, (Orange?) Green, Blue, Yellow, (Red?), Purple, Purple, Orange, Pink, Red, Yellow, Yellow

Now, here’s the regular expression to match 15 different characters (15 dots):

$digits –match “................”

It does, of course (I hear Pi goes on for a while), so let’s get more specific.

Regular expressions let you group characters by surrounding them in parenthesis:

$digits –match “(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)”

And also give them names. Let’s name the first group:

$digits –match “(?<Yellow>.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)”

And rather than using the “dot” to match a specific character, you can refer to a previous group – by either name or number. Here’s the chunk that names the first character “Yellow”, and then says that the second character must be the same as the first:

$regex = "(?<Yellow>.)(\k<Yellow>)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)"
$digits | Select-String $regex -AllMatches | % { $_.Matches.Value } | Select -First 1
33832795028841

This is how we’ll “crack the code” – find patterns where the digits are the same at the same place the colours are the same. Since most of the colours aren’t repeated, they don’t help us much and we can ignore them for now. Let’s expand our example:

$regex = "(?<Yellow>.)(\k<Yellow>)" +
    "(?<Blue>.)(?<Orangey>.)(<?Green>.)(\k<Blue>)(\k<Yellow>)" +
    "(?<Pinkish>.)(?<Purple>.)(\k<Purple>)(?<Orange>.)(?<Pink>.)" +
    "(?<Red>.)(\k<Yellow>)(\k<Yellow>)"
$digits | Select-String $regex -AllMatches | % { $_.Matches.Value } | sort

PS >

Hrrm. No results.

Let’s relax some constraints. Maybe some colours we thought matched really didn’t? With my man eyes to the rescue, those blues look subtly different.

$regex = "(?<Yellow>.)(\k<Yellow>)" +
    "(?<Blue>.)(?<Orangey>.)(<?Green>.)(?<Blue2>.)(\k<Yellow>)" +
    "(?<Pinkish>.)(?<Purple>.)(\k<Purple>)(?<Orange>.)(?<Pink>.)" +
    "(?<Red>.)(\k<Yellow>)(\k<Yellow>)"
$digits | Select-String $regex -AllMatches | % { $_.Matches.Value } | sort
PS >

Grr! No results again.

Maybe the yellows are different? There are three groups. The first group of two is certainly the same colour. The last group of two is certainly the same colour. But maybe those groups are different yellows. Lets see – call the other ones Yellow2 and Yellow3:

$regex = "(?<Yellow>.)(\k<Yellow>)" +
    "(?<Blue>.)(?<Orangey>.)(<?Green>.)(?<Blue2>.)(?<Yellow2>.)" +
    "(?<Pinkish>.)(?<Purple>.)(\k<Purple>)(?<Orange>.)(?<Pink>.)" +
    "(?<Red>.)(?<Yellow3>.)(\k<Yellow3>)"
$digits | Select-String $regex -AllMatches | % { $_.Matches.Value } | sort

PS >

Hmm. This doesn’t seem to be working out so well.

image

Brain flash. Oh wait, this is just a picture. What if it was taken from the other side? The colours would be reversed. Maybe we’re looking at the digits of Pi in the wrong order?

Let’s rewrite the regex since we might actually be looking at:

Yellow, Yellow, Red, Pink, Orange, Purple, Purple, (Red?), Yellow, Blue, (Orange?) Green, Blue, Yellow,Yellow

$regex = "(?<Yellow3>.)(\k<Yellow3>)" +
    "(?<Red>.)(?<Pink>.)(?<Orange>.)(?<Purple>.)(\k<Purple>)" +
    "(?<Pinkish>.)(?<Yellow2>.)(?<Blue2>.)(?<Green>.)(?<Orangey>.)" +
    "(?<Blue>.)(?<Yellow>.)(\k<Yellow>)"
$digits | Select-String $regex -AllMatches | % { $_.Matches.Value } | sort

000139962344455
004489917210022
006079992795411
006447771837022
006734420647477
006758838043211
008368842671022
009318804549333
112821107094422

(…)

Bingo!

That’s a lot of options, so let’s add back in some things that we are more sure of – specifically, those yellows:

$regex = "(?<Yellow3>.)(\k<Yellow3>)" +
    "(?<Red>.)(?<Pink>.)(?<Orange>.)(?<Purple>.)(\k<Purple>)" +
    "(?<Pinkish>.)(?<Yellow3>.)(?<Blue2>.)(?<Green>.)(?<Orangey>.)" +
    "(?<Blue>.)(?<Yellow2>.)(\k<Yellow2>)"
$digits | Select-String $regex -AllMatches | % { $_.Matches.Value } | sort

000139962344455
004489917210022
006079992795411
006447771837022
006734420647477
006758838043211
008368842671022
009318804549333
112821107094422
113829928776911

(…)

But that first yellow has a zero. Maybe that’s a hint?

$regex = "(?<Yellow3>0)(\k<Yellow3>)" +
    "(?<Red>.)(?<Pink>.)(?<Orange>.)(?<Purple>.)(\k<Purple>)" +
    "(?<Pinkish>.)(?<Yellow3>.)(?<Blue2>.)(?<Green>.)(?<Orangey>.)" +
    "(?<Blue>.)(?<Yellow2>.)(\k<Yellow2>)"
$digits | Select-String $regex -AllMatches | % { $_.Matches.Value } | sort

000139962344455
004489917210022
006079992795411
006447771837022
006734420647477
006758838043211
008368842671022
009318804549333

Not very many, so we’re getting closer. Let’s take a look at the patterns and rule out the ones that don’t make sense:

000139962344455    # No, triple start digit
004489917210022    # No, second digits doubled
006079992795411    # No, has another zero ("Yellow") on the fourth digit
006447771837022    # No, has another doubling right after the red
006734420647477    # No, has colour four of colour #4, four of colour #7
006758838043211    # No, 8 means purple, and has another purple in the middle
008368842671022    # No, has another zero ("Yellow") where we thought Blue was
009318804549333    # No, has a tripled final colour

Well, evidently none of them make sense. This is truly confusing. Maybe the numbers happen after 100,000 digits of Pi?

Taking a different approach, it turns out that the original California Institute of Technology page shows a few pictures where there are numbers drawn on the links. Could we use those for a key?

 

Let’s try this again from the start. The pictures tell us:

0 = Yellow

1 = Pink

2 = Goldenrod

3 = Green

4 = ?

5 = Light Blue

6 = Orange

7 = Purple

8 = ?

9 = Dark Blue

Wait. There’s a Goldenrod?

This might help. Let’s do this match against the left-to-right version:

Yellow, Yellow, Blue, (Orange?) Green, Blue, Yellow, (Red?), Purple, Purple, Orange, Pink, Red, Yellow, Yellow

We can now substitute some numbers into the regex. Yellow and Goldenrod are pretty close, so let’s use Regex Alteration (a|b) to let it select either. The light blue is pretty clear, so let’s call that out specifically:

$regex = "(0|2)(0|2)5..5.......(0|2)(0|2)"
$digits | Select-String $regex -AllMatches | % { $_.Matches.Value } | sort

225695968815920

Wow! It seems like we have a match!

Godenrod, Goldenrod, Light Blue, Orange, Dark Blue, Blue, Dark Blue, …

But there are two problems:

  • It implies that either 9 is Green, or that the picture showing Green is actually Dark Blue
  • It says that the colour before the second blue is the same as the colour after it. That’s not the case.
  • Maybe they messed up assigning colours? If you follow the colour chart through further, there are almost a dozen more mistakes. Let’s go back to the big brain flash, and reverse the regex.

    $regex = ".........5..5(0|2)(0|2)"
    $digits | Select-String $regex -AllMatches | % { $_.Matches.Value } | sort

    005476022525520
    008169980536520
    020695124539500
    036407573587502
    048466277504500
    068006422512520
    090988702550520
    101024365553522

    (…)

    Tons of matches again. Let’s get more specific with those yellows:

    $regex = "(0|2)(0|2)......(0|2)5..5(0|2)(0|2)"
    $digits | Select-String $regex -AllMatches | % { $_.Matches.Value } | sort

    005476022525520
    008169980536520

    Much better. Apply the process of elimination again:

    005476022525520    ## Says that the red is colour 5, but also that orange and light blue are too.
    008169980536520    ## Seems legit

    One was way off, the other one seemed legit. If we apply our colour key, we get:

    Yellow, Yellow, ?, Pink, Orange, Dark Blue, Dark Blue, Red, Yellow, Light Blue, Green, Orange, Light Blue, Goldenrod, Yellow

    That seems very reasonable. The unknown colour (8) is evidently red. The two that we thought were purple were either actually dark blue, or the assembly person made a mistake. “Pinkish” was red in the sunlight. The last two yellows were actually not two yellows – they were one Goldenrod, and then one Yellow. Or the person putting them together made a mistake.

    Now, we can get smart. How far along were they on the chain?

    $digits.IndexOf("008169980536520")

    12309

    Going back to our 4-inch estimate (3 of those in a foot), that’s 4103 feet:

What went wrong in the pattern-based regex?

If you take a look at the final regex we played with, we assumed that the last two were the same colour:

$regex = "(?<Yellow3>0)(\k<Yellow3>)" +
    "(?<Red>.)(?<Pink>.)(?<Orange>.)(?<Purple>.)(\k<Purple>)" +
    "(?<Pinkish>.)(?<Yellow3>.)(?<Blue2>.)(?<Green>.)(?<Orangey>.)" +
    "(?<Blue>.)(?<Yellow2>.)(\k<Yellow2>)"
$digits | Select-String $regex -AllMatches | % { $_.Matches.Value } | sort

If we assume that “Orangey” is actually “Orange”, and say that the final two digits are either Yellow or Goldenrod, then we get the proper result after the process of elimination:

$regex = "(?<Yellow3>0)(\k<Yellow3>)" +
    "(?<Red>.)(?<Pink>.)(?<Orange>.)(?<Purple>.)(\k<Purple>)" +
    "(?<Pinkish>.)(?<Yellow3>.)(?<Blue2>.)(?<Green>.)(\k<Orange>)" +
    "(?<Blue>.)(?<Yellow2>.)(\k<Yellow2>|\k<Yellow3>)"
$digits | Select-String $regex -AllMatches | % { $_.Matches.Value } | sort

001502298330798
008169980536520
009743300884990

Why do I care?

Why not spend some time messing around in Pi?

ScanSnap ix500 Scanning into Excel

Following up on my previous post about the ScanSnap ix500, one thing I couldn’t find when searching for information about it was how well it did scanning information into Excel.

So, here’s a quick video that demonstrates it:

 

ScanSnap ix500 Scanning Speed and Document Processing Workflow

If you’ve ever played with the Getting Things Done system, you may have heard David Allen’s excited advice for getting your paperwork organized: a filing system, and automatic labeler.

One of the hardest parts of a paperwork filing system is the nomenclature. If you just have files that are labeled “A-Z”, you can never remember – did you put your car insurance information under “C” for Car? “I” for insurance? “G” for Geico?

Figuring that out is an annoying process. It means manually skimming through (at least) each of those folders.

So, the Getting Things Done approach is simpler – label a new folder with “Geico Car Insurance”. As you skim through your folder tabs, you’ll see that and not have to dig in other folders.

That’s a great philosophy, and I stuck with it for a while. However, I declared “paperwork filing bankruptcy” a few years ago. Taking my filing box out, filing paperwork, creating new labels – it doesn’t sound painful, but it was enough that I instead adopted the “sedimentation approach”. In a filing cabinet drawer, old bills are on the bottom, new bills are on the top. When you need to find a bill or document, you spend an annoyingly large amount of time digging through it.

You organize the REALLY important stuff, but most other stuff just goes in the pile.

In any case, this approach isn’t exactly optimal. Even with the clearly labeled system, I am well and truly in trouble if I want to find the little receipt from some electronics store that I forget from 2 years ago that had the purchase information about several things – only one of which I care about.

This is becoming an increasingly common sense of dissatisfaction, and the “home paperless office” has recently started becoming a viable answer.

The idea behind the “paperless office” is that you scan in all of your documents / paperwork, have them digitized, and software automatically recognizes the text in those documents through OCR (optical character recognition). Nowadays, most software stores your documents in searchable PDFs – the kind of PDF that lets you select text, but also shows the original document in full fidelity.

Once you’ve gone through this process, you can use software that acts like a virtual filing cabinet to search through your paperwork, or use the built-in search capabilities of Windows.

To get a feel for the “paperless office” approach, I first downloaded a trial of Abbyy Fine Reader. I used my current flatbed scanner, scanned in a few bills (flipping each page so I get both sides, pushing “Scan” each time), and watched the PDFs get generated and recognized into Windows. From there, the power is amazing – you can search for nearly any text in the document. You know that huge 50 page car insurance policy? The software has absolutely no problem finding a random word from page 38.

I was completely sold on this being the way to manage my paperwork. But the scanning. Oh, the horrible scanning.

Fortunately, there’s an answer to that unbelievable pain. Document scanners have arrived for that purpose. Unlike your multi-function printer / flatbed scanner combo, these things are designed for one thing: scanning documents. Quickly, efficiently, and no fuss.

In this world, there’s a clear line of champions – the Fujitsu ScanSnap line of document scanners. They scan 50 pages per minute, double sided, and have great text recognition capabilities.

When looking around for reviews, I couldn’t believe the amount of love customers had for a freaking document scanner.

image

As I roamed the web for reviews of their newest model – the ScanSnap IX500 - I continually stumbled on blogs that said things like this:

I've been using the previous model, the ScanSnap S1500 for almost 4 years now. I liked it so much that I bought a second one. To me, it completely transformed the concept of going paperless: from painful and time consuming to fun and even cool. (…)

That said, I bought the iX500 as soon as I learned it was available on Amazon.

Reviews of the IX500 are even more polarized:

image

Without further ado, I got the ScanSnap ix500 document scanner the other day in an attempt to tame the vast piles of paperwork sedimentation. And no surprise, this thing is fast. I’ve spent the last few nights scanning in paperwork, and have been averaging about 100 documents per hour. Most of the time is spent unfolding paper.

One thing I couldn’t find a good example of when researching the ScanSnap iX500 was its scanning speed and document processing workflow. So, I recorded a quick two minute example last night, and uploaded it. Enjoy!

 

 

I’ve also blogged an example of the ScanSnap scanning paper and bringing tabular data into Excel here: http://www.leeholmes.com/blog/2013/03/09/scansnap-ix500-scanning-into-excel/