Regular Expressions in Monad

Tue, Jul 19, 2005 3-minute read

SteveX has a Monad forum on his site, and one of the first topics posted (by Ayende) was an example script on using regular expressions in Monad.

This is helpful example of how well Monad integrates with the .Net framework.  You can call out to [System.Text.RegularExpressions.Regex], and port your C# regular expression almost effortlessly.

But wait, it only gets better.

Regular Expressions are a glorious beast of burden in scripting languages, as you might have noticed if you’ve read much Perl.  Because of that, we’ve given them first-class language support via the “-match” expression.

Let’s start with a simple example:

MSH:40 C:\Temp >“Hello World” -match “hello”
True

The match evaluates to $true if the match was successful, $false otherwise.  Notice that regular expressions (via the -match expression) are case insensitive by default.  If you want case sensitive operation, the -cmatch expression is the one you want to look toward:

MSH:41 C:\Temp >“Hello World” -cmatch “hello”
False

Once the match expression evaluates your regular expression, it places the resulting matches (called groups) in an appropriately named variable, called $matches:

MSH:43 C:\Temp >$matches

Key                            Value
---                            -----
0                              Hello

MSH:46 C:\Temp >"Hello World" -match "hello.*or"
True

MSH:47 C:\Temp >$matches

Key                            Value
---                            -----
0                              Hello Wor

As with the .Net method, $matches[0] always holds the entire match of your regular expression.  Regular expression matches can get much more complex – such as when they contain multiple capture groups, or even named ones:

MSH:50 C:\Temp >"Hello World" -match "h(ell)o(?<named1>.*)(?<named2>or)"
True

MSH:51 C:\Temp >$matches

Key                            Value
---                            -----
named2                         or
named1                          W
1                              ell
0                              Hello Wor

This is how Monad (via .Net) chose those groups:

  • Match ‘0’, as always, is the largest substring match of the regular expression.
  • Match ‘1’ is the un-named capture at the beginning of the regular expression.
  • Match ‘named1’ is the first named capture in the middle of the regular expression.
  • Match ‘named2’ is the second named capture in the middle of the regular expression.

Monad uses .Net’s regular expression facilities, so keep this great MSDN Regex Reference handy.  From there, memorize obscure minutea such as the following to dazzle family and soon-to-be-ex-friends:

Named captures are numbered sequentially, based on the left-to-right order of the opening parenthesis (like unnamed captures), but numbering of named captures starts after all unnamed captures have been counted.

This becomes a pretty powerful mechanism for slicing and dicing text content.  How about a command-line address book?

MSH:79 C:\Temp >get-content address_book.csv
Joe,555-1212
Frank,555-1234
Ella,555-314159265359

MSH:80 C:\Temp >$outputBlock = { "Name: " + $matches["name"] + ", Number: " + $matches["number"] }
MSH:81 C:\Temp >get-content address_book.csv | `
>> foreach-object { if ($_ -match "(?<name>Ella),(?<number>.*)") `
>> { & $outputBlock } }
>>
Name: Ella, Number: 555-314159265359

So, to close the loop – we can write the original script:

if($args.Count -lt 2)
{
   "Arguments are: "
   break
}

$file = [string]$args[0]
$pattern = [string]$args[1]
$re = new-object System.Text.RegularExpressions.Regex($pattern)
foreach ($line in $(Get-Content $file))
{
   $match = $re.Match($line);
   if($match.Success)
   {
      $match.Value
   }
}

like this, using Monad’s built-in language support.

param(
   [string] $filename = $(throw "Please specify a filename."),
   [string] $regex = $(throw "Please specify a regular expression.")
     )

foreach ($line in (get-content $filename))
{
   if($line -match $regex)
   {
      $matches[0]
   }
}

Pretty nice!

[Edit: Monad has now been renamed to Windows PowerShell. This script or discussion may require slight adjustments before it applies directly to newer builds.]