Creating SQL's "join"-like functionality in MSH

Wed, Jun 22, 2005 3-minute read

An interesting thread came up on our BetaPlace newsgroup today, asking if MSH supports a SQL-like join command.  After all, we do already support the select-object cmdlet, which allows you to pick individual properties for display.  Given our display model (that often looks and feels like a SQL grid output,) this is a totally great question.  It’s a feature I’ve always wanted in a shell, too.

We don’t support the feature natively, but the MSH language sure does.

Although a true SQL join combines the colums of the tables you select, we don’t really need that in the command line.  Instead, we’ll implement the set-intersection capabilities of the join.  Given two sets of items, find the ones that meet some common criteria.  For those that match, output a result.

Let’s look at a simple example:

Set 1

NameNumber
Lee555-1212
Joe555-5555
Sam555-1234

Set 2

NameNumber
Lee’s Sister555-1212
Frank555-9876
Jim555-1029
Jed555-1234

Now, using SQL, let’s see what numbers are being shared:

SELECT Number FROM Set1 WHERE Set1.Number = Set2.Number

Would give us the results:

Number
555-1212
555-1234

Now, let’s do that in MSH.  We’ll pick all the items from our system32 directory that are also in the windows root:

MSH:171 C:\temp > $dirset1 = (get-childitem c:\winnt\system32)
MSH:172 C:\temp > $dirset2 = (get-childitem c:\winnt\)
MSH:173 C:\temp > join-object $dirset1 $dirset2 -where:{$firstItem.Name -eq $secondItem.Name}

    Directory: FileSystem::C:\winnt\system32

Mode    LastWriteTime            Length Name
-—    ————-            —— —-
-a—   Mar 25  2003                  2 desktop.ini
-a—   Mar 24 17:08              68608 notepad.exe
-a—   Mar 25  2003               8704 winhlp32.exe
d—-   Jun 21 21:52                    config
d—-   Apr 05 05:00                    IME
d—-   Apr 05 12:05                    mui
d—-   Apr 05 16:20                    SoftwareDistribution

As is the bane of a SQL join, this takes a very long time to complete on large data sets.  However, it works suprisingly well.  Here’s the script:

## join-object.msh
## Outputs the intersection of two lists, based on a given property
##
## Parameters:
##    -First:  The first set to join with.  Defaults to the pipeline if not specified
##    -Second: The second set to join with.
##    -Where:  A script block by which to compare the elements of the two sets.
##       -$firstItem refers to the element from 'First'
##       -$secondItem refers to the element from 'Second'
##    -Output: An expression to execute when the 'Where' expression evaluates to 'True".
##             Defaults to outputting $firstItem if not specified.
##
## Examples:
## "Hello","World" | join-object -second:"World"
## join-object -first:"A","E","I" -second:"BAT","BUG","BIG" -where:{$secondItem.Contains($firstItem)} -output:{$secondItem}
##
## $dirset1 = (get-childitem c:\winnt)
## $dirset2 = (get-childitem c:\winnt\system32)
## join-object $dirset1 $dirset2 -where:{$firstItem.Name -eq $secondItem.Name}

param($first=@(), $second = $(throw "Please specify a target to join"), $where={$firstItem -eq $secondItem}, $output={$firstItem})

if(-not $first)
{
    foreach($element in $input) { $first += $element }
}

foreach($firstItem in $first)
{
   foreach($secondItem in $second)
   {
      if(& $where) { & $output }
   }
}

First, we check if the user has specified the ‘First’ dataset to use.  If not, we collect the data from the pipeline.  Then, we simply go through every item in the ‘First’ dataset.  For each of those items, we go through every item in the ‘Second’ dataset.  If our ‘Where’ clause evaluates to true, then we output as desired. 

What’s mind-blowing is that the script could literally be written as a one-liner if you removed the error handling and comments.

[Edit: Monad has now been renamed to Windows PowerShell. This script or discussion may require slight adjustments before it applies directly to newer builds.]