PowerShell Cookbook

Twitter Updates

    follow me on Twitter

    Search

    Categories

     

    On this page

    Creating SQL's "join"-like functionality in MSH

    Archive

    Blogroll

    Disclaimer
    I work for Microsoft.

    The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

    RSS 2.0 | Atom 1.0 | CDF

    Send mail to the author(s) E-mail

    Total Posts: 235
    This Year: 12
    This Month: 0
    This Week: 0
    Comments: 634

    Sign In

     Tuesday, June 21, 2005
    Wednesday, June 22, 2005 6:45:53 AM (Pacific Daylight Time, UTC-07:00) ( )

    An interesting thread came up on our BetaPlace newsgroup today, asking if MSH supports a SQL-like join command.  After all, we do already support the select-object cmdlet, which allows you to pick individual properties for display.  Given our display model (that often looks and feels like a SQL grid output,) this is a totally great question.  It's a feature I've always wanted in a shell, too.

    We don't support the feature natively, but the MSH language sure does.

    Although a true SQL join combines the colums of the tables you select, we don't really need that in the command line.  Instead, we'll implement the set-intersection capabilities of the join.  Given two sets of items, find the ones that meet some common criteria.  For those that match, output a result.

    Let's look at a simple example:

    Set1
    Name, Number
    Lee, 555-1212
    Joe, 555-5555
    Sam, 555-1234

    Set2
    Name, Number
    Lee's Sister, 555-1212
    Frank, 555-9876
    Jim, 555-1029
    Jed, 555-1234

    Now, using SQL, let's see what numbers are being shared:

    SELECT Number FROM Set1 WHERE Set1.Number = Set2.Number

    Would give us the results:

    Result
    Number
    555-1212
    555-1234

    Now, let's do that in MSH.  We'll pick all the items from our system32 directory that are also in the windows root:

    MSH:171 C:\temp > $dirset1 = (get-childitem c:\winnt\system32)
    MSH:172 C:\temp > $dirset2 = (get-childitem c:\winnt\)
    MSH:173 C:\temp > join-object $dirset1 $dirset2 -where:{$firstItem.Name -eq $secondItem.Name}


        Directory: FileSystem::C:\winnt\system32


    Mode    LastWriteTime            Length Name
    ----    -------------            ------ ----
    -a---   Mar 25  2003                  2 desktop.ini
    -a---   Mar 24 17:08              68608 notepad.exe
    -a---   Mar 25  2003               8704 winhlp32.exe
    d----   Jun 21 21:52                    config
    d----   Apr 05 05:00                    IME
    d----   Apr 05 12:05                    mui
    d----   Apr 05 16:20                    SoftwareDistribution

    As is the bane of a SQL join, this takes a very long time to complete on large data sets.  However, it works suprisingly well.  Here's the script:

    ## join-object.msh
    ## Outputs the intersection of two lists, based on a given property
    ##
    ## Parameters:
    ##    -First:  The first set to join with.  Defaults to the pipeline if not specified
    ##    -Second: The second set to join with.
    ##    -Where:  A script block by which to compare the elements of the two sets.
    ##       -$firstItem refers to the element from 'First'
    ##       -$secondItem refers to the element from 'Second'
    ##    -Output: An expression to execute when the 'Where' expression evaluates to 'True".
    ##             Defaults to outputting $firstItem if not specified.
    ##
    ## Examples:
    ## "Hello","World" | join-object -second:"World"
    ## join-object -first:"A","E","I" -second:"BAT","BUG","BIG" -where:{$secondItem.Contains($firstItem)} -output:{$secondItem}
    ##
    ## $dirset1 = (get-childitem c:\winnt)
    ## $dirset2 = (get-childitem c:\winnt\system32)
    ## join-object $dirset1 $dirset2 -where:{$firstItem.Name -eq $secondItem.Name}

    param($first=@(), $second = $(throw "Please specify a target to join"), $where={$firstItem -eq $secondItem}, $output={$firstItem})

    if(-not $first)
    {
        foreach($element in $input) { $first += $element }
    }

    foreach($firstItem in $first)
    {
       foreach($secondItem in $second)
       {
          if(& $where) { & $output }
       }
    }

    First, we check if the user has specified the 'First' dataset to use.  If not, we collect the data from the pipeline.  Then, we simply go through every item in the 'First' dataset.  For each of those items, we go through every item in the 'Second' dataset.  If our 'Where' clause evaluates to true, then we output as desired. 

    What's mind-blowing is that the script could literally be written as a one-liner if you removed the error handling and comments.

    [Edit: Monad has now been renamed to Windows PowerShell. This script or discussion may require slight adjustments before it applies directly to newer builds.]