Will it Pipe? Brevity and Readability

Fri, Nov 2, 2007 2-minute read

Scott Hanselman and I recently chatted about using PowerShell for a bit of log analysis. The majority of his solution (in green) ended up being quite elegant, flowing into a pipeline nearly as easily as you might speak it:

PS C:\> $re =[regex]"\d{2}(?=[_.])"; import-csv file.csv |
    select File, Hits, @{Name="Show";Expression={$re.matches($_.File)[0] } } |
    sort Show -desc | group Show |
    select Name,{($_.Group | Measure-Object -Sum Hits).Sum }

(Of course, one should never pronounce a regex aloud in polite company.)

The bit in red took a little hammering on, but eventually produced the desired results. At this point, I mentioned that I didn’t think the final solution was a good demonstration of PowerShell’s pipeline power. 80% of it, absolutely. But since the last bit took some fussing, the solution stopped being an example of how easy the pipeline makes everything, and instead became an example of how you could write a pipeline to do anything.

If I was to blog it with the intent of education, I probably would have written a more scripty function:

function Get-ShowHits
{
    $regex = '/hanselminutes_(\d+).*'
    $shows = Import-CSv File.csv | Select File,Hits | Group { $_.File -replace $regex,'$1' }
    foreach($show in $shows)
    {
        $showOutput = New-Object System.Management.Automation.PsObject
        $showOutput | Add-Member NoteProperty Name $show.Name
        $showOutput | Add-Member NoteProperty Hits ($show.Group | Measure-Object -Sum Hits).Sum
        $showOutput
    }
}
Get-ShowHits | Sort -Desc Hits

This example illustrates a couple great points about PowerShell:

  • You can write functions and scripts to wrap complex functionality into a more usable form
  • You can write pipelines to easily express powerfull object flows (in the $shows line)
  • You can create your own objects with their own properties – and manipulate them just as easily

But the most important point about these two examples is how easy they are to modify and extend.

Jon Udell keyed in on Scott’s post, and in the comments of the two blogs, language comparisons quickly blossomed. Hey, we’ve been here before!

Specifically,

These types of problems are like mosquito bites for me - I can’t stop itching at them. A better ruby one-liner. Doesn’t print a header but not a big deal…

CSV.read(“test.csv”).inject(Hash.new(0)) {|h,row| h[row[0][/\d{4}/].to_i] += row[1].to_i;h}.sort.each {|i| puts("#{i[0]}\t#{i[1]}")}

Once you have a solution that works, a natural scripter’s passion is to tinker it down to one line. It’s no longer educational, intelligible, or extendable, but it’s fun. You can do that in PowerShell, too:

$foo = @{}; ipcsv test.csv | % {
    $foo[0+($_.File -replace '.*?(\d+).*','$1')] += (0+$_.Hits) };
    $foo.GetEnumerator() | Sort Value

Mmmm. Pipeline smoke.