burn-console Digression: An MSH Script Profiler Part 2

Mon, Dec 5, 2005 6-minute read

In the last article, we talked about the theory behind a sampling profiler.  Let’s go over the code to implement one.  We’ll use this to guide our future optimizations.  You can also download the script here: profile-transcript.msh.

################################################################################
##
## profile-transcript.msh
##
## Computes the performance characteristics of a script, based on the transcript
## of it running at trace level 1.
##
## To profile a script:
##    1) Turn on MSH script tracing in the window that will run the script: 
##       set-mshdebug –trace 1
##    2) Turn on the transcript for the window that will run the script:
##       start-transcript
##       (Note the filename that Monad provides as the logging destination.)
##    3) Type in the script name, but don't actually start it.
##    4) Open another MSH window, and navigate to the directory holding 
##       this script.  Type in '.\profile-transcript.msh <path-to-transcript>',
##       replacing <path-to-transcript> with the path given in step 2.  Don't
##       press <Enter> yet.
##    5) Switch to the profiled script window, and start the script.
##       Switch to the window containing profile-transcript, and press <Enter>
##    6) Wait until your profiled script exits, or has run long enough to be
##       representative of its work.  To be statistically accurate, your script
##       should run for at least ten seconds.
##    7) Switch to the window running profile-transcript.msh, and press a key.
##    8) Switch to the window holding your profiled script, and type:
##       stop-transcript
##    9) Delete the transcript.
##
## Note: You can profile regions of code (ie: functions) rather than just lines
## by placing the following call at the start of the region:
##       write-debug "ENTER <region_name>"
## and the following call and the end of the region:
##       write-debug "EXIT"
## This is implemented to account exclusively for the time spent in that 
## region, and does not include time spent in regions contained within the
## region.  For example, if FunctionA calls FunctionB, and you've surrounded
## each by region markers, the statistics for FunctionA will not include the
## statistics for FunctionB.
##
################################################################################
param($logFilePath = $(throw "Please specify a path to the transcript log file."))

function Main
{
    ## Run the actual profiling of the script.  $uniqueLines gets
    ## the mapping of line number to actual script content.
    ## $samples gets a hashtable mapping line number to the number of times
    ## we observed the script running that line.
    $uniqueLines = @{}
    $samples = GetSamples $uniqueLines
    
    "Breakdown by line:"
    "----------------------------"

    ## Create a new hash table that flips the $samples hashtable -- 
    ## one that maps the number of times sampled to the line sampled.
    ## Also, figure out how many samples we got altogether.
    $counts = @{}
    $totalSamples = 0;
    foreach($item in $samples.Keys) 
    { 
       $counts[$samples[$item]] = $item 
       $totalSamples += $samples[$item]
    }

    ## Go through the flipped hashtable, in descending order of number of 
    ## samples.  As we do so, output the number of samples as a percentage of
    ## the total samples.  This gives us the percentage of the time our script
    ## spent executing that line.
    foreach($count in ($counts.Keys | sort -desc))
    {
       $line = $counts[$count]
       $percentage = "{0:#0}" -f ($count * 100 / $totalSamples)
       "{0,3}%: Line {1,4} -{2}" -f $percentage,$line,
          $uniqueLines[$line]
    }

    ## Go through the transcript log to figure out which lines are part of any
    ## marked regions.  This returns a hastable that maps region names to
    ## the lines they contain.
    ""
    "Breakdown by marked regions:"
    "----------------------------"
    $functionMembers = GenerateFunctionMembers
    
    ## For each region name, cycle through the lines in the region.  As we
    ## cycle through the lines, sum up the time spent on those lines and output
    ## the total.
    foreach($key in $functionMembers.Keys)
    {
        $totalTime = 0
        foreach($line in $functionMembers[$key])
        {
            $totalTime += ($samples[$line] * 100 / $totalSamples)
        }
        
        $percentage = "{0:#0}" -f $totalTime
        "{0,3}%: {1}" -f $percentage,$key
    }
}

## Run the actual profiling of the script.  $uniqueLines gets
## the mapping of line number to actual script content.
## Return a hashtable mapping line number to the number of times
## we observed the script running that line.
function GetSamples($uniqueLines)
{
    ## Open the log file.  We use the .Net file I/O, so that we keep monitoring
    ## just the end of the file.  Otherwise, we would make our timing innacurate
    ## as we scan the entire length of the file every time.
    $logStream = [System.IO.File]::Open($logFilePath, "Open", "Read", "ReadWrite")
    $logReader = new-object System.IO.StreamReader $logStream

    $random = new-object Random
    $samples = @{}

    $lastCounted = $null
    
    ## Gather statistics until the user presses a key.
    while(-not $host.UI.RawUI.KeyAvailable)
    {
       ## We sleep a slightly random amount of time.  If we sleep a constant
       ## amount of time, we run the very real risk of improperly sampling 
       ## scripts that exhibit periodic behaviour.
       $sleepTime = [int] ($random.NextDouble() * 100.0)
       start-sleep -m $sleepTime

       ## Get any content produced by the transcript since our last poll.
       ## From that poll, extract the last DEBUG statement (which is the last
       ## line executed.)
       $rest = $logReader.ReadToEnd()
       $lastEntryIndex = $rest.LastIndexOf("DEBUG: ")

       ## If we didn't get a new line, then the script is still working on the
       ## last line that we captured.
       if($lastEntryIndex -lt 0) 
       { 
          if($lastCounted) { $samples[$lastCounted]++ }
          continue; 
       }
       
       ## Extract the debug line.
       $lastEntryFinish = $rest.IndexOf("\n", $lastEntryIndex)
       if($lastEntryFinish -eq -1) { $lastEntryFinish = $rest.length }

       $scriptLine = $rest.Substring($lastEntryIndex, ($lastEntryFinish - $lastEntryIndex)).Trim()
       if($scriptLine -match 'DEBUG:[ \t]*([0-9]*)\+(.*)')
       {
           ## Pull out the line number from the line
           $last = $matches[1]
           
           $lastCounted = $last
           $samples[$last]++
           
           ## Pull out the actual script line that matches the line number
           $uniqueLines[$last] = $matches[2]
       }

       ## Discard anything that's buffered during this poll, and start waiting
       ## again
       $logReader.DiscardBufferedData()
    }

    ## Clean up
    $logStream.Close()
    $logReader.Close()
    
    return $samples
}

## Go through the transcript log to figure out which lines are part of any
## marked regions.  This returns a hastable that maps region names to
## the lines they contain.
function GenerateFunctionMembers
{
    ## Create a stack that represents the callstack.  That way, if a marked
    ## region contains another marked region, we attribute the statistics
    ## appropriately.
    $callstack = new-object System.Collections.Stack
    $currentFunction = "Unmarked"
    $callstack.Push($currentFunction)

    $functionMembers = @{}

    ## Go through each line in the transcript file, from the beginning
    foreach($line in (get-content $logFilePath))
    {
        ## Check if we're entering a monitor block
        ## If so, store that we're in that function, and push it onto
        ## the callstack.
        if($line -match 'write-debug "ENTER (.*)"')
        {
            $currentFunction = $matches[1]
            $callstack.Push($currentFunction)
        }
        ## Check if we're exiting a monitor block
        ## If so, clear the "current function" from the callstack,
        ## and store the new "current function" onto the callstack.
        elseif($line -match 'write-debug "EXIT"')
        {
            [void] $callstack.Pop()
            $currentFunction = $callstack.Peek()
        }
        ## Otherwise, this is just a line with some code.
        ## Add the line number as a member of the "current function"
        else
        {
            if($line -match 'DEBUG:[ \t]*([0-9]*)\+')
            {
                ## Create the arraylist if it's not initialized
                if(-not $functionMembers[$currentFunction])
                {
                    $functionMembers[$currentFunction] = new-object `
                        System.Collections.ArrayList
                }
                
                ## Add the current line to the ArrayList
                if(-not $functionMembers[$currentFunction].Contains($matches[1]))
                {
                    [void] $functionMembers[$currentFunction].Add($matches[1])
                }
            }
        }
    }
    
    return $functionMembers
}

. Main

[Edit: Monad has now been renamed to Windows PowerShell. This script or discussion may require slight adjustments before it applies directly to newer builds.]