PowerShell Cookbook

Twitter Updates

    follow me on Twitter

    Search

    Categories

     

    On this page

    Counting Lines of Source Code in PowerShell

    Archive

    Blogroll

    Disclaimer
    I work for Microsoft.

    The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

    RSS 2.0 | Atom 1.0 | CDF

    Send mail to the author(s) E-mail

    Total Posts: 235
    This Year: 12
    This Month: 0
    This Week: 0
    Comments: 634

    Sign In

     Wednesday, October 18, 2006
    Wednesday, October 18, 2006 8:49:13 PM (Pacific Daylight Time, UTC-07:00) ( )

    Oren Eini recently ran into some performance problems while using PowerShell to count the number of lines in a source tree:

    I wanted to know how many lines of code NHibernate has, so I run the following PowerShell command...
    (gci -Recurse | select-string . ).Count
    The result:
    (Graphic of PowerShell still working after about 5 minutes, using 50% of his CPU.)
    Bummer.

    The performance problem from this command comes from us preparing for a rich pipeline experience in PowerShell that you never use.  With only a little more text, you could have run even more powerful reports:

    Line count per path:
       gci . *.cs -Recurse | select-string . | Group Path
    Min / Max / Averages:
       gci . *.cs -Recurse | select-string . | Group Filename | Measure-Object Count -Min -Max -Average
    Comment ratio: 
       $items = gci . *.cs -rec; ($items | select-string "//").Count / ($items | select-string .).Count

    But if you don't need that power, there are alternatives that perform better.  Let's look at some of them.  We'll use a baseline of the command that Oren started with:

    [C:\temp]
    PS:14 > $baseline = Measure-Command { (gci . *.cs -Recurse | Select-String .).Count }

    … and a comparison to the LineCount.exe he pointed to:

    PS:15> $lineCountExe = Measure-Command { C:\temp\linecount.exe *.cs /s }
    PS:16 > $baseline.TotalMilliseconds / $lineCountExe.TotalMilliseconds
    41.5567286307833

    (The Select-String approach is about 41.5x slower)

    Since we don't need all of the PowerShell metadata generated by Select-String, and we don' t need the Regular Expression matching power of Select-String, we can instead use the [File]::ReadAllText() method from the .NET Framework:

    PS:17 > $readAllText = Measure-Command { gci . *.cs -rec | % { [System.IO.File]::ReadAllText($_.FullName) } | Measure-Object -Line }
    PS:18 > $readAllText.TotalMilliseconds / $lineCountExe.TotalMilliseconds
    3.30927987204783

    This is now about 3.3x slower – but is only 87 characters!  With a PowerShell one-liner, you were able to implement an entire linecount program.
    If you want to go further, you can write a linecount program yourself:

    ## Get-LineCount.ps1
    ## Count the number of lines in all C# files in (and below)
    ## the current directory.

    function CountLines($directory)
    {
        $pattern = "*.cs"
        $directories = [System.IO.Directory]::GetDirectories($directory)
        $files = [System.IO.Directory]::GetFiles($directory, $pattern)

        $lineCount = 0

        foreach($file in $files)
        {
            $lineCount += [System.IO.File]::ReadAllText($file).Split("`n").Count
        }

        foreach($subdirectory in $directories)
        {
            $lineCount += CountLines $subdirectory
        }

        $lineCount
    }

    CountLines (Get-Location)

    Now, about 2.7x slower – but in an easy to read, easy to modify format that saves you from having to open up your IDE and compiler.

    PS:19 > $customScript = Measure-Command { C:\temp\Get-LineCount.ps1 }
    PS:20 > $customScript.TotalMilliseconds / $lineCountExe.TotalMilliseconds
    2.73733204860216

    And to nip an annoying argument in the bud:

    ## Get-LineCount.rb
    ## Count the number of lines in in all C# files in (and below)
    ## the current directory

    require 'find'

    def filelines(file)
      count = 0
      while line = file.gets
         count += 1
      end
      count
    end

    def countFile(filename)
        file = File.open(filename)
        totalCount = filelines(file)
        file.close()
        totalCount
    end   

    totalCount = 0

    files = Dir['**/*.cs']
    files.each { |filename| totalCount += countFile(filename) }

    puts totalCount

    Which gives:

    PS:21 > $rubyScript = Measure-Command { C:\temp\Get-LineCount.rb }
    PS:22 > $rubyScript.TotalMilliseconds / $lineCountExe.TotalMilliseconds
    3.0709602651302

    Comments [5] | | #