Counting Lines of Source Code in PowerShell

Wed, Oct 18, 2006 3-minute read

Oren Eini recently ran into some performance problems while using PowerShell to count the number of lines in a source tree:

_I wanted to know how many lines of code NHibernate has, so I run the following PowerShell command…
(gci -Recurse | select-string . ).Count
The result:
(Graphic of PowerShell still working after about 5 minutes, using 50% of his CPU.)
Bummer.
_

The performance problem from this command comes from us preparing for a rich pipeline experience in PowerShell that you never use.  With only a little more text, you could have run even more powerful reports:

Line count per path:

gci . \*.cs -Recurse | select-string . | Group Path

Min / Max / Averages:

gci . \*.cs -Recurse | select-string . | Group Filename |
    Measure-Object Count -Min -Max -Average

Comment ratio:

$items = gci . \*.cs -rec
($items | select-string "//").Count / ($items | select-string .).Count

But if you don’t need that power, there are alternatives that perform better.  Let’s look at some of them.  We’ll use a baseline of the command that Oren started with:

[C:\temp]
PS:14 > $baseline = Measure-Command { (gci . *.cs -Recurse | Select-String .).Count }

… and a comparison to the LineCount.exe he pointed to:

PS:15> $lineCountExe = Measure-Command { C:\temp\linecount.exe *.cs /s }
PS:16 > $baseline.TotalMilliseconds / $lineCountExe.TotalMilliseconds
41.5567286307833

(The Select-String approach is about 41.5x slower)

Since we don’t need all of the PowerShell metadata generated by Select-String, and we don' t need the Regular Expression matching power of Select-String, we can instead use the [File]::ReadAllText() method from the .NET Framework:

PS:17 > $readAllText = Measure-Command { gci . *.cs -rec | % { [System.IO.File]::ReadAllText($_.FullName) } | Measure-Object -Line }
PS:18 > $readAllText.TotalMilliseconds / $lineCountExe.TotalMilliseconds
3.30927987204783

This is now about 3.3x slower – but is only 87 characters!  With a PowerShell one-liner, you were able to implement an entire linecount program.
If you want to go further, you can write a linecount program yourself:

## Get-LineCount.ps1
## Count the number of lines in all C# files in (and below)
## the current directory.
function CountLines($directory)
{
    $pattern = "*.cs"
    $directories = [System.IO.Directory]::GetDirectories($directory)
    $files = [System.IO.Directory]::GetFiles($directory, $pattern)

    $lineCount = 0

    foreach($file in $files)
    {
        $lineCount += [System.IO.File]::ReadAllText($file).Split("`n").Count
    }

    foreach($subdirectory in $directories)
    {
        $lineCount += CountLines $subdirectory
    }

    $lineCount
}

CountLines (Get-Location)

Now, about 2.7x slower – but in an easy to read, easy to modify format that saves you from having to open up your IDE and compiler.

PS:19 > $customScript = Measure-Command { C:\temp\Get-LineCount.ps1 }
PS:20 > $customScript.TotalMilliseconds / $lineCountExe.TotalMilliseconds
2.73733204860216

And to nip an annoying argument in the bud:

## Get-LineCount.rb
## Count the number of lines in in all C# files in (and below)
## the current directory
require 'find'

def filelines(file)
  count = 0
  while line = file.gets
     count += 1
  end
  count
end

def countFile(filename)
    file = File.open(filename)
    totalCount = filelines(file)
    file.close()
    totalCount
end   

totalCount = 0

files = Dir['**/*.cs']
files.each { |filename| totalCount += countFile(filename) }

puts totalCount

Which gives:

PS:21 > $rubyScript = Measure-Command { C:\temp\Get-LineCount.rb }
PS:22 > $rubyScript.TotalMilliseconds / $lineCountExe.TotalMilliseconds
3.0709602651302