PowerShell Cookbook

Search

Categories

 

On this page

Counting Lines of Source Code in PowerShell

Archive

Blogroll

Disclaimer
I work for Microsoft.

The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

RSS 2.0 | Atom 1.0 | CDF

Send mail to the author(s) E-mail

Total Posts: 218
This Year: 18
This Month: 0
This Week: 0
Comments: 529

Sign In

 Wednesday, October 18, 2006
Wednesday, October 18, 2006 8:49:13 PM (Pacific Daylight Time, UTC-07:00) ( )

Oren Eini recently ran into some performance problems while using PowerShell to count the number of lines in a source tree:

I wanted to know how many lines of code NHibernate has, so I run the following PowerShell command...
(gci -Recurse | select-string . ).Count
The result:
(Graphic of PowerShell still working after about 5 minutes, using 50% of his CPU.)
Bummer.

The performance problem from this command comes from us preparing for a rich pipeline experience in PowerShell that you never use.  With only a little more text, you could have run even more powerful reports:

Line count per path:
   gci . *.cs -Recurse | select-string . | Group Path
Min / Max / Averages:
   gci . *.cs -Recurse | select-string . | Group Filename | Measure-Object Count -Min -Max -Average
Comment ratio: 
   $items = gci . *.cs -rec; ($items | select-string "//").Count / ($items | select-string .).Count

But if you don't need that power, there are alternatives that perform better.  Let's look at some of them.  We'll use a baseline of the command that Oren started with:

[C:\temp]
PS:14 > $baseline = Measure-Command { (gci . *.cs -Recurse | Select-String .).Count }

… and a comparison to the LineCount.exe he pointed to:

PS:15> $lineCountExe = Measure-Command { C:\temp\linecount.exe *.cs /s }
PS:16 > $baseline.TotalMilliseconds / $lineCountExe.TotalMilliseconds
41.5567286307833

(The Select-String approach is about 41.5x slower)

Since we don't need all of the PowerShell metadata generated by Select-String, and we don' t need the Regular Expression matching power of Select-String, we can instead use the [File]::ReadAllText() method from the .NET Framework:

PS:17 > $readAllText = Measure-Command { gci . *.cs -rec | % { [System.IO.File]::ReadAllText($_.FullName) } | Measure-Object -Line }
PS:18 > $readAllText.TotalMilliseconds / $lineCountExe.TotalMilliseconds
3.30927987204783

This is now about 3.3x slower – but is only 87 characters!  With a PowerShell one-liner, you were able to implement an entire linecount program.
If you want to go further, you can write a linecount program yourself:

## Get-LineCount.ps1
## Count the number of lines in all C# files in (and below)
## the current directory.

function CountLines($directory)
{
    $pattern = "*.cs"
    $directories = [System.IO.Directory]::GetDirectories($directory)
    $files = [System.IO.Directory]::GetFiles($directory, $pattern)

    $lineCount = 0

    foreach($file in $files)
    {
        $lineCount += [System.IO.File]::ReadAllText($file).Split("`n").Count
    }

    foreach($subdirectory in $directories)
    {
        $lineCount += CountLines $subdirectory
    }

    $lineCount
}

CountLines (Get-Location)

Now, about 2.7x slower – but in an easy to read, easy to modify format that saves you from having to open up your IDE and compiler.

PS:19 > $customScript = Measure-Command { C:\temp\Get-LineCount.ps1 }
PS:20 > $customScript.TotalMilliseconds / $lineCountExe.TotalMilliseconds
2.73733204860216

And to nip an annoying argument in the bud:

## Get-LineCount.rb
## Count the number of lines in in all C# files in (and below)
## the current directory

require 'find'

def filelines(file)
  count = 0
  while line = file.gets
     count += 1
  end
  count
end

def countFile(filename)
    file = File.open(filename)
    totalCount = filelines(file)
    file.close()
    totalCount
end   

totalCount = 0

files = Dir['**/*.cs']
files.each { |filename| totalCount += countFile(filename) }

puts totalCount

Which gives:

PS:21 > $rubyScript = Measure-Command { C:\temp\Get-LineCount.rb }
PS:22 > $rubyScript.TotalMilliseconds / $lineCountExe.TotalMilliseconds
3.0709602651302

Comments [5] | | #