Counting Lines of Source Code in PowerShell

Oren Eini recently ran into some performance problems while using PowerShell to count the number of lines in a source tree:



I wanted to know how many lines of code NHibernate has, so I run the following PowerShell command…
(gci -Recurse | select-string . ).Count
The result:
(Graphic of PowerShell still working after about 5 minutes, using 50% of his CPU.)
Bummer.


The performance problem from this command comes from us preparing for a rich pipeline experience in PowerShell that you never use.  With only a little more text, you could have run even more powerful reports:



Line count per path:
   gci . *.cs -Recurse | select-string . | Group Path
Min / Max / Averages:
   gci . *.cs -Recurse | select-string . | Group Filename | Measure-Object Count -Min -Max -Average
Comment ratio: 
   $items = gci . *.cs -rec; ($items | select-string “//”).Count / ($items | select-string .).Count


But if you don’t need that power, there are alternatives that perform better.  Let’s look at some of them.  We’ll use a baseline of the command that Oren started with:



[C:\temp]
PS:14 > $baseline = Measure-Command { (gci . *.cs -Recurse | Select-String .).Count }


… and a comparison to the LineCount.exe he pointed to:



PS:15> $lineCountExe = Measure-Command { C:\temp\linecount.exe *.cs /s }
PS:16 > $baseline.TotalMilliseconds / $lineCountExe.TotalMilliseconds
41.5567286307833


(The Select-String approach is about 41.5x slower)


Since we don’t need all of the PowerShell metadata generated by Select-String, and we don’ t need the Regular Expression matching power of Select-String, we can instead use the [File]::ReadAllText() method from the .NET Framework:



PS:17 > $readAllText = Measure-Command { gci . *.cs -rec | % { [System.IO.File]::ReadAllText($_.FullName) } | Measure-Object -Line }
PS:18 > $readAllText.TotalMilliseconds / $lineCountExe.TotalMilliseconds
3.30927987204783


This is now about 3.3x slower – but is only 87 characters!  With a PowerShell one-liner, you were able to implement an entire linecount program.
If you want to go further, you can write a linecount program yourself:




## Get-LineCount.ps1
## Count the number of lines in all C# files in (and below)
## the current directory.

function CountLines($directory)
{
    $pattern = “*.cs”
    $directories = [System.IO.Directory]::GetDirectories($directory)
    $files = [System.IO.Directory]::GetFiles($directory, $pattern)

    $lineCount = 0

    foreach($file in $files)
    {
        $lineCount += [System.IO.File]::ReadAllText($file).Split(“`n”).Count
    }

    foreach($subdirectory in $directories)
    {
        $lineCount += CountLines $subdirectory
    }

    $lineCount
}

CountLines (Get-Location)

Now, about 2.7x slower – but in an easy to read, easy to modify format that saves you from having to open up your IDE and compiler.



PS:19 > $customScript = Measure-Command { C:\temp\Get-LineCount.ps1 }
PS:20 > $customScript.TotalMilliseconds / $lineCountExe.TotalMilliseconds
2.73733204860216


And to nip an annoying argument in the bud:




## Get-LineCount.rb
## Count the number of lines in in all C# files in (and below)
## the current directory

require ‘find’

def filelines(file)
  count = 0
  while line = file.gets
     count += 1
  end
  count
end

def countFile(filename)
    file = File.open(filename)
    totalCount = filelines(file)
    file.close()
    totalCount
end   

totalCount = 0

files = Dir['**/*.cs']
files.each { |filename| totalCount += countFile(filename) }

puts totalCount

Which gives:



PS:21 > $rubyScript = Measure-Command { C:\temp\Get-LineCount.rb }
PS:22 > $rubyScript.TotalMilliseconds / $lineCountExe.TotalMilliseconds
3.0709602651302

7 Responses to “Counting Lines of Source Code in PowerShell”

  1. Ben Hollis writes:

    How’s it compare to what I’d do in Linux?

    find **/*.cs | xargs cat | wc -l

    Speaking of which, is there a UNIX alias set for Powershell that would help those of us more familiar with the UNIX tools use Powershell? And is there a PowerShell equivalent of cat that’s easier than [System.IO.File]::ReadAllText(filename)?

  2. Lee writes:

    Ben;

    If performance wasn’t the top concern, another PowerShell command closer to the Unix command would be:

    Get-ChildItem . *cs -Recurse | Get-Content | Measure-Object -Line

    (or less verbose)

    dir . *.cs -rec | gc | Measure-Object -Line

    It is actually a little slower than the version Oren started with.

    The equivalent of Cat is Get-Content — which we actually have an alias for. I didn’t use that in the "more efficient" version because I was trying to show a way to get performance gains. I think you’ll be surprised at how many Unix-like aliases are built into PowerShell. Peruse through the output of ‘Get-Alias’ to see them all.

    I’ve seen some efforts around the web to make references for "Old DOS hackers" and "Old Unix hackers," but I’m not aware of anything definitive at the moment.

    As for the performance of the Unix (actually, Win32 ports thereof) variant, it’s about 40x slower than the baseline version:

    $unix = measure-command { find.exe -path *.cs | xargs cat.exe | wc -l }

    This seems to be dominated by the performance of find.exe — I would be surprised if it performed this poorly on a Unix system.

  3. Ben Hollis writes:

    At least in Darwin, find is dreadfully slow (the first time) so I don’t know how much better it would be. Thanks!

    I’ve been holding off on Powershell mainly because there doesn’t seem to be a good terminal app for it – I’ve seen the Powershell Analyzer, but I’d love something with just some simple abilities – resizable window, reasonable tab completion, decent text selection (for copy and paste) and emacs key bindings. The shell itself looks amazing, but working in the cmd.exe host is just too primitive for me.

  4. Lee writes:

    The most recent PowerShell release (RC2) actually goes a long way to satisfying this desire. You can resize the windows (although not completely dynamically,) and use DOS’ QuickEdit mode for Copy and Paste. Some fairly good tab completion exists out of the box, but the kicker is that it is fully customizable. The community has been going nuts on it, with a great deal of success: http://mow001.blogspot.com/2006/10/powershell-tabcompletion-part-5.html.

    Also, see how Console treats you: http://www.canerten.com/console2-tabbed-command-prompt/

    About the Emacs keybindings — I dearly wish for them, too, but it’s not in the cards just yet.

  5. falstaff writes:

    Im not the powershell hacker, but when I started the optimized command over my source, i get an out of memory exception… I think he is loading _every_ file into memory first, and then counts the line… So I optimized it a little bit…

    $var = 0; gci . -include ("*.cpp", "*.h") -rec | % { [System.IO.File]::ReadAllText($_.FullName) | Measure-Object -Line } | % { $var+=$_.Lines } ; $var

  6. Pelonomi writes:

    Can i use the program to count a *.vb program?

  7. Counting Lines of Source Code in PowerShell | Precision Computing « The Wiert Corner – irregular stream of stuff writes:

    [...] The Counting Lines of Source Code in PowerShell entry is on counting C# code lines (and shows some great performance optimization tips). [...]

Leave a Reply