Using PowerShell to Compare / Diff Files

If you’ve tried to diff files in PowerShell before, you might have seen the Compare-Object cmdlet. The Compare-Object cmdlet lets you compare two sets of items, giving you a report on the differences between those two sets:

PS G:\lee\tools> cd c:\temp
PS C:\temp> $set1 = "A","B","C"
PS C:\temp> $set2 = "C","D","E"
PS C:\temp> Compare-Object $set1 $set2

InputObject SideIndicator
———– ————-
D           =>
E           =>
A           <=
B           <=

From this output, we can see that “A” and “B” only show up in $set1, while “D” and “E” only show up in $set2. For sets of objects, this is all you need to know.

However, one common “set of objects” that people like to compare are lines in text files. When you are comparing lines in a file, you usually don’t care only about the lines that have been added or deleted. You care about where in the file they got added – a situation usually handled by a special-purpose tool such as WinMerge, ExamDiff, WinDiff, or simply the Windows port of diff.exe.

Special-purpose file comparison tools have lots of tricks to compare files efficiently and logically, but PowerShell does let you implement a basic file comparison through a special trick – realizing that the Get-Content cmdlet tags its output objects with the line number they came from.

PS C:\temp> (Get-Content .\test.txt)[5] | Format-List * -Force

PSPath       : C:\temp\test.txt
PSParentPath : C:\temp
PSChildName  : test.txt
PSDrive      : C
PSProvider   : Microsoft.PowerShell.Core\FileSystem
ReadCount    : 6
Length       : 0

That gives the nifty one-liner:

PS C:\temp> Compare-Object (Get-Content files.txt) (Get-Content files2.txt) |
    Sort { $_.InputObject.ReadCount }

InputObject                                                      SideIndicator
———–                                                      ————-
-a—        11/26/2013   9:52 PM          0 files.txt       … <=
-a—        11/26/2013   9:52 PM      75702 files.txt       … =>
-a—        11/26/2013   9:52 PM          0 files2.txt      … =>

If you want to pretty up the output a bit and make the syntax cleaner, let me introduce Compare-File:

001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
043
044
045
046
047
048
049
050
051
052
053
054
055
056
057
058
059
060
061
062
063
064
065
066
067
068
069
070
071
072
073
074
075
##############################################################################
##
## Compare-File
##
##############################################################################

<#
 
.SYNOPSIS
 
Compares two files, displaying differences in a manner similar to traditional
console-based diff utilities.
 
#>

param(
   
## The first file to compare
    $file1,
   
   
## The second file to compare
    $file2,

    ## The pattern (if any) to use as a filter for file
    ## differences
    $pattern = ".*"
)

## Get the content from each file
$content1 = Get-Content 
$file1
$content2
 = Get-Content $file2

## Compare the two files. Get-Content annotates output objects with
## a ‘ReadCount’ property that represents the line number in the file
## that the text came from.

$comparedLines = Compare-Object $content1 $content2 -IncludeEqual |
    Sort-Object { $_.InputObject.
ReadCount }
   

$lineNumber = 0
$comparedLines | foreach
 {

    ## Keep track of the current line number, using the line
    ## numbers in the "after" file for reference.
    if($_.SideIndicator -eq "==" -or $_.SideIndicator -eq "=>"
)
    {
       
$lineNumber = $_.InputObject.
ReadCount
    }
   
   
## If the text matches the pattern, output a custom object
    ## that displays text like this:
    ##
    ## Line Operation Text
    ## —- ——— —-
    ## 59 added New text added
    ##
    if($_.InputObject -match $pattern
)
    {
       
if($_.SideIndicator -ne "=="
)
        {
           
if($_.SideIndicator -eq "=>"
)
            {
               
$lineOperation = "added"
            }
           
elseif($_.SideIndicator -eq "<="
)
            {
               
$lineOperation = "deleted"
            }
               
           
[PSCustomObject]
 @{
                Line 
= $lineNumber
                Operation = $lineOperation
                Text = $_.InputObject 
            }
        }
    }
}

One Response to “Using PowerShell to Compare / Diff Files”

  1. Dirk writes:

    Hi Lee,
    that’s a great idea I’ve never thought about that. The line numbering was not working for me after lines where the content is equal in both files,though. I’ve modified your script to show the content of both files rather than the operation. Maybe this can be of use for you, too.
    Thanks a lot for sharing,
    Dirk

    function Compare-Files{
    param(
    $file1,
    $file2,
    [switch]$IncludeEqual
    )
    $content1 = Get-Content $file1
    $content2 = Get-Content $file2
    $comparedLines = Compare-Object $content1 $content2 -IncludeEqual:$IncludeEqual |
    group { $_.InputObject.ReadCount } | sort Name
    $comparedLines | foreach {
    $curr=$_
    switch ($_.Group[0].SideIndicator){
    “==” { $right=$left = $curr.Group[0].InputObject;break}
    “=>” { $right,$left = $curr.Group[0].InputObject,$curr.Group[1].InputObject;break }
    “<=" { $right,$left = $curr.Group[1].InputObject,$curr.Group[0].InputObject;break }
    }
    [PSCustomObject] @{
    Line = $_.Name
    Left = $left
    Right = $right
    }
    }
    }

Leave a Reply