Using PowerShell to Compare / Diff Files
Friday, 29 November 2013
If you’ve tried to diff files in PowerShell before, you might have seen the Compare-Object cmdlet. The Compare-Object cmdlet lets you compare two sets of items, giving you a report on the differences between those two sets:
PS G:\lee\tools> cd c:\temp
PS C:\temp> $set1 = "A","B","C"
PS C:\temp> $set2 = "C","D","E"
PS C:\temp> Compare-Object $set1 $set2InputObject SideIndicator
----------- -------------
D =>
E =>
A <=
B <=
From this output, we can see that “A” and “B” only show up in $set1, while “D” and “E” only show up in $set2. For sets of objects, this is all you need to know.
However, one common “set of objects” that people like to compare are lines in text files. When you are comparing lines in a file, you usually don’t care only about the lines that have been added or deleted. You care about where in the file they got added – a situation usually handled by a special-purpose tool such as WinMerge, ExamDiff, WinDiff, or simply the Windows port of diff.exe.
Special-purpose file comparison tools have lots of tricks to compare files efficiently and logically, but PowerShell does let you implement a basic file comparison through a special trick – realizing that the Get-Content cmdlet tags its output objects with the line number they came from.
PS C:\temp> (Get-Content .\test.txt)[5] | Format-List * -Force
PSPath : C:\temp\test.txt
PSParentPath : C:\temp
PSChildName : test.txt
PSDrive : C
PSProvider : Microsoft.PowerShell.Core\FileSystem
ReadCount : 6
Length : 0
That gives the nifty one-liner:
PS C:\temp> Compare-Object (Get-Content files.txt) (Get-Content files2.txt) |
Sort { $_.InputObject.ReadCount }InputObject SideIndicator
----------- -------------
-a--- 11/26/2013 9:52 PM 0 files.txt ... <=
-a--- 11/26/2013 9:52 PM 75702 files.txt ... =>
-a--- 11/26/2013 9:52 PM 0 files2.txt ... =>
If you want to pretty up the output a bit and make the syntax cleaner, let me introduce Compare-File:
001
002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 |
##############################################################################
## ## Compare-File ## ############################################################################## <# .SYNOPSIS Compares two files, displaying differences in a manner similar to traditional console-based diff utilities. #> param (## The first file to compare $file1, ## The second file to compare $file2, ## The pattern (if any) to use as a filter for file ## Get the content from each file $content2 = Get-Content $file2 ## Compare the two files. Get-Content annotates output objects with ## a 'ReadCount' property that represents the line number in the file ## that the text came from. $comparedLines = Compare-Object $content1 $content2 -IncludeEqual | Sort-Object { $_.InputObject.ReadCount } $lineNumber = 0 $comparedLines | foreach { ## Keep track of the current line number, using the line ## numbers in the "after" file for reference. if($_.SideIndicator -eq "==" -or $_.SideIndicator -eq "=>") { $lineNumber = $_.InputObject.ReadCount } ## If the text matches the pattern, output a custom object ## that displays text like this: ## ## Line Operation Text ## ---- --------- ---- ## 59 added New text added ## if($_.InputObject -match $pattern) { if($_.SideIndicator -ne "==") { if($_.SideIndicator -eq "=>") { $lineOperation = "added" } elseif($_.SideIndicator -eq "<=") { $lineOperation = "deleted" } [PSCustomObject] @{ Line = $lineNumber Operation =< span style="color: "> $lineOperation Text = $_.InputObject } } } } |
No. 1 — November 30th, 2013 at 10:48 am
Hi Lee,
that’s a great idea I’ve never thought about that. The line numbering was not working for me after lines where the content is equal in both files,though. I’ve modified your script to show the content of both files rather than the operation. Maybe this can be of use for you, too.
Thanks a lot for sharing,
Dirk
function Compare-Files{
param(
$file1,
$file2,
[switch]$IncludeEqual
)
$content1 = Get-Content $file1
$content2 = Get-Content $file2
$comparedLines = Compare-Object $content1 $content2 -IncludeEqual:$IncludeEqual |
group { $_.InputObject.ReadCount } | sort Name
$comparedLines | foreach {
$curr=$_
switch ($_.Group[0].SideIndicator){
“==” { $right=$left = $curr.Group[0].InputObject;break}
“=>” { $right,$left = $curr.Group[0].InputObject,$curr.Group[1].InputObject;break }
“<=" { $right,$left = $curr.Group[1].InputObject,$curr.Group[0].InputObject;break }
}
[PSCustomObject] @{
Line = $_.Name
Left = $left
Right = $right
}
}
}
No. 2 — May 15th, 2014 at 12:21 pm
one small change to the custom object
[PSCustomObject] @{
Line = [int]$_.Name
Left = $left
Right = $right
}
this allows for sorting by line as an integer rather than string.
No. 3 — September 15th, 2015 at 1:29 pm
Hmm…seems like somethings wrong somewhere…off by one?.
Take two files test1 and test2 where the content is
4 lines:
l1
l2
l3
l4
Switch the content of one line in file test2, include-equal will indicate that the content is indeed equal, indicating that the lines of both files are identical where in fact they are not.
No. 4 — September 15th, 2015 at 2:01 pm
mmm….missed the obvious -syncwindow switch 0… reading the explanation from Microsoft (https://technet.microsoft.com/da-DK/library/dd347568.aspx)…this makes sence…but really…
/Mikael
No. 5 — December 24th, 2015 at 3:33 pm
Does compare-object work with the TFS files compare too?