Hopefully you’ve been following along with Eric’s regular expression exercises, because we’re about to add another cool tool to your Monad toolbox. If your regex-fu is strong, you will soon dice text streams with ease.
As you well know, one of the strongest features of Monad is that the pipeline is object-based. You don’t waste your energy creating, destroying, and recreating the object representation of your data. In past shells, you destroy the full- fidelity representation of data when the pipeline converts it to pure text. You can regain some of it through excessive text parsing, but not all of it.
However, we still often have to interact with low-fidelity input originating from outside of Monad. Text-based data files and legacy programs are two examples.
If you’re used to searching through files with Grep, you’ve hopefully discovered Monad’s match-string cmdlet. If you’re used to dynamically replacing patterns in a stream of text with Sed, you’ve hopefully discovered the [Regex]::Replace() method. If you’re used to extracting text from a stream with Awk, you’ve hopefully discovered… [String]::Split()? OK, it’s the best you have so far, but it gets better.
The following parse-textObject script allows you to convert many text streams into a meaningful object-based representation. From there, you can use all of Monad’s powerful object-based filtering cmdlets as you would normally.
Here’s an example, using the output of a source control system we use at work.
MSH:48 D:\enlistment > sd opened
//depot/main/dirs#2 - edit default change (text)
//depot/main/output.txt#0 - add default change (text)
//depot/main/sdb.ini#1 - delete default change (text)
MSH:49 D:\enlistment >
MSH:49 D:\enlistment > $sdObjectDefinition = @("Path","Revision","Change","ChangeList","Type")
MSH:50 D:\enlistment > $sdFormat = "(.*)#([^ \t]+) - ([^ \t]+) (.*) \((.*)\)"
MSH:51 D:\enlistment > $results = sd opened | parse-textobject -ParseExpression:$sdFormat `
>> -ObjectDefinition:$sdObjectDefinition
>>
MSH:52 D:\enlistment > $results | format-table
Path Revision Change ChangeList Type
---- -------- ------ ---------- ----
//depot/main/dirs 2 edit default change text
//depot/main/output.txt 0 add default change text
//depot/main/sdb.ini 1 delete default change text
MSH:53 D:\enlistment > $results | where { $_.Revision -gt 1 }
Path : //depot/main/dirs
Revision : 2
Change : edit
ChangeList : default change
Type : text
And here's the script:
|
001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089 090 091 092 093 094 095 096 097 098 099 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210
|
############################################################################## ## ## Convert-TextObject.ps1 -- Convert a simple string into a custom PowerShell ## object. ## ## From Windows PowerShell Cookbook (O'Reilly) ## by Lee Holmes (http://www.leeholmes.com/guide) ## ## Parameters: ## ## [string] Delimiter ## If specified, gives the .NET Regular Expression with which to ## split the string. The script generates properties for the ## resulting object out of the elements resulting from this split. ## If not specified, defaults to splitting on the maximum amount ## of whitespace: "\s+", as long as ParseExpression is not ## specified either. ## ## [string] ParseExpression ## If specified, gives the .NET Regular Expression with which to ## parse the string. The script generates properties for the ## resulting object out of the groups captured by this regular ## expression. ## ## ** NOTE ** Delimiter and ParseExpression are mutually exclusive. ## ## [string[]] PropertyName ## If specified, the script will pair the names from this object ## definition with the elements from the parsed string. If not ## specified (or the generated object contains more properties ## than you specify,) the script uses property names in the ## pattern of Property1,Property2,...,PropertyN ## ## [type[]] PropertyType ## If specified, the script will pair the types from this list with ## the properties from the parsed string. If not specified (or the ## generated object contains more properties than you specify,) the ## script sets the properties to be of type [string] ## ## ## Example usage: ## "Hello World" | Convert-TextObject ## Generates an Object with "Property1=Hello" and "Property2=World" ## ## "Hello World" | Convert-TextObject -Delimiter "ll" ## Generates an Object with "Property1=He" and "Property2=o World" ## ## "Hello World" | Convert-TextObject -ParseExpression "He(ll.*o)r(ld)" ## Generates an Object with "Property1=llo Wo" and "Property2=ld" ## ## "Hello World" | Convert-TextObject -PropertyName FirstWord,SecondWord ## Generates an Object with "FirstWord=Hello" and "SecondWord=World ## ## "123 456" | Convert-TextObject -PropertyType $([string],[int]) ## Generates an Object with "Property1=123" and "Property2=456" ## The second property is an integer, as opposed to a string ## ############################################################################## param( [string] $delimiter, [string] $parseExpression, [string[]] $propertyName, [type[]] $propertyType )
function Main( $inputObjects, $parseExpression, $propertyType, $propertyName, $delimiter) { $delimiterSpecified = [bool] $delimiter $parseExpressionSpecified = [bool] $parseExpression
## If they've specified both ParseExpression and Delimiter, show usage if($delimiterSpecified -and $parseExpressionSpecified) { Usage return }
## If they enter no parameters, assume a default delimiter of whitespace if(-not $($delimiterSpecified -or $parseExpressionSpecified)) { $delimiter = "\s+" $delimiterSpecified = $true }
## Cycle through the $inputObjects, and parse it into objects foreach($inputObject in $inputObjects) { if(-not $inputObject) { $inputObject = "" } foreach($inputLine in $inputObject.ToString()) { ParseTextObject $inputLine $delimiter $parseExpression ` $propertyType $propertyName } } }
function Usage { "Usage: " " Convert-TextObject" " Convert-TextObject -ParseExpression parseExpression " + "[-PropertyName propertyName] [-PropertyType propertyType]" " Convert-TextObject -Delimiter delimiter " + "[-PropertyName propertyName] [-PropertyType propertyType]" return }
## Function definition -- ParseTextObject. ## Perform the heavy-lifting -- parse a string into its components. ## for each component, add it as a note to the Object that we return function ParseTextObject { param( $textInput, $delimiter, $parseExpression, $propertyTypes, $propertyNames)
$parseExpressionSpecified = -not $delimiter
$returnObject = New-Object PSObject
$matches = $null $matchCount = 0 if($parseExpressionSpecified) { ## Populates the matches variable by default [void] ($textInput -match $parseExpression) $matchCount = $matches.Count } else { $matches = [Regex]::Split($textInput, $delimiter) $matchCount = $matches.Length }
if(-not $matchCount) { return }
$counter = 0 if($parseExpressionSpecified) { $counter++ } for(; $counter -lt $matchCount; $counter++) { $propertyName = "None" $propertyType = [string]
## Parse by Expression if($parseExpressionSpecified) { $propertyName = "P$counter"
## Get the property name if($counter -le $propertyNames.Length) { if($propertyName[$counter - 1]) { $propertyName = $propertyNames[$counter - 1] } }
## Get the property value if($counter -le $propertyTypes.Length) { if($types[$counter - 1]) { $propertyType = $propertyTypes[$counter - 1] } } } ## Parse by delimiter else { $propertyName = "P$($counter + 1)"
## Get the property name if($counter -lt $propertyNames.Length) { if($propertyNames[$counter]) { $propertyName = $propertyNames[$counter] } }
## Get the property value if($counter -lt $propertyTypes.Length) { if($propertyTypes[$counter]) { $propertyType = $propertyTypes[$counter] } } }
Add-Note $returnObject $propertyName ` ($matches[$counter] -as $propertyType) }
$returnObject }
## Add a note to an object function Add-Note ($object, $name, $value) { $object | Add-Member NoteProperty $name $value }
Main $input $parseExpression $propertyType $propertyName $delimiter |
[Edit: Monad has now been renamed to Windows PowerShell. This script or discussion may require slight adjustments before it applies directly to newer builds.]
[Edit 2: Updated script to work with new builds]
[Edit 3: Updated script to add type constraints, and consistent parameter naming]
[Edit 4: Updated again]