PowerShell Cookbook

Search

Categories

 

On this page

Archive

Blogroll

Disclaimer
I work for Microsoft.

The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

RSS 2.0 | Atom 1.0 | CDF

Send mail to the author(s) E-mail

Total Posts: 218
This Year: 18
This Month: 0
This Week: 0
Comments: 529

Sign In

 Monday, October 17, 2005
Monday, October 17, 2005 7:20:52 AM (Pacific Daylight Time, UTC-07:00) ( )

Marcel has been posting some interesting articles on using Monad to generate the MD5 hashes of files.  Now, an MD5 hash of a file is just an array of bytes.  Typical hashing programs display this in a more friendly manner:

MSH:15 C:\Temp >md5sum 71-59-B7.bmp
a05805e638741bb767f97c0e88962952 *71-59-B7.bmp

Although the output of Marcel’s scripts could definitely be crafted to display this output, they currently output the string representation of a byte array:

MSH:19 C:\Temp >get-md5 (get-childitem 71-59-B7.bmp)
160 88 5 230 56 116 27 183 103 249 124 14 136 150 41 82

One of the comments in response to Marcel’s post was that Monad should, by default, output byte arrays as hex.  This is a good suggestion, and we can go even further with it.  Let’s write a script to give us a full hex editor-like view of a byte array:

MSH:20 C:\Temp >get-md5 (get-childitem 71-59-B7.bmp) | format-hex


            0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F


00000000   A0 58 05 E6 38 74 1B B7 67 F9 7C 0E 88 96 29 52   X.æ8t.•gù|.??)R

Or even better, let’s use it to dump out a very small bitmap – 10 pixels of the colour (R=0x71 G=0x59 B=0xB7)

MSH:21 C:\Temp >get-content 71-59-B7.bmp -encoding byte | format-hex


            0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F


00000000   42 4D 5E 00 00 00 00 00 00 00 36 00 00 00 28 00  BM^.......6...(.
00000010   00 00 0A 00 00 00 01 00 00 00 01 00 20 00 00 00  ............ ...
00000020   00 00 00 00 00 00 C4 0E 00 00 C4 0E 00 00 00 00  ......Ä...Ä.....
00000030   00 00 00 00 00 00 B7 59 71 FF B7 59 71 FF B7 59  ......•Yq.•Yq.•Y
00000040   71 FF B7 59 71 FF B7 59 71 FF B7 59 71 FF B7 59  q.•Yq.•Yq.•Yq.•Y
00000050   71 FF B7 59 71 FF B7 59 71 FF B7 59 71 FF        q.•Yq.•Yq.•Yq.

To make it easier to determine byte offsets, files are usually broken down into 16-byte rows.  The left-hand section gives the offset of the 16-byte chunk.  The middle section gives the hex representation of the data at that location.  These pieces of data are aligned in columns also, corresponding to their location within the 16-byte chunk.  So column “E” in row 0x40 means a file offset of (0x40 + 0x0E) = 0x4E.  The last section gives an ASCII representation of the data.

In this representation, it becomes possible to see some of the underlying structure of the bitmap format:

Offset Length Comment
0x00 2 “BM,” the magic bitmap header
0x02 4 “0x5E,” the length of the file. Notice that our last data byte is at 0x5D.  Since we started counting from zero, this means that we have 0x5E bytes of data.
(...) (...) (...)
0x0A 4 “0x36”, specifies the absolute start of the bitmap data. Notice that the data begins at offset (0x30 + 0x06).
0x36 40 10 4-byte pixel representations. In Bitmaps, they are laid out as (B=0xB7 G=0x59 R=0x71 <reserved>)

 

Now, for the script:

## format-hex.msh
## Convert a byte array into a hexidecimal dump
##
## Example usage:
## get-content 'c:\windows\Coffee Bean.bmp' -encoding byte | format-hex | more

## Convert the input to an array of bytes.  This is a strongly-typed variable,
## so that we're not trying to iterate over strings, directory entries, etc.
[byte[]] $bytes = $(foreach($byte in $input) { $byte })

## Store our header, and formatting information
$counter = 0
$header = "            0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F"
$nextLine = "{0}   " -f
    [Convert]::ToString($counter, 16).ToUpper().PadLeft(8, '0')
$asciiEnd = ""

## Output the header
"`r`n$header`r`n"

foreach($byte in $bytes)
{
   ## Display each byte, in 2-digit hexidecimal, and add that to the left-hand
   ## side.  Notice the use of the '-f' operator here.  This provides access
   ## to the facilities offered by [String]::Format.
   $nextLine += "{0:X2} " -f $byte

   ## If the character is printable, add its ascii representation to
   ## the right-hand side.  Otherwise, add a dot to the right hand side.
   if(($byte -ge 0x20) -and ($byte -le 0xFE))
   {
      $asciiEnd += [char] $byte
   }
   else
   {
      $asciiEnd += "."
   }

   $counter++;

   ## If we've hit the end of a line, combine the right half with the left half,
   ## and start a new line.
   if(($counter % 16) -eq 0)
   {
      "$nextLine $asciiEnd"
      $nextLine = "{0}   " -f
        [Convert]::ToString($counter, 16).ToUpper().PadLeft(8, '0')
      $asciiEnd = "";
   }
}

## At the end of the file, we might not have had the chance to output the end
## of the line yet.  Only do this if we didn't exit on the 16-byte boundary,
## though.
if(($counter % 16) -ne 0)
{
   while(($counter % 16) -ne 0)
   {
      $nextLine += "   "
      $asciiEnd += " "
      $counter++;
   }
   "$nextLine $asciiEnd"
}

""

[Edit: Monad has now been renamed to Windows PowerShell. This script or discussion may require slight adjustments before it applies directly to newer builds.]

Wednesday, October 19, 2005 4:23:59 AM (Pacific Daylight Time, UTC-07:00)
Very cool!
Jeffrey Snover
Friday, October 21, 2005 11:10:44 PM (Pacific Daylight Time, UTC-07:00)
Quite interesting as always, but raises some questions.

What's get-content? Is it like unix "cat"? What happens if you omit the default encoding? How you pass C# style Stream objects around?

Something like
"openread vsts.iso | readudf "autorun.exe" -data | format"
openread passes Read mode opened Stream to readudf which passes autorun.exe as Stream to format, which in turn determines it's binary and outputs as hex by default. If format is omitted I am not sure what would happen. Maybe it could ask if you'd like to open autorun.exe and keep the original .iso open such that if autorun.exe wants to execute setup.exe, that would be read through readudf.
ac
Saturday, October 22, 2005 10:58:09 AM (Pacific Daylight Time, UTC-07:00)
get-content takes a while to pass even a smallish zip file to the output.

Perhaps get-content should instead return an IEnumerable, which wraps a yield return loop. If I'm right, it should return immediately, without loading the file into memory before passing it to format-hex. format-hex will see the enumerator, and access it just fine. The file will be getting its read hits during formatting, so some buffering could reduce the disk activity.

This probably increases the *total* time spent, but spreading out the loading time should increase responsiveness.

sketched out for byte:

class FileContentEnumerator: IEnumerator[byte]
{
private string _fileName;
public FileContentEnumerator(string fileName) { ... }
public IEnumerable[T] GetEnumerator()
{
using (File file = new File(_fileName))
{
byte nextByte;
while(nextByte = file.ReadByte())
{
yield return nextByte;
}
}
}
}
Keith J. Farmer
Saturday, October 22, 2005 11:03:45 AM (Pacific Daylight Time, UTC-07:00)
Okay, I think I was smoking crack.

It's that initial conversion of $input to byte[], isn't it?

A similar technique should apply, though. Create an IEnumerable[byte] that takes input and yields the converted byte. Then the conversion happens during formatting, yadda yadda.

I forget -- does MSH *have* a yield?
Keith J. Farmer
Monday, October 24, 2005 12:09:24 AM (Pacific Daylight Time, UTC-07:00)
AC: Yes, get-content is the way to get content from any provider that supports it. In the FileSystem, that's the equivalent of "type" or "cat." On the Function provider, it returns the content of a function. For example, "get-content Function:\Prompt."

The reason that this script takes so long is that MSH currently buffers the output of scripts and functions, but not filters (get-help about_filter.) The get-content operation itself actually returns content byte-by-byte.

Since I wrote it as a script, we have to wait for it to process the whole file before we see any output.

If I had written it as a filter, you would see the output stream immediately. That's the closest we have to a yield, but it's not exactly the same. However, filters are stateless, and do not provide the ability to write a single header, or lines of 16 bytes, for example.
Tuesday, October 25, 2005 4:15:02 AM (Pacific Daylight Time, UTC-07:00)
Right.. If you can define a filter that knew about state, or a function that didn't buffer all the input, it'd probably work well for this.

How easy is it to define to categories of pipeline elements (function, filter...)?
Keith J. Farmer
Name
E-mail
Home page

Comment (Some html is allowed: b, blockquote@cite, em, i, strike, strong, sub, super, u)  

Enter the code shown (prevents robots):