BinShred - Parsing Arbitrary Binary Data in PowerShell

Tue, Apr 6, 2021 3-minute read

When working with raw binary data (especially in security forensics), it is common to need to write parsers for this binary data. For example, extracting file contents out of the NTFS data structures on disk. For many common data structures, there are already binary parsers written for them that you can leverage, but you’ll still sometimes need to write your own.

BinShred is a PowerShell module that lets you do this.

BinShred uses a custom parsing language called a BinShred Template (.bst). Unlike the code-heavy templates used by things like 010 Editor, this grammar (implemented in ANTLR) is designed to be as close as possible to the language that people actually use when describing file formats informally.

image

You can install it from the PowerShell Gallery:

Install-Module –Name BinShred

For a full treatment of how to write these binary parsers, please see the included help topic. However, here’s a very simple example:

Consider a simple example of the following binary content:
	
	PS C:\> Format-Hex words.bin

			   Path: C:\words.bin

			   00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

	00000000   4C 48 02 00 00 00 05 00 00 00 48 65 6C 6C 6F 05  LH........Hello.
	00000010   00 00 00 57 6F 72 6C 64                          ...World
	
	From either documentation or investigation, we've determined that the file
	format has two main portions: a header, followed by a list of words. The
	header itself has 2 bytes in ASCII as the magic signature, followed by an
	integer representing the number count of the number of words. After that,
	each word entry has an integer representing the word length, followed by a
	word (of that length) in UTF8.
	
	A BinShred Template (.bst) for this file looks like this:
	
		header :
			magic (2 bytes as ASCII)
			wordCount (4 bytes as UINT32)
			words (wordCount items);
		words :
			wordLength (4 bytes as UINT32)
			word (wordLength bytes as UTF8);	
	
	Regions are identified as words followed by a colon. Within a region, you
	identify properties by writing their property names followed by the length
	and data type of that property. A semicolon identifies the end of a region.
	
	When you supply this template to the ConvertFrom-BinaryData cmdlet, the resulting
	object represents the data structures contained in that binary file as
	objects.
	
	PS > binshred -Path .\words.bin -TemplatePath .\wordParser.bst

	Name                           Value
	----                           -----
	magic                          LH
	wordCount                      2
	words                          (...)

	PS > (binshred -Path .\words.bin -TemplatePath .\wordParser.bst).Words[0]

	Name                           Value
	----                           -----
	wordLength                     5
	word                           Hello

While BinShred is capable of processing fairly complicated binary formats (such as the BMP example above,) you will likely run into data structures that require much more advanced parsing logic. For these, be sure to check out Kaitai Struct (https://kaitai.io/)), which is a very robust binary parsing engine. While it does not support binary parsing via PowerShell, it is possible to compile file format parsers one-at-a-time into C# files, which you can then load into PowerShell and use.