Searching for Content in Base-64 Strings

You might have run into situations in the past where you’re looking for some specific text or binary sequence, but that content is encoded with Base-64. Base-64 is an incredibly common encoding format in malware, and all kinds of binary obfuscation tools alike.

The basic idea behind Base-64 is that it takes arbitrary binary data and encodes it into 64 (naturally) ASCII characters that can be transmitted safely over any normal transmission channel. Wikipedia goes into the full details here:

Some tooling supports decoding of Base-64 automatically, but that requires some pretty detailed knowledge of where the Base-64 starts and stops.

The Problem

Pretend you’re looking for the string, “Hello World” in a log file or SIEM system, but you know that it’s been Base-64 encoded. You might use PowerShell’s handy Base-64 classes to tell you what to search for:


That seems useful. But what if “Hello World” is in the middle of a longer string? Can you still use ‘SGVobG8gV29fbGQ=’? It turns out, no. Adding a single character to the beginning changes almost everything:


Now, we’ve got ‘IEhlbGxvIFdvcmxk’.

The main problem here is the way that Base-64 works. When we’re encoding characters, Base-64 takes 3 characters (24 bits) and re-interprets them as 4 segments of 6 bits each. It then encodes each of those 6 bits into the 64 characters that you know and love. Here’s a graphical example from Wikipedia:


So when we add a character to the beginning, we shift the whole bit pattern to the right and change the encoding of everything that follows!

Another feature of Base-64 is padding. If your content isn’t evenly divisible by 24 bits, Base-64 encoding will pad the remainder with null bytes. It will use the “=” character to denote how many extra padding blocks were used:


When final padding is added, you can’t just remove those "=” characters. If additional content is added to the end of your string (i.e.: “Hello World!”), that additional content will influence both the padding bytes, as well as the character before them.

Another major challenge is when the content is Unicode rather than ASCII. All of these points still apply – but the bit patterns change. Unicode usually represents characters as two bytes (16 bits). This is why the Base-64 encoding of Unicode content representing ASCII text has so many of the ‘A’ character: that is the Base-64 representation of a NULL byte.


The Solution

When you need to search for content that’s been Base-64 encoded, then, the solution is to generate the text at all possible three-byte offsets, and remove the characters that might be influenced by the context: content either preceding what you are looking for, or the content that follows. Additionally, you should do this for both the ASCII as well as Unicode representations of the string.

An Example

One example of Base-64 content is in PowerShell’s –EncodedCommand parameter. This shows up in Windows Event Logs if you have command-line logging enabled (and of course shows up directly if you have PowerShell logging enabled).

Here’s an example of an event log like that:


Here’s an example of launching a bunch of PowerShell instances with the –EncodedCommand parameter, as well as the magical Get-Base64RegularExpression command. That command will generate a regular expression that you can use to match against that content:


As you can see in this example, searching for the Base-64 content of “My voice is my” returned all four log entries, while the “My voice is my passport” search returned the single event log that contained the whole expression.

The Script

Get-Base64RegularExpression is a pretty simple script. You can use this in PowerShell, or any event log system that supports basic regular expression searches.

You can find it on the PowerShell Gallery:

Install-Script Get-Base64RegularExpression.ps1 

Searching for Content in XOR “Encrypted” Data

A while back, we talked about a common challenge in the security industry – searching for some known bad content (i.e.: “Invoke-WebRequest”) in content that you know has been encoded in base64. In a really cool bout of co-discovery, others simultaneously wrote similar implementations. Since then, this approach is now in the process of being integrated into YARA. Very cool times!

Another situation you might have run across is malware authors “encrypting” their content with a static XOR key – a process I like to call “encraption”. One of the neat things about XOR encraption is that you use a single-byte key to encode the data by simply using the XOR operator on each byte of the data. To reverse the process, you just do it again. Despite being horrible from a security perspective, it is somewhat reliable at basic obfuscation to break string searching and simple signatures.

This pattern of decoding content (Base64, XOR, etc.) before running it is extremely common – and is a major driver behind why we added the Antimalware Scan Interface in Windows. This is great at stripping these layers of obfuscation from content at runtime.

But what about static analysis or log hunting?

Like the challenge we had with Base64, SIEM systems don’t generally offer a way to decrapt embedded XOR content to let you search within it. But they do offer regular expressions. Can we take a similar approach to what we did in Base64 – generate a regular expression that matches content in XOR-encoded strings? It turns out, yes!

[Aside – in another wonderful bout of co-discovery, YARA added XOR encoding for files in August 2018.]

Let’s take a simple example – data that has been encrapted directly.


So a little script that reverses this and emits the output looks like this:


One of the key weaknesses of XOR is that there are only 255 possible XOR keys. If this script’s content made it into our SIEM, we could simply brute force the search. We could search for (“encrapted” BXOR 1) and then (“encrapted” BXOR 2) and then … and then (“encrapted” BXOR 40). Eventually, we would end up searching for for “MFKZIX\ML” and find it. And fortunately, Regular Expressions support searching for multiple patterns all at once, so we can have a script simply generate a regular expression for all possible XOR keys.



The full regex is pretty long (255 elements), but this is a portion of what it looks like under the hood:


Now, XOR content is rarely encoded in scripts directly. Depending on the XOR key, the content will usually end up containing bytes that are not valid for use within a string. Usually, you’ll find that scripts have base64-encoded the XOR encraption.

For this scenario, we can leverage the “-Raw” parameter of Get-XorRegularExpression. This will return the raw bytes (rather than the escaped Regex representation), which we can then feed into our base64 regex generator. The result is quite a beast (765 elements: 3 base64 representations of each XOR key), but still a valuable source to hunt with.

Here’s an example of this happening in a script directly (taken from the AMSI blog post earlier):

In this example, the malware author uses a Unicode encoding of the string, so we use the “-Unicode” parameter of Get-XorRegularExpression to have it operate against the Unicode string.


While large, this is a regex we can now use against SIEM systems as well. Here’s an example of searching (and finding!) this content in PowerShell’s Script Block logs in Azure Sentinel:


And for some additional fun, we can even use the –Raw parameter of Get-Base64RegularExpression to generate Yara rules out of these byte sequences.



So, with a bit of creativity, we can now search for base64 content, XOR encoded content, and more in any SIEM that supports regular expressions. Enjoy!

You can download these scripts from the PowerShell Gallery:

Install-Script Get-Base64RegularExpression –Scope CurrentUser
Install-Script Get-XorRegularExpression –Scope CurrentUser
Install-Script New-YaraStringSearchRule –Scope CurrentUser

Star Trek TOS Science, Engineering, Command Embroidery Patterns

If you’re looking to get one of the Star Trek TOS patches (Science, Engineering, or Command) embroidered on something, they often charge a lot of money to convert the logo to the format that embroidery machines understand.


Here are version I created in the Brother PES format, as well as in the native format I created it in (EmbroideryWare STICH).

Command - [Brother PES] [EmbroideryWare STICH]

Science - [Brother PES] [EmbroideryWare STICH]

Engineering - [Brother PES] [EmbroideryWare STICH]

Dragon Ball Z Logo Embroidery Pattern

If you’re looking to get the Dragon Ball Z logo embroidered on something, they often charge a lot of money to convert the logo to the format that embroidery machines understand.


Here’s a version I created in the Brother PES format, as well as in the native format I created it in (EmbroideryWare STICH).

[Brother PES]

[EmbroideryWare STICH]

PowerShell Logo Embroidery Pattern

If you’re looking to get the PowerShell logo embroidered on something, they often charge a lot of money to convert the logo to the format that embroidery machines understand.


Here’s a version I created in the Brother PES format, as well as in the native format I created it in (EmbroideryWare STICH).

[Brother PES]

[EmbroideryWare STICH]

WebSockets from Scratch


In the web application world – especially single-page applications – smooth and fluid interaction is key. For many years, these applications have been doing a pretty good job of getting this fluid interaction though AJAX techniques and browser support for XMLHttpRequest. One issue, however, is that XMLHttpRequest requires that all of your communication go through an text-based HTTP protocol. Another issue is that XMLHttpRequest doesn’t let a server initiate communication back to connected clients. Instead, clients need to continuously poll the server to find out if it has anything to say.

To solve that need, browser vendors introduced the WebSocket protocol. The WebSocket protocol is a persistent, two-way TCP connection between a WebSocket client (traditionally a browser) and a server.

You might wonder: “That’s all well and good, but why should I care?” That’s a good question with a simple answer: Smile This is a really cool project designed by Tim McGuffin (@NotMedic). Whatever you draw on the website goes onto his badge – via a 64x32 LED board, and a bunch of more cool software.


Tim carried this around DEF CON this year, and people had lots of fun Smile


But can we have some _real_ fun? Smile

Digging into the Implementation

When you dig into the code a little bit, you can see how the browser does the communication. For every pixel you draw on the canvas, it converts it into a JSON object with a command called “DRAW”, and “DATA” that represents an X, Y, Color combination. Then it sends that string data to a connected WebSocket (exampleSocket.send(…)).


Partial Automation – Code Generation

The first step to having fun with Tim’s badge was through code generation. It is really easy in PowerShell and .NET to iterate through the pixels of an image (bitmap, PNG, etc.). Each pixel you access gives you its color. From there, I took a picture I manually resized to the correct dimensions and had PowerShell generate JavaScript code that I could copy + paste into the Developer Tools console.


And from there, the Mona Lisa made her first appearance on Tim’s badge Smile


That was fun, but copying + pasting code that PowerShell generated into a browser still felt a little hacky. Why don’t I just talk to the WebSocket directly from PowerShell? There are a few C# libraries for doing this out there, but I thought it would be a fun / interesting project to implement the protocol from scratch. So that’s what we’re going to do Smile

WebSockets from Scratch

The WebSocket protocol is defined by RFC6455, which goes through the protocol in great detail.

Initial Upgrade Request

For the first phase of the connection, the browser makes a standard TCP connection to the remote server. Here’s how to do that in PowerShell:


In that TCP connection, it makes a standard HTTP request, requesting an upgrade to the websocket protocol. Part of this HTTP request includes a Sec-WebSocket-Key header, which is intended to ensure that random HTTP requests can’t be retargeted to WebSocket servers, and that WebSocket client requests can’t be targeted to arbitrary other TCP servers. Here’s an example, which hard-codes the key for demonstration purposes.


Once the server accepts the connection upgrade, a well-written client will verify that the key included in the server’s response was correctly derived from the Sec-WebSocket-Key you provided. This is what a server response looks like:


Data Frames

Once the connection has been upgraded, client applications send frames of data. This isn’t the raw data itself – there is a structure around the data to describe it appropriately to the server accepting the connection.


To hold the frame data in PowerShell, we start with defining a byte array. The first 8 bits are:

  • FIN: 1 bit describing whether this is the final frame or not. In my experimentation with Chrome, this was always set to ‘1’.
  • RSV1, RSV2, and RSV3: 1 bit each that should always be zero unless you’re implementing a special protocol on top of WebSockets.
  • OPCODE: 4 bits. The one used by Chrome by default is “TEXT FRAME”, which has the value of 0001.

Putting these 8 bits together gives you a single byte with the bits of 10000001. So that’s how we start each frame:

The next 8 bits (the next byte) are:

  • MASK: Whether the data is “masked” by a random obfuscation key. This is recommended for the web browser developers themselves so that malicious web applications can’t cause arbitrary content to be written to the underlying TCP connection itself.
  • Payload Length: If the content is less than 126 bytes, this represents the payload length directly. Otherwise it needs to be the value ‘126’ with the next two bytes representing the actual payload length.

Since we are always supposed to mask the data, we define our mask as ‘1. However, this is supposed to be a bit at the leftmost position in the byte, so we need to shift it left by 7 places.


Since the mask and the payload length need to share the same byte (MASK being the leftmost bit and Payload Length being the rest), we use the –bor (Binary Or) bitwise operator to combine them into a single byte and then add that byte to the frame.


In the situation where the message length is greater than 126, though, the next two bytes need to also say how long the payload is. In C# and PowerShell, the way to get the bytes of a number is through the BitConverter.GetBytes() API. This API returns the bytes as they are represented by your processor. On most processors, this is Little Endian, where the least significant digits (think the 1s and 10s columns in decimal) come first.


The WebSocket protocol, however, requires that these bytes are in “Network Order” (the opposite of Little Endian), so we need to reverse them.


After the frame lengh, we need to provide the actual masking key. This is 4 bytes. As mentioned earlier, these are values used to obfuscate the content being transmitted to the server so that malicious web applications can’t communicate at the TCP level directly to the websocket server. The obfuscation algorithm is a simple XOR, where the four bytes in the masking key are used in a round-robin fashion against bytes in the content. Because this is was just for fun and the security protection is irrelevant, I provided a static masking key of “0” since anything XOR’d by 0 doesn’t change anything. That way, I didn’t have to implement the masking algorithm and the server wouldn’t notice Smile


Next, we get to add our actual content to the frame – in this case, a message representing the JSON of pixels that we want to draw.


Finally, we write this frame to the TCP connection and flush it


When all is said and done, pretty pictures show up on Tim’s badge Smile

When I sent Rick Astley, Defender ATP alerted Microsoft IT that one of its computers had PowerShell making raw TCP connections to a newly-registered domain. Amazing!


I tried to WannaCry Tim’s badge, and he still hasn’t paid Sad smile


And with true raw WebSockets control, I was able to give Tim’s badge a demoscene Smile Smile

So that’s how to go about WebSockets from Scratch. Hope this helps!

Efficiently Generating Python Hash Collisions

In 2003, Crosby and Wallach wrote an excellent paper demonstrating a new form of Denial of Service vulnerability against applications by abusing algorithmic complexity in the data structures that they depend on. For example, some data structures operate quite efficiently when given everyday input, but the performance degrades precipitously in certain edge cases.

If an application provides user-controlled data to one of these data structures, attackers can intentionally provide this worst case form of input to generate a Denial of Service attack against the application.

In 2011, Alexander Klink and Julian demonstrated this form of attack quite affectively against web servers that take attacker-provided HTTP headers and represent them internally as hashtables.

Hashtables work extremely quickly for most input by distributing items into different buckets based on the hash value of the item’s key. There is a degenerate case, however, when hash collisions cause the algorithm to distribute items into the same bucket. Once hundreds or thousands of items start being placed into the same bucket due to having the same hash value, the performance of a hashtable degrades exponentially. A web server that used to be able to handle 1,000 requests per second may now only be able to handle 10. Or 1. Or worse.


Many programming languages and frameworks now account for this type of attack. Python fixed this by default in Python 3.3+, which was released over 7 years ago. However, despite the fix being released over 7 years ago, many applications still build on Python 2.7 – including an application I was testing. Alexander and Julian didn’t ever release the code they used to DOS Python, so I embarked on an interesting journey to find the most efficient way possible to generate hashtable collisions to DOS Python.

Before this research, the most efficient mechanism I could find was from Robert Grosse. He focused on 64-bit hashes, and his approach took about 80 hours to generate 50,000 hash collisions. The result of my research (against 32-bit Python) generates billions of collisions essentially instantaneously (as fast as your computer can print them to the screen, write them to a file, etc.).

Background: Understanding Python’s hash implementation

Python (remember: 3.2 and below) uses a FNV-style algorithm to calculate the hash values of strings. The hash value is a single 32-bit number that is the result of running a calculation over all of the characters in the input string. Here’s a simple re-implementation in C that takes the string to hash as its first argument:


The variable ‘x’ represents the resulting hash value. The variable ‘a’ represents the string to be hashed. The variable ‘p’ represents the value of the current character being processed.

Let’s walk through an example of this algorithm.


On line 21, the hash value is initialized to zero. The PowerShell console on the right-hand side shows the state of the hash value throughout the algorithm, and the highlighted line shows the bits of the current value of the hash. They are all zero, of course, since we just initialized it to zero.


Line 22 takes the value of the first letter (‘L’ is 1001100 in binary), shifts it left by 7 places, and then blends it into the hash value with the XOR operator.


Line 25 is where we start the character-by-character processing. Python takes the current value of the hash, and multiplies it by 1000003. This is a large prime number, which essentially largely and unpredictably churns the bits in the leftmost digits (the red square – I call that area the “bit blender”). Then, it uses the XOR operator to bring this character into the hash value for the rightmost digits.



Then, we rinse and repeat until the end of the string.


For the final stage, Python takes the current hash value and uses the XOR operation to incorporate the length of the string, which is 3 in our case. This step is done to reduce the amount of collisions for short strings.


When we convert this final value to an integer, we get the hash value that you know and love from Python 3.2 and below.

Meet in the Middle Attacks

Now that we have an understanding of how Python implements its hash algorithm, we can explore the previous state-of-the-art technique used to attack the algorithm.

Pretend that you’re trying to generate hundreds or thousands of strings that all hash to the value ‘0’ as a way to generate a hashtable denial of service. One way to do this is to brute force your way through billions and billions of strings and hope to find hundreds or thousands that have the hash value of ‘0’. This is incredibly inefficient and will take you many hundreds of years of computation.

Another way is called a “Meet in the Middle Attack”. This is the method used by Alexander Klink and Julian in their CCC presentation, as well as the technique explored more deeply for 64-bit Python by Robert Grosse.


As Python iterates through a string, it keeps modifying and altering a running hash value. Once it gets to the end of the string, it outputs the hash value that it’s calculated. For example, imagine that the running hash value of the first part of a string – QCMWaIOFFgH – was 3B847A2A. Well, it turns out that Python’s FNV hash algorithm can partially be reversed as well. If you try to work backwards from a desired hash value (i.e.: 0), you can figure out a set of intermediate hash values such that “if the intermediate hash value is <x> and the rest of the string is <y>, then the resulting hash value will be 0”. This isn’t fully deterministic: for a given hash value or internal state, there are many characters and previous internal states that can lead to it.

You can combine both of these tricks with a lot of brute force to massively speed up a naïve brute force approach.

As the first step, we calculate the hash values of a bunch of strings. We record the internal hash value as Python processes each character. For each internal state value, we save the characters in the string that led up to it.

As the second step, we take a desired hash value (i.e.: 0) and reverse the hashing process. We iterate through a bunch of hypothetical preceding characters and figure out the internal state hash value that could have led to it.

Both of these steps require a lot of computational power and a lot of memory.

The example above demonstrates a situation where we found:

  1. A string, QCMWaIO, where processing it up until that point generated an internal hash value of 3B847A2A
  2. Two strings, vpl and wQl, where if the internal hash value was 3B847A2A then it would result in a hash value of 0

If you combine these two paths together, you get a hash collision: two strings (QCMWaIOvpl and QCMWaIOwQl) that both hash to the value of 0. Apply enough time, memory, and computational power and you have thousands of hash collisions! This is many thousands or millions of times faster than the basic brute force approach.

Hash Stretching: Making Meet-in-the-Middle attacks 250 times more efficient

One thing that became clear after analyzing Python’s FNV-style algorithm is that it has a critical weakness: intermediate hash values of zero.


If the intermediate hash state (the variable x in this sample code) ever gets to 0, then multiplying it by 1000003 doesn’t change the intermediate hash value at all and you don’t get the unpredictable mixing that used to previously happen with that step. Then, you blend in an attacker-controlled character (the variable p in this sample code) and attackers have fine-grained control over the internal hash state.

It only takes a few seconds of brute forcing to find strings that at some point have an internal hash state of zero.

This introduces two major attacks:


  1. Control of the final hash value. If the internal hash state ever reaches zero, the next character of the string completely defines the new internal hash state. If there are no more characters in the string, then the final hash value (as in the first example above) will simply be the length of the string due to Python’s final XOR with the length of the string. Additionally, since you know that Python will do a final XOR by the length of the string, you can use your next character after the state hits ‘0’ to make this final XOR generate a final hash value of anything between 0 and 255. For example, if you supply the next character as the decimal value ‘5’, an XOR by the new string length of ‘5’ will result in a hash value of zero.
  2. Hash collision stretching. If the internal hash state ever hits zero, processing a character with the value of zero doesn’t change anything except the string length. Because of this, you can stretch a single hash collision into more hash collisions by simply adding new NULL characters when the internal hash value reaches zero. This changes the string length, so you also need to use technique #1 (controlling the final hash value) to remove the string length from the final calculation. This works for strings of length up to 255 (the maximum value of a byte you control), which lets you generate about 250 more collisions for every collision you find where the internal state reaches zero.

Using this approach, this technique was able to generate 10,000+ collisions with only a few hours of brute force. A fun and huge improvement to the state of the art, but we’re not done yet 🙂

Generating Billions of Collisions: The Null Points method

As mentioned, Python’s FNV implementation has a critical weakness whenever the internal hash state reaches zero. But finding those strings with an internal hash state of zero required a couple of hours of brute force. This leads to a question: Since the algorithm starts the hash state at zero and we find a part of a string (i.e. f8ds3k3kk) that leads to an internal hash state of zero, couldn’t we just repeat that substring to keep on finding new strings with internal hash states of zero? Like f8dsf8dsf8dsf8ds3k3kk?

Initially, the answer appeared to be ‘No’.


The big issue is Python’s processing of the initial character. If you recall the very first step, Python shifts the initial character to the left by 7 bits, thoroughly placing it into the unpredictable Bit Blender. If we tried to place that character again after we hit the internal state of zero, we wouldn’t be able to replicate the left shift by 7 places. So it seems we’re out of luck.

Except for one initial character!


When we shift left by seven bits, there is exactly one byte that doesn’t shift anything into the bit blender. The value ‘1’. When the algorithm shifts ‘1’ left by 7 bits, you get the value ‘128’ (0x80) with nothing in the bit blender – leaving an intermediate hash value that we can still completely influence with further characters under our control. So if we ever find a string that starts with the byte ‘1’ and eventually gets to an internal state of ‘0’, we can absolutely use these little islands of characters to get back to an internal hash state of ‘0’.

Here’s an example. After a few seconds of brute force, we find the string \x01\x02\xBC\x6F\xA4\xB6 that gets to an internal hash state of ‘0’.


As our next character, we use 0x80 (the initial ‘1’ shifted left 7 bits), and then we can clone our attack string again (“\x01\x02\xBC\x6F\xA4\xB6”) to get back to an internal hash state of zero. We use our final byte to influence the string length XOR to produce a final hash result of our choice, and we now have a repeatable way to influence the hashing operation from beginning to end.

More importantly, our little attack meta-pattern (“\x01\x02\xBC\x6F\xA4\xB6”) is only one example of a character sequence vulnerable to the null points method. With a bit more brute forcing, it’s possible to find these meta-patterns pretty quickly. With a few hours of computation, I easily found 16 patterns that fit the definition of criticality:

  • They started with the byte ‘1’
  • They caused Python’s internal hash state to reach ‘0’ at some point

The true beauty of this attack is that the meta-patterns used in this attack are interchangeable. The example above demonstrates that we can generate a lot of collisions from strings in the form of "<Pattern1>0x80<Pattern1>0x80<Pattern1>”. But since we’re just abusing the fact that we got to an internal state of ‘0’, we can also generate them in the form of “<Pattern1>0x80<Pattern2>0x80<Pattern1>”. Or “<Pattern1>0x80<Pattern1>0x80<Pattern2>”. Start combining these meta-patterns systematically, and you have nearly infinite collisions at your disposal.


With these 16 meta-patterns of only 6 bytes each, you can generate 4 billion hash collisions against Python’s hash algorithm using only 48 bytes each. Due to the magic of combinatorics, the sky really is the limit: this technique will let you generate hash collisions against Python 3.2 and below as fast as your hard drive or network can process them. Have a 256-byte limit on key length? That’s 3.74144419156711E+50 (374144419156711000000000000000000000000000000000000) collisions! And if you want to experience a true embarrassment of riches, we can make this attack 250x more efficient with the hash collision stretching approach discussed earlier. But with this many zeros, who’s counting 🙂

Impact on Python Dictionary Insertion Performance

In raw numbers, Python’s dictionary insertion performance is pretty fast – even with collisions. But at only 50,000 colliding entries in a hashtable, Python starts to peg a CPU for 80 or 90 seconds. This represents about 2mb of raw data. So with a second or two of data transfer, you can keep a CPU busy for a minute or two.


The critical problem is that this performance degrades exponentially. Submit 4mb of data (10,000 collisions), and you may never see that CPU again. Submit 4 billion collisions? 3.74144419156711E+50 collisions? It’s safe to say that if you are exposing Python 3.2 (or below) hash tables to untrusted input, you have a significant security vulnerability on your hands. And by the way – json.loads() generates a dictionary backed by a hashtable.

Coordinated Vulnerability Disclosure

While this research demonstrates a fundamental break in the Python 3.2 (or below) hash algorithm, Python fixed this issue 7 years ago. It’s time to upgrade.

Extracting Forensic Script Content from PowerShell Process Dumps

After posting Extracting Activity History from PowerShell Process Dumps, I got an interesting follow up question: “Is it possible to extract the content of scripts (from disk) that were executed, even if those files were not captured?”

The answer is “Yes”, but it’s also complicated. And to make it even more complicated, we’re going to go down a path showing how to do some of this detective work from scratch. This is going to require a lot of WinDbg automation, so for a first step, install the WinDbg module.

To set up our forensics experiment, create this simple script. Save it somewhere like c:\temp:


Open a PowerShell session, run the script, and create a dump file.


Now, use the WinDbg module to connect to the dump file:

Connect-DbgSession -ArgumentList '-z "C:\Users\lee\AppData\Local\Temp\powershell.DMP"'

Begin our investigation

To begin our investigation, let’s cast a really wide net. We know we want to extract objects (if they exist) that represent scripts that were run in that session. But how do we find these?

First, let’s use SOS’s “Dump Object” command to dump everything it knows about every single object in the process. So, we’ll start with the !DumpHeap command to find all object instances (i.e.: we won’t even use the –Type filter). There are “smarter” ways to do it, but this step and the next will take a very long time, so maybe get it going before bed or something.

$allReferences = dbg !dumpheap -short

Once we have all object references, let’s use the !do (Dump Object) command to have SOS visualize them all. The output of Dump Object doesn’t include the address of the object being dumped, so we’ll use Add-Member to keep track of that as well.

$allObjects = $allReferences | Foreach-Object { $object = dbg "!do $_"; Add-Member -InputObject $object Address $_ -PassThru -Force }

(The next day) That’s a mighty hay stack indeed! On my system, there are about a million objects that SOS knows about in this process instance. Do any of them have any part of the GUID in the way that SOS would visualize them? Let’s find out!


Looks like we’re in luck! Out of those million objects, we managed to narrow it down to 7 System.String objects in PowerShell’s memory that somehow referenced the GUID. If we think the information might have been in a System.String all along, we could have made our initial “$allObjects” query faster by using “$allReferences = dbg !dumpheap –type System.String –short”. But how do we figure out what’s holding these GUIDs?

To find out, we’ll use SOS’s !gcroot command. This is commonly used to diagnose managed memory leaks – for example, “What am I doing that’s causing the CLR to hold onto 10 million instances of this string?” For any given object, the !gcroot command tells you what object is referencing it and what object is referencing that one - all the way until you hit the root of the object tree. Let’s explore some of these roots.


Ok, so the last one (item #6 in the array) wasn’t actually rooted. It was no longer referenced, and would be cleaned up by the garbage collector shortly.

Item #5 was rooted through an object array (System.Object[]), where one of those elements was a ConcurrentDictionary, which held a ScriptBlock, which held CompiledScriptBlockData, which held nodes in a PowerShell AST, bottoming out in a CommandAst AST that referenced this GUID.

Sounds cool. What about any others? Here’s item #4 in my instance:


This is interesting! This one starts with the same root object array (0000026e101e9a40), the same ConcurrentDictionary (0000026e003bc440), but this time bottoms out into a tuple (a simple pairing of two items) that contains our string and another string. Let’s dive into that tuple and the strings it contains.


So this tuple has two elements. The first element looks to be the path to the script that was executed, and the second element looks to be the content that was in that script. Let’s see what the PowerShell Source has to say about these data structures. I’ll search for ConcurrentDictionary to see what I can find. On the third page, we can see exactly what we’re looking at:


There’s a class called CompiledScriptBlock. It contains a static (process-wide) cache called “s_cachedScripts”. This is a dictionary that maps a pair of strings to an instance of a ScriptBlock. And if you read the source, you can see exactly what the Tuple is as well – a mapping of a script’s path to the content it contained at the time the ScriptBlock was cached:


This data structure is what we ended up poking around in. For performance reasons, PowerShell maintains an internal script block cache so that it doesn’t need to re-compile the script blocks every time it sees a script. That cache is keyed off of the path and script contents. The thing stored in the cache is an instance of a ScriptBlock class, which contains  (among other things) the AST of the script that was compiled.

So now that we know this thing exists, we can be much smarter in our automation and extract this stuff intentionally! Now we’re getting into real scripting, so this is what we’ll do:

  1. Use !dumpheap to find instances of this Tuple class. The dumpheap command does a substring search, so we’ll do a bit of post-processing with a regex.
  2. This gives us the MT of the tuple class that we actually want to investigate.
  3. Run !dumpheap again with that MT as a filter


Now we can explore one of these nodes. It has a m_key that we can dive into.


Almost there! Let’s extract out the two items from those resulting keys, and emit a pretty PowerShell object:


It’s been a long journey. But: we investigated a hypothesis from scratch, followed it through, and now are able to forensically recover the content of all scripts from the PowerShell process memory even if you no longer have access to the files in question. Awesome Smile

Here’s a script that packages all of this into a function.

function Get-ScriptBlockCache
    $nodeType = dbg !dumpheap -type ConcurrentDictionary |
        Select-String 'ConcurrentDictionary.*Node.*Tuple.*String.*String.*\]\]$'
    $nodeMT = $nodeType | ConvertFrom-String | Foreach-Object P1
    $nodeAddresses = dbg !dumpheap -mt $nodeMT -short
    $keys = $nodeAddresses | % { dbg !do $_ } | Select-String m_key
    $keyAddresses = $keys | ConvertFrom-String | Foreach-Object P7

    foreach($keyAddress in $keyAddresses) {
        $keyObject = dbg !do $keyAddress

        $item1 = $keyObject | Select-String m_Item1 | ConvertFrom-String | % P7
        $string1 = dbg !do $item1 | Select-String 'String:\s+(.*)' | % { $_.Matches.Groups[1].Value }

        $item2 = $keyObject | Select-String m_Item2 | ConvertFrom-String | % P7
        $string2 = dbg !do $item2 | Select-String 'String:\s+(.*)' | % { $_.Matches.Groups[1].Value }

        [PSCustomObject] @{ Path = $string1; Content = $string2 }

Extracting Activity History from PowerShell Process Dumps

Imagine that you’re investigating the compromise of a system. The system doesn’t have PowerShell Logging enabled, but you did capture a process dump while activity was happening.

This memory dump is forensic gold, and the managed code debugging extension for WinDbg (“SOS” – Son of Strike) gives you all the tools you need to mine it. After using File | Open Crash Dump, this is what you see:


From there, load the SOS extension, fix symbols, and reload:

.loadby sos clr

Use !Help to see all of the features offered by the SOS CLR Debugging Extension, but one useful one is !DumpHeap. It enumerates all of the objects in managed memory, and lets you filter these by type.

PowerShell stores command history in HistoryInfo objects, so lets look for those. We can use !DumpHeap –Type HistoryInfo for this. SOS sometimes generates an exception the first time you use it, so just do it again 🙂


We can see that there were 6 items of HistoryInfo (commands typed by the attacker), and 7 arrays of HistoryInfo (internal data structures that hold them). The heap includes objects that are temporarily in use, so not all of these represent unique commands.

If we click on the MT column for HistoryInfo itself, we’ll get that type of object explicitly. Then, we can click on any of the Addresses to see what that HistoryInfo contains. Clicking is just a shortcut for the commands that WinDbg ultimately shows in the output anyways: !DumpHeap /d -mt 00007ff8a140be70 and !DumpObj /d 0000024226256100.


There’s a string at offset 8 that contains the _cmdline itself. What’s in it? You can click (!DumpObj /d 0000024226255a30) to find out.


Ah, yes. The eternal existential question. Who am I?

Doing this manually for each HistoryInfo can get annoying, so you can use WinDbg’s “scripting language” to automate some of it. The “.foreach” statement lets us iterate over debugger output, assign each item to a variable, and then act on that variable. So here’s a little recipe we’ll run:

.foreach (historyinfo { !dumpheap -type HistoryInfo -short }) { .echo ${historyinfo}; !DumpObj poi(${historyinfo}+8) }

We’ll iterate through the !DumpHeap results, assigning each memory address to a ‘historyinfo’ variable. We’ll echo that original memory address to the screen, and then remember that the _cmdline was offset 8 from that memory address. So, we’ll use the poi function to calculate the address of that new object, and use !DumpObj to dump it.

That gives us a bunch of output that we can then review, or script even further.


In our forensic analysis of this image, we can see that we are in trouble indeed. After they ran ‘whoami’, they knew the username and password of the Domain Admin, and then used PowerShell remoting to connect to the Domain Controller with it.


User interfaces are great for initial analysis, but terrible for scale. To automate scenarios like this, you can use this WinDbg Automation module I shared a little while ago. It leverages cdb.exe – the console version of WinDbg. From there, you can do all kinds of crazy stuff.

As cool as that module is, I’m most excited by far about the recent open sourcing of DbgShell. This is the hobby-time project of @JazzDelightsMe, and exposes much of the debugging engine into PowerShell as PowerShell objects. Incredible!


Producer / Consumer Parallelism in PowerShell

After it’s done indexing your data, Scour is blazingly fast at local content searches – far faster than Select-String, grep, or anything else that rips through files when you search. It accomplishes this through the power of the popular Lucene search engine. Similar to the way that publishers have made searching physical books fast, Lucene must first create an index of your content before you can search it.

So while Scour was incredibly fast at searching content, the indexing process took much longer than I wanted it to. In one directory (20,000 files), it took about 13 minutes.

Most of this was my fault – I had put in a call to [GC]::Collect() after every file. This was temporary code I had put in when I was investigating some memory issues, and it destroyed performance. After removing that mistake, indexing my test directory was now down to 45 seconds. After spending an afternoon on further tweaks, the final performance measurement was down to 9 seconds. That’s a 90x improvement if you include the sandbag, and merely a 5x improvement if you don’t 🙂

The majority of this performance came from introducing multi-threading into the indexing process.

There are lots of great threading techniques in PowerShell – both built-in, and from the community: Jobs, PoshRSJob, and Thread Jobs, to name a few. All of these do an excellent job of running multiple blobs of code in parallel.

Managing Parallelism

One thing that’s not often talked about is how you manage this parallelism. Once you have half a dozen threads running, what do you do with them? If you have hundreds of thousands of work items, how do you dispatch them to the threads and receive the output? How do you track overall status?

Image result for producer consumer

One approach to this is the Producer-Consumer pattern. Many cloud architectures use components like Azure Service Bus to accomplish this at a large scale, but it’s equally helpful for work management among threads. In the Producer-Consumer pattern, you have one thread generating work. In this situation, it was a PowerShell script generating lists of file names for Lucene to index. Then, you have multiple threads consuming this shared pipeline – fetching whatever work is available, doing the work itself, and then submitting the output back to a shared output pipeline. In this situation, these were background threads receiving file names and then doing the work to add those files to the index.

This pattern can be successful when the description of the work (i.e.: a filename, or coordinate of an image) is much smaller than the work itself (indexing that filename, or rendering that portion of the image).

Example Implementation

You can see the implementation of this in Scour if you are interested, but the details are somewhat obscured by the details and context of the actual implementation. Instead, here is a heavily-commented example of implementing this pattern in PowerShell, simply passing around GUIDs and printing them out.


I’ve also shared this on the PowerShell Gallery (Install-Script Invoke-ProducerConsumer –Scope CurrentUser).

.GUID bfb939b9-03f0-433e-ad0f-e4e12f4a009c
.AUTHOR Lee Holmes

  Example implementation of producer / consumer parallelism in PowerShell

## The script block we want to run in parallel. Threads will all
## retrieve work from $InputQueue, and send results to $OutputQueue
$parallelScript = {
        ## An Input queue of work to do
        ## The output buffer to write responses to
        ## State tracking, to help threads communicate
        ## how much progress they've made
        $OutputProgress, $ThreadId, $ShouldExit

    ## Continually try to fetch work from the input queue, until
    ## the 'ShouldExit' flag is set
    $processed = 0
    $workItem = $null
    while(! $ShouldExit.Value)
        if($InputQueue.TryDequeue([ref] $workItem))
            ## If we got a work item, do something with it. In this
            ## situation, we just create a string and sleep a bit.
            $workItemResult = "Processing $workItem in thread $ThreadId"
            Start-Sleep -Seconds (Get-Random -Max 3)

            ## Add the result to the output queue

            ## Update our progress
            $OutputProgress[$ThreadId] = $processed
            ## If there was no work, wait a bit for more.
            Start-Sleep -m 100

## Create a set of background PowerShell instances to do work, based on the
## number of available processors.
$threads = Get-WmiObject Win32_Processor | Foreach-Object NumberOfLogicalProcessors
$runspaces = 1..$threads | Foreach-Object { [PowerShell]::Create() }
$outputProgress = New-Object 'Int[]' $threads
$inputQueue = New-Object 'System.Collections.Concurrent.ConcurrentQueue[String]'
$outputQueue = New-Object 'System.Collections.Concurrent.ConcurrentQueue[String]'
$shouldExit = $false

## Spin up each of our PowerShell runspaces. Once invoked, these are actively
## waiting for work and consuming once available.
for($counter = 0; $counter -lt $threads; $counter++)
    $null = $runspaces[$counter].AddScript($parallelScript).
        AddParameter("InputQueue", $inputQueue).
        AddParameter("OutputQueue", $outputQueue).
        AddParameter("OutputProgress", $outputProgress).
        AddParameter("ThreadId", $counter).
        AddParameter("ShouldExit", [ref] $shouldExit).BeginInvoke()

## Some fake work - send 50 GUIDs into our worker threads
$estimated = 50
1..$estimated | Foreach-Object {
    $currentInput = New-Guid

## Wait for our worker threads to complete processing the
## work.
        ## Update the status of how many items we've processed, based on adding up the
        ## output progress from each of the worker threads
        $totalProcessed = $outputProgress | Measure-Object -Sum | Foreach-Object Sum
        Write-Progress "Processed $totalProcessed of $estimated" -PercentComplete ($totalProcessed * 100 / $estimated)
        ## If there were any results, output them.
        $scriptOutput = $null
        while($outputQueue.TryDequeue([ref] $scriptOutput))

        ## If the threads are done processing the input we gave them, let them know they can exit
        if($inputQueue.Count -eq 0)
            $shouldExit = $true
        Start-Sleep -m 100

        ## See if we still have any busy runspaces. If not, exit the loop.
        $busyRunspaces = $runspaces | Where-Object { $_.InvocationStateInfo.State -ne 'Complete' }
    } while($busyRunspaces)
    ## Clean up our PowerShell instances
    foreach($runspace in $runspaces)