Searching for Content in XOR "Encrypted" Data
A while back, we talked about a common challenge in the security industry – searching for some known bad content (i.e.: “Invoke-WebRequest”) in content that you know has been encoded in base64. In a really cool bout of co-discovery, others simultaneously wrote similar implementations. Since then, this approach is now in the process of being integrated into YARA. Very cool times!
Another situation you might have run across is malware authors “encrypting” their content with a static XOR key – a process I like to call “encraption”. One of the neat things about XOR encraption is that you use a single-byte key to encode the data by simply using the XOR operator on each byte of the data. To reverse the process, you just do it again. Despite being horrible from a security perspective, it is somewhat reliable at basic obfuscation to break string searching and simple signatures.
This pattern of decoding content (Base64, XOR, etc.) before running it is extremely common – and is a major driver behind why we added the Antimalware Scan Interface in Windows. This is great at stripping these layers of obfuscation from content at runtime.
But what about static analysis or log hunting?
Like the challenge we had with Base64, SIEM systems don’t generally offer a way to decrapt embedded XOR content to let you search within it. But they do offer regular expressions. Can we take a similar approach to what we did in Base64 – generate a regular expression that matches content in XOR-encoded strings? It turns out, yes!
[Aside – in another wonderful bout of co-discovery, YARA added XOR encoding for files in August 2018.]
Let’s take a simple example – data that has been encrapted directly.
So a little script that reverses this and emits the output looks like this:
One of the key weaknesses of XOR is that there are only 255 possible XOR keys. If this script’s content made it into our SIEM, we could simply brute force the search. We could search for (“encrapted” BXOR 1) and then (“encrapted” BXOR 2) and then … and then (“encrapted” BXOR 40). Eventually, we would end up searching for for “MFKZIX\ML” and find it. And fortunately, Regular Expressions support searching for multiple patterns all at once, so we can have a script simply generate a regular expression for all possible XOR keys.
The full regex is pretty long (255 elements), but this is a portion of what it looks like under the hood:
Now, XOR content is rarely encoded in scripts directly. Depending on the XOR key, the content will usually end up containing bytes that are not valid for use within a string. Usually, you’ll find that scripts have base64-encoded the XOR encraption.
For this scenario, we can leverage the “-Raw” parameter of Get-XorRegularExpression. This will return the raw bytes (rather than the escaped Regex representation), which we can then feed into our base64 regex generator. The result is quite a beast (765 elements: 3 base64 representations of each XOR key), but still a valuable source to hunt with.
Here’s an example of this happening in a script directly (taken from the AMSI blog post earlier):
In this example, the malware author uses a Unicode encoding of the string, so we use the “-Unicode” parameter of Get-XorRegularExpression to have it operate against the Unicode string.
While large, this is a regex we can now use against SIEM systems as well. Here’s an example of searching (and finding!) this content in PowerShell’s Script Block logs in Azure Sentinel:
And for some additional fun, we can even use the –Raw parameter of Get-Base64RegularExpression to generate Yara rules out of these byte sequences.
So, with a bit of creativity, we can now search for base64 content, XOR encoded content, and more in any SIEM that supports regular expressions. Enjoy!
You can download these scripts from the PowerShell Gallery:
Install-Script Get-Base64RegularExpression –Scope CurrentUser
Install-Script Get-XorRegularExpression –Scope CurrentUser
Install-Script New-YaraStringSearchRule –Scope CurrentUser