TripleAgent: Even Zeroer-Tay Code Injection and Persistence Technique

Overview

We'd like to introduce a new Zero-Tay technique for injecting code and maintaining persistency against common advanced attacker toolkits dubbed TripleAgent. We discovered this by ourselves in our very advanced labs, and are in the process of registering a new vanity domain as we speak.

TripleAgent can exploit:

  • Every toolkit version
  • Every toolkit architecture (x86 and x64)
  • Every toolkit user (RED / PURPLE / APT / NATION STATE / etc.)
  • Every toolkit process (including PoC, GTFO, PoC||GTFO, METASPLOIT, UNICORN)

TripleAgent exploits a fundamental flaw in the design of commonly used advanced attacker toolkits, and therefore cannot be patched.

Code Injection

TripleAgent gives the defender the ability to inject any DLL into any attacker toolkit. The code injection occurs extremely early during the victim's process boot, giving the defender full control over the process and no way for the process to protect itself. The code injection technique is so unique that it's not detected or blocked by even the most advanced threaty threats.

Attack Vectors

  • Attacking persistence toolkits - Taking full control of ANY persistence toolkit by injecting code into it while bypassing all of its self-protection mechanisms. The attack has been verified and works on all bleeding-edge attacker toolkits including but not limited to: DoubleAgent.

Technical Deep, Deep, Deep, Dive

An example of an advanced attacker toolkit is known as DoubleAgent. This attacker toolkit exploits a fundamental issue in Windows, nay computing, NAY HUMANITY itself.

When this advanced toolkit runs, it is widely acknowledged to provide complete control over other unwitting applications. However, we can apply our new TripleAgent framework to this toolkit to completely neutralize it. Rather than have it infect target systems, we can write a few simple lines of code to make it instead launch the Windows Update settings dialog!

static BOOL main_DllMainProcessAttach(VOID)
{
    PROCESS_Create(L"c:\\windows\\system32\\cmd.exe"L"/c start ms-settings:windowsupdate");

 

    return TRUE;
}

 

Once run, we can see the significant impact of our new zero-tay technique. The first invocation installs our TripleAgent exploit, rendering the advanced "DoubleAgent" threat completely harmless during its second invocation.

Mitigations

Unfortunately, there are no mitigations or bypasses for this extremely advanced defensive technique. We do however offer highly-advanced next generation cyber threat intel cloud machine learning offensive services. Just putting that out there.

Adding a Let’s Encrypt Certificate to an Azure-Hosted Website

If you host your website in Azure, you might be interested in adding SSL support via Let's Encrypt. Azure doesn't offer any functionality to automate this or make it easy, but thankfully there are plenty of useful tools in the PowerShell community to make this easy.

  1. ACMESharp - A PowerShell module to interact with Let's Encrypt.
  2. Azure PowerShell - A set of PowerShell modules to interact with Azure.

What's been missing (until now!) is the glue. So now, here's the glue: Register-LetsEncryptCertificate.ps1.

So the steps:

  1. Install-Module AcmeSharp, Azure, AzureRM.Websites
  2. Install-Script Register-LetsEncryptCertificate.ps1
  3. Register-LetsEncryptCertificate -Domain www.example.com -RegistrationEmail admin@example.com -ResourceGroup exampleResourceGroup -WebApp exampleWebApp
  4. Visit https://www.example.com

Done!

 

Why is SeDebugPrivilege enabled in PowerShell?

We sometimes get the question: Why is the SeDebugPrivilege enabled by default in PowerShell?

This is enabled by .NET when PowerShell uses the System.Diagnostics.Process class in .NET, which it does for many reasons. One example is the Get-Process cmdlet. Another example is the method it invokes to get the current process PID for the $pid variable. Any .NET application that uses the System.Diagnostics.Process class also enables this privilege.

 

You can see the .NET code that enables this here:

            NativeMethods.LUID luid = default(NativeMethods.LUID);
            
if (!NativeMethods.LookupPrivilegeValue(null, "SeDebugPrivilege", out luid))
            
{
                
return;
            
}
            
IntPtr zero = IntPtr.Zero;
            
try
            
{
                
if (NativeMethods.OpenProcessToken(new HandleRef(null, NativeMethods.GetCurrentProcess()), 32, out zero))
                
{
                    
NativeMethods.TokenPrivileges tokenPrivileges = new NativeMethods.TokenPrivileges();
                    
tokenPrivileges.PrivilegeCount = 1;
                    
tokenPrivileges.Luid = luid;
                    
tokenPrivileges.Attributes = 2;
                    
NativeMethods.AdjustTokenPrivileges(new HandleRef(null, zero), false, tokenPrivileges, 0, IntPtr.Zero, IntPtr.Zero);
                
}
            
}

https://github.com/dotnet/corefx/blob/master/src/System.Diagnostics.Process/src/System/Diagnostics/ProcessManager.Windows.cs#L129

 

Detecting and Preventing PowerShell Downgrade Attacks

With the advent of PowerShell v5’s awesome new security features, old versions of PowerShell have all of the sudden become much more attractive for attackers and Red Teams.

PowerShell Downgrade Attacks

There are two ways to do this:

Command Line Version Parameter

The simplest technique is: “PowerShell –Version 2 –Command <…>” (or of course any of the –Version abbreviations).

PowerShell.exe itself is just a simple native application that hosts the CLR, and the –Version switch tells PowerShell which version of the PowerShell assemblies to load.

Unfortunately, the PowerShell v5 enhancements did NOT include time travel, so the v2 binaries that were shipped in 2008 did NOT include the code we wrote in 2014.  The 2.0 .NET Framework (which is required for PowerShell’s V2 engine) is not included by default in Win10+, but an attacker or Red Teamer could enable it or install it. Prior to Windows 10, where it is available by default, they could just use it.

 

Hosting Applications Compiled using V2 Reference Assemblies

When somebody compiles a C# application to leverage the PowerShell engine, they link against reference assemblies when they do that. If they link against the PowerShell v2 reference assemblies during development, Windows will use the PowerShell v2 engine (if available) when the application runs. Otherwise, PowerShell's type forwarding will run the application using the currently installed PowerShell engine.

This is what happens when PowerShell Empire's "psinject" module attempts to load PowerShell into another process (such as notepad).

 

Detection and Prevention

You have several options to detect and prevent PowerShell Downgrade Attacks.

Event Log

As a detection mechanism, the “Windows PowerShell” classic event log has event ID 400. This is the “Engine Lifecycle” event, and includes the Engine Version. Here is an example query to find lower versions of the PowerShell engine being loaded:

001
002
003
004
005
006
Get-WinEvent -LogName "Windows PowerShell" |
    Where-Object Id -eq 400 |
    Foreach-Object {
        $version = [Version] ($_.Message -replace '(?s).*EngineVersion=([\d\.]+)*.*','$1')
        if($version -lt ([Version] "5.0")) { $_ }
}

 

AppLocker / File Auditing

When the CLR loads PowerShell assemblies, it will first load the managed assemblies from the GAC (if they are available). It will also load the native images that contain pre-jitted code if the assemblies are NGEN’d (which they are). Here is what loading PowerShell v2 looks like:

These can either be an audit trigger, or can be blocked outright.

Be careful to not be too selective on the directories you monitor, as the CLR can also load assemblies from specific directories. For example, it is possible to use the CLR’s undocumented / unsupported DEVPATH environment variable to force the CLR to use a specified version of the assemblies rather than the GAC’d version. And if you don’t have a GAC’d version to override, PowerShell will do regular LoadLibrary() probing to find one – including its installation directory.

In addition, PowerShell can either be launched as a 32-bit process, or 64-bit process. A 64-bit system will load 64-bit PowerShell by default. A 32-bit system will load 32-bit PowerShell. On a 64-bit system, though, Windows will implicitly change the version of PowerShell that gets launched by looking at the bitness of the launching application: a 32-bit app will load other 32-bit apps. It is also possible for users or applications to do this explicitly by launching PowerShell from the WOW directory: c:\windows\syswow64\windowspowershell\v1.0\powershell.exe.

PS > dir *.dll -rec -ea ig | % FullName | ? { $_ -match 'System\.Management\.Automation\.(ni\.)?dll' }
C:\windows\assembly\GAC_MSIL\System.Management.Automation\1.0.0.0__31bf3856ad364e35\System.Management.Automation.dll
C:\windows\assembly\NativeImages_v2.0.50727_64\System.Management.A#\8b1355a03394301941edcbb9190e165b\System.Management.Automation.ni.dll
C:\windows\assembly\NativeImages_v4.0.30319_32\System.Manaa57fc8cc#\08d9ad8b895949d2a5f247b63b94a9cd\System.Management.Automation.ni.dll
C:\windows\assembly\NativeImages_v4.0.30319_64\System.Manaa57fc8cc#\4072bc1c91e324a1f680e9536b50bad4\System.Management.Automation.ni.dll
C:\windows\Microsoft.NET\assembly\GAC_MSIL\System.Management.Automation\v4.0_3.0.0.0__31bf3856ad364e35\System.Management.Automation.dll

 

If you’re going down the enforcement route via AppLocker or Device Guard path, the most robust solution is to block earlier versions of the PowerShell engine by version. Be sure to block both the native image and MSIL assemblies:

C:\Users\leeholm>powershell -version 2 -noprofile -command "(Get-Item ([PSObject].Assembly.Location)).VersionInfo"

ProductVersion   FileVersion      FileName
--------------   -----------      --------
6.1.7600.16385   6.1.7600.16385   C:\WINDOWS\assembly\GAC_MSIL\System.Management.Automation\1.0.0.0__31bf3856ad364e3...


C:\Users\leeholm>powershell -noprofile -command "(Get-Item ([PSObject].Assembly.Location)).VersionInfo"

ProductVersion   FileVersion      FileName
--------------   -----------      --------
10.0.14986.1000  10.0.14986.1000  C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\System.Management.Automation\v4.0_3.0.0...


C:\Users\leeholm>powershell -version 2 -noprofile -command "(Get-Item (Get-Process -id $pid -mo | ? { $_.FileName -match 'System.Management.Automation.ni.dll' } | % { $_.FileName })).VersionInfo"

ProductVersion   FileVersion      FileName
--------------   -----------      --------
6.1.7600.16385   6.1.7600.16385   C:\WINDOWS\assembly\NativeImages_v2.0.50727_64\System.Management.A#\8b1355a0339430...


C:\Users\leeholm>powershell -noprofile -command "(Get-Item (Get-Process -id $pid -mo | ? { $_.FileName -match 'System.Management.Automation.ni.dll' } | % { $_.FileName })).VersionInfo"

ProductVersion   FileVersion      FileName
--------------   -----------      --------
10.0.14986.1000  10.0.14986.1000  C:\WINDOWS\assembly\NativeImages_v4.0.30319_64\System.Manaa57fc8cc#\4072bc1c91e324...

Differences between Visual Studio 2003, 2005, 2008, 2010, 2012, 2013, and 2015

If you're interested in knowing when specific Visual Studio compiler options have been introduced, here you go.

2003 to 2005


Option                 Purpose                                                                                             
------                 -------                                                                                             
/analyze               Enable code analysis.                                                                               
/bigobj                Increases the number of addressable sections in an .obj file.                                       
/doc                   Process documentation comments to an XML file.                                                      
/errorReport           Allows you to provide internal compiler error (ICE) information directly to the Visual C++ team.    
/favor                 Produces code that is optimized for a specific x64 architecture or for the specifics of             
                       micro-architectures in both the AMD64 and Extended Memory 64 Technology (EM64T) architectures.      
/FC                    Display full path of source code files passed to cl.exe in diagnostic text.                         
/Fp                    Specifies a precompiled header file name.                                                           
/G1                    Optimize for Itanium processor. Only available in the IPF cross compiler or IPF native compiler.    
/G2                    Optimize for Itanium2 processor (default between /G1 and /G2). Only available in the IPF cross      
                       compiler or IPF native compiler.                                                                    
/GF                    Enables string pooling.                                                                             
/homeparams            Forces parameters passed in registers to be written to their locations on the stack upon function   
                       entry. This compiler option is only for the x64 compilers (native and cross compile).               
/hotpatch              Creates a hotpatchable image.                                                                       
/LN                    Creates an MSIL module.                                                                             
/openmp                Enables #pragma omp in source code.                                                                 
/QIPF_B                Does not generate sequences of instructions that give unexpected results, according to the errata   
                       for the B CPU stepping. (IPF only).                                                                 
/QIPF_C                Does not generate sequences of instructions that give unexpected results, according to the errata   
                       for the C CPU stepping. (IPF only).                                                                 
/QIPF_fr32             Do not use upper 96 floating-point registers. (IPF only).                                           
/QIPF_noPIC            Generates an image with position dependent code (IPF only).                                         
/QIPF_restrict_plabels Enhances performance for programs that do not create functions at runtime. (IPF only).              
/Zx                    Generates debuggable optimized code. Only available in the IPF cross compiler or IPF native         
                       compiler.                                                                                           
                                                                          

2005 to 2008


Option                 Purpose                                                    
------                 -------                                                    
/MP                    Compiles multiple source files by using multiple processes.
/Qfast_transcendentals Generates fast transcendentals.                            
/Qimprecise_fwaits     Removes fwait commands inside try blocks.                  

2008 to 2010


Option Purpose                                
------ -------                                
/Fi    Sets the preprocessed output file name.

2010 to 2012


Option                                         Purpose                                                                     
------                                         -------                                                                     
/kernel                                        The compiler and linker will create a binary that can be executed in the    
                                               Windows kernel.                                                             
/Qpar (Auto-Parallelizer)                      Enables automatic parallelization of loops that are marked with the #pragma 
                                               loop() directive.                                                           
/Qvec-report (Auto-Vectorizer Reporting Level) Enables reporting levels for automatic vectorization.                       
/sdl                                           Enables additional compiler security checks.                                
/volatile                                      Selects how the volatile keyword is interpreted.                            
/ZW                                            Produces an output file to run on the Windows Runtime.                      

2012 to 2013


Option          Purpose                                                                                                    
------          -------                                                                                                    
/cgthreads      Specifies number of cl.exe threads to use for optimization and code generation.                            
/FS             Forces writes to the program database (PDB) file to be serialized through MSPDBSRV.EXE.                    
/Gv             Uses the __vectorcall calling convention. (x86 and x64 only)                                               
/Gw             Enables whole-program global data optimization.                                                            
/Qsafe_fp_loads Uses integer move instructions for floating-point values and disables certain floating point load          
                optimizations.                                                                                             
/Zo             Generate enhanced debugging information for optimized code in non-debug builds.                            

2013 to 2015


Option                  Purpose                                                                     
------                  -------                                                                     
/guard:cf               Adds control flow guard security checks.                                    
/W0, /W1, /W2, /W3, /W4 Sets which warning level to output.                                         
/w1, /w2, /w3, /w4      Sets the warning level for the specified warning.                           
/wd                     Disables the specified warning.                                             
/we                     Treats the specified warning as an error.                                   
/wo                     Displays the specified warning only once.                                   
/Wv                     Displays no warnings introduced after the specified version of the compiler.
/WX                     Treats all warnings as errors.                                              

PowerShell to Generate these:

$2003 = $($wr = Invoke-WebRequest (Get-Clipboard); Get-WebRequestTable.ps1 -WebRequest $wr -TableNumber 0)
$2005 = $($wr = Invoke-WebRequest (Get-Clipboard); Get-WebRequestTable.ps1 -WebRequest $wr -TableNumber 0)
$2008 = $($wr = Invoke-WebRequest (Get-Clipboard); Get-WebRequestTable.ps1 -WebRequest $wr -TableNumber 0)
$2010 = $($wr = Invoke-WebRequest (Get-Clipboard); Get-WebRequestTable.ps1 -WebRequest $wr -TableNumber 0)
$2012 = $($wr = Invoke-WebRequest (Get-Clipboard); Get-WebRequestTable.ps1 -WebRequest $wr -TableNumber 0)
$2013 = $($wr = Invoke-WebRequest (Get-Clipboard); Get-WebRequestTable.ps1 -WebRequest $wr -TableNumber 0)
$2015 = $($wr = Invoke-WebRequest (Get-Clipboard); Get-WebRequestTable.ps1 -WebRequest $wr -TableNumber 0)
Compare-Object $2013 $2015 -Property { $_.Option -replace '\s','' } -PassThru | ? SideIndicator -eq '=>' | Format-Table Option,Purpose -Wrap | clip
Compare-Object $2003 $2005 -Property { $_.Option -replace '\s','' } -PassThru | ? SideIndicator -eq '=>' | Format-Table Option,Purpose -Wrap | clip
Compare-Object $2005 $2008 -Property { $_.Option -replace '\s','' } -PassThru | ? SideIndicator -eq '=>' | Format-Table Option,Purpose -Wrap | clip
Compare-Object $2008 $2010 -Property { $_.Option -replace '\s','' } -PassThru | ? SideIndicator -eq '=>' | Format-Table Option,Purpose -Wrap | clip
Compare-Object $2010 $2012 -Property { $_.Option -replace '\s','' } -PassThru | ? SideIndicator -eq '=>' | Format-Table Option,Purpose -Wrap | clip
Compare-Object $2012 $2013 -Property { $_.Option -replace '\s','' } -PassThru | ? SideIndicator -eq '=>' | Format-Table Option,Purpose -Wrap | clip
Compare-Object $2013 $2015 -Property { $_.Option -replace '\s','' } -PassThru | ? SideIndicator -eq '=>' | Format-Table Option,Purpose -Wrap | clip

TimeJournal: Time Profiling for Humans

Time Journal helps you analyze where you spend your time by infrequently asking the simple question: “What are you doing?

[Download here: TimeJournal.zip]

 

 

How it Works

Time Journal follows the same principles as a traditional software sampling profiler, but instead samples humans. By randomly recording your current task, Time Journal lets you analyze your answers as a faithful proxy for how you actually spent your time. If 20% of your randomly sampled answers were “Status Meeting,” then you spent close to 20% of your time in status meetings.

An alternative to the sampling approach is an instrumentation approach: faithfully recording your transition between tasks. Time Journal avoids this design, since asking humans to faithfully record transitions between tasks is enormously error-prone. For example, you might not log a task transition for a task that you consider inconsequential (for example, “Checking email”,) when in fact that task may account for a significant portion of your day. Some software attempts to address the human element by tracking window titles, but the level of data captured by window titles often does not map well to the task they support.

Installing Time Journal

  • Extract TimeJournal.exe to a place on your computer (i.e.: a Tools folder)
  • Start | Run | shell:startup
  • Create a shortcut to TimeJournal.exe in that Startup folder

Using Time Journal

Time Journal runs as a background application. Every once in awhile (randomly between 5 and 25 minutes,) it asks you the question, “What are you doing?” It stores your previous answers in a list until you exit the program, which lets you easily re-use your answers to previous questions.

When you press OK, it adds your answer (along with the current window title) to a date-appropriate CSV in your “My Documents\TimeJournal” folder. If you don’t answer within 4 minutes, it dismisses the dialog and records nothing. This lets you keep TimeJournal running when you go home for the day without polluting your journal file.

In addition, when Time Journal auto-dismisses the dialog, it checks Outlook to see if you are in a meeting. If you are, it records the meeting title instead as your activity.

Slicing and Dicing

Time Journal records all output into a CSV in your “My Documents\TimeJournal” directory. For example, use the included Get-TimeJournal PowerShell script to easily see the breakdown of your time:

PS> Get-TimeJournal.ps1

Count Name
----- ----
    2 Bug 127272: System bluescreens when I move the mouse
    1 MEETING: Chat about alignment
    1 TimeJournal fixes
    1 Hubble Space Telescope programming
    1 SCRUM meeting
    1 Security reviews
    1 Chat

Setting Visual Studio Code to Auto-Update in the Background

Visual Studio Code has a built-in feature to check for and install updates, but I've always been frustrated by having to acknowledge the update, allow the browser to restart, watch an installer, and then get back to what I was about to do anyways (which is edit some text).

As a solution, here's a quick little PowerShell script to run. It will create a background task to run every night at 3:14 AM and update VS Code for you automatically if one is available. If you tend to leave your editor open without saving files, you might want to enable the VS Code settings of files.AutoSave, and files.hotExit.

When VS Code comes back up after an update, you'll have to reopen any files that were previously open. Once VS Code gets more fully-featured crash recovery functionality, that annoyance will go away.

And the PowerShell:

001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
Register-ScheduledJob -Name VSCode_Updater -Trigger (New-JobTrigger -Daily -At 3:14) -ScheduledJobOption @{ RunElevated = $true } -ScriptBlock {
    $tags = Invoke-RestMethod https://api.github.com/repos/Microsoft/vscode/tags
    $latest = $tags | Foreach-Object { try { [version] $_.Name } catch { } } | Sort-Object -Desc | Select-Object -First 1
    $current = [Version] ((Get-Content 'C:\Program Files (x86)\Microsoft VS Code\resources\app\package.json' | ConvertFrom-Json).Version)
    if($latest -gt $current)
    {
        $codeWasRunning = @(Get-Process -name Code -ErrorAction Ignore).Count -gt 0
        if($codeWasRunning)
        {
            Stop-Process -name Code
        }
    
        Invoke-WebRequest https://vscode-update.azurewebsites.net/latest/win32/stable -OutFile $env:TEMP/vscode_latest.exe
        & $env:TEMP/vscode_latest.exe /verysilent
        if(-not $codeWasRunning)
        {
            Stop-Process -Name Code
        }
    }
}
$task = Get-ScheduledTask -TaskName VSCode_Updater
$task.Principal.LogonType = "Interactive"
$task | Set-ScheduledTask

Interactive Rosetta Stone Explorer

In 1799, Napoleon's explorers discovered a 4-foot tall, 700 lb stone slab in Rosetta (Rashid), Egypt. It was carved with three sections of writing. Two were well known to archaeologists: Demotic script, and ancient Greek (Coptic). The most interesting part - the hieroglyphic symbols - were also very well known to the scientific community. But they were also a great mystery. All attempts to decipher hieroglyphic text had so far been unsuccessful.

 

The discovery of this stone, which we now call the Rosetta Stone, began to unravel the great mystery of Egyptian Hieroglyphs. Nearly immediately, scholars realized that these three sections of writing represented three translations of the exact same passage: a relatively uneventful proclamation of policy and tax changes under King Ptolemy V. Using the more well-known Coptic and Demotic sections as guides, they initially translated phonetic portions of the hieroglyphs. Then, over the next 20 years, they continued to use the Greek and Demotic sections to gradually expand their understanding and translation of the hieroglyphic inscription.

When you look at the Rosetta Stone, it's easy to wonder what a specific section means. To this end, I've created the Interactive Rosetta Stone Explorer. The translation of the hieroglyphic text comes from Sharpe, Samuel. (1871). The Rosetta Stone in Hieroglyphics and Greek. London, with the hieroglyphic transcription of the characters coming primarily from Jim Loy.

interactive_rosetta_stone

Enjoy!

Downloading Plain-Text Wikipedia

If you've ever been interested in having all of Wikipedia in a plain-text format, you might have been disappointed to learn that Wikipedia doesn't actually make this format available.

Fortunately, they do offer an XML version of the entire database, so I've written a PowerShell script to convert that XML dump into individual plain-text articles. The script tries to remove as much of Wikipedia's additional markup as possible, and skips inconsequential articles.

This script demonstrates a unique way of processing XML in PowerShell that you rarely see - because it is rarely needed. In XML form, the Wikipedia database is nearly 60GB. This is FAR too large for PowerShell's [xml] cast, due to the memory overhead required for the XmlDocument format on which the [xml] cast is built. It's also far too large for most systems to even hold in memory at once. Instead, this script takes a streaming approach built on System.Xml.XmlReader. The XmlReader class lets you handle tags and elements as the reader sees them, rather than forcing you to wait for that final ill-fated </mediawiki> closing tag while everything buffers in memory.

  1. Install the 'Split-Wikipedia' helper script from a PowerShell prompt:
    1. Install-Script Split-Wikipedia -Scope CurrentUser
    2. The Install-Script command requires Windows 10 or install PowerShell 5.0.
    3. If this is the first time you've used Install-Script, exit PowerShell and launch it again.
  2. Use PowerShell to navigate to a directory that you want to contain your Wikipedia articles
    1. mkdir ~/Documents/Wikipedia
    2. Set-Location ~/Documents/Wikipedia
  3. Download the latest English Wikipedia database (~ 13GB)
    1. Invoke-WebRequest https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 -Outfile enwiki-latest-pages-articles.xml.bz2
  4. Decompress the XML, using bzip2 (or another tool like 7zip if you wish):
    1. bzip2 -d enwiki-latest-pages-articles.xml.bz2
  5. Process the XML (~ 58GB). This will take about 7 hours:
    1. Split-Wikipedia -Path ./enwiki-latest-pages-articles.xml
  6. (Optional) Delete the source XML
    1. Remove-Item ./enwiki-latest-pages-articles.xml

All 4 million articles are now in your 'Wikipedia\Articles' directory. Within this directory, they are again split into subdirectories of 5,000 articles each - as most software (i.e.: File | Open dialogs, browsing in Explorer) doesn't handle single directories with 4 million items very well.

Enjoy!

More Detecting Obfuscated PowerShell

In a recent post, we talked a little bit about detecting obfuscated PowerShell through the use of PowerShell's tokenizer - tackling, as an example, the highly irregular variable names generated by MetaSploit's PowerShell encoder.

Obfuscation has been around as long as computer programs have, so the rise of obfuscated PowerShell scripts shouldn't be much of a surprise. Obfuscated VBScript, Perl, Ruby, Python, and of course assembly language are very common.

A great example comes from this blog: http://perl-users.jp/articles/advent-calendar/2010/sym/11, which relies heavily on dynamic evaluation through Invoke-Expression:

Symbolic PowerShell!

Enabling ScriptBlock logging in PowerShell v5 is an incredibly effective way to gain insight into this style of technique:

scriptblock_logging_obfuscation

Obfuscation through Invoke-Expression or basic things like variable names are one thing, but what happens when people really start obfuscating the mechanics of the scripts themselves? At DerbyCon this year, Daniel Bohannon (noted Red Teamer) recently gave a really great presentation: "Invoke-Obfuscation: PowerShell obFUsk8tion Techniques & How To (Try To) D""e`Tec`T 'Th'+'em' ".

In that presentation, he dropped gems like this that rely heavily on the Format operator:

Token Obfuscation

That obfuscation doesn't rely on the Invoke-Expression command, so would show up basically the same in script block logging. Along with the crazy Invoke-Expression stuff, these examples demonstrate, without a doubt, that relying on string matching alone to detect evil is a fool's errand.

But here's the thing.

Obfuscated scripts aren't normal. Anybody looking at an obfuscated script knows that they're not normal. They stick out like a sore thumb. This alone can be used as an incredibly rich signal that somebody is trying to avoid getting caught, in the same way that all self-respecting SOCs look for the events that come from an attacker turning off Antivirus.

In the previous post, we looked at the letter frequency of variable names, but what if we expanded on that approach to look at all characters in a script? The Invoke-Expression based script above relied entirely on 16 characters. The script that relied on PowerShell's Format operator relied heavily on quoting and brace characters.

But how do you know what's abnormal across "all PowerShell scripts"?

On the PowerShell team, one thing we often use for questions like this is a corpus that we created by downloading everything we could get our grubby little hands on. So, let's take a look at character frequencies using Measure-CharacterFrequency.

PS C:\PowerShellCorpus\PoshCode> $globalFrequency = Measure-CharacterFrequency *.ps1
PS C:\PowerShellCorpus\PoshCode> $globalFrequency | Select -First 20

Name Percent
---- -------
E      9.912
T      7.414
A      5.512
R       5.43
S      5.303
I      5.041
N      5.025
O      4.944
L      3.509
M        3.3
C      3.191
$      3.076
P      2.914
D      2.753
U       2.29
-      1.955
.      1.917
"      1.822
F      1.626
G      1.526

Now, compare that to some of these other ones:

## The Token-based obfuscation that relies on the Format operator
PS > Measure-CharacterFrequency C:\temp\tokenall.ps1 | Select -First 10

Name Percent
---- -------
'     20.175
{      7.456
}      7.456
,      5.702
E      3.947
T      3.509
N      3.509
"      3.509
(       3.07
)       3.07

## The one that relies on Invoke-Expression
PS > Measure-CharacterFrequency C:\temp\symbolic.ps1 | Select -First 10

Name Percent
---- -------
$     21.808
{     21.659
}     21.659
+     13.313
"      7.452
=      2.832
[      2.086
(      1.689
;       1.54
)      1.341

 

The difference is huge, and unmistakable. But how do we compare these sets in a robust and reliable way? We steal from the field of Information Retrieval, that's how!

The field of Information Retrieval has long used a technique called vector similarity / cosine similarity to compare two sets of things. For example, the similarity of two documents, or how closely a search matches a document.

Cosine Similarity

That's a lot of complicated math-looking symbols - but it turns out it's very easy to calculate. Here's an example, using Measure-VectorSimilarity.

PS > Measure-VectorSimilarity @(1..10) @(4..15)
0.639

 

So, let's automate a vector similarity comparison. Take random selection of scripts from PoshCode, dump in some obfuscated ones, and see if anything sticks out based on the vector similarity score:

[C:\PowerShellCorpus\PoshCode]
PS > md c:\temp\randomscripts
PS > dir | Get-Random -Count 20 | Copy-Item -Destination C:\temp\randomscripts
PS > copy C:\temp\symbolic.ps1 C:\temp\randomscripts
PS > copy C:\temp\tokenall.ps1 C:\temp\randomscripts
PS > dir C:\temp\randomscripts\ | % {
>>>     $scriptFrequency = $_ | Measure-CharacterFrequency.ps1
>>>     $sim = Measure-VectorSimilarity $globalFrequency $scriptFrequency 
>>>             -KeyProperty Name -ValueProperty Percent
>>>     [PSCustomObject] @{ Name = $_.Name; Similarity = $sim }
>>> }

Name                                     Similarity
----                                     ----------
43a28a15-5023-4feb-a71f-abe95aa0f2a6.ps1      0.957
Export-PSCredential_4.ps1                     0.979
Get-BogonList_1.ps1                           0.925
Get-Netstat _1.9.ps1                           0.89
Get-Parameter_8.ps1                           0.959
group-byobject_4.ps1                          0.939
IADsDNWithBinary Cmdlet_1.ps1                 0.924
Import-ExcelToSQL_2.ps1                       0.961
Invoke-Sql_2.ps1                              0.979
List AddRemovePrograms.ps1                    0.961
Lock-WorkStation.ps1                          0.905
Monitor-FileSize_1.ps1                        0.974
symbolic.ps1                                  0.157
Reverse filename sequenc.ps1                  0.874
scriptable telnet client_2.ps1                0.967
Set Active Sync DeviceID.ps1                  0.955
SharePoint Large Lists_1.ps1                  0.944
Show-Sample_1.ps1                             0.919
Start-Verify.ps1                              0.923
tokenall.ps1                                  0.379

 

In fact, if you graph the similarity scores of the nearly 3500 scripts from PoshCode, only 2% of them have a similarity score less than 80%. And almost all of them are legitimately obfuscated for fun.

similarity_graph

The difference is unmistakable. If you're currently trying to detect malicious script-based content, be sure to also look for indicators of script obfuscation. Reliable techniques exist, and you're likely running blind without them.