Archives for the ‘Uncategorized’ Category

Searching for Content in Base-64 Strings

You might have run into situations in the past where you’re looking for some specific text or binary sequence, but that content is encoded with Base-64. Base-64 is an incredibly common encoding format in malware, and all kinds of binary obfuscation tools alike.

The basic idea behind Base-64 is that it takes arbitrary binary data and encodes it into 64 (naturally) ASCII characters that can be transmitted safely over any normal transmission channel. Wikipedia goes into the full details here:

Some tooling supports decoding of Base-64 automatically, but that requires some pretty detailed knowledge of where the Base-64 starts and stops.

The Problem

Pretend you’re looking for the string, “Hello World” in a log file or SIEM system, but you know that it’s been Base-64 encoded. You might use PowerShell’s handy Base-64 classes to tell you what to search for:


That seems useful. But what if “Hello World” is in the middle of a longer string? Can you still use ‘SGVobG8gV29fbGQ=’? It turns out, no. Adding a single character to the beginning changes almost everything:


Now, we’ve got ‘IEhlbGxvIFdvcmxk’.

The main problem here is the way that Base-64 works. When we’re encoding characters, Base-64 takes 3 characters (24 bits) and re-interprets them as 4 segments of 6 bits each. It then encodes each of those 6 bits into the 64 characters that you know and love. Here’s a graphical example from Wikipedia:


So when we add a character to the beginning, we shift the whole bit pattern to the right and change the encoding of everything that follows!

Another feature of Base-64 is padding. If your content isn’t evenly divisible by 24 bits, Base-64 encoding will pad the remainder with null bytes. It will use the “=” character to denote how many extra padding blocks were used:


When final padding is added, you can’t just remove those "=” characters. If additional content is added to the end of your string (i.e.: “Hello World!”), that additional content will influence both the padding bytes, as well as the character before them.

Another major challenge is when the content is Unicode rather than ASCII. All of these points still apply – but the bit patterns change. Unicode usually represents characters as two bytes (16 bits). This is why the Base-64 encoding of Unicode content representing ASCII text has so many of the ‘A’ character: that is the Base-64 representation of a NULL byte.


The Solution

When you need to search for content that’s been Base-64 encoded, then, the solution is to generate the text at all possible three-byte offsets, and remove the characters that might be influenced by the context: content either preceding what you are looking for, or the content that follows. Additionally, you should do this for both the ASCII as well as Unicode representations of the string.

An Example

One example of Base-64 content is in PowerShell’s –EncodedCommand parameter. This shows up in Windows Event Logs if you have command-line logging enabled (and of course shows up directly if you have PowerShell logging enabled).

Here’s an example of an event log like that:


Here’s an example of launching a bunch of PowerShell instances with the –EncodedCommand parameter, as well as the magical Get-Base64RegularExpression command. That command will generate a regular expression that you can use to match against that content:


As you can see in this example, searching for the Base-64 content of “My voice is my” returned all four log entries, while the “My voice is my passport” search returned the single event log that contained the whole expression.

The Script

Get-Base64RegularExpression is a pretty simple script. You can use this in PowerShell, or any event log system that supports basic regular expression searches.


## Get-Base64RegularExpression.ps1
## Get a regular expression that can be used to search for content that has been
## Base-64 encoded

    ## The value that we would like to search for in Base64 encoded content

## Holds the various byte representations of what we're searching for
$byteRepresentations = @()

## If we got a string, look for the Unicode and ASCII representations of the string
if($Value -is [String])
    $byteRepresentations += 

## If it was a byte array directly, look for the byte representations
if($Value -is [byte[]])
    $byteRepresentations += ,$Value

## Find the safe searchable sequences for each Base64 representation of input bytes
$base64sequences = foreach($bytes in $byteRepresentations)
    ## Offset 0. Sits on a 3-byte boundary so we can trust the leading characters.
    $offset0 = [Convert]::ToBase64String($bytes)

    ## Offset 1. Has one byte from preceeding content, so we need to throw away the
    ## first 2 leading characters
    $offset1 = [Convert]::ToBase64String( (New-Object 'Byte[]' 1) + $bytes ).Substring(2)

    ## Offset 2. Has two bytes from preceeding content, so we need to throw away the
    ## first 4 leading characters
    $offset2 = [Convert]::ToBase64String( (New-Object 'Byte[]' 2) + $bytes ).Substring(4)

    ## If there is any terminating padding, we must remove the characters mixed with that padding. That
    ## ends up being the number of equals signs, plus one.
    $base64matches = $offset0,$offset1,$offset2 | % {
        if($_ -match '(=+)$')
            $_.Substring(0, $_.Length - ($matches[0].Length + 1))

    $base64matches | ? { $_ }

## Output a regular expression for these sequences
"(" + (($base64sequences | Sort-Object -Unique) -join "|") + ")"

TripleAgent: Even Zeroer-Tay Code Injection and Persistence Technique


We'd like to introduce a new Zero-Tay technique for injecting code and maintaining persistency against common advanced attacker toolkits dubbed TripleAgent. We discovered this by ourselves in our very advanced labs, and are in the process of registering a new vanity domain as we speak.

TripleAgent can exploit:

  • Every toolkit version
  • Every toolkit architecture (x86 and x64)
  • Every toolkit user (RED / PURPLE / APT / NATION STATE / etc.)
  • Every toolkit process (including PoC, GTFO, PoC||GTFO, METASPLOIT, UNICORN)

TripleAgent exploits a fundamental flaw in the design of commonly used advanced attacker toolkits, and therefore cannot be patched.

Code Injection

TripleAgent gives the defender the ability to inject any DLL into any attacker toolkit. The code injection occurs extremely early during the victim's process boot, giving the defender full control over the process and no way for the process to protect itself. The code injection technique is so unique that it's not detected or blocked by even the most advanced threaty threats.

Attack Vectors

  • Attacking persistence toolkits - Taking full control of ANY persistence toolkit by injecting code into it while bypassing all of its self-protection mechanisms. The attack has been verified and works on all bleeding-edge attacker toolkits including but not limited to: DoubleAgent.

Technical Deep, Deep, Deep, Dive

An example of an advanced attacker toolkit is known as DoubleAgent. This attacker toolkit exploits a fundamental issue in Windows, nay computing, NAY HUMANITY itself.

When this advanced toolkit runs, it is widely acknowledged to provide complete control over other unwitting applications. However, we can apply our new TripleAgent framework to this toolkit to completely neutralize it. Rather than have it infect target systems, we can write a few simple lines of code to make it instead launch the Windows Update settings dialog!

static BOOL main_DllMainProcessAttach(VOID)
    PROCESS_Create(L"c:\\windows\\system32\\cmd.exe"L"/c start ms-settings:windowsupdate");


    return TRUE;


Once run, we can see the significant impact of our new zero-tay technique. The first invocation installs our TripleAgent exploit, rendering the advanced "DoubleAgent" threat completely harmless during its second invocation.


Unfortunately, there are no mitigations or bypasses for this extremely advanced defensive technique. We do however offer highly-advanced next generation cyber threat intel cloud machine learning offensive services. Just putting that out there.

Adding a Let’s Encrypt Certificate to an Azure-Hosted Website

If you host your website in Azure, you might be interested in adding SSL support via Let's Encrypt. Azure doesn't offer any functionality to automate this or make it easy, but thankfully there are plenty of useful tools in the PowerShell community to make this easy.

  1. ACMESharp - A PowerShell module to interact with Let's Encrypt.
  2. Azure PowerShell - A set of PowerShell modules to interact with Azure.

What's been missing (until now!) is the glue. So now, here's the glue: Register-LetsEncryptCertificate.ps1.

So the steps:

  1. Install-Module AcmeSharp, Azure, AzureRM.Websites
  2. Install-Script Register-LetsEncryptCertificate.ps1
  3. Register-LetsEncryptCertificate -Domain -RegistrationEmail -ResourceGroup exampleResourceGroup -WebApp exampleWebApp
  4. Visit



Why is SeDebugPrivilege enabled in PowerShell?

We sometimes get the question: Why is the SeDebugPrivilege enabled by default in PowerShell?

This is enabled by .NET when PowerShell uses the System.Diagnostics.Process class in .NET, which it does for many reasons. One example is the Get-Process cmdlet. Another example is the method it invokes to get the current process PID for the $pid variable. Any .NET application that uses the System.Diagnostics.Process class also enables this privilege.


You can see the .NET code that enables this here:

            NativeMethods.LUID luid = default(NativeMethods.LUID);
if (!NativeMethods.LookupPrivilegeValue(null, "SeDebugPrivilege", out luid))
IntPtr zero = IntPtr.Zero;
if (NativeMethods.OpenProcessToken(new HandleRef(null, NativeMethods.GetCurrentProcess()), 32, out zero))
NativeMethods.TokenPrivileges tokenPrivileges = new NativeMethods.TokenPrivileges();
tokenPrivileges.PrivilegeCount = 1;
tokenPrivileges.Luid = luid;
tokenPrivileges.Attributes = 2;
NativeMethods.AdjustTokenPrivileges(new HandleRef(null, zero), false, tokenPrivileges, 0, IntPtr.Zero, IntPtr.Zero);


Detecting and Preventing PowerShell Downgrade Attacks

With the advent of PowerShell v5’s awesome new security features, old versions of PowerShell have all of the sudden become much more attractive for attackers and Red Teams.

PowerShell Downgrade Attacks

There are two ways to do this:

Command Line Version Parameter

The simplest technique is: “PowerShell –Version 2 –Command <…>” (or of course any of the –Version abbreviations).

PowerShell.exe itself is just a simple native application that hosts the CLR, and the –Version switch tells PowerShell which version of the PowerShell assemblies to load.

Unfortunately, the PowerShell v5 enhancements did NOT include time travel, so the v2 binaries that were shipped in 2008 did NOT include the code we wrote in 2014.  The 2.0 .NET Framework (which is required for PowerShell’s V2 engine) is not included by default in Win10+, but an attacker or Red Teamer could enable it or install it. Prior to Windows 10, where it is available by default, they could just use it.


Hosting Applications Compiled using V2 Reference Assemblies

When somebody compiles a C# application to leverage the PowerShell engine, they link against reference assemblies when they do that. If they link against the PowerShell v2 reference assemblies during development, Windows will use the PowerShell v2 engine (if available) when the application runs. Otherwise, PowerShell's type forwarding will run the application using the currently installed PowerShell engine.

This is what happens when PowerShell Empire's "psinject" module attempts to load PowerShell into another process (such as notepad).


Detection and Prevention

You have several options to detect and prevent PowerShell Downgrade Attacks.

Event Log

As a detection mechanism, the “Windows PowerShell” classic event log has event ID 400. This is the “Engine Lifecycle” event, and includes the Engine Version. Here is an example query to find lower versions of the PowerShell engine being loaded:

Get-WinEvent -LogName "Windows PowerShell" |
    Where-Object Id -eq 400 |
    Foreach-Object {
        $version = [Version] ($_.Message -replace '(?s).*EngineVersion=([\d\.]+)*.*','$1')
        if($version -lt ([Version] "5.0")) { $_ }


AppLocker / File Auditing

When the CLR loads PowerShell assemblies, it will first load the managed assemblies from the GAC (if they are available). It will also load the native images that contain pre-jitted code if the assemblies are NGEN’d (which they are). Here is what loading PowerShell v2 looks like:

These can either be an audit trigger, or can be blocked outright.

Be careful to not be too selective on the directories you monitor, as the CLR can also load assemblies from specific directories. For example, it is possible to use the CLR’s undocumented / unsupported DEVPATH environment variable to force the CLR to use a specified version of the assemblies rather than the GAC’d version. And if you don’t have a GAC’d version to override, PowerShell will do regular LoadLibrary() probing to find one – including its installation directory.

In addition, PowerShell can either be launched as a 32-bit process, or 64-bit process. A 64-bit system will load 64-bit PowerShell by default. A 32-bit system will load 32-bit PowerShell. On a 64-bit system, though, Windows will implicitly change the version of PowerShell that gets launched by looking at the bitness of the launching application: a 32-bit app will load other 32-bit apps. It is also possible for users or applications to do this explicitly by launching PowerShell from the WOW directory: c:\windows\syswow64\windowspowershell\v1.0\powershell.exe.

PS > dir *.dll -rec -ea ig | % FullName | ? { $_ -match 'System\.Management\.Automation\.(ni\.)?dll' }


If you’re going down the enforcement route via AppLocker or Device Guard path, the most robust solution is to block earlier versions of the PowerShell engine by version. Be sure to block both the native image and MSIL assemblies:

C:\Users\leeholm>powershell -version 2 -noprofile -command "(Get-Item ([PSObject].Assembly.Location)).VersionInfo"

ProductVersion   FileVersion      FileName
--------------   -----------      --------
6.1.7600.16385   6.1.7600.16385   C:\WINDOWS\assembly\GAC_MSIL\System.Management.Automation\

C:\Users\leeholm>powershell -noprofile -command "(Get-Item ([PSObject].Assembly.Location)).VersionInfo"

ProductVersion   FileVersion      FileName
--------------   -----------      --------
10.0.14986.1000  10.0.14986.1000  C:\WINDOWS\Microsoft.Net\assembly\GAC_MSIL\System.Management.Automation\v4.0_3.0.0...

C:\Users\leeholm>powershell -version 2 -noprofile -command "(Get-Item (Get-Process -id $pid -mo | ? { $_.FileName -match '' } | % { $_.FileName })).VersionInfo"

ProductVersion   FileVersion      FileName
--------------   -----------      --------
6.1.7600.16385   6.1.7600.16385   C:\WINDOWS\assembly\NativeImages_v2.0.50727_64\System.Management.A#\8b1355a0339430...

C:\Users\leeholm>powershell -noprofile -command "(Get-Item (Get-Process -id $pid -mo | ? { $_.FileName -match '' } | % { $_.FileName })).VersionInfo"

ProductVersion   FileVersion      FileName
--------------   -----------      --------
10.0.14986.1000  10.0.14986.1000  C:\WINDOWS\assembly\NativeImages_v4.0.30319_64\System.Manaa57fc8cc#\4072bc1c91e324...

Differences between Visual Studio 2003, 2005, 2008, 2010, 2012, 2013, and 2015

If you're interested in knowing when specific Visual Studio compiler options have been introduced, here you go.

2003 to 2005

Option                 Purpose                                                                                             
------                 -------                                                                                             
/analyze               Enable code analysis.                                                                               
/bigobj                Increases the number of addressable sections in an .obj file.                                       
/doc                   Process documentation comments to an XML file.                                                      
/errorReport           Allows you to provide internal compiler error (ICE) information directly to the Visual C++ team.    
/favor                 Produces code that is optimized for a specific x64 architecture or for the specifics of             
                       micro-architectures in both the AMD64 and Extended Memory 64 Technology (EM64T) architectures.      
/FC                    Display full path of source code files passed to cl.exe in diagnostic text.                         
/Fp                    Specifies a precompiled header file name.                                                           
/G1                    Optimize for Itanium processor. Only available in the IPF cross compiler or IPF native compiler.    
/G2                    Optimize for Itanium2 processor (default between /G1 and /G2). Only available in the IPF cross      
                       compiler or IPF native compiler.                                                                    
/GF                    Enables string pooling.                                                                             
/homeparams            Forces parameters passed in registers to be written to their locations on the stack upon function   
                       entry. This compiler option is only for the x64 compilers (native and cross compile).               
/hotpatch              Creates a hotpatchable image.                                                                       
/LN                    Creates an MSIL module.                                                                             
/openmp                Enables #pragma omp in source code.                                                                 
/QIPF_B                Does not generate sequences of instructions that give unexpected results, according to the errata   
                       for the B CPU stepping. (IPF only).                                                                 
/QIPF_C                Does not generate sequences of instructions that give unexpected results, according to the errata   
                       for the C CPU stepping. (IPF only).                                                                 
/QIPF_fr32             Do not use upper 96 floating-point registers. (IPF only).                                           
/QIPF_noPIC            Generates an image with position dependent code (IPF only).                                         
/QIPF_restrict_plabels Enhances performance for programs that do not create functions at runtime. (IPF only).              
/Zx                    Generates debuggable optimized code. Only available in the IPF cross compiler or IPF native         

2005 to 2008

Option                 Purpose                                                    
------                 -------                                                    
/MP                    Compiles multiple source files by using multiple processes.
/Qfast_transcendentals Generates fast transcendentals.                            
/Qimprecise_fwaits     Removes fwait commands inside try blocks.                  

2008 to 2010

Option Purpose                                
------ -------                                
/Fi    Sets the preprocessed output file name.

2010 to 2012

Option                                         Purpose                                                                     
------                                         -------                                                                     
/kernel                                        The compiler and linker will create a binary that can be executed in the    
                                               Windows kernel.                                                             
/Qpar (Auto-Parallelizer)                      Enables automatic parallelization of loops that are marked with the #pragma 
                                               loop() directive.                                                           
/Qvec-report (Auto-Vectorizer Reporting Level) Enables reporting levels for automatic vectorization.                       
/sdl                                           Enables additional compiler security checks.                                
/volatile                                      Selects how the volatile keyword is interpreted.                            
/ZW                                            Produces an output file to run on the Windows Runtime.                      

2012 to 2013

Option          Purpose                                                                                                    
------          -------                                                                                                    
/cgthreads      Specifies number of cl.exe threads to use for optimization and code generation.                            
/FS             Forces writes to the program database (PDB) file to be serialized through MSPDBSRV.EXE.                    
/Gv             Uses the __vectorcall calling convention. (x86 and x64 only)                                               
/Gw             Enables whole-program global data optimization.                                                            
/Qsafe_fp_loads Uses integer move instructions for floating-point values and disables certain floating point load          
/Zo             Generate enhanced debugging information for optimized code in non-debug builds.                            

2013 to 2015

Option                  Purpose                                                                     
------                  -------                                                                     
/guard:cf               Adds control flow guard security checks.                                    
/W0, /W1, /W2, /W3, /W4 Sets which warning level to output.                                         
/w1, /w2, /w3, /w4      Sets the warning level for the specified warning.                           
/wd                     Disables the specified warning.                                             
/we                     Treats the specified warning as an error.                                   
/wo                     Displays the specified warning only once.                                   
/Wv                     Displays no warnings introduced after the specified version of the compiler.
/WX                     Treats all warnings as errors.                                              

PowerShell to Generate these:

$2003 = $($wr = Invoke-WebRequest (Get-Clipboard); Get-WebRequestTable.ps1 -WebRequest $wr -TableNumber 0)
$2005 = $($wr = Invoke-WebRequest (Get-Clipboard); Get-WebRequestTable.ps1 -WebRequest $wr -TableNumber 0)
$2008 = $($wr = Invoke-WebRequest (Get-Clipboard); Get-WebRequestTable.ps1 -WebRequest $wr -TableNumber 0)
$2010 = $($wr = Invoke-WebRequest (Get-Clipboard); Get-WebRequestTable.ps1 -WebRequest $wr -TableNumber 0)
$2012 = $($wr = Invoke-WebRequest (Get-Clipboard); Get-WebRequestTable.ps1 -WebRequest $wr -TableNumber 0)
$2013 = $($wr = Invoke-WebRequest (Get-Clipboard); Get-WebRequestTable.ps1 -WebRequest $wr -TableNumber 0)
$2015 = $($wr = Invoke-WebRequest (Get-Clipboard); Get-WebRequestTable.ps1 -WebRequest $wr -TableNumber 0)
Compare-Object $2013 $2015 -Property { $_.Option -replace '\s','' } -PassThru | ? SideIndicator -eq '=>' | Format-Table Option,Purpose -Wrap | clip
Compare-Object $2003 $2005 -Property { $_.Option -replace '\s','' } -PassThru | ? SideIndicator -eq '=>' | Format-Table Option,Purpose -Wrap | clip
Compare-Object $2005 $2008 -Property { $_.Option -replace '\s','' } -PassThru | ? SideIndicator -eq '=>' | Format-Table Option,Purpose -Wrap | clip
Compare-Object $2008 $2010 -Property { $_.Option -replace '\s','' } -PassThru | ? SideIndicator -eq '=>' | Format-Table Option,Purpose -Wrap | clip
Compare-Object $2010 $2012 -Property { $_.Option -replace '\s','' } -PassThru | ? SideIndicator -eq '=>' | Format-Table Option,Purpose -Wrap | clip
Compare-Object $2012 $2013 -Property { $_.Option -replace '\s','' } -PassThru | ? SideIndicator -eq '=>' | Format-Table Option,Purpose -Wrap | clip
Compare-Object $2013 $2015 -Property { $_.Option -replace '\s','' } -PassThru | ? SideIndicator -eq '=>' | Format-Table Option,Purpose -Wrap | clip

TimeJournal: Time Profiling for Humans

Time Journal helps you analyze where you spend your time by infrequently asking the simple question: “What are you doing?

[Download here:]



How it Works

Time Journal follows the same principles as a traditional software sampling profiler, but instead samples humans. By randomly recording your current task, Time Journal lets you analyze your answers as a faithful proxy for how you actually spent your time. If 20% of your randomly sampled answers were “Status Meeting,” then you spent close to 20% of your time in status meetings.

An alternative to the sampling approach is an instrumentation approach: faithfully recording your transition between tasks. Time Journal avoids this design, since asking humans to faithfully record transitions between tasks is enormously error-prone. For example, you might not log a task transition for a task that you consider inconsequential (for example, “Checking email”,) when in fact that task may account for a significant portion of your day. Some software attempts to address the human element by tracking window titles, but the level of data captured by window titles often does not map well to the task they support.

Installing Time Journal

  • Extract TimeJournal.exe to a place on your computer (i.e.: a Tools folder)
  • Start | Run | shell:startup
  • Create a shortcut to TimeJournal.exe in that Startup folder

Using Time Journal

Time Journal runs as a background application. Every once in awhile (randomly between 5 and 25 minutes,) it asks you the question, “What are you doing?” It stores your previous answers in a list until you exit the program, which lets you easily re-use your answers to previous questions.

When you press OK, it adds your answer (along with the current window title) to a date-appropriate CSV in your “My Documents\TimeJournal” folder. If you don’t answer within 4 minutes, it dismisses the dialog and records nothing. This lets you keep TimeJournal running when you go home for the day without polluting your journal file.

In addition, when Time Journal auto-dismisses the dialog, it checks Outlook to see if you are in a meeting. If you are, it records the meeting title instead as your activity.

Slicing and Dicing

Time Journal records all output into a CSV in your “My Documents\TimeJournal” directory. For example, use the included Get-TimeJournal PowerShell script to easily see the breakdown of your time:

PS> Get-TimeJournal.ps1

Count Name
----- ----
    2 Bug 127272: System bluescreens when I move the mouse
    1 MEETING: Chat about alignment
    1 TimeJournal fixes
    1 Hubble Space Telescope programming
    1 SCRUM meeting
    1 Security reviews
    1 Chat

Setting Visual Studio Code to Auto-Update in the Background

Visual Studio Code has a built-in feature to check for and install updates, but I've always been frustrated by having to acknowledge the update, allow the browser to restart, watch an installer, and then get back to what I was about to do anyways (which is edit some text).

As a solution, here's a quick little PowerShell script to run. It will create a background task to run every night at 3:14 AM and update VS Code for you automatically if one is available. If you tend to leave your editor open without saving files, you might want to enable the VS Code settings of files.AutoSave, and files.hotExit.

When VS Code comes back up after an update, you'll have to reopen any files that were previously open. Once VS Code gets more fully-featured crash recovery functionality, that annoyance will go away.

And the PowerShell:

Register-ScheduledJob -Name VSCode_Updater -Trigger (New-JobTrigger -Daily -At 3:14) -ScheduledJobOption @{ RunElevated = $true } -ScriptBlock {
    $tags = Invoke-RestMethod
    $latest = $tags | Foreach-Object { try { [version] $_.Name } catch { } } | Sort-Object -Desc | Select-Object -First 1
    $current = [Version] ((Get-Content 'C:\Program Files (x86)\Microsoft VS Code\resources\app\package.json' | ConvertFrom-Json).Version)
    if($latest -gt $current)
        $codeWasRunning = @(Get-Process -name Code -ErrorAction Ignore).Count -gt 0
            Stop-Process -name Code
        Invoke-WebRequest -OutFile $env:TEMP/vscode_latest.exe
        & $env:TEMP/vscode_latest.exe /verysilent
        if(-not $codeWasRunning)
            Stop-Process -Name Code
$task = Get-ScheduledTask -TaskName VSCode_Updater
$task.Principal.LogonType = "Interactive"
$task | Set-ScheduledTask

Interactive Rosetta Stone Explorer

In 1799, Napoleon's explorers discovered a 4-foot tall, 700 lb stone slab in Rosetta (Rashid), Egypt. It was carved with three sections of writing. Two were well known to archaeologists: Demotic script, and ancient Greek (Coptic). The most interesting part - the hieroglyphic symbols - were also very well known to the scientific community. But they were also a great mystery. All attempts to decipher hieroglyphic text had so far been unsuccessful.


The discovery of this stone, which we now call the Rosetta Stone, began to unravel the great mystery of Egyptian Hieroglyphs. Nearly immediately, scholars realized that these three sections of writing represented three translations of the exact same passage: a relatively uneventful proclamation of policy and tax changes under King Ptolemy V. Using the more well-known Coptic and Demotic sections as guides, they initially translated phonetic portions of the hieroglyphs. Then, over the next 20 years, they continued to use the Greek and Demotic sections to gradually expand their understanding and translation of the hieroglyphic inscription.

When you look at the Rosetta Stone, it's easy to wonder what a specific section means. To this end, I've created the Interactive Rosetta Stone Explorer. The translation of the hieroglyphic text comes from Sharpe, Samuel. (1871). The Rosetta Stone in Hieroglyphics and Greek. London, with the hieroglyphic transcription of the characters coming primarily from Jim Loy.



Downloading Plain-Text Wikipedia

If you've ever been interested in having all of Wikipedia in a plain-text format, you might have been disappointed to learn that Wikipedia doesn't actually make this format available.

Fortunately, they do offer an XML version of the entire database, so I've written a PowerShell script to convert that XML dump into individual plain-text articles. The script tries to remove as much of Wikipedia's additional markup as possible, and skips inconsequential articles.

This script demonstrates a unique way of processing XML in PowerShell that you rarely see - because it is rarely needed. In XML form, the Wikipedia database is nearly 60GB. This is FAR too large for PowerShell's [xml] cast, due to the memory overhead required for the XmlDocument format on which the [xml] cast is built. It's also far too large for most systems to even hold in memory at once. Instead, this script takes a streaming approach built on System.Xml.XmlReader. The XmlReader class lets you handle tags and elements as the reader sees them, rather than forcing you to wait for that final ill-fated </mediawiki> closing tag while everything buffers in memory.

  1. Install the 'Split-Wikipedia' helper script from a PowerShell prompt:
    1. Install-Script Split-Wikipedia -Scope CurrentUser
    2. The Install-Script command requires Windows 10 or install PowerShell 5.0.
    3. If this is the first time you've used Install-Script, exit PowerShell and launch it again.
  2. Use PowerShell to navigate to a directory that you want to contain your Wikipedia articles
    1. mkdir ~/Documents/Wikipedia
    2. Set-Location ~/Documents/Wikipedia
  3. Download the latest English Wikipedia database (~ 13GB)
    1. Invoke-WebRequest -Outfile enwiki-latest-pages-articles.xml.bz2
  4. Decompress the XML, using bzip2 (or another tool like 7zip if you wish):
    1. bzip2 -d enwiki-latest-pages-articles.xml.bz2
  5. Process the XML (~ 58GB). This will take about 7 hours:
    1. Split-Wikipedia -Path ./enwiki-latest-pages-articles.xml
  6. (Optional) Delete the source XML
    1. Remove-Item ./enwiki-latest-pages-articles.xml

All 4 million articles are now in your 'Wikipedia\Articles' directory. Within this directory, they are again split into subdirectories of 5,000 articles each - as most software (i.e.: File | Open dialogs, browsing in Explorer) doesn't handle single directories with 4 million items very well.