WebSockets from Scratch

Tue, Aug 13, 2019 6-minute read

Background

In the web application world – especially single-page applications – smooth and fluid interaction is key. For many years, these applications have been doing a pretty good job of getting this fluid interaction though AJAX techniques and browser support for XMLHttpRequest. One issue, however, is that XMLHttpRequest requires that all of your communication go through an text-based HTTP protocol. Another issue is that XMLHttpRequest doesn’t let a server initiate communication back to connected clients. Instead, clients need to continuously poll the server to find out if it has anything to say.

To solve that need, browser vendors introduced the WebSocket protocol. The WebSocket protocol is a persistent, two-way TCP connection between a WebSocket client (traditionally a browser) and a server.

DrawOnMyBadge.com

You might wonder: “That’s all well and good, but why should I care?” That’s a good question with a simple answer: http://www.drawonmybadge.com Smile This is a really cool project designed by Tim McGuffin (@NotMedic). Whatever you draw on the website goes onto his badge – via a 64x32 LED board, and a bunch of more cool software.

image

Tim carried this around DEF CON this year, and people had lots of fun Smile

image

But can we have some _real_ fun? Smile

Digging into the Implementation

When you dig into the code a little bit, you can see how the browser does the communication. For every pixel you draw on the canvas, it converts it into a JSON object with a command called “DRAW”, and “DATA” that represents an X, Y, Color combination. Then it sends that string data to a connected WebSocket (exampleSocket.send(…)).

image

Partial Automation – Code Generation

The first step to having fun with Tim’s badge was through code generation. It is really easy in PowerShell and .NET to iterate through the pixels of an image (bitmap, PNG, etc.). Each pixel you access gives you its color. From there, I took a picture I manually resized to the correct dimensions and had PowerShell generate JavaScript code that I could copy + paste into the Developer Tools console.

image

And from there, the Mona Lisa made her first appearance on Tim’s badge Smile

image

That was fun, but copying + pasting code that PowerShell generated into a browser still felt a little hacky. Why don’t I just talk to the WebSocket directly from PowerShell? There are a few C# libraries for doing this out there, but I thought it would be a fun / interesting project to implement the protocol from scratch. So that’s what we’re going to do Smile

WebSockets from Scratch

The WebSocket protocol is defined by RFC6455, which goes through the protocol in great detail.

Initial Upgrade Request

For the first phase of the connection, the browser makes a standard TCP connection to the remote server. Here’s how to do that in PowerShell:

image

In that TCP connection, it makes a standard HTTP request, requesting an upgrade to the websocket protocol. Part of this HTTP request includes a Sec-WebSocket-Key header, which is intended to ensure that random HTTP requests can’t be retargeted to WebSocket servers, and that WebSocket client requests can’t be targeted to arbitrary other TCP servers. Here’s an example, which hard-codes the key for demonstration purposes.

image

Once the server accepts the connection upgrade, a well-written client will verify that the key included in the server’s response was correctly derived from the Sec-WebSocket-Key you provided. This is what a server response looks like:

image

Data Frames

Once the connection has been upgraded, client applications send frames of data. This isn’t the raw data itself – there is a structure around the data to describe it appropriately to the server accepting the connection.

image

To hold the frame data in PowerShell, we start with defining a byte array. The first 8 bits are:

  • FIN: 1 bit describing whether this is the final frame or not. In my experimentation with Chrome, this was always set to ‘1’.
  • RSV1, RSV2, and RSV3: 1 bit each that should always be zero unless you’re implementing a special protocol on top of WebSockets.
  • OPCODE: 4 bits. The one used by Chrome by default is “TEXT FRAME”, which has the value of 0001.

Putting these 8 bits together gives you a single byte with the bits of 10000001. So that’s how we start each frame:
image

The next 8 bits (the next byte) are:

  • MASK: Whether the data is “masked” by a random obfuscation key. This is recommended for the web browser developers themselves so that malicious web applications can’t cause arbitrary content to be written to the underlying TCP connection itself.
  • Payload Length: If the content is less than 126 bytes, this represents the payload length directly. Otherwise it needs to be the value ‘126’ with the next two bytes representing the actual payload length.

Since we are always supposed to mask the data, we define our mask as ‘1. However, this is supposed to be a bit at the leftmost position in the byte, so we need to shift it left by 7 places.

image

Since the mask and the payload length need to share the same byte (MASK being the leftmost bit and Payload Length being the rest), we use the –bor (Binary Or) bitwise operator to combine them into a single byte and then add that byte to the frame.

image

In the situation where the message length is greater than 126, though, the next two bytes need to also say how long the payload is. In C# and PowerShell, the way to get the bytes of a number is through the BitConverter.GetBytes() API. This API returns the bytes as they are represented by your processor. On most processors, this is Little Endian, where the least significant digits (think the 1s and 10s columns in decimal) come first.

image

The WebSocket protocol, however, requires that these bytes are in “Network Order” (the opposite of Little Endian), so we need to reverse them.

image

After the frame lengh, we need to provide the actual masking key. This is 4 bytes. As mentioned earlier, these are values used to obfuscate the content being transmitted to the server so that malicious web applications can’t communicate at the TCP level directly to the websocket server. The obfuscation algorithm is a simple XOR, where the four bytes in the masking key are used in a round-robin fashion against bytes in the content. Because this is was just for fun and the security protection is irrelevant, I provided a static masking key of “0” since anything XOR’d by 0 doesn’t change anything. That way, I didn’t have to implement the masking algorithm and the server wouldn’t notice Smile

image

Next, we get to add our actual content to the frame – in this case, a message representing the JSON of pixels that we want to draw.

image

Finally, we write this frame to the TCP connection and flush it

image

When all is said and done, pretty pictures show up on Tim’s badge Smile

When I sent Rick Astley, Defender ATP alerted Microsoft IT that one of its computers had PowerShell making raw TCP connections to a newly-registered domain. Amazing!

image

I tried to WannaCry Tim’s badge, and he still hasn’t paid Sad smile

image

And with true raw WebSockets control, I was able to give Tim’s badge a demoscene Smile Smile
draw_badge_box

So that’s how to go about WebSockets from Scratch. Hope this helps!