HackerChai

Some musings on tech. Mostly pwn.

Analysis of CVE-2021-35211 (Part 1)

16 September 2021

Introduction

On around 13 Jul, I chanced upon this article warning users of Solarwinds Serv-U against a pre-auth SSH RCE bug being exploited in the wild. This was pretty interesting to me, as I didn’t think SSH RCE was still possible in a year like 2021. What followed was a 2 month long on and off exploration of the Serv-U SSH codebase. In the end, the original team at Microsoft released a (more authoritative) writeup a few days before I figured out the bug. Nonetheless, the following two-part series would hopefully present a different perspective on the vulnerability analysis process. Part 1, or this post, will go through my process behind identifying and triggering the vulnerability, while part 2 will demonstrate how I made use of the vulnerability to gain RCE. Do note that all this research were done on the Windows version of Serv-U.

What is Serv-U?

Serv-U

Solarwinds Serv-U is a file-sharing server for both Windows and Linux. One of its features is to allow file transfers via SFTP, a file transfer protocol built on top of the SSH protocol. CVE-2021-35211 is a memory corruption vulnerability in the SFTP component of Serv-U which can take place even before a successful SSH login, and affects Serv-U 15.2.3 HotFix 1 on both OSes.

SSH Primer

To exploit an SSH server, it is imperative to understand how the SSH protocol works. This post here does a wonderful technical explanation on how the server and client establish an encrypted communication channel using what is called an SSH handshake. The following graphic shows a simplified version of how the handshake works:

Figure 1. Simplified SSH Handshake

Initially, all packets are sent unencrypted. After exchanging banners, the server and client exchange SSH2_MSG_KEXINIT packets and both parties would decide on what key exchange and symmetric encryption algorithms to use based on a fixed set of rules. The client then sends an SSH2_MSG_KEXDH_INIT packet and the server responds with an SSH2_MSG_KEXDH_REPLY packet to facilitate key exchange. Finally, with both parties sharing a common secret key, the client sends SSH2_MSG_NEWKEYS, which causes both parties to start sending encrypted packets to each other.

Initial analysis

The initial Microsoft article provided me with little to go with; I knew Hotfix 2 patched the bug, and what error Serv-U will log when the exploit fails, as listed below:

EXCEPTION: C0000005; CSUSSHSocket::ProcessReceive(); Type: 30; puchPayLoad = 0x03e909f6; nPacketLength = 76; nBytesReceived = 80; nBytesUncompressed = 156; uchPaddingLength = 5

With a little bit of work, I managed to get my hands on both Hotfix 1 and 2 packages. Inside, the main file of interest would be Serv-U.dll, which does the majority of the SSH handling. For my analysis, I would be using the dll from Hotfix 1.

I was quickly able to locate the (16-bit) error string mentioned in the article and associate it with the ProcessReceive function at 0x180144E90.

Figure 2. Error string in exception handler

From a cursory analysis of the case switch statement inside ProcessReceive, I was able to deduce that the case switch code read SSH packets and dispatched them to their respective handling functions based on type message (SSH2_MSG_KEXINIT etc.).

Figure 3. Case switch in ProcessReceive

For this vulnerability, the most important functions would be packet_20_1801445D0, packet_30_180144420 and packet_21_180144870, which correspond to SSH2_MSG_KEXINIT, SSH2_MSG_KEXDH_INIT and SSH2_MSG_NEWKEYS.

Annotating the code

Without symbols, the code was pretty much unreadable, with unknown functions calling more unknown functions. My big break came when I realised that a bunch of functions seemed to share a similar pattern as follows:

__int64 __fastcall sub_18015FF40(__int64 a1)
{
  __int64 result; // rax

  if ( (*(_BYTE *)(a1 + 8) & 1) != 0 )
  {
    result = *(_QWORD *)(a1 + 2232);
    if ( result )
      result = ((__int64 (__fastcall *)())result)();
  }
  return result;
}

Each function dereferenced a different offset of its first argument and then uses it as a function pointer to call, and this pattern looked awfully like the usage of a virtual table. My theory was confirmed after I found sub_180160D34. Here is an small portion of the function:

v1 = 0;
  if ( (*(_BYTE *)(a1 + 8) & 1) != 0 )
    return 1;
  v4 = LoadLibraryW(L"libeay32.DLL");
  *(_QWORD *)(a1 + 16) = v4;
  if ( !v4 )
    goto LABEL_331;
  BIO_new_fp = GetProcAddress(v4, "BIO_new_fp");
  *(_QWORD *)(a1 + 520) = BIO_new_fp;
  if ( !BIO_new_fp )
    return 0;
  BIO_free = GetProcAddress(*(HMODULE *)(a1 + 16), "BIO_free");
  *(_QWORD *)(a1 + 528) = BIO_free;
  if ( !BIO_free )
    return 0;
  BIO_new_socket = GetProcAddress(*(HMODULE *)(a1 + 16), "BIO_new_socket");
  *(_QWORD *)(a1 + 536) = BIO_new_socket;
  if ( !BIO_new_socket )
    return 0;
  BIO_s_mem = GetProcAddress(*(HMODULE *)(a1 + 16), "BIO_s_mem");
  *(_QWORD *)(a1 + 544) = BIO_s_mem;
  if ( !BIO_s_mem )
    return 0;
  BIO_s_file = GetProcAddress(*(HMODULE *)(a1 + 16), "BIO_s_file");
  *(_QWORD *)(a1 + 552) = BIO_s_file;

As can be seen, the function looks up OpenSSL functions from libeay32.dll and use them to populate the virtual table at a1. With this understanding, I defined the structure for a1 and wrote an IDAPython script to find all the wrapper functions with the same pattern as sub_18015FF40 and rename them based on their API call. Here is what the code looks like after:

v1 = 0;
  if ( (a1->pad[8] & 1) != 0 )
    return 1;
  v4 = LoadLibraryW(L"libeay32.DLL");
  a1->libeay32 = v4;
  if ( !v4 )
    goto LABEL_331;
  BIO_new_fp = GetProcAddress(v4, "BIO_new_fp");
  a1->BIO_new_fp = BIO_new_fp;
  if ( !BIO_new_fp )
    return 0;
  BIO_free = GetProcAddress(a1->libeay32, "BIO_free");
  a1->BIO_free = BIO_free;
  if ( !BIO_free )
    return 0;
  BIO_new_socket = GetProcAddress(a1->libeay32, "BIO_new_socket");
  a1->BIO_new_socket = BIO_new_socket;
  if ( !BIO_new_socket )
    return 0;
  BIO_s_mem = GetProcAddress(a1->libeay32, "BIO_s_mem");
  a1->BIO_s_mem = BIO_s_mem;
  if ( !BIO_s_mem )
    return 0;
  BIO_s_file = GetProcAddress(a1->libeay32, "BIO_s_file");
  a1->BIO_s_file = BIO_s_file;

The same function after renaming:

__int64 __fastcall DH_free_18015FF40(struct COpenSSL *a1)
{
  __int64 result; // rax

  if ( (a1->pad[8] & 1) != 0 )
  {
    result = (__int64)a1->DH_free;
    if ( result )
      result = ((__int64 (__fastcall *)())result)();
  }
  return result;
}

Now, with my own makeshift “symbols”, understanding the code became a lot easier.

Key Exchange Init (SSH2_MSG_KEXINIT)

When parsing a packet SSH2_MSG_KEXINIT, we eventually reach a function I called parse_1801403C0.

Figure 4. Link between packet_20_1801445D0 and parse_1801403C0

It makes 6 functions calls sequentially, with 2 of them to prep_symm_enc_18013F840:

Figure 5. Inside parse_1801403C0

If we look into prep_symm_enc_18013F840, we notice that it involved strings such as “aes128-cbc” and made use of OpenSSL functions such as EVP_aes_128_cbc and EVP_CipherInit_ex (see Figure 6). This pattern of OpenSSL API usage is used for symmetric encryption/decryption, and an example can be found here. It is therefore logical to assume that the calls to prep_symm_enc_18013F840 were to implement the symmetric encryption portion of the SSH handshake.

Figure 6. Inside prep_symm_enc_18013F840

Since there was a pair of calls to the prep_symm_enc_18013F840, I deduced that the calls were responsible for creating OpenSSL EVP contexts for the client-to-server (a1 + 1112) and server-to-client (a1 + 1120) encrypted communication channels respectively (Note: client-to-server and server-to-client comms can use different encryption algorithms).

Recall that a key exchange has not taken place yet when the server and client are exchanging SSH2_MSG_KEXINIT packets. As such, the server does not possess shared keys and IVs with the client and can only call EVP_EncryptInit_ex with most fields as null, which is valid as stated here:

Figure 7. Documentation for EVP_EncryptInit_ex

Actual Key Exchange (SSH2_MSG_KEXDH_INIT)

Moving on to the parsing of SSH2_MSG_KEXDH_INIT, or packet 30, the bulk of the code logic is in a function I (aptly) named kex_shiz_180142950. Towards the end of the function, we see 6 very similar pieces of code:

Figure 8. 6 pieces of code (not all inside image)

The purpose behind this code can be found in the RFC, specifically in section 7.2. Essentially, after the key exchange has occurred, this code is used to generate shared keys and IVs with the client. Note that EVP_EncryptInit_ex is called a second time on a1 + 1120, but this time with the IV and key. With this, encryption / decryption can take place.

Switch to encrypted communication (SSH2_MSG_NEWKEYS)

So far, all packets had been sent in the plain. After receiving the last plaintext packet, SSH2_MSG_NEWKEYS, the server calls calls packet_21_180144870 which assigns a1 + 1120 to a1 + 1104.

Figure 9. Now a1 + 1104 holds the AES context

From now on, when sending encrypted packets to the client, the server will use this EVP_CIPHER_CTX object stored in a1 + 1104.

Bindiffing

Now, on to the actual bindiffing. The patch was deceptively simple, which made me doubt my judgement quite a couple of times throughout the analysis. As can be seen below, in the patched version, the patch adds a check at the start of packet_30_180144420 to ensure the value of a1 + 408 is 3 before continuing:

Figure 10. Change to packet_30_180144420

When the updated kex_shiz_180142950 completes successfully, it then assigns 4 to a1 + 408…

Figure 11. Change to kex_shiz_180142950

…which will be checked for in the updated packet_21_180144870:

Figure 12. Change to packet_21_180144870

With the patch, the order of packets received is now restricted to the correct order specified by the RFC. For example, if the server receives SSH2_MSG_KEXDH_INIT instead of SSH2_MSG_KEXINIT as the first packet after banner exchange, the function parsing the packet will immediately bail as a1 + 408 is not set to the correct value of 3.

From here, I had 3 hypotheses of how the bug may be triggered:

  1. Send SSH2_MSG_KEXDH_INIT as the first packet
  2. Send an invalid SSH2_MSG_KEXINIT packet that causes parsing to fail and leave some fields uninitialised, then send an SSH2_MSG_KEXDH_INIT
  3. Rearrange SSH2_MSG_KEXINIT, SSH2_MSG_KEXDH_INIT and SSH2_MSG_NEWKEYS in an invalid order

In retrospect, the log entry provided by Microsoft became a red herring in my analysis, as it made me incorrectly assume that the root cause of the bug was somehow in kex_shiz_180142950 which parses SSH2_MSG_KEXDH_INIT. After the first idea was quickly proven wrong, I moved on to the second idea and tried to send all kinds of invalid SSH2_MSG_KEXINIT packets in hopes of triggering a crash, which did not work.

It was only a month of aimlessly testing that I thought of the third possibility. By that time, the Microsoft blog post was released for 1-2 days but I had no idea. After a little testing, I was amazed to find out that just by sending a single SSH2_MSG_NEWKEYS packet, I managed to crash the server. It was that simple all along. However, looking at the logs and crash location, I quickly realised the crash was not what was reported by Microsoft.

Figure 13. The first crash

The crash happened instead in the packet parsing function at get_pkt_1801411E0, which I did not expect. After some brief analysis, I found the reason: after SSH2_MSG_NEWKEYS is received, the server will treat that all subsequent packets received from the client as encrypted and would try to use the EVP context at a1 + 1096 to decrypt any subsequent packets received; however, without a prior SSH2_MSG_KEXINIT, the field a1 + 1096 holding the decryption context would be left as NULL, causing a null-pointer dereference.

Figure 14. Cause of crash

To remedy this issue was simple: I just had to tell the server that client-to-server packets could only be sent in the plain. To do this, I sent an SSH2_MSG_KEXINIT packet before SSH2_MSG_NEWKEYS specifying that client-to-server encryption could only be “none”, which was supported (though not encouraged) according to the RFC. After this, I sent an SSH2_MSG_KEXDH_INIT packet and boom, the server crashed, with the same log entry as Microsoft documented.

Figure 15. Crash reproduced!

The following is my analysis of the root cause of the vulnerability: as the server did not receive an SSH2_MSG_KEXDH_INIT packet, no key exchange took place and the previously mentioned second call to EVP_EncryptInit_ex would hence not occur.

However, as the server does not check if an SSH2_MSG_KEXDH_INIT packet had been received before using new keys, it accepts the subsequent SSH2_MSG_NEWKEYS packet as valid and treats the encryption context at a1 + 1120 as valid.

Figure 16. Process of triggering the vulnerability

When I then send an SSH2_MSG_KEXDH_INIT packet, or in fact any packet that elicits a server-side response, the server will attempt to encrypt its response packet, and it does this with EVP_EncryptUpdate. When the EVP_CIPHER_CTX context is initialised with AES 128 CTR, the EVP_EncryptUpdate call eventually leads to a call to aes_ctr_cipher:

static int aes_ctr_cipher(EVP_CIPHER_CTX *ctx, unsigned char *out,
                          const unsigned char *in, size_t len)
{
    int n = EVP_CIPHER_CTX_get_num(ctx);
    unsigned int num;
    EVP_AES_KEY *dat = EVP_C_DATA(EVP_AES_KEY,ctx);

    if (n < 0)
        return 0;
    num = (unsigned int)n;

    if (dat->stream.ctr)
        CRYPTO_ctr128_encrypt_ctr32(in, out, len, &dat->ks,
                                    ctx->iv,
                                    EVP_CIPHER_CTX_buf_noconst(ctx),
                                    &num, dat->stream.ctr);
    else
        CRYPTO_ctr128_encrypt(in, out, len, &dat->ks,
                              ctx->iv,
                              EVP_CIPHER_CTX_buf_noconst(ctx), &num,
                              dat->block);
    EVP_CIPHER_CTX_set_num(ctx, num);
    return 1;
}

In the else case, dat->block, or ctx->cipher_data->block, is used as a function pointer. ctx->cipher_block comes from a OPENSSL_zalloc here. The astute reader might be wondering how OPENSSL_zalloc can cause uninitialised memory use when it zeroes out its allocated heap buffer.

As it turns out, doing a git blame on line 294 shows us that the commit which introduced the use of OPENSSL_zalloc was made in 2006. Unfortunately, Serv-U appears to use a version of libeay32.dll that was dated back to 2005. If only they had just used a more up-to-date libeay32, this vulnerability would probably had been unexploitable.

Figure 17. libeay32.dll used by Serv-U

Moving on to cipher_data->block, we see that it is assigned in aes_init_key, which is called from here. As seen from the line 369, if we do not call EVP_CipherInit_ex with a non-NULL key, this chain of calls will never happen, leaving cipher_data->block uninitialised. This causes the use of an uninitialised function pointer.

Conclusion

Where do we go from here? Stay tuned for part 2 on a detailed writeup on how I exploited this vulnerability to gain RCE.