Analysis of CVE-2021-35211 (Part 2)

23 September 2021

Introduction

In my last post, I provided an analysis of the vulnerability in Solarwinds Serv-U, CVE-2021-35211. Picking up from where I left off, this post will discuss my approach to achieving (a super unstable) RCE in Serv-U on Windows 10. As usual, for those who are here for the exploit, it can be found here.

Exploitation

Execution flow control

As we now know, by sending packets in an invalid order, we can trigger Serv-U to dereference and call function pointer dat->block from an uninitialised heap buffer here. How then can we control the value of the function pointer?

From the source here, we see that the function pointer is dereferenced from the structure EVP_AES_KEY. As dat->block compiles to offset 0xf8 in this structure and stream is 8 bytes, I deduced that the structure has a size of 0xf8 + 8 (stream) + 8 (block) = 0x108 bytes.

typedef struct {
    union {
        OSSL_UNION_ALIGN;
        AES_KEY ks;
    } ks;
    block128_f block;   // +0x0f8
    union {
        cbc128_f cbc;
        ctr128_f ctr;
    } stream;           // +0x100
} EVP_AES_KEY;          // +0x108

On Windows 10, blocks of this size will be allocated from the Low-Fragmentation Heap (LFH). To my knowledge, the LFH works by allocating a large chunk of memory called a subsegment and splitting it into identically sized blocks. To service a request, it returns a randomly selected unused block from the subsegment. When a subsegment is exhausted, a new one is allocated and the process repeats again.

Figure 1. Windows 10 LFH

Theoretically, it is possible for me to make enough allocations to exhaust the current subsegment, then exhaust the new subsegment and subsequently free all but one block of the new subsegment, which would ensure the next block that I allocate will always contain content I control.

Figure 2. Theoretical manipulation of LFH

However, due to the lack of information on the remaining space of the current subsegment as well as a bit of laziness, I decided to stick to the approach by Microsoft.

By allocating a couple user blocks and freeing them, there was a good chance that the next allocation will return one of the previous freed blocks. For the allocation and freeing, I simply sent packets with SSH2_MSG_DEBUG with size 0x108. The packets are basically no-ops, but a buffer of size 0x108 will still have to be allocated to hold the data within it, allowing us to allocate user blocks.

Figure 3. Packet to allocate block

Now, I can control the function pointer. What should I set it to then?

With some incredible luck, the Serv-U DLL itself didn’t have ASLR, which meant I can jump to any location in it without an information leak. Following Microsoft’s approach, I decided to jump to 0x1800E19EC, which consists of the following instructions:

loc_1800E19EC: mov     rdx, [rbx+58h]
loc_1800E19F0: mov     r9d, esi
loc_1800E19F3: mov     rcx, [rbx+38h]
loc_1800E19F7: mov     r8, rbp
loc_1800E19FA: call    qword ptr [rbx+10h]

As rbx happened to be the base of the heap buffer, this meant that I could control both rcx and rdx, or the first 2 parameters. However, I felt that this primitive was still pretty limiting without an information leak. I could probably write to somewhere in the Serv-U.dll’s memory space and then use it as a parameter, but that required more analysis.

Info leak

Hence, I decided to take a closer look at the context of the vulnerability. dat->block is called from CRYPTO_ctr128_encrypt, a function which does AES CTR encryption. Here’s a diagram of how this mode of AES encryption works.

Figure 4. AES CTR diagram

In the diagram above, dat->block is the AES block cipher function which acts as a pseudorandom function (PRF), providing a stream of pseudorandom bytes to XOR with the plaintext stream. In the code for CRYPTO_ctr128_encrypt, we see that dat->block is called the following way:

while (len >= 16) {
    (*block) (ivec, ecount_buf, key); // block is dat->block, key is &dat->ks
    ctr128_inc_aligned(ivec);
    for (n = 0; n < 16; n += sizeof(size_t))
        *(size_t_aX *)(out + n) =
            *(size_t_aX *)(in + n)
            ^ *(size_t_aX *)(ecount_buf + n);
    len -= 16;
    out += 16;
    in += 16;
    n = 0;
}

As the input plaintext is XORed with ecount_buf after calling the block function, it is easy to infer that ecount_buf holds the output bytes for the block function. Meanwhile, key comes from &dat->ks here, and happens to be pointing to bottom of our EVP_AES_KEY structure. Knowing these, I replaced the block function with the following gadget:

loc_18004E170: mov     [rdx], r8
loc_18004E173: mov     rax, rdx
loc_18004E176: retn

By moving the third argument (r8) into the memory location of the second argument (rdx), I essentially did *(void**)ecount_buf = key, which changes the normal AES CTR procedure to this:

Figure 5. “AES CTR” diagram

The plaintext to encrypt was the server’s response packet and its first 8 bytes never changes. I simply had to XOR the known plaintext with the first 8 data bytes of the “encrypted” packet to leak the pointer to the key, which was the heap buffer location.

Now equipped with the heap leak, I made a second connection to the server and repeated the same attack, while keeping the previous connection open. Now, I have RIP control as well as a heap leak. At this stage, it was probably possible to do a single call to run a command of my choosing. However, I want remote code execution, so I decided to dig a little deeper.

JOP

Since I did not have stack control at this stage, I decided to instead make use of jump oriented programming (JOP) instead of ROP to kickstart the attack. The following is the setup:

Figure 6. JOP chain

With this chain of 4 gadgets, I can now pivot the stack to the heap buffer which I controlled. For my ROP chain, the general idea was to use VirtualProtect to change a heap buffer containing shellcode to rwx, and then jump to it. The Serv-U DLL already imports GetProcAddress and LoadLibraryA, so the ROP chain would have to achieve the same effect of following (pseudo)code:

kernel32 = LoadLibraryA("kernel32.dll");
virtualprotect = GetProcAddress(kernel32, "VirtualProtect");
(virtualprotect)(buffer_with_shellcode, // lpAddress
                 0x100,                 // dwSize
                 0x40,                  // flNewProtect
                 random_writeable_addr);// lpflOldProtect

The ROP chain was quite complicated so I would not be going through how it works in this post, but for those who are interested, feel free to contact me.

Egg hunter

By now, this should have been the point where I can pop calc with my shellcode and call it a day. However, I soon came to realise that the fixed size of ~0x100 of the heap buffer holding my shellcode became a limiting factor. For Metasploit, even a basic windows/x64/exec shellcode was at least 200+ bytes. To resolve the issue, I decided to make use of an age-old technique, the egg hunter.

On advice of my friend, I referenced code here and wrote a 200+ bytes shellcode that functioned similarly and looked for the egg 0x1337beef.

All that was left was to send a SSH2_MSG_DEBUG packet of arbitrary size containing the egg and shellcode. For my PoC video, I had to turn off the setting for Serv-U to run as a service, as services did not have GUIs. From there, I used a Metasploit windows/x64/exec shellcode with CMD set to calc.exe (EXITFUNC as thread) and popped calc.

Figure 7. Win

Reliability

Till now, I haven’t really mentioned the reliability of my exploit. Just to make it clear, this exploit is extremely unstable, namely for these reasons:

The triggering of the vulnerability in the first place may fail as the allocated heap block may not be one I had written to. (Refer to start) This will cause a crash that will be caught by Serv-U and leads to a log entry
As I stack pivoted to a heap buffer and called WinAPI functions such as LoadLibrary, the use of the “stack” may underflow and/or overflow the heap buffer with the ROP chain. This can often cause random crashes that Serv-U do not catch, causing server restarts.

If someone is determined enough, these problems can definitely be eliminated.

Conclusion

This concludes my two-month journey of exploring Serv-U and successfully exploiting a rather straightforward n-day vulnerability. Hopefully, in the future, I can work on finding some bugs of my own. :)