Bypassing SMEP
Background
I’m currently taking Offensive Security’s PEN-401 course and studying for their OSEE exam. One concept I’ve been learning about is Supervisor Mode Execution Prevention (SMEP). I found it to be one of the more confusing topics to learn, so I thought I’d try to explain how it works to help fill my own knowledge gaps and better solidify my own understanding.
Supervisor Mode Execution Prevention (SMEP)
Supervisor mode execution prevention is an exploit mitigation feature built into some CPUs. It was introduced by Intel in 2011 and supported in Windows since Windows 8.
When developing an exploit, if you manage to gain code execution in supervisor-mode (kernel-mode), you’ll eventually want to execute your own payload. It is often easiest to store your payload in a user-mode buffer. If you try to transfer execution to user-mode memory while running from kernel-mode, SMEP will kick in and prevent execution. This causes a blue screen of death on Windows and your exploit will fail.
There are ways this mitigation can be bypassed, but first it’s important to understand the concept of virtual memory and how virtual memory is mapped to physical memory.
Virtual and Physical Memory
Physical memory refers to the actual physical RAM hardware attached to a computer. Running programs are loaded off of the hard drive and into RAM for faster execution, but RAM is volatile. When the system reboots, the RAM is erased. A computer system has a limited amount of RAM based on whatever is installed at the time and is limited by what is supported by the computer’s CPU, motherboard, operating system, etc. Since RAM is limited and only certain pieces of programs are running at a given time, the computer’s Memory Management Unit (MMU) can copy chunks of the program into physical memory when needed. If it is not actively being used, it is copied back to disk to free up physical memory for components that need it immediately. This is done using a memory management strategy known as virtual memory.
Virtual memory allows programs to address and utilize more memory than actually exists physically in the system. With virtual memory, data can be moved back and forth between the physical hard drive and RAM as needed through a process known as paging. The entirety of a program can be mapped to a virtual memory address range, but only chunks of that program must actually exist in the physical memory at a given time. When a program attempts to access a virtual memory address that has been paged out to the hard disk, the MMU can automatically copy that data back into physical memory and page out some other data that is not currently in use. This is all transparent to the process.
Memory Paging
On the x86_64 architecture, the CPU uses a series of linked tables called paging structures to translate a virtual memory address to a physical memory address. These structures are:
- Page Map Level 4 Table (PML4)
- Page Directory Pointer Table (PDPT)
- Page Directory Table (PDT)
- Page Table (PT)
Each entry of the PML4 contains a memory address of a PDPT. Each entry of the PDPT contains the address of a PDT. Each entry of the PDT contains the address of a PT. The PT entry contains the physical address of each memory page.
When translating an address, the system first checks the CPU’s CR3 register. This register will contain a physical memory address which is the base address of the PML4 table for the current process. As a simple example, let’s say the CR3 register contains the value 0x0100:
CR3 | 0x0100 |
---|
We now know that the PML4 table’s physical base address is 0x0100 for the current process. Each entry of the PML4 table contains a QWORD. The QWORD holds several pieces of information, but the most important thing to know for now is that it includes the physical base memory address of a PDPT. Here is a simplified example:
PML4 | PML4E |
---|---|
0x0100 | 0x0800 |
0x0108 | 0x0900 |
0x0110 | 0x0a00 |
… | … |
The next question is, which entry is the correct entry? We need to know the offset into the PML4 table to choose the correct entry. The offset is obtained from the virtual memory address that we are trying to translate. We’ll go into more detail on this in a bit, but for now just assume that the PML4 offset is 0x01 for the sake of this example and just know that the offset is stored inside the virtual memory address that we are translating. Each PML4 entry (PML4E) is one QWORD in size (or eight bytes). Since our offset is 1, we must multiply that by 8 bytes (0x08 * 1 = 0x08).
PML4 | PML4E |
---|---|
0x0100 | 0x0800 |
0x0108 | 0x0900 |
0x0110 | 0x0a00 |
… | … |
Checking the PML4 table at this offset reveals that the physical base address of the PDPT is at 0x0900. We now need to know the offset into the PDPT. This is also obtained from the original virtual memory address. In this case, let’s assume the offset is 0x00.
PDPT | PDPTE |
---|---|
0x0900 | 0x1000 |
0x0908 | 0x1100 |
0x0910 | 0x1200 |
… | … |
The physical base address of the PDPT was 0x0900 and the offset was 0x00. Therefore, we read the PDPTE at 0x0900 and find that the PDT physical base address is 0x1000. We once again find the offset for the PDT inside our virtual memory address. Let’s assume the offset was 2. We again multiple our offset by 8 bytes (0x02 * 0x08 = 0x10):
PDT | PDTE |
---|---|
0x1000 | 0x1800 |
0x1008 | 0x1900 |
0x1010 | 0x1a00 |
… | … |
We check the PDT at this offset and find that the PT physical base address is 0x1a00. We obtain our final PT offset from our virtual memory address. We’ll assume the offset in this case was 1. Multiply 1 by 8 bytes and the offset becomes (0x08).
PT | PTE |
---|---|
0x1a00 | 0x2000 |
0x1a08 | 0x2100 |
0x1a10 | 0x2200 |
… | … |
The result of the lookup was 0x2100. This is the physical base address for the virtual address we are translating. All we need to do at this point is extract a final offset value from our original virtual address and add it to the physical base address to obtain the final physical address. Assume the offset value in this case was 0x28. In this case, we don’t need to multiple the offset by 8 bytes because we are no longer reading a paging structure entry.
0x2100 + 0x28 = 0x2128
In our hypothetical example, the physical address is 0x2128. In reality it’s more complex than this, but this simple example helps to illustrate how the paging structures are all linked together. Next we’ll go through an actual example with real memory addresses using WinDbg. We’ll also see how the paging structure entries contain more than just the base addresses and this therefore requires some extra steps to extract the base addresses.
Memory Paging Example
For this example I wrote a simple program that uses VirtualAlloc to create a buffer at address 0x44440000. The buffer was then filled with NOP instructions (0x90). The address 0x44440000 is a virtual address and will be translated to a physical address by the MMU. The virtual address can be broken down into five main components. The first four components act as an offset into the four paging structures mentioned above. The final component acts as an offset from the physical memory base address.
Taking the virtual address example 0x44440000, the first step is to convert the hexadecimal address to binary form.
00000000 00000000 00000000 00000000 01000100 01000100 00000000 00000000
Next, we break down the bits into chunks.
- Bits 39 - 47
- Index to the PML4 table
- Bits 30 - 38
- Index to the PDPT
- Bits 21 - 29
- Index to the PDT
- Bits 12 - 20
- Index to the PT
Let’s break down our example address in the same way:
- 0000000000000000 (0x00) - Ignored
- 000000000 (0x00) - Index into the PML4
- 000000001 (0x01) - Index into PDPT
- 000100010 (0x22) - Index into the PDT
- 001000000 (0x40) - Index into the PT
- 000000000000 (0x00) - Offset from the physical address base
- CR3 Register -
We need to start with the PML4 table. We now know that we need the PML4 Entry (PML4E) at offset 0x00 inside the PML4 table. But where is the PML4 table stored? The PML4 table physical base address is stored in the CR3 CPU register. The value of this register will change with each process. In this case of this example, the CR3 register was set to 0x134cbf000 as seen with WinDbg:
1: kd> r cr3
cr3=0000000134cbf000
- PML4 Entry (PML4E) -
We now have enough information to begin translating the virtual memory address of 0x44440000 to a physical address. The PML4 table begins at address 0x134cbf000. We need the entry at offset 0x00. We can obtain this with WinDbg:
1: kd> !dq 0000000134cbf000 + (0x00 * 8) L1
#134cbf000 0a000000`be1d3867
The PML4 entry (PML4E) in this case was 0x0a000000`be1d3867. The PML4E contains the physical base address of the PDPT but it also contains some extra information. We need to extract the physical address of the PDPT from this PML4E. The first step is to convert the PML4E to binary:
00001010 00000000 00000000 00000000 10111110 00011101 00111000 01100111
The Intel manual states that for every paging structure entry:
- Bits 12:(M-1) are the physical address for a 4kB aligned paging structure
- Bits M:51 are reserved and should be set to 0
- Bits 63:52 are ignored for the purposes of the address translation
In this case, ‘M’ stands for the maximum number of bits used to store a memory address. Modern CPUs only use the first 45 bits, so M = 45. Therefore:
- Bits 12:(45-1) are the physical address for a 4kB aligned paging structure
- Bits 45:51 are reserved and should be set to 0
- Bits 63:52 are ignored for the purposes of the address translation
Practically, this means we must ignore bits 45 - 63 and just set them to zero:
00000000 00000000 00000000 00000000 10111110 00011101 00111000 01100111
Because each paging structure is 4kB aligned, the lowest 12 bits are not needed. The paging structures take advantage of this and use those 12 bits to store control information about each page, but that information is not used for the address translation. The control information will become relevant later, but for now it means we should zero out the least significant 12 bits:
00000000 00000000 00000000 00000000 10111110 00011101 00110000 00000000
Now we can convert this value back into a hexadecimal value:
1: kd> ?0y0000000000000000000000000000000010111110000111010011000000000000
Evaluate expression: 3189583872 = 00000000`be1d3000
- PDPT Entry (PDPTE) -
We now have the base address of the PDPT: 0xbe1d3000. Let’s refer back to our page structure offsets to see what’s next:
- 000000001 (0x01) - Index into PDPT
Our PDPT offset is 0x01. We must now repeat the process, this time starting with the PDPT base address instead of the PML4 base address.
1: kd> !dq 00000000`be1d3000 + (0x01 * 0x08) L1
#be1d3008 0a000000`284e6867
The PDTE is 0a000000`284e6867. We now must convert to binary and zero out bits 0 - 11 and bits 45 - 63:
0000000000000000000000000000000000101000010011100110000000000000
We convert this back to hexadecimal to get our PDT base address.
1: kd> ? 1: kd> ? 0y0000000000000000000000000000000000101000010011100110000000000000
Evaluate expression: 676225024 = 00000000`284e6000
- PDT Entry (PDTE) -
The PDT base address is 0x284e6000. Our PDT offset is 0x22, so we repeat the process again, starting at the PDT base.
1: kd> !dq 284e6000 + (0x22 * 0x08) L1
#284e6110 0a000000`bbb2d867
The PDTE is 0x0a000000`bbb2d867. Convert it to binary and zero out bits 0 - 11 and bits 45 - 63:
0000000000000000000000000000000010111011101100101101000000000000
Then convert it back to hexadecimal to get the PT base address.
1: kd> ? 0y0000000000000000000000000000000010111011101100101101000000000000
Evaluate expression: 3149058048 = 00000000`bbb2d000
- PT Entry (PTE) -
The PT base address is 0xbbb2d000. The PT offset in this case is 0x40, so here we go again.
1: kd> !dq 00000000`bbb2d000 + (0x40 * 0x08) L1
#bbb2d200 01000000`bbd2e867
The PTE is 0x01000000`bbd2e867. We once again zero out bits 0-11 and 45-63:
0000000000000000000000000000000010111011110100101110000000000000
Then convert back to hexadecimal to get the physical base address.
1: kd> ? 0y0000000000000000000000000000000010111011110100101110000000000000
Evaluate expression: 3151159296 = 00000000`bbd2e000
- Physical Address -
The physical base address is 0xbbd2e000. Finally, our offset from the physical base address is 0x00. So we can do the final translation to get the physical address associated with our virtual address.
1: kd> ? bbd2e000 + 0x00
Evaluate expression: 3151159296 = 00000000`bbd2e000
The physical address that corresponds to 0x44440000 in this instance is 0x00000000`bbd2e000. If I check this physical address in WinDbg, I find my NOP instructions stored there.
1: kd> !dq bbd2e000 + 0x00 L1
#bbd2e000 90909090`90909090
All of this translation happens automatically in the background whenever a memory address is accessed.
Memory Page Protections
During the translation process, we were zeroing out the lowest 12 bits of each paging structure because they were not needed. Those 12 bits contain control information about the referenced memory page. In the previous example, the final Page Table Entry (PTE) was 0x01000000`bbd2e867. If we convert that value to binary, we get:
00000001 00000000 00000000 00000000 10111011 11010010 11101000 01100111
Let’s look at the lowest 12 bits only:
100001100111
Each of the twelve bits contains a piece of information about the memory page. For SMEP, the interesting bits are bits 1 and 2:
- Bit 1 (R/W)
- Controls the read/write permission of the page
- Bit 2 (U/S)
- Determines if the page is mapped for use in user-mode or supervisor-mode (kernel-mode)
Bit 63 of the PTE also contains an interesting bit of control information.
- Bit 63 (Execute-Disable)
- Controls if a page is executable or not
In the case of this example, bit 1 is set to 1, which indicates that it is readable and writable. Bit 2 is set to 1, which indicates that it is a user-mode page. The kernel cannot execute code from user-mode pages due to SMEP. Bit 63 is set to 0, indicating that the page has execution enabled.
If we want to bypass SMEP we could try to modify these control bits depending on the situation. In this example, we have memory that is readable, writable, and executable (I configured it this way in my call to VirtualAlloc). The problem is it’s been flagged as a user-mode page. If we could modify bit 2 and set the value to 0, the CPU would interpret the page as a kernel-mode page and allow us to execute code stored there from kernel-mode.
Alternatively, if we had a way to store shellcode in a kernel-mode page that was not executable, we could try to flip bit 63 to convert the page to an executable page.
One last idea would be to identify an executable kernel-mode page containing a code cave. We could then flip the R/W bit to make the page writable and copy our shellcode to that page before execution.
That sounds simple enough, but the problem is the PTE address we collected through the whole translation process is a physical address. We can’t directly access physical addresses to modify the information stored there. We can only access virtual addresses. However, the operating system needs to be able to manipulate the table entries when allocating memory and also to enforce page-level memory protections. So there must be a way to access the paging structure entries from virtual memory. The operating system maps the paging structures into virtual memory. We just need a way to find them.
PML4 Self-Reference Entry
The PML4 table contains a list of entries leading to PDPT tables. However there is one special PML4 entry that points back to the PML4 table itself using the PML4 table’s virtual memory address. If we attempt to use a virtual address that uses the PML4 self-reference entry, this will result in the PML4 being referenced twice and the entire translation process being cut short at the end.
The first step is to convert our example virtual address (0x44440000) to binary.
00000000 00000000 00000000 00000000 01000100 01000100 00000000 00000000
Then we shift it to the right by nine bits.
00000000 00000000 00000000 00000000 00000000 00100010 00100010 00000000
We’ve just shifted the PML4 offset over to the PDPT offset. The PDPT was shifted to the PDT. The PDT was shifted to the PTE, and the physical address offset was shifted mostly away. There are three bytes remaining but we won’t need them, so we can null them out. In this case, they happen to already be set to zero.
00000000 00000000 00000000 00000000 00000000 00100010 00100010 00000000
Let’s see what this looks like when it’s split into it’s paging structure offset components.
0000000000000000 (0x00) - Ignored
000000000 (0x00) - ???
000000000 (0x01) - Index into PML4
000000001 (0x22) - Index into the PDPT
000100010 (0x40) - Index into the PDT
001000000000 (0x00) - Offset from the PT base
What used to be the PML4 offset, is now nulled out. We need to stick the PML4 self-reference entry into that spot. Before Windows 10 1607, the self-reference entry was always at offset 0x1ed. We’ll use that for the example. We now must convert the PML4 self-reference offset (0x1ed) to binary:
00000000 00000000 00000000 00000000 00000000 00000000 00000001 11101101
We just need the lower nine bits.
111101101
These bits get inserted into our virtual address as the PML4 entry.
00000000 00000000 *11110110 1*0000000 00000000 00100010 00100010 00000000
Finally, we need to ensure the virtual address is canonical. I won’t get into what canonical means, suffice it to say if the 47th bit is a one, then bits 48 - 63 must also be one. If the 47th bit is a zero, then those other bits must also be zero. In this case, the 47th bit is a one, so we must change bits 48 - 63 to all be set to one.
11111111 11111111 11110110 10000000 00000000 00100010 00100010 00000000
Let’s once again see how this looks when broken down into it’s components.
1111111111111111 (0x00) - Ignored
111101101 (0x00) - Index to the PML4 Self-Reference Entry
000000000 (0x01) - Index into PML4
000000001 (0x22) - Index into the PDPT
000100010 (0x40) - Index into the PDT
001000000000 (0x00) - Offset from the PT base
Finally, we can convert this back into a hexadecimal address to obtain the virtual memory address of the PTE associated with our original virtual memory address (0x44440000).
fffff680`00222200
In summary, we can obtain the PTE virtual address from a process virtual address using the following steps:
- Shift the process virtual address to the right 9 bits
- Replace the PML4 entry within the process virtual address with the self-reference index
- Set the bits 48:63 to 1 to keep the address canonical
- Set bits 0:2 to zero
PML4 Self-Reference Randomization
Since Windows 10 1607, the self-reference entry offset is randomized when the system boots. Microsoft therefore introduced a kernel-mode function called nt!MiGetPteAddress to allow the OS to access the self-reference entry. It turns out that at nt!MiGetPteAddress+0x13 you can find the base virtual address of the PT structure.
Note: The below addresses will not match what was discussed above because these commands were run in a different process context.
1: kd> dq nt!MiGetPteAddress + 0x13 L1
fffff804`1853273b ffffbf80`00000000
Supposedly the PT base address is 0xffffbf80`00000000. We can check in WinDbg using the !pte command to get the address of the NULL page. The NULL page is at offset 0x00 in the PT structure, so it’s PTE address is the same as the PT base address.
1: kd> !pte 0x0
VA 0000000000000000
PXE at FFFFBFDFEFF7F000 PPE at FFFFBFDFEFE00000 PDE at FFFFBFDFC0000000 PTE at FFFFBF8000000000
contains 0A00000085F1F867 contains 0000000000000000
pfn 85f1f ---DA--UWEV contains 0000000000000000
not valid
Notice “PTE at FFFFBF8000000000” in the response. This confirms that the numbers match. If you have a read primitive, and are able to use it to read the PT base address from nt!MiGetPteAddress, you can then calculate the virtual address for the PTE of any address of your choosing. We know that there are 0x1000 bytes per page. Each page table entry is 0x8 bytes in size. The PTE for the first page in this example would be stored at 0xFFFFBF80`00000000. The second page PTE would be at 0xFFFFBF80`00000008. We can follow this process to calculate the PTE for a give virtual address:
- Divide the virtual address by 0x1000 (Size of a memory page)
- Multiple the result by 0x08 (Size of a PTE)
- Add the PTE base address
We can build a simple formula to calculate the PTE virtual address from a given process virtual address:
0x00000000`44440000 / 0x1000 * 0x08 + 0xFFFFBF80`00000000
In this case, the PTE virtual address would be 0xFFFFBF80`00222200.
Conclusion
Bypassing SMEP is fairly simple. You only need to flip a single bit in memory. It’s simple enough to take a known formula and apply it in an exploit to accomplish the task. Understanding how and why this bypass works is much more complex. I needed to re-read my course material a few times and walk through the process myself with WinDbg to really solidify my understanding. I also found that writing this post was extremely helpful to ensure I really understood the concepts.