On the clock: Escaping VMware Workstation at Pwn2Own Berlin 2025

# On the clock: Escaping VMware Workstation at Pwn2Own Berlin 2025

Looking to improve your skills? Discover our **trainings** sessions! Learn more.

## Introduction

At Pwn2Own Berlin 2025, we successfully exploited VMware Workstation using a single Heap-Overflow vulnerability in the PVSCSI controller implementation. While we were initially confident in the bug’s potential, reality soon hit hard as we confronted the latest Windows 11 Low Fragmentation Heap (LFH) mitigations.

This post details our journey from discovery to exploitation. We start by analyzing the PVSCSI vulnerability and the specific challenges posed by the LFH environment. We then describe a peculiar LFH state that enables deterministic behavior, alongside the two key objects used for spraying and corruption. Building on this setup, we demonstrate how to leverage the vulnerability to craft powerful memory manipulation primitives, ultimately achieving arbitrary Read, Write, and Code Execution.

Finally, we reveal how—just two days before the contest—we exploited a timing side-channel within the LFH to fully defeat its randomization, ensuring a first-try success during the live demonstration.

## The PVSCSI Bug

`vmware-vmx` `linux/drivers/scsi/vmw_pvscsi.c` `PVSCSISGElement`

“`
struct PVSCSISGElement { u64 addr; u32 length; u32 flags; } __packed;
“`

“`
bool __fastcall PVSCSI_FillSGI(pvscsi_vmx *pvscsi_vmx, ScsiCmd *scsi_cmd, sgi *sgi) { // […] while ( 1 ) { next_i = i + 1; if ( 0x10 * (unsigned __int64)(i + 1) > leftInPage ) { Log(“PVSCSI: Invalid s/g segment. Segment crosses a page boundary.n”); goto return_invalid; } idx = i; pInPage = &page[idx]; if ( (page[idx].flags & 1) == 0 ) { seg_len = page[idx].length; if ( sg_table_len_1 < seg_len ) seg_len = sg_table_len_1; seg_count = sgi_1->seg_count; if ( seg_count > 0x1FF ) { v13 = page; pEntries = (SGI_Entry *)UtilSafeRealloc1( sgi_2->entries_buffer, 0x4000uLL, 0xFFFFFFFF, “bora/devices/pvscsi/pvscsi.c”, 0xC5A); page = v13; sgi_1 = sgi_2; sgi_2->entries_buffer = pEntries; sgi_2->pEntries = pEntries; seg_count = sgi_2->seg_count; } else { pEntries = sgi_1->pEntries; } seg_idx = (int)seg_count; pEntries[seg_idx].addr = pInPage->addr; pEntries[seg_idx].entry_size = seg_len; sg_table_len_1 -= seg_len; sgi_1->seg_count = seg_count + 1; goto loop_over; } // […] }
“`

`UtilsSafeRealloc1()`

`addr` `length`

## A Heap of Trouble

“`
p2 = malloc(0x4000); // Allocate the new chunk memcpy(p2, p1, 0x4000); free(p1); // Free the current chunk memcpy(p2+0x4000, elem, 16); // Write 16 bytes past the end, corrupting the new chunk’s metadata
“`

## A Tale of Two Objects

### Shaders

### URBs

## Winning a Ping-Pong Game

**the number of available chunks in the 0x4000 buckets**. At that point, we had no way of retrieving this information. Lacking a better alternative, we decided to investigate further, acting as if the initial state was already known.

### The Reap Oracle

To implement the rest of our primitives, we needed four contiguous chunks in a known order in **B1**. This is where the Reap Oracle comes into play. As previously mentioned, allocated URBs are stored in a FIFO queue. By repeatedly calling the UHCI controller’s `reap` method, we can retrieve the content of the next URB in the queue and free it. This allows us to detect which URB was corrupted.

The 16-byte overwrite affects the following four fields of the URB structure:

“`
struct Urb { int refcount; uint32_t urb_datalen; uint32_t size; uint32_t actual_len; […] }
“`

The critical field here is `actual_len`. Recall that for the 16-byte overflow, we control the first 12 bytes, but the last 4 bytes are always forced to zero. Consequently, the overflow invariably sets `actual_len` to zero. This corruption acts as a marker, allowing us to identify the affected URB.

The Reap Oracle Strategy:

1. **Allocation**: We allocate 15 URBs to fill the **B1** bucket.
2. **Corruption**: We trigger the vulnerability (Ping-Pong) to zero out the `actual_len` field of the URB located immediately after **Hole0**. Then, we allocate two placeholder shaders to reuse **Hole0** and **PONG**.
3. **Inspection & Replacement**: We iterate through the URB queue. For each URB, we `reap` it and check its length. We immediately allocate a placeholder shader in its place.
4. **Identification**: When we retrieve a URB with a modified `actual_len`, we know that the shader we just allocated to replace it resides in the slot following **Hole0**. We label this new slot **Hole1**.

We repeat the process to locate **Hole2** and **Hole3**. For each iteration, we free the non-essential placeholders (keeping **Hole0**, **Hole1**, etc.), refill the bucket with URBs, and use the previous Hole as the **PING** buffer. Once the heap is prepared, we re-execute the corruption and identification steps to pinpoint the next contiguous slot. Ultimately, we obtain a sequence of four contiguous chunks ( **Hole0**– **Hole3**), currently occupied by shaders, which can then be freed to enforce adjacency for subsequent allocations.

## Coalescing Is All You Need

“`
// […] res = ((__int64 (__fastcall *)(pvscsi_vmx *, ScsiCmd *, sgi *))scsi->pvscsi->fillSGI)(// PVSCSI_FillSGI scsi->pvscsi, scsi_cmd, &scsi_cmd->sgi); LODWORD(real_seg_count) = 0; if ( !res ) return 0; seg_count = (unsigned int)scsi_cmd->sgi.seg_count; if ( (int)seg_count <= 0 ) { numBytes_1 = 0LL; } else { pEntries = scsi_cmd->sgi.pEntries; numBytes_1 = pEntries->entry_size; if ( (_DWORD)seg_count != 1 ) { next_entry = (SGI_Entry *)((char *)pEntries + 0x18); LODWORD(real_seg_count) = 0; for ( i = 1LL; i != seg_count; ++i ) { idx = (int)real_seg_count; entry_size = pEntries[idx].entry_size; addr = ADJ(next_entry)->addr; if ( entry_size + pEntries[idx].addr == addr ) { pEntries[idx].entry_size = ADJ(next_entry)->entry_size + entry_size; } else { real_seg_count = (unsigned int)(real_seg_count + 1); if ( i != real_seg_count ) { real_seg_idx = (int)real_seg_count; pEntries[real_seg_idx].addr = addr; pEntries[real_seg_idx].entry_size = ADJ(next_entry)->entry_size; } } numBytes_1 += ADJ(next_entry++)->entry_size; } } } // […]
“`

“`
Entry 1: {.addr = 0x11000, .size = 0x4000} Entry 2: {.addr = 0x15000, .size = 0x2000} Entry 3: {.addr = 0x30000, .size = 0x1000}
“`

“`
Entry 1′: {.addr = 0x11000, .size = 0x6000} Entry 2′: {.addr = 0x30000, .size = 0x1000}
“`

`0xFFFFFFFF`

### Building a controlled overflow

1. `{.addr=0, .len=0}` `entry[1023]`

1. `entry[2048]` `entry[1024]`

`AddrA+LenA==AddrB` `LenA=0` `AddrA==AddrB` `0x4141414141414141 0x4242424242424242` `0x4343434343434343 0x4444444444444444`

“`
entry[i] = { .addr = 0x4141414141414141, .size = 0 } entry[i+1] = { .addr = 0x4141414141414141, .size = 0x4242424242424242 }
“`

**Pair 2:**

“`
entry[i+2] = { .addr = 0x4343434343434343, .size = 0 } entry[i+3] = { .addr = 0x4343434343434343, .size = 0x4444444444444444 }
“`

Note that **even-indexed** entries (with the size set to zero) are written by the heap-overflow, while **odd-indexed** entries are the ones that were already present in **Shader2**.

Each pair of entries is merged into a single one due to their matching addresses and the zero size:

“`
entry[i] = { .addr = 0x4141414141414141, .size = 0x4242424242424242 } entry[i+1] = { .addr = 0x4343434343434343, .size = 0x4444444444444444 }
“`

## Leaking a URB

`actual_lenJiang`

### Step 1: The Setup

– `actual_len` `0x0`

### Step 2: The Overflow

– `0xFFFFFFFF`

Just like in the previous section, we trigger the vulnerability twice in order to overwrite both **odd-indexed** and **even-indexed** entries of **URB1**, and corrupt only half of **URB2** intact. We end up with the following state:

### Step 3: The Coalescing

1. `0xFFFFFFFF*0x401`. `actual_len` `0x400`

– `actual_len` `0x400`

`actual_len`

## Arbitrary Read, Write and Call Primitives

We reuse the memory layout from our leak phase: `[Hole0, URB1, URB2, Shader3]`

At this stage, **URB1** and **URB2** have corrupted heap metadata and can no longer be safely freed. However, **Shader3** (the former **URB3** location) remains uncorrupted and can be freely reallocated at will.

We gain full control over the structure of **URB1** by using **Shader3** as our source buffer. By placing a forged URB structure inside **Shader3** and triggering the “move up” primitive, we shift our data directly into **URB1**’s memory space, effectively replacing its contents with arbitrary data. Having previously leaked a URB header, we already possess all the pointers necessary to forge a perfectly valid one.

#### A Persistent Arbitrary URB

To ensure full stability, we aim to create a persistent fake URB that can be controlled through simple heap reallocations, bypassing the need to trigger the vulnerability ever again. The trick is to change the `URB1.next` pointer to point to **Hole0**. We also increment the reference count of **URB1** to ensure it stays in memory despite its corrupted metadata.

When VMware “reaps” **URB1**, it sets `URB1.next` as the new head of the URBs FIFO queue. This effectively places our fake URB in **Hole0** at the top of the FIFO. We can now control this fake structure at will by reallocating **Hole0** with a new shader whenever needed, removing any further need to trigger the PVSCSI vulnerability.

#### Read & Write Operations

– `URB.actual_len` `URB.data_ptr`

– `URB.pipe`

#### Call primitive

`RCX+0x90`

To ensure our exploit is portable across Windows versions, we avoid hardcoded offsets. Instead, we use our read primitive to parse `Kernel32` in memory and dynamically resolve the address of `WinExec`.

The last hurdle is bypassing Control Flow Guard (CFG). We cannot jump directly to `WinExec`, so we use a CFG-whitelisted gadget within `vmware-vmx`. This gadget pivots data from `RCX+0x100` into a fully controlled argument before jumping to a second arbitrary function pointer, in this case, `WinExec(“calc.exe”)`.

## On the Clock

Two days before the competition—and three days after registering—we finally had a working exploit. The only minor issue was that it relied on the assumption that we knew the initial LFH state—but we didn’t. The number of free LFH chunks was a moving target. Right after booting the guest OS, the value was almost always the same, but as soon as a graphical session was launched, it started changing in unpredictable ways. To make things worse, our various testing setups all had close but distinct initial LFH states. Basically, we needed to pick one number out of 16, knowing only that the odds were somewhat skewed in favor of certain values. At this point, our best strategy was rolling a slightly loaded 16-sided die.

We had previously envisaged a solution based on a simple hypothesis: when a chunk is allocated from the LFH, if all the current buckets are full, the LFH needs to create a new bucket, a process that should take additional time. By allocating multiple 0x4000 buffers and precisely measuring the duration of each allocation, we should be able to detect a slightly longer delay each time a new bucket is created. We hoped this would provide a reliable timing-channel to uncover the initial LFH state.

The catch was that we needed a synchronous allocation primitive. In VMware, almost all allocations are performed asynchronously. For instance, to allocate shaders, we push commands in the SVGA FIFO, which are then processed in the background, leaving us no way to precisely time the allocation.

By chance, VMware exposes one feature that is perfectly synchronous: the backdoor channel. This channel is used to implement VMware Tools features, such as copy-and-paste. It is implemented via “magic” assembly instructions, which return only after the command has been fully processed. Here is an excerpt from Open VM Tools, which provides an open-source implementation of the VMware Tools:

“`
/** VMware backdoor magic instruction */ #define VMW_BACKDOOR “inl %%dx, %%eax” static inline __attribute__ (( always_inline )) uint32_t vmware_cmd_guestrpc ( int channel, uint16_t subcommand, uint32_t parameter, uint16_t *edxhi, uint32_t *ebx ) { uint32_t discard_a; uint32_t status; uint32_t edx; /* Perform backdoor call */ __asm__ __volatile__ ( VMW_BACKDOOR : “=a” ( discard_a ), “=b” ( *ebx ), “=c” ( status ), “=d” ( edx ) : “0” ( VMW_MAGIC ), “1” ( parameter ), “2” ( VMW_CMD_GUESTRPC | ( subcommand << 16 )), “3” ( VMW_PORT | ( channel << 16 ) ) ); *edxhi = ( edx >> 16 ); return status; }
“`

To trigger controlled allocations using the VMware Tools, we leveraged the `vmx.capability.unified_loop` command [5]. By providing a string argument of 0x4000 bytes, we could force the host to allocate exactly two buffers of that size.

Since a bucket for the 0x4000 size class contains exactly 16 chunks, triggering this command 8 times (for a total of 16 allocations) guaranteed that we would cross a bucket boundary and observe a bucket creation event.

To time the allocations, we relied on the `gettimeofday` system call. While the timing signal was subtle, it was definitely noticeable. To clean up the noise, we implemented some “poor man’s signal processing”:

1. We triggered the command 8 times to collect a batch of measurements.
2. We discarded any batch containing significant outliers (typically much longer measurements caused by host context-switches).
3. We summed multiple valid batches to obtain a smoother, more reliable estimation.

When tuned correctly, the results were clear: among the 8 measurements, one was consistently longer, indicating that a new bucket had been created during that specific call. We could then allocate a single 0x4000 buffer and repeat the process to determine precisely which of the two allocations within the command had triggered the new bucket’s creation.

This second pass allowed us to deduce the exact LFH offset: if the timing spike appeared at the same index as before, it meant the LFH offset was odd; if the spike shifted to the next index, the offset was even. Any other result was flagged as incoherent—usually due to background heap activity or, more commonly, excessively noisy measurements—meaning we had to restart the process from scratch.

### Racing Against Noise

In theory, we could have used a very large number of batches to arbitrarily increase the signal-to-noise ratio (SNR). In practice, however, we hit a significant bottleneck in the `vmx.capability.unified_loop` command: the strings allocated by this command are added to a global list and cannot be freed.

Furthermore, these strings must be unique. This means that every time we issued the command, the host would first compare our string argument against every string already present in the list, only performing a new allocation if it found no match.

This posed a major challenge. Initially, when the list was empty, the string comparison was instantaneous. But after a few hundred allocations, the command had to perform hundreds of comparisons before even reaching the allocation logic. This $O(n)$ search overhead meant that as we were collecting more batches to improve our SNR, the baseline noise and latency were actually increasing.

This created a race against time: every batch of measurements we collected to increase our precision paradoxically raised the noise floor for the next one.

We knew that if the algorithm didn’t converge quickly enough, the VM state would become too “polluted” to ever yield a clear reading again. Luckily, after testing and tuning the algorithm on multiple computers, we found a working balance. During the competition, the exploit worked on the first attempt, much to our relief.

## Demonstration

Here is a video of the exploit in action:

## Conclusion

This research was conducted over three months of evenings and weekends. The first month was dedicated to reverse-engineering VMware and discovering the vulnerability. Convinced that exploitation would be straightforward, we spent the second month procrastinating.

The third month was a reality check. While we developed the drivers necessary to allocate interesting objects, we hit a wall with Windows 11 LFH mitigations, exhausting countless bypass strategies that ultimately failed. Consequently, the core of the exploit—including the leak, the Read/Write/Execute primitives, and the timing-channel—was developed in the final five days leading up to the competition.

While this last-minute sprint ultimately paid off, we strictly advise against replicating our timeline—unless you particularly enjoy sleep deprivation.

## References

[1] Saar Amar, _Low Fragmentation Heap (LFH) Exploitation – Windows 10 Userspace_

[2] Zisis Sialveras, _Straight outta VMware: Modern exploitation of the SVGA device for guest-to-host escape exploits_

[3] Corentin Bayet, Bruno Pujos, _SpeedPwning VMware Workstation: Failing at Pwn2Own, but doing it fast_

[4] Yuhao Jiang, Xinlei Ying, _URB Excalibur: The New VMware All-Platform VM Escapes_

[5] nafod, _Pwning VMware, Part 2: ZDI-19-421, a UHCI bug_

Recent posts