0x01 Research Overview
A critical vulnerability has been discovered in Hyper-V’s virtual network switch driver ( vmswitch.sys ).
The vulnerability was discovered using a Fuzzer we named hAFL1, which we open sourced (https://github.com/SB-GC-Labs/hAFL1).
hAFL1 is a modified version of kAFL that enables fuzzing of Hyper-V paravirtualized devices and adds structure awareness, detailed crash monitoring, and coverage guidance.
Hyper-V is the underlying virtualization technology for Azure (Microsoft’s public cloud).
The vulnerability allows remote code execution (RCE) and denial of service (DoS). With it, attackers can use Azure virtual machines to control entire public cloud platforms and run arbitrary code on Hyper-V hosts.
The vulnerability, which first appeared in the August 2019 release of vmswitch, may have been around for over a year.
In May 2021, Microsoft assigned a CVSS score of 9.9 to vulnerability CVE-2021-28476 and released a patch for it.
Why target Hyper-V? More and more companies are migrating a major portion of their workloads to public clouds such as AWS, GCP and Azure. The public cloud offers users the flexibility to not have to manage their own bare metal servers. However, these clouds are inherently based on shared infrastructure—shared storage, networking, and CPU power. This means that any vulnerability in the hypervisor will have a wider impact; it will not only affect one virtual machine, but potentially many of them.
Hyper-V is Azure’s underlying virtualization technology, and we decided to target its virtual switch ( vmswitch.sys ) as it is a core key component of cloud functions.
Why choose fuzzing? Between developing a fuzzing tool and statically analyzing Hyper-V’s network driver, vmswitch.sys, we chose the first one. The reason is simple – code size. Manual auditing for bugs is tedious, and we hope that a good fuzzer will find more than one Crashs.
In Hyper-V terminology, the host OS runs in the Root Partition and the guest OS runs in the Child Partition. Hyper-V makes extensive use of paravirtualized devices in order to provide subpartitions with interfaces to hardware devices. With paravirtualization, both the VM and the host use a modified hardware interface, resulting in better performance. One such paravirtualized device is the network switch, which is the target of our research.
Each paravirtualized device in Hyper-V consists of two components:
1. A Virtualization Service Consumer (VSC) running in the child partition. netvsc.sys is the network VSC.
2. A Virtualization Device Provider (VSP) running in the root partition. vmswitch.sys is the network VSP.
The two components communicate with each other via VMBus, a hypercall-based intrapartition communication protocol. VMBus uses two ring buffers – a send buffer and a receive buffer – to transfer data between the client and the host.
The paravirtualized network in Hyper-V consists of netvsc (consumer) and vmswitch (provider).
Fuzzing is an automated software testing technique that involves providing invalid or random data as input to a computer program. A fuzzer generates inputs, sends them to its target and monitors for crashes or unexpected behavior on the target. The core component of fuzzing is harness, which is responsible for sending input directly to the target. The harness is tightly coupled to the target; it must send input through the communication channel that the target normally uses. To make the fuzzing process efficient, modern fuzzers implement several additional features. The first is coverage guidance – the ability to track exactly which code flows are executed and alter new inputs accordingly, with the goal of increasing the amount of code accessed in the target program. Another important feature is crash monitoring – the ability to get detailed information about any crashes that occur during fuzzing. Such information can be a stack trace or the line of code that triggered the crash. Finally there is structure awareness; instead of sending completely arbitrary input, the fuzzer generates input that conforms to a specific structure (e.g. network protocol, file format, etc.). This increases the chance of processing the input at an early stage rather than discarding it via basic validation. We use fuzzing infrastructure to refer to any software project that includes the above components to perform the fuzzing process.
Our goal is to have a fuzzing infrastructure capable of sending input to vmswitch. Also, hopefully our fuzzer will be able to cover the boot and provide a detailed crash report pinpointing exactly why the crash occurred. Finally, structure awareness is also important for us to send input in a format accepted by vmswitch instead of wasting time and resources with arbitrary input.
The first stage of developing a Fuzzer is designing the harness. We took inspiration from an MSRC blog post detailing fuzzing for VPCI (Hyper-V’s paravirtualized PCI bus). Since this goal is similar to ours, we start with the same steps.
The idea presented in the Microsoft post is simple – find the VMBus channel used by VSC and use this channel to send data to the VSP using a known, documented API. Our goal is to apply these steps to our goal: find the VMBus channel used by netvsc and use this channel to send data to vmswitch using VmbPacketAllocate and VmbPacketSend.
1. Find the VMBus channel
netvsc is an NDIS driver that runs in the guest operating system of a Hyper-V subpartition and exposes virtualized network adapters. As part of the virtual adapter initialization process, netvsc allocates a structure called MiniportAdapterContext (this happens as part of the function NvscMicroportInit). The offset 0x18 of the MiniportAdapterContext is our VMBus Channel pointer.
As part of the initialization process in netvsc, the VMBus Channel pointer is written to the MiniportAdapterContext structure.
Armed with this new knowledge, we wrote a dedicated driver ( harness.sys ) that runs on the subpartition. It goes through all NDIS miniport adapters, finds the adapter we want to fuzz (by string matching its name) and gets the VMBus Channel pointer from the adapter context structure. With the VMBus channel used by netvsc, the driver will allow us to send data to vmswitch.
The process of finding the VMBus channel through the ndis.sys driver
Every VSP in Hyper-V must implement and register the packet processing callback EvtVmbChannelProcessPacket. This function is called whenever a new packet arrives at the VSP. In vmswitch, this callback function is VmsVmNicPvtKmclProcessPacket.
vmswitch expects packets of type NVSP, a proprietary format for packets transmitted over Hyper-V’s VMBus. There are many NVSP packet types; some are responsible for setting up the VMBus send and receive buffers, some are responsible for performing the handshake between VSP and VSC (eg exchanging NDIS and NVSP versions), and some are used to send RNDIS messages between client and host.
RNDIS defines a message protocol between a host and a remote NDIS device by abstracting control and data channels. In a Hyper-V setup, the “host” is the guest VM, the “RNDIS device” is the vmswitch or external network adapter, and the “abstract communication channel” is the VMBus.
We decided to focus our fuzzing efforts on the code flow that handles RNDIS messages for two reasons:
1. There is a lot of code that handles RNDIS messages.
2. Quite a few vulnerabilities found in vmswitch are in RNDIS packet processing.
The function that handles the RNDIS message is VmsVmNicPvtVersion1HandleRndisSendMessage, which is called directly from VmsVmNicPvtKmclProcessPacket.
To use the RNDIS message Fuzzing vmswitch, we must call these functions and pass the RNDIS message to it.
3. Send RNDIS message
VmsVmNicPvtKmclProcessPacket accepts five parameters: VMBus Channel pointer, packet object, buffer and its length, and flags. The buffer parameter is used to send packet metadata to vmswitch. It consists of 4 fields:
1. msg_type – NVSP message type
2. channel_type – 0 means data, 1 means control
3. send_buf_section_index – The index of the send buffer section of the written data.Recall that VMBus transfers data through two ring buffers; this field specifies the exact location of the data
4. send_buf_section_size – the size of the data in the send buffer
Different fields in the Buffer parameter of the packet processing callback
Initially, data must be sent to the buffer via the VMBus. But after researching for a while, we found another way to send RNDIS messages that doesn’t involve VMBus sending data to the buffer. Memory can be allocated, data copied into it, and then a memory descriptor list (or MDL) that points to the allocated buffer is created. We found this way more convenient for us because it saves us from copying the RNDIS message to the send buffer.
To send an RNDIS message using MDL, the above buffer specifies the following values:
● msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT
● send_buf_section_index = -1 (indicates the use of MDL)
● send_buf_section_size = 0 (this parameter is ignored when using MDL)
The MDL itself is attached to the packet object.
At this point, not only can you send arbitrary inputs to the vmswitch, but you know exactly which packets to send and how to send them in order to execute the RNDIS code flow. With this feature, our harness can trigger the n-day vulnerability of vmswitch: CVE-2019-0717.
0x04 Harness connects with Fuzzer
The next step in the process is to integrate our tools into a fuzzing framework, without having to implement it entirely yourself – writing hypercalls, designing mutation engines, decoding coverage traces, etc. There are several options available, but we chose kAFL which seemed to best suit our needs – Fuzzing Kernel Mode Driver.
Our Fuzzer has three levels of virtualization (“LN” for level N). L0 – Bare Metal Server – will run kAFL on Linux’s built-in hypervisor, KVM. We will then create our Hyper-V host (L1) – a Hyper-V enabled virtual machine running Windows 10. On top of our Hyper-V host, there will be two machines (L2) running: the root partition, where vmswitch will execute, and a sub-partition, where we will run our harness and fuzzing vmswitch.
hAFL1 setting -: vmswitch runs inside the root partition (L2), harness runs inside the child partition (L2).
The problem is that kAFL does not support nested virtualization, and our setup is based on nested virtualization – we have a guest OS on a Hyper-V host on top of KVM. With such a setup, kAFL cannot communicate directly with components running in L2. More precisely, this means that vmswitch lacks coverage information and cannot send the fuzz payload (input) from kAFL to our harness.
Therefore, to accommodate kAFL, we have to reset. If we can’t fuzz from L2, then let’s try to fuzz from L1. In practice, this meant that a way had to be found to run vmswitch from within L1 rather than from within the root partition. Then we simply run our tool from the same virtualization level as vmswitch.
Fortunately, we found a neat workaround. It turns out that when the Hyper-V feature is enabled and Intel VTx is disabled, Windows boots in a fallback mode where Hyper-V cannot run, but the vmswitch is still loaded into kernel memory! However, the root and subpartitions do not exist because Hyper-V is not running, so we are left with L1. This is exactly what we want, now we can run the tool on a single Windows VM and call our target function VmsVmNicPvtVersion1HandleRndisSendMessage.
The next problem encountered was the lack of a VMBus channel. When fully operational, vmswitch uses the VMBus channel to communicate with its consumers (netvsc instances). But since Hyper-V is inactive and there are no running VMs, vmswitch has no such VMBus channel to use. We need to find a way to provide vmswitch with a VMBus channel, or have it initialize one by itself.
After a while of reverse engineering, we found a special function in vmswitch called VmsVmNicMorph that does exactly that – it initializes a new VMBus channel for vmswitch. However, simply calling this function results in a blue screen as it attempts to call a VMBus related function, which is not running. We decided to patch all the VMBus logic. Because VMBus is a separate communication layer, it does not interfere with the data being sent. You can think of it as the OSI network layer model: VMBus is the transport layer, independent of vmswitch, the application layer. That is, we can forego executing the VMBus logic and still receive the appropriate VMBus channel object usage for vmswitch.
There is one more problem to be solved. A Windows feature called PatchGuard blocks changes to signed kernel-mode code. So if we want to modify the instructions in vmswitch, PatchGuard must be disabled. To do this, we use an open source tool called EfiGuard, which provides us with the relevant functionality: it disables kernel patch protection and driver signing enforcement, allowing us to run our unsigned harness driver on the machine.
Our problems and solutions in building hAFL1
The current setup is nothing like what we originally envisioned. vmswitch runs directly on the Windows 10 host (not inside the root partition) and our harness driver ( harness.sys ) runs at the same level instead of inside a subpartition. The user-mode harness process receives fuzz payloads from kAFL via hypercalls and uses IOCTL to pass them to our harness driver. To recap – Hyper-V cannot be used because VT-x is disabled. But our fuzzer ran, sending fuzzing input to vmswitch and getting coverage information to drive the fuzzing process forward.
hAFL1 setup: vmswitch runs inside L1 and our harness also runs inside L1.
0x05 Fuzzing improvements
Below is more logic and functionality we added to the Fuzzer framework.
1. Override Boot
kAFL utilizes Intel-PT to track the value of the instruction pointer throughout the fuzzing iteration, and alters the input to increase the number of basic blocks it hits. In order to trace execution only from the context harness of a process, kAFL uses CR3 filtering, it will log execution trace only if the CR3 register value matches the CR3 filter value.
https://software.intel.com/content/www/us/en/develop/blogs/processor-tracing.html https://software.intel.com/content/www/us/en/develop/documentation/debug- extensions-windbg-pt-user-guide/top/commands/commands-for-configuration/ip-filter-configuration.html
But the number of basic blocks accessed is so low that even a single packet should be propagated through more basic blocks than the Fuzzer UI shows.
Analysis found that vmswitch processes packets in an asynchronous, multi-threaded manner. Packets are first synced briefly and then pushed to a queue as work items, waiting to be processed by dedicated system worker threads. Obviously this thread has a different CR3 value than our harness. That’s why the fuzzer doesn’t track execution at all when it originates from a worker thread. To overcome this problem, we disabled CR3 filtering. This doesn’t pollute the trace results, because only we trigger the code in vmswitch.
Finally, to monitor vmswitch coverage, we wrote a Python script to convert Intel-PT data from kAFL format to IDA’s Lighthouse plugin format.
vmswitch coverage visualized using IDA’s Lighthouse plugin
In order to be able to monitor and analyze crashes effectively, it is necessary for Fuzzer to generate detailed crash reports. However, kAFL does not provide much crash information in Windows targets. For example, it doesn’t output the stack trace or exact offset in the target code that triggered the crash, we need to implement this logic ourselves.
We used part of the Xen codebase to get stack traces and module information. Then, two KVM hypercalls are written to send this information from L1 back to kAFL. Finally, we implement and register a special BugCheck callback to invoke these KVM hypercalls.
With these conditions, we were able to get detailed information about each crash that occurred in vmswitch – a full stack trace with function name and offset, as shown in the screenshot below.
Detailed crash report from hAFL1 showing stack trace, function name and offset within it.
3. Structural awareness
For faster fuzzing, we want the fuzzer to generate input that matches the format expected by the target. In our case, these inputs are RNDIS messages.
We defined RNDIS messages using protocol buffers and mutated them using libprotobuf-mutator. In order to integrate our custom protocol buffer based mutation strategy into kAFL, a new state must be created and added to kAFL’s state machine, which is a pipeline. Any fuzz payloads are mutated by kAFL’s built-in mutators through this pipeline.
0x06 Vulnerability Mining
After running hAFL1 for two hours, a critical CVSS 9.9 RCE vulnerability was discovered.
hAFL1 GUI, the interface is the same as kAFL, but can be extended by adding new protocol buffer based mutation strategies.
The vulnerability exists in vmswitch.sys – Hyper-V’s network switch driver. It is triggered by sending specially crafted packets from the guest virtual machine to the Hyper-V host and can be exploited for DoS and RCE.
The vulnerability first appeared in the August 2019 release, indicating that the vulnerability has been in production for over a year and a half. It affects Windows 7, 8.1 and 10 as well as Windows Server 2008, 2012, 2016 and 2019.
Hyper-V is the hypervisor for Azure; therefore, a vulnerability in Hyper-V can lead to a vulnerability in Azure and potentially affect entire regions of the public cloud. Triggering a denial of service from an Azure VM will crash a major part of the Azure infrastructure and shut down all virtual machines sharing the same host.
Through a more complex exploit chain, the vulnerability could grant an attacker remote code execution capabilities, and by taking control of the host and all virtual machines running on it, the attacker could access personal information stored on those machines, run malware, and more.
0x07 Background knowledge
In Hyper-V terminology, the host OS runs in the “root partition” and the guest OS runs in the “child partition”. Hyper-V makes extensive use of paravirtualized devices in order to provide subpartitions with interfaces to hardware devices. With paravirtualization, the VM knows it’s virtual; both the VM and the host use a modified hardware interface, resulting in better performance. One such paravirtualized device is the network switch, which is the target of our research.
Each paravirtualized appliance consists of two components:
1. A Virtualization Service Consumer (VSC) running in the child partition. netvsc.sys is the network VSC.
2. A Virtualization Device Provider (VSP) running in the root partition. vmswitch.sys is the network VSP.
The two components communicate with each other via VMBus, a hypercall-based intrapartition communication protocol.
VSC and VSP run on the root partition (Hyper-V host) and guest partition (guest VM), respectively.
2. Communication protocol
netvsc (network consumer) communicates with vmswitch (provider) over VMBus using NVSP type packets. These packets serve a variety of purposes: to initialize and establish a VMBus channel between the two components, to configure various communication parameters, and to send data to a Hyper-V host or other VM. NVSP includes many different packet types; one of them is NVSP_MSG1_TYPE_SEND_RNDIS_PKT for sending RNDIS packets.
3. RNDIS and OID
RNDIS defines a message protocol between a host and a remote NDIS device by abstracting control and data channels. In a Hyper-V setup, the “host” is the guest VM, the “remote NDIS device” is the vmswitch or external network adapter, and the “abstract communication channel” is the VMBus.
RNDIS also has various message types – init, set, query, reset, halt, etc. When a VM wishes to set or query certain parameters of its network adapter, it sends an OID request to vmswitch – a message identifier (OID) with the associated object and its parameters. Two examples of such OIDs are OID_GEN_MAC_ADDRESS, which sets the adapter’s MAC address, and OID_802_3_MULTICAST_LIST, which sets the adapter’s current multicast address list.
RNDIS setup message structure, from the RNDIS specification. OID is one of the required fields of the packet.
4. Virtual swap extension
vmswitch, Hyper-V’s virtual switch, also known as “Hyper-V Extensible Switch”. Its extensions are NDIS filter drivers or Windows Filtering Platform (WFP) drivers, which run inside the switch and can capture, filter, or forward processed packets. The Hyper-V Extensible Switch has a control path for OID requests, as shown in the following diagram:
Hyper-V extensible switch extensions as part of the switch control path
0x08 Vulnerability Analysis
1. Notorious OIDs
Some OID requests go to external network adapters, or other network adapters connected to the vmswitch. Such OID requests include, for example, Hardware Offload, Internet Protocol Security (IPsec), and Single Root I/O Virtualization (SR-IOV) requests.
When these requests arrive at the vmswitch interface, they are encapsulated and forwarded down the extensible switch control path using a special OID of type OID_SWITCH_NIC_REQUEST. The new OID request is formed as an NDIS_SWITCH_NIC_OID_REQUEST structure whose member OidRequest points to the original OID request. The resulting message controls the path through vmswitch until it reaches its target driver. The process is shown in the figure below.
OID request encapsulation in the Hyper-V extensible switch control path.
Microsoft documented NDIS_SWITCH_NIC_OID_REQUEST structure
2. Vulnerable code
When processing an OID request, vmswitch tracks its contents for logging and debugging; this also applies to OID_SWITCH_NIC_REQUEST. However, due to its encapsulation structure, vmswitch requires special handling of this request and dereferences OidRequest to track internal requests. The flaw is that vmswitch never validates the value of OidRequest, and therefore dereferences an invalid pointer.
The following steps lead to the vulnerable function in vmswitch:
1. The message is first processed by RndisDevHostControlMessageWorkerRoutine – a general RNDIS message processing function.
2. vmswitch recognizes the set request and passes the message to a more specific handler – RndisDevHostHandleSetMessage.
3. Later, the message is passed to VmsIfrInfoParamsNdisOidRequestBuffer. This function is responsible for tracing the message parameter using IFR (trace logger), a Windows tracing feature that logs binary messages in real time.
4. Finally, the packet arrives at VmsIfrInfoParams_OID_SWITCH_NIC_REQUEST, which specifically tracks requests of type OID_SWITCH_NIC_REQUEST and their respective structure NDIS_SWITCH_NIC_OID_REQUEST.
The chain of function calls that caused the bug, handling the tracking of RNDIS request messages for a specific OID.
3. Realize utilization
netvsc, Network Virtual Service Consumer (vsc). OID requests with OID_SWITCH_NIC_REQUEST are not sent. Still, a design flaw can cause vmswitch to accept and process such a request even if it comes from a guest VM. This allows us to trigger an arbitrary pointer dereference vulnerability in the tracking mechanism by sending an RNDIS setup message with OID_SWITCH_NIC_REQUEST directly from the guest VM.
This vulnerability can serve as the basis for two exploitation scenarios. If the OidRequest member contains an invalid pointer, the Hyper-V host will simply crash. Another option is to have the host’s kernel read from memory-mapped device registers, further enabling code execution. RCE on a Hyper-V host would enable attackers to read sensitive information at will, run malicious payloads with elevated privileges, and more.
0x09 Research Summary
The vulnerability is due to a hypervisor arbitrary pointer dereference and a design flaw that allows an overly permissive communication channel between the guest and the host.
Vulnerabilities such as CVE-2021-28476 demonstrate the risks posed by shared resource models such as public cloud. In fact, with shared infrastructure, even simple mistakes can lead to devastating results such as denial of service and remote code execution.
Vulnerabilities in software are inevitable, and this saying also applies to public cloud infrastructure. This reinforces the importance of a hybrid cloud strategy that doesn’t put all your eggs in one basket or all instances in one region. This approach will help with quick recovery from DoS attacks, and proper segmentation will prevent the cluster from being taken over after some machines are compromised.
The Links: 6DI85A-060 SKIIP26AC126V1 IGBTCOMPANY