Now, both Intel and AMD have added a set of instructions that makes virtualization considerably easier for x86. AMD introduced AMD-V,formerly known as Pacifica, whereas Intel’s extensions are known simply as (Intel) Virtualization Technology (IVT or VT). The idea behind these is to extend the x86 ISA to make up for the shortcomings in the existing instruction set. Conceptually, they can be thought of as adding a “ring -1” above ring 0, allowing the OS to stay where it expects to be and catching attempts to access the hardware directly. In implementation, more than one ring is added, but the important thing is that there is an extra privilege mode where a hypervisor can trap and emulate operations that would previously have silently failed.
IVT adds a new mode to the processor, called VMX. A hypervisor can run in VMX mode and be invisible to the operating system, running in ring 0. When the CPU is in VMX mode, it looks normal from the perspective of an unmodified OS. All instructions do what they would be expected to, from the perspective of the guest, and there are no unexpected failures as long as the hypervisor correctly performs the emulation. A set of extra instructions is added that can be used by a process in VMX root mode. These instructions do things like allocating a memory page on which to store a full copy of the CPU state, start, and stop a VM. Finally, a set of bitmaps is defined indicating whether a particular interrupt, instruction, or exception should be passed to the virtual machine’s OS running in ring 0 or by the hypervisor running in VMX root mode.
In addition to the features of Intel’s VT4, AMD’s Pacifica provides a few extra things linked to the x86-64 extensions and to the Opteron architecture. Current Opterons have an on-die memory controller. Because of the tight integrationbetween the memory controller and the CPU, it is possible for the hypervisor to delegate some of the partitioning to the memory controller.
Using AMD-V, there are two ways in which the hypervisor can handle memory partitioning. In fact, two modes are provided. The first, Shadow Page Tables, allows the hypervisor to trap whenever the guest OS attempts to modify its page tables and change the mapping itself. This is done, in simple terms, by marking the page tables as read only, and catching the resulting fault to the hypervisor, instead of the guest operating system kernel. The second mode is a little more complicated. Nested Page Tables allow a lot of this to be done in hardware.
Nested page tables do exactly what their name implies; they add another layer of indirection to virtual memory. The MMU already handles virtual to physical translations as defined by the OS. Now, these “physical” addresses are translated to real physical addresses using another set of page tables defined by the hypervisor. Because the translation is done in hardware, it is almost as fast as normal virtual memory lookups.
The other additional feature of Pacifica is that it specifies a Device Exclusion Vector interface. This masks the addresses that a device is allowed to write to, so a device can only write to a specific guest’s address space.