diff --git a/docs/kbase.html.in b/docs/kbase.html.in index c586e0f676..e663ca525f 100644 --- a/docs/kbase.html.in +++ b/docs/kbase.html.in @@ -36,6 +36,9 @@
Virtio-FS
Share a filesystem between the guest and the host
+ +
KVM real time
+
Run real time workloads in guests on a KVM hypervisor
diff --git a/docs/kbase/kvm-realtime.rst b/docs/kbase/kvm-realtime.rst new file mode 100644 index 0000000000..2e1541cb4d --- /dev/null +++ b/docs/kbase/kvm-realtime.rst @@ -0,0 +1,214 @@ +========================== +KVM Real Time Guest Config +========================== + +.. contents:: + +The KVM hypervisor is capable of running real time guest workloads. This page +describes the key pieces of configuration required in the domain XML to achieve +the low latency needs of real time workloads. + +For the most part, configuration of the host OS is out of scope of this +documentation. Refer to the operating system vendor's guidance on configuring +the host OS and hardware for real time. Note in particular that the default +kernel used by most Linux distros is not suitable for low latency real time and +must be replaced by a special kernel build. + + +Host partitioning plan +====================== + +Running real time workloads requires carefully partitioning up the host OS +resources, such that the KVM / QEMU processes are strictly separated from any +other workload running on the host, both userspace processes and kernel threads. + +As such, some subset of host CPUs need to be reserved exclusively for running +KVM guests. This requires that the host kernel be booted using the ``isolcpus`` +kernel command line parameter. This parameter removes a set of CPUs from the +scheduler, such that that no kernel threads or userspace processes will ever get +placed on those CPUs automatically. KVM guests are then manually placed onto +these CPUs. + +Deciding which host CPUs to reserve for real time requires understanding of the +guest workload needs and balancing with the host OS needs. The trade off will +also vary based on the physical hardware available. + +For the sake of illustration, this guide will assume a physical machine with two +NUMA nodes, each with 2 sockets and 4 cores per socket, giving a total of 16 +CPUs on the host. Furthermore, it is assumed that hyperthreading is either not +supported or has been disabled in the BIOS, since it is incompatible with real +time. Each NUMA node is assumed to have 32 GB of RAM, giving 64 GB total for +the host. + +It is assumed that 2 CPUs in each NUMA node are reserved for the host OS, with +the remaining 6 CPUs available for KVM real time. With this in mind, the host +kernel should have booted with ``isolcpus=2-7,10-15`` to reserve CPUs. + +To maximise efficiency of page table lookups for the guest, the host needs to be +configured with most RAM exposed as huge pages, ideally 1 GB sized. 6 GB of RAM +in each NUMA node will be reserved for general host OS usage as normal sized +pages, leaving 26 GB for KVM usage as huge pages. + +Once huge pages are reserved on the hypothetical machine, the ``virsh +capabilities`` command output is expected to look approximately like: + +:: + + + + + 33554432 + 1572864 + 0 + 26 + + + + + + + + + + + + + + + + + 33554432 + 1572864 + 0 + 26 + + + + + + + + + + + + + + + + + + +Be aware that CPU ID numbers are not always allocated sequentially as shown +here. It is not unusual to see IDs interleaved between sockets on the two NUMA +nodes, such that ``0-3,8-11`` are on the first node and ``4-7,12-15`` are on +the second node. Carefully check the ``virsh capabilities`` output to determine +the CPU ID numbers when configiring both ``isolcpus`` and the guest ``cpuset`` +values. + +Guest configuration +=================== + +What follows is an overview of the key parts of the domain XML that need to be +configured to achieve low latency for real time workflows. The following example +will assume a 4 CPU guest, requiring 16 GB of RAM. It is intended to be placed +on the second host NUMA node. + +CPU configuration +----------------- + +Real time KVM guests intended to run Linux should have a minimum of 2 CPUs. +One vCPU is for running non-real time processes and performing I/O. The other +vCPUs will run real time applications. Some non-Linux OS may not require a +special non-real time CPU to be available, in which case the 2 CPU minimum would +not apply. + +Each guest CPU, even the non-real time one, needs to be pinned to a dedicated +host core that is in the `isolcpus` reserved set. The QEMU emulator threads +need to be pinned to host CPUs that are not in the `isolcpus` reserved set. +The vCPUs need to be given a real time CPU scheduler policy. + +When configuring the `guest CPU count <../formatdomain.html#elementsCPUAllocation>`_, +do not include any CPU affinity at this stage: + +:: + + 4 + +The guest CPUs now need to be placed individually. In this case, they will all +be put within the same host socket, such that they can be exposed as core +siblings. This is achieved using the `CPU tunning config <../formatdomain.html#elementsCPUTuning>`_: + +:: + + + + + + + + + + +The `guest CPU model `_ now needs to be +configured to pass through the host model unchanged, with topology matching the +placement: + +:: + + + + + + +The performance monitoring unit virtualization needs to be disabled +via the `hypervisor features <../formatdomain.html#elementsFeatures>`_: + +:: + + + ... + + + + +Memory configuration +-------------------- + +The host memory used for guest RAM needs to be allocated from huge pages on the +second NUMA node, and all other memory allocation needs to be locked into RAM +with memory page sharing disabled. +This is achieved by using the `memory backing config `_: + +:: + + + + + + + + + + +Device configuration +-------------------- + +Libvirt adds a few devices by default to maintain historical QEMU configuration +behaviour. It is unlikely these devices are required by real time guests, so it +is wise to disable them. Remove all USB controllers that may exist in the XML +config and replace them with: + +:: + + + +Similarly the memory balloon config should be changed to + +:: + + + +If the guest had a graphical console at installation time this can also be +disabled, with remote access being over SSH, with a minimal serial console +for emergencies.