mirror of https://gitee.com/openkylin/linux.git
228 lines
9.8 KiB
ReStructuredText
228 lines
9.8 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0+
|
||
|
||
======================================================
|
||
IBM Virtual Management Channel Kernel Driver (IBMVMC)
|
||
======================================================
|
||
|
||
:Authors:
|
||
Dave Engebretsen <engebret@us.ibm.com>,
|
||
Adam Reznechek <adreznec@linux.vnet.ibm.com>,
|
||
Steven Royer <seroyer@linux.vnet.ibm.com>,
|
||
Bryant G. Ly <bryantly@linux.vnet.ibm.com>,
|
||
|
||
Introduction
|
||
============
|
||
|
||
Note: Knowledge of virtualization technology is required to understand
|
||
this document.
|
||
|
||
A good reference document would be:
|
||
|
||
https://openpowerfoundation.org/wp-content/uploads/2016/05/LoPAPR_DRAFT_v11_24March2016_cmt1.pdf
|
||
|
||
The Virtual Management Channel (VMC) is a logical device which provides an
|
||
interface between the hypervisor and a management partition. This interface
|
||
is like a message passing interface. This management partition is intended
|
||
to provide an alternative to systems that use a Hardware Management
|
||
Console (HMC) - based system management.
|
||
|
||
The primary hardware management solution that is developed by IBM relies
|
||
on an appliance server named the Hardware Management Console (HMC),
|
||
packaged as an external tower or rack-mounted personal computer. In a
|
||
Power Systems environment, a single HMC can manage multiple POWER
|
||
processor-based systems.
|
||
|
||
Management Application
|
||
----------------------
|
||
|
||
In the management partition, a management application exists which enables
|
||
a system administrator to configure the system’s partitioning
|
||
characteristics via a command line interface (CLI) or Representational
|
||
State Transfer Application (REST API's).
|
||
|
||
The management application runs on a Linux logical partition on a
|
||
POWER8 or newer processor-based server that is virtualized by PowerVM.
|
||
System configuration, maintenance, and control functions which
|
||
traditionally require an HMC can be implemented in the management
|
||
application using a combination of HMC to hypervisor interfaces and
|
||
existing operating system methods. This tool provides a subset of the
|
||
functions implemented by the HMC and enables basic partition configuration.
|
||
The set of HMC to hypervisor messages supported by the management
|
||
application component are passed to the hypervisor over a VMC interface,
|
||
which is defined below.
|
||
|
||
The VMC enables the management partition to provide basic partitioning
|
||
functions:
|
||
|
||
- Logical Partitioning Configuration
|
||
- Start, and stop actions for individual partitions
|
||
- Display of partition status
|
||
- Management of virtual Ethernet
|
||
- Management of virtual Storage
|
||
- Basic system management
|
||
|
||
Virtual Management Channel (VMC)
|
||
--------------------------------
|
||
|
||
A logical device, called the Virtual Management Channel (VMC), is defined
|
||
for communicating between the management application and the hypervisor. It
|
||
basically creates the pipes that enable virtualization management
|
||
software. This device is presented to a designated management partition as
|
||
a virtual device.
|
||
|
||
This communication device uses Command/Response Queue (CRQ) and the
|
||
Remote Direct Memory Access (RDMA) interfaces. A three-way handshake is
|
||
defined that must take place to establish that both the hypervisor and
|
||
management partition sides of the channel are running prior to
|
||
sending/receiving any of the protocol messages.
|
||
|
||
This driver also utilizes Transport Event CRQs. CRQ messages are sent
|
||
when the hypervisor detects one of the peer partitions has abnormally
|
||
terminated, or one side has called H_FREE_CRQ to close their CRQ.
|
||
Two new classes of CRQ messages are introduced for the VMC device. VMC
|
||
Administrative messages are used for each partition using the VMC to
|
||
communicate capabilities to their partner. HMC Interface messages are used
|
||
for the actual flow of HMC messages between the management partition and
|
||
the hypervisor. As most HMC messages far exceed the size of a CRQ buffer,
|
||
a virtual DMA (RMDA) of the HMC message data is done prior to each HMC
|
||
Interface CRQ message. Only the management partition drives RDMA
|
||
operations; hypervisors never directly cause the movement of message data.
|
||
|
||
|
||
Terminology
|
||
-----------
|
||
RDMA
|
||
Remote Direct Memory Access is DMA transfer from the server to its
|
||
client or from the server to its partner partition. DMA refers
|
||
to both physical I/O to and from memory operations and to memory
|
||
to memory move operations.
|
||
CRQ
|
||
Command/Response Queue a facility which is used to communicate
|
||
between partner partitions. Transport events which are signaled
|
||
from the hypervisor to partition are also reported in this queue.
|
||
|
||
Example Management Partition VMC Driver Interface
|
||
=================================================
|
||
|
||
This section provides an example for the management application
|
||
implementation where a device driver is used to interface to the VMC
|
||
device. This driver consists of a new device, for example /dev/ibmvmc,
|
||
which provides interfaces to open, close, read, write, and perform
|
||
ioctl’s against the VMC device.
|
||
|
||
VMC Interface Initialization
|
||
----------------------------
|
||
|
||
The device driver is responsible for initializing the VMC when the driver
|
||
is loaded. It first creates and initializes the CRQ. Next, an exchange of
|
||
VMC capabilities is performed to indicate the code version and number of
|
||
resources available in both the management partition and the hypervisor.
|
||
Finally, the hypervisor requests that the management partition create an
|
||
initial pool of VMC buffers, one buffer for each possible HMC connection,
|
||
which will be used for management application session initialization.
|
||
Prior to completion of this initialization sequence, the device returns
|
||
EBUSY to open() calls. EIO is returned for all open() failures.
|
||
|
||
::
|
||
|
||
Management Partition Hypervisor
|
||
CRQ INIT
|
||
---------------------------------------->
|
||
CRQ INIT COMPLETE
|
||
<----------------------------------------
|
||
CAPABILITIES
|
||
---------------------------------------->
|
||
CAPABILITIES RESPONSE
|
||
<----------------------------------------
|
||
ADD BUFFER (HMC IDX=0,1,..) _
|
||
<---------------------------------------- |
|
||
ADD BUFFER RESPONSE | - Perform # HMCs Iterations
|
||
----------------------------------------> -
|
||
|
||
VMC Interface Open
|
||
------------------
|
||
|
||
After the basic VMC channel has been initialized, an HMC session level
|
||
connection can be established. The application layer performs an open() to
|
||
the VMC device and executes an ioctl() against it, indicating the HMC ID
|
||
(32 bytes of data) for this session. If the VMC device is in an invalid
|
||
state, EIO will be returned for the ioctl(). The device driver creates a
|
||
new HMC session value (ranging from 1 to 255) and HMC index value (starting
|
||
at index 0 and ranging to 254) for this HMC ID. The driver then does an
|
||
RDMA of the HMC ID to the hypervisor, and then sends an Interface Open
|
||
message to the hypervisor to establish the session over the VMC. After the
|
||
hypervisor receives this information, it sends Add Buffer messages to the
|
||
management partition to seed an initial pool of buffers for the new HMC
|
||
connection. Finally, the hypervisor sends an Interface Open Response
|
||
message, to indicate that it is ready for normal runtime messaging. The
|
||
following illustrates this VMC flow:
|
||
|
||
::
|
||
|
||
Management Partition Hypervisor
|
||
RDMA HMC ID
|
||
---------------------------------------->
|
||
Interface Open
|
||
---------------------------------------->
|
||
Add Buffer _
|
||
<---------------------------------------- |
|
||
Add Buffer Response | - Perform N Iterations
|
||
----------------------------------------> -
|
||
Interface Open Response
|
||
<----------------------------------------
|
||
|
||
VMC Interface Runtime
|
||
---------------------
|
||
|
||
During normal runtime, the management application and the hypervisor
|
||
exchange HMC messages via the Signal VMC message and RDMA operations. When
|
||
sending data to the hypervisor, the management application performs a
|
||
write() to the VMC device, and the driver RDMA’s the data to the hypervisor
|
||
and then sends a Signal Message. If a write() is attempted before VMC
|
||
device buffers have been made available by the hypervisor, or no buffers
|
||
are currently available, EBUSY is returned in response to the write(). A
|
||
write() will return EIO for all other errors, such as an invalid device
|
||
state. When the hypervisor sends a message to the management, the data is
|
||
put into a VMC buffer and an Signal Message is sent to the VMC driver in
|
||
the management partition. The driver RDMA’s the buffer into the partition
|
||
and passes the data up to the appropriate management application via a
|
||
read() to the VMC device. The read() request blocks if there is no buffer
|
||
available to read. The management application may use select() to wait for
|
||
the VMC device to become ready with data to read.
|
||
|
||
::
|
||
|
||
Management Partition Hypervisor
|
||
MSG RDMA
|
||
---------------------------------------->
|
||
SIGNAL MSG
|
||
---------------------------------------->
|
||
SIGNAL MSG
|
||
<----------------------------------------
|
||
MSG RDMA
|
||
<----------------------------------------
|
||
|
||
VMC Interface Close
|
||
-------------------
|
||
|
||
HMC session level connections are closed by the management partition when
|
||
the application layer performs a close() against the device. This action
|
||
results in an Interface Close message flowing to the hypervisor, which
|
||
causes the session to be terminated. The device driver must free any
|
||
storage allocated for buffers for this HMC connection.
|
||
|
||
::
|
||
|
||
Management Partition Hypervisor
|
||
INTERFACE CLOSE
|
||
---------------------------------------->
|
||
INTERFACE CLOSE RESPONSE
|
||
<----------------------------------------
|
||
|
||
Additional Information
|
||
======================
|
||
|
||
For more information on the documentation for CRQ Messages, VMC Messages,
|
||
HMC interface Buffers, and signal messages please refer to the Linux on
|
||
Power Architecture Platform Reference. Section F.
|