mirror of https://gitee.com/openkylin/linux.git
IB/opa-vnic: Virtual Network Interface Controller (VNIC) documentation
Add OPA VNIC design document explaining the VNIC architecture and the driver design. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
This commit is contained in:
parent
23790ba2d7
commit
c73690ca16
|
@ -0,0 +1,153 @@
|
|||
Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature
|
||||
supports Ethernet functionality over Omni-Path fabric by encapsulating
|
||||
the Ethernet packets between HFI nodes.
|
||||
|
||||
Architecture
|
||||
=============
|
||||
The patterns of exchanges of Omni-Path encapsulated Ethernet packets
|
||||
involves one or more virtual Ethernet switches overlaid on the Omni-Path
|
||||
fabric topology. A subset of HFI nodes on the Omni-Path fabric are
|
||||
permitted to exchange encapsulated Ethernet packets across a particular
|
||||
virtual Ethernet switch. The virtual Ethernet switches are logical
|
||||
abstractions achieved by configuring the HFI nodes on the fabric for
|
||||
header generation and processing. In the simplest configuration all HFI
|
||||
nodes across the fabric exchange encapsulated Ethernet packets over a
|
||||
single virtual Ethernet switch. A virtual Ethernet switch, is effectively
|
||||
an independent Ethernet network. The configuration is performed by an
|
||||
Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM)
|
||||
application. HFI nodes can have multiple VNICs each connected to a
|
||||
different virtual Ethernet switch. The below diagram presents a case
|
||||
of two virtual Ethernet switches with two HFI nodes.
|
||||
|
||||
+-------------------+
|
||||
| Subnet/ |
|
||||
| Ethernet |
|
||||
| Manager |
|
||||
+-------------------+
|
||||
/ /
|
||||
/ /
|
||||
/ /
|
||||
/ /
|
||||
+-----------------------------+ +------------------------------+
|
||||
| Virtual Ethernet Switch | | Virtual Ethernet Switch |
|
||||
| +---------+ +---------+ | | +---------+ +---------+ |
|
||||
| | VPORT | | VPORT | | | | VPORT | | VPORT | |
|
||||
+--+---------+----+---------+-+ +-+---------+----+---------+---+
|
||||
| \ / |
|
||||
| \ / |
|
||||
| \/ |
|
||||
| / \ |
|
||||
| / \ |
|
||||
+-----------+------------+ +-----------+------------+
|
||||
| VNIC | VNIC | | VNIC | VNIC |
|
||||
+-----------+------------+ +-----------+------------+
|
||||
| HFI | | HFI |
|
||||
+------------------------+ +------------------------+
|
||||
|
||||
|
||||
The Omni-Path encapsulated Ethernet packet format is as described below.
|
||||
|
||||
Bits Field
|
||||
------------------------------------
|
||||
Quad Word 0:
|
||||
0-19 SLID (lower 20 bits)
|
||||
20-30 Length (in Quad Words)
|
||||
31 BECN bit
|
||||
32-51 DLID (lower 20 bits)
|
||||
52-56 SC (Service Class)
|
||||
57-59 RC (Routing Control)
|
||||
60 FECN bit
|
||||
61-62 L2 (=10, 16B format)
|
||||
63 LT (=1, Link Transfer Head Flit)
|
||||
|
||||
Quad Word 1:
|
||||
0-7 L4 type (=0x78 ETHERNET)
|
||||
8-11 SLID[23:20]
|
||||
12-15 DLID[23:20]
|
||||
16-31 PKEY
|
||||
32-47 Entropy
|
||||
48-63 Reserved
|
||||
|
||||
Quad Word 2:
|
||||
0-15 Reserved
|
||||
16-31 L4 header
|
||||
32-63 Ethernet Packet
|
||||
|
||||
Quad Words 3 to N-1:
|
||||
0-63 Ethernet packet (pad extended)
|
||||
|
||||
Quad Word N (last):
|
||||
0-23 Ethernet packet (pad extended)
|
||||
24-55 ICRC
|
||||
56-61 Tail
|
||||
62-63 LT (=01, Link Transfer Tail Flit)
|
||||
|
||||
Ethernet packet is padded on the transmit side to ensure that the VNIC OPA
|
||||
packet is quad word aligned. The 'Tail' field contains the number of bytes
|
||||
padded. On the receive side the 'Tail' field is read and the padding is
|
||||
removed (along with ICRC, Tail and OPA header) before passing packet up
|
||||
the network stack.
|
||||
|
||||
The L4 header field contains the virtual Ethernet switch id the VNIC port
|
||||
belongs to. On the receive side, this field is used to de-multiplex the
|
||||
received VNIC packets to different VNIC ports.
|
||||
|
||||
Driver Design
|
||||
==============
|
||||
Intel OPA VNIC software design is presented in the below diagram.
|
||||
OPA VNIC functionality has a HW dependent component and a HW
|
||||
independent component.
|
||||
|
||||
The support has been added for IB device to allocate and free the RDMA
|
||||
netdev devices. The RDMA netdev supports interfacing with the network
|
||||
stack thus creating standard network interfaces. OPA_VNIC is an RDMA
|
||||
netdev device type.
|
||||
|
||||
The HW dependent VNIC functionality is part of the HFI1 driver. It
|
||||
implements the verbs to allocate and free the OPA_VNIC RDMA netdev.
|
||||
It involves HW resource allocation/management for VNIC functionality.
|
||||
It interfaces with the network stack and implements the required
|
||||
net_device_ops functions. It expects Omni-Path encapsulated Ethernet
|
||||
packets in the transmit path and provides HW access to them. It strips
|
||||
the Omni-Path header from the received packets before passing them up
|
||||
the network stack. It also implements the RDMA netdev control operations.
|
||||
|
||||
The OPA VNIC module implements the HW independent VNIC functionality.
|
||||
It consists of two parts. The VNIC Ethernet Management Agent (VEMA)
|
||||
registers itself with IB core as an IB client and interfaces with the
|
||||
IB MAD stack. It exchanges the management information with the Ethernet
|
||||
Manager (EM) and the VNIC netdev. The VNIC netdev part allocates and frees
|
||||
the OPA_VNIC RDMA netdev devices. It overrides the net_device_ops functions
|
||||
set by HW dependent VNIC driver where required to accommodate any control
|
||||
operation. It also handles the encapsulation of Ethernet packets with an
|
||||
Omni-Path header in the transmit path. For each VNIC interface, the
|
||||
information required for encapsulation is configured by the EM via VEMA MAD
|
||||
interface. It also passes any control information to the HW dependent driver
|
||||
by invoking the RDMA netdev control operations.
|
||||
|
||||
+-------------------+ +----------------------+
|
||||
| | | Linux |
|
||||
| IB MAD | | Network |
|
||||
| | | Stack |
|
||||
+-------------------+ +----------------------+
|
||||
| | |
|
||||
| | |
|
||||
+----------------------------+ |
|
||||
| | |
|
||||
| OPA VNIC Module | |
|
||||
| (OPA VNIC RDMA Netdev | |
|
||||
| & EMA functions) | |
|
||||
| | |
|
||||
+----------------------------+ |
|
||||
| |
|
||||
| |
|
||||
+------------------+ |
|
||||
| IB core | |
|
||||
+------------------+ |
|
||||
| |
|
||||
| |
|
||||
+--------------------------------------------+
|
||||
| |
|
||||
| HFI1 Driver with VNIC support |
|
||||
| |
|
||||
+--------------------------------------------+
|
Loading…
Reference in New Issue