mirror of https://gitee.com/openkylin/linux.git
docs: infiniband: convert docs to ReST and rename to *.rst
The InfiniBand docs are plain text with no markups. So, all we needed to do were to add the title markups and some markup sequences in order to properly parse tables, lists and literal blocks. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This commit is contained in:
parent
b417c0879d
commit
97162a1ee8
|
@ -1,4 +1,6 @@
|
||||||
INFINIBAND MIDLAYER LOCKING
|
===========================
|
||||||
|
InfiniBand Midlayer Locking
|
||||||
|
===========================
|
||||||
|
|
||||||
This guide is an attempt to make explicit the locking assumptions
|
This guide is an attempt to make explicit the locking assumptions
|
||||||
made by the InfiniBand midlayer. It describes the requirements on
|
made by the InfiniBand midlayer. It describes the requirements on
|
||||||
|
@ -6,45 +8,47 @@ INFINIBAND MIDLAYER LOCKING
|
||||||
protocols that use the midlayer.
|
protocols that use the midlayer.
|
||||||
|
|
||||||
Sleeping and interrupt context
|
Sleeping and interrupt context
|
||||||
|
==============================
|
||||||
|
|
||||||
With the following exceptions, a low-level driver implementation of
|
With the following exceptions, a low-level driver implementation of
|
||||||
all of the methods in struct ib_device may sleep. The exceptions
|
all of the methods in struct ib_device may sleep. The exceptions
|
||||||
are any methods from the list:
|
are any methods from the list:
|
||||||
|
|
||||||
create_ah
|
- create_ah
|
||||||
modify_ah
|
- modify_ah
|
||||||
query_ah
|
- query_ah
|
||||||
destroy_ah
|
- destroy_ah
|
||||||
post_send
|
- post_send
|
||||||
post_recv
|
- post_recv
|
||||||
poll_cq
|
- poll_cq
|
||||||
req_notify_cq
|
- req_notify_cq
|
||||||
map_phys_fmr
|
- map_phys_fmr
|
||||||
|
|
||||||
which may not sleep and must be callable from any context.
|
which may not sleep and must be callable from any context.
|
||||||
|
|
||||||
The corresponding functions exported to upper level protocol
|
The corresponding functions exported to upper level protocol
|
||||||
consumers:
|
consumers:
|
||||||
|
|
||||||
ib_create_ah
|
- ib_create_ah
|
||||||
ib_modify_ah
|
- ib_modify_ah
|
||||||
ib_query_ah
|
- ib_query_ah
|
||||||
ib_destroy_ah
|
- ib_destroy_ah
|
||||||
ib_post_send
|
- ib_post_send
|
||||||
ib_post_recv
|
- ib_post_recv
|
||||||
ib_req_notify_cq
|
- ib_req_notify_cq
|
||||||
ib_map_phys_fmr
|
- ib_map_phys_fmr
|
||||||
|
|
||||||
are therefore safe to call from any context.
|
are therefore safe to call from any context.
|
||||||
|
|
||||||
In addition, the function
|
In addition, the function
|
||||||
|
|
||||||
ib_dispatch_event
|
- ib_dispatch_event
|
||||||
|
|
||||||
used by low-level drivers to dispatch asynchronous events through
|
used by low-level drivers to dispatch asynchronous events through
|
||||||
the midlayer is also safe to call from any context.
|
the midlayer is also safe to call from any context.
|
||||||
|
|
||||||
Reentrancy
|
Reentrancy
|
||||||
|
----------
|
||||||
|
|
||||||
All of the methods in struct ib_device exported by a low-level
|
All of the methods in struct ib_device exported by a low-level
|
||||||
driver must be fully reentrant. The low-level driver is required to
|
driver must be fully reentrant. The low-level driver is required to
|
||||||
|
@ -62,6 +66,7 @@ Reentrancy
|
||||||
information between different calls of ib_poll_cq() is not defined.
|
information between different calls of ib_poll_cq() is not defined.
|
||||||
|
|
||||||
Callbacks
|
Callbacks
|
||||||
|
---------
|
||||||
|
|
||||||
A low-level driver must not perform a callback directly from the
|
A low-level driver must not perform a callback directly from the
|
||||||
same callchain as an ib_device method call. For example, it is not
|
same callchain as an ib_device method call. For example, it is not
|
||||||
|
@ -74,18 +79,18 @@ Callbacks
|
||||||
completion event handlers for the same CQ are not called
|
completion event handlers for the same CQ are not called
|
||||||
simultaneously. The driver must guarantee that only one CQ event
|
simultaneously. The driver must guarantee that only one CQ event
|
||||||
handler for a given CQ is running at a time. In other words, the
|
handler for a given CQ is running at a time. In other words, the
|
||||||
following situation is not allowed:
|
following situation is not allowed::
|
||||||
|
|
||||||
CPU1 CPU2
|
CPU1 CPU2
|
||||||
|
|
||||||
low-level driver ->
|
low-level driver ->
|
||||||
consumer CQ event callback:
|
consumer CQ event callback:
|
||||||
/* ... */
|
/* ... */
|
||||||
ib_req_notify_cq(cq, ...);
|
ib_req_notify_cq(cq, ...);
|
||||||
low-level driver ->
|
low-level driver ->
|
||||||
/* ... */ consumer CQ event callback:
|
/* ... */ consumer CQ event callback:
|
||||||
/* ... */
|
/* ... */
|
||||||
return from CQ event handler
|
return from CQ event handler
|
||||||
|
|
||||||
The context in which completion event and asynchronous event
|
The context in which completion event and asynchronous event
|
||||||
callbacks run is not defined. Depending on the low-level driver, it
|
callbacks run is not defined. Depending on the low-level driver, it
|
||||||
|
@ -93,6 +98,7 @@ Callbacks
|
||||||
Upper level protocol consumers may not sleep in a callback.
|
Upper level protocol consumers may not sleep in a callback.
|
||||||
|
|
||||||
Hot-plug
|
Hot-plug
|
||||||
|
--------
|
||||||
|
|
||||||
A low-level driver announces that a device is ready for use by
|
A low-level driver announces that a device is ready for use by
|
||||||
consumers when it calls ib_register_device(), all initialization
|
consumers when it calls ib_register_device(), all initialization
|
|
@ -0,0 +1,23 @@
|
||||||
|
:orphan:
|
||||||
|
|
||||||
|
==========
|
||||||
|
InfiniBand
|
||||||
|
==========
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:maxdepth: 1
|
||||||
|
|
||||||
|
core_locking
|
||||||
|
ipoib
|
||||||
|
opa_vnic
|
||||||
|
sysfs
|
||||||
|
tag_matching
|
||||||
|
user_mad
|
||||||
|
user_verbs
|
||||||
|
|
||||||
|
.. only:: subproject and html
|
||||||
|
|
||||||
|
Indices
|
||||||
|
=======
|
||||||
|
|
||||||
|
* :ref:`genindex`
|
|
@ -1,4 +1,6 @@
|
||||||
IP OVER INFINIBAND
|
==================
|
||||||
|
IP over InfiniBand
|
||||||
|
==================
|
||||||
|
|
||||||
The ib_ipoib driver is an implementation of the IP over InfiniBand
|
The ib_ipoib driver is an implementation of the IP over InfiniBand
|
||||||
protocol as specified by RFC 4391 and 4392, issued by the IETF ipoib
|
protocol as specified by RFC 4391 and 4392, issued by the IETF ipoib
|
||||||
|
@ -8,16 +10,17 @@ IP OVER INFINIBAND
|
||||||
masqueraded to the kernel as ethernet interfaces).
|
masqueraded to the kernel as ethernet interfaces).
|
||||||
|
|
||||||
Partitions and P_Keys
|
Partitions and P_Keys
|
||||||
|
=====================
|
||||||
|
|
||||||
When the IPoIB driver is loaded, it creates one interface for each
|
When the IPoIB driver is loaded, it creates one interface for each
|
||||||
port using the P_Key at index 0. To create an interface with a
|
port using the P_Key at index 0. To create an interface with a
|
||||||
different P_Key, write the desired P_Key into the main interface's
|
different P_Key, write the desired P_Key into the main interface's
|
||||||
/sys/class/net/<intf name>/create_child file. For example:
|
/sys/class/net/<intf name>/create_child file. For example::
|
||||||
|
|
||||||
echo 0x8001 > /sys/class/net/ib0/create_child
|
echo 0x8001 > /sys/class/net/ib0/create_child
|
||||||
|
|
||||||
This will create an interface named ib0.8001 with P_Key 0x8001. To
|
This will create an interface named ib0.8001 with P_Key 0x8001. To
|
||||||
remove a subinterface, use the "delete_child" file:
|
remove a subinterface, use the "delete_child" file::
|
||||||
|
|
||||||
echo 0x8001 > /sys/class/net/ib0/delete_child
|
echo 0x8001 > /sys/class/net/ib0/delete_child
|
||||||
|
|
||||||
|
@ -28,6 +31,7 @@ Partitions and P_Keys
|
||||||
rtnl_link_ops, where children created using either way behave the same.
|
rtnl_link_ops, where children created using either way behave the same.
|
||||||
|
|
||||||
Datagram vs Connected modes
|
Datagram vs Connected modes
|
||||||
|
===========================
|
||||||
|
|
||||||
The IPoIB driver supports two modes of operation: datagram and
|
The IPoIB driver supports two modes of operation: datagram and
|
||||||
connected. The mode is set and read through an interface's
|
connected. The mode is set and read through an interface's
|
||||||
|
@ -51,6 +55,7 @@ Datagram vs Connected modes
|
||||||
networking stack to use the smaller UD MTU for these neighbours.
|
networking stack to use the smaller UD MTU for these neighbours.
|
||||||
|
|
||||||
Stateless offloads
|
Stateless offloads
|
||||||
|
==================
|
||||||
|
|
||||||
If the IB HW supports IPoIB stateless offloads, IPoIB advertises
|
If the IB HW supports IPoIB stateless offloads, IPoIB advertises
|
||||||
TCP/IP checksum and/or Large Send (LSO) offloading capability to the
|
TCP/IP checksum and/or Large Send (LSO) offloading capability to the
|
||||||
|
@ -60,9 +65,10 @@ Stateless offloads
|
||||||
on/off using ethtool calls. Currently LRO is supported only for
|
on/off using ethtool calls. Currently LRO is supported only for
|
||||||
checksum offload capable devices.
|
checksum offload capable devices.
|
||||||
|
|
||||||
Stateless offloads are supported only in datagram mode.
|
Stateless offloads are supported only in datagram mode.
|
||||||
|
|
||||||
Interrupt moderation
|
Interrupt moderation
|
||||||
|
====================
|
||||||
|
|
||||||
If the underlying IB device supports CQ event moderation, one can
|
If the underlying IB device supports CQ event moderation, one can
|
||||||
use ethtool to set interrupt mitigation parameters and thus reduce
|
use ethtool to set interrupt mitigation parameters and thus reduce
|
||||||
|
@ -71,6 +77,7 @@ Interrupt moderation
|
||||||
moderation is supported.
|
moderation is supported.
|
||||||
|
|
||||||
Debugging Information
|
Debugging Information
|
||||||
|
=====================
|
||||||
|
|
||||||
By compiling the IPoIB driver with CONFIG_INFINIBAND_IPOIB_DEBUG set
|
By compiling the IPoIB driver with CONFIG_INFINIBAND_IPOIB_DEBUG set
|
||||||
to 'y', tracing messages are compiled into the driver. They are
|
to 'y', tracing messages are compiled into the driver. They are
|
||||||
|
@ -79,7 +86,7 @@ Debugging Information
|
||||||
runtime through files in /sys/module/ib_ipoib/.
|
runtime through files in /sys/module/ib_ipoib/.
|
||||||
|
|
||||||
CONFIG_INFINIBAND_IPOIB_DEBUG also enables files in the debugfs
|
CONFIG_INFINIBAND_IPOIB_DEBUG also enables files in the debugfs
|
||||||
virtual filesystem. By mounting this filesystem, for example with
|
virtual filesystem. By mounting this filesystem, for example with::
|
||||||
|
|
||||||
mount -t debugfs none /sys/kernel/debug
|
mount -t debugfs none /sys/kernel/debug
|
||||||
|
|
||||||
|
@ -96,10 +103,13 @@ Debugging Information
|
||||||
performance, because it adds tests to the fast path.
|
performance, because it adds tests to the fast path.
|
||||||
|
|
||||||
References
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
Transmission of IP over InfiniBand (IPoIB) (RFC 4391)
|
Transmission of IP over InfiniBand (IPoIB) (RFC 4391)
|
||||||
http://ietf.org/rfc/rfc4391.txt
|
http://ietf.org/rfc/rfc4391.txt
|
||||||
|
|
||||||
IP over InfiniBand (IPoIB) Architecture (RFC 4392)
|
IP over InfiniBand (IPoIB) Architecture (RFC 4392)
|
||||||
http://ietf.org/rfc/rfc4392.txt
|
http://ietf.org/rfc/rfc4392.txt
|
||||||
|
|
||||||
IP over InfiniBand: Connected Mode (RFC 4755)
|
IP over InfiniBand: Connected Mode (RFC 4755)
|
||||||
http://ietf.org/rfc/rfc4755.txt
|
http://ietf.org/rfc/rfc4755.txt
|
|
@ -1,3 +1,7 @@
|
||||||
|
=================================================================
|
||||||
|
Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC)
|
||||||
|
=================================================================
|
||||||
|
|
||||||
Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature
|
Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature
|
||||||
supports Ethernet functionality over Omni-Path fabric by encapsulating
|
supports Ethernet functionality over Omni-Path fabric by encapsulating
|
||||||
the Ethernet packets between HFI nodes.
|
the Ethernet packets between HFI nodes.
|
||||||
|
@ -17,70 +21,72 @@ an independent Ethernet network. The configuration is performed by an
|
||||||
Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM)
|
Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM)
|
||||||
application. HFI nodes can have multiple VNICs each connected to a
|
application. HFI nodes can have multiple VNICs each connected to a
|
||||||
different virtual Ethernet switch. The below diagram presents a case
|
different virtual Ethernet switch. The below diagram presents a case
|
||||||
of two virtual Ethernet switches with two HFI nodes.
|
of two virtual Ethernet switches with two HFI nodes::
|
||||||
|
|
||||||
+-------------------+
|
+-------------------+
|
||||||
| Subnet/ |
|
| Subnet/ |
|
||||||
| Ethernet |
|
| Ethernet |
|
||||||
| Manager |
|
| Manager |
|
||||||
+-------------------+
|
+-------------------+
|
||||||
/ /
|
/ /
|
||||||
/ /
|
/ /
|
||||||
/ /
|
/ /
|
||||||
/ /
|
/ /
|
||||||
+-----------------------------+ +------------------------------+
|
+-----------------------------+ +------------------------------+
|
||||||
| Virtual Ethernet Switch | | Virtual Ethernet Switch |
|
| Virtual Ethernet Switch | | Virtual Ethernet Switch |
|
||||||
| +---------+ +---------+ | | +---------+ +---------+ |
|
| +---------+ +---------+ | | +---------+ +---------+ |
|
||||||
| | VPORT | | VPORT | | | | VPORT | | VPORT | |
|
| | VPORT | | VPORT | | | | VPORT | | VPORT | |
|
||||||
+--+---------+----+---------+-+ +-+---------+----+---------+---+
|
+--+---------+----+---------+-+ +-+---------+----+---------+---+
|
||||||
| \ / |
|
| \ / |
|
||||||
| \ / |
|
| \ / |
|
||||||
| \/ |
|
| \/ |
|
||||||
| / \ |
|
| / \ |
|
||||||
| / \ |
|
| / \ |
|
||||||
+-----------+------------+ +-----------+------------+
|
+-----------+------------+ +-----------+------------+
|
||||||
| VNIC | VNIC | | VNIC | VNIC |
|
| VNIC | VNIC | | VNIC | VNIC |
|
||||||
+-----------+------------+ +-----------+------------+
|
+-----------+------------+ +-----------+------------+
|
||||||
| HFI | | HFI |
|
| HFI | | HFI |
|
||||||
+------------------------+ +------------------------+
|
+------------------------+ +------------------------+
|
||||||
|
|
||||||
|
|
||||||
The Omni-Path encapsulated Ethernet packet format is as described below.
|
The Omni-Path encapsulated Ethernet packet format is as described below.
|
||||||
|
|
||||||
Bits Field
|
==================== ================================
|
||||||
------------------------------------
|
Bits Field
|
||||||
|
==================== ================================
|
||||||
Quad Word 0:
|
Quad Word 0:
|
||||||
0-19 SLID (lower 20 bits)
|
0-19 SLID (lower 20 bits)
|
||||||
20-30 Length (in Quad Words)
|
20-30 Length (in Quad Words)
|
||||||
31 BECN bit
|
31 BECN bit
|
||||||
32-51 DLID (lower 20 bits)
|
32-51 DLID (lower 20 bits)
|
||||||
52-56 SC (Service Class)
|
52-56 SC (Service Class)
|
||||||
57-59 RC (Routing Control)
|
57-59 RC (Routing Control)
|
||||||
60 FECN bit
|
60 FECN bit
|
||||||
61-62 L2 (=10, 16B format)
|
61-62 L2 (=10, 16B format)
|
||||||
63 LT (=1, Link Transfer Head Flit)
|
63 LT (=1, Link Transfer Head Flit)
|
||||||
|
|
||||||
Quad Word 1:
|
Quad Word 1:
|
||||||
0-7 L4 type (=0x78 ETHERNET)
|
0-7 L4 type (=0x78 ETHERNET)
|
||||||
8-11 SLID[23:20]
|
8-11 SLID[23:20]
|
||||||
12-15 DLID[23:20]
|
12-15 DLID[23:20]
|
||||||
16-31 PKEY
|
16-31 PKEY
|
||||||
32-47 Entropy
|
32-47 Entropy
|
||||||
48-63 Reserved
|
48-63 Reserved
|
||||||
|
|
||||||
Quad Word 2:
|
Quad Word 2:
|
||||||
0-15 Reserved
|
0-15 Reserved
|
||||||
16-31 L4 header
|
16-31 L4 header
|
||||||
32-63 Ethernet Packet
|
32-63 Ethernet Packet
|
||||||
|
|
||||||
Quad Words 3 to N-1:
|
Quad Words 3 to N-1:
|
||||||
0-63 Ethernet packet (pad extended)
|
0-63 Ethernet packet (pad extended)
|
||||||
|
|
||||||
Quad Word N (last):
|
Quad Word N (last):
|
||||||
0-23 Ethernet packet (pad extended)
|
0-23 Ethernet packet (pad extended)
|
||||||
24-55 ICRC
|
24-55 ICRC
|
||||||
56-61 Tail
|
56-61 Tail
|
||||||
62-63 LT (=01, Link Transfer Tail Flit)
|
62-63 LT (=01, Link Transfer Tail Flit)
|
||||||
|
==================== ================================
|
||||||
|
|
||||||
Ethernet packet is padded on the transmit side to ensure that the VNIC OPA
|
Ethernet packet is padded on the transmit side to ensure that the VNIC OPA
|
||||||
packet is quad word aligned. The 'Tail' field contains the number of bytes
|
packet is quad word aligned. The 'Tail' field contains the number of bytes
|
||||||
|
@ -123,7 +129,7 @@ operation. It also handles the encapsulation of Ethernet packets with an
|
||||||
Omni-Path header in the transmit path. For each VNIC interface, the
|
Omni-Path header in the transmit path. For each VNIC interface, the
|
||||||
information required for encapsulation is configured by the EM via VEMA MAD
|
information required for encapsulation is configured by the EM via VEMA MAD
|
||||||
interface. It also passes any control information to the HW dependent driver
|
interface. It also passes any control information to the HW dependent driver
|
||||||
by invoking the RDMA netdev control operations.
|
by invoking the RDMA netdev control operations::
|
||||||
|
|
||||||
+-------------------+ +----------------------+
|
+-------------------+ +----------------------+
|
||||||
| | | Linux |
|
| | | Linux |
|
|
@ -1,4 +1,6 @@
|
||||||
SYSFS FILES
|
===========
|
||||||
|
Sysfs files
|
||||||
|
===========
|
||||||
|
|
||||||
The sysfs interface has moved to
|
The sysfs interface has moved to
|
||||||
Documentation/ABI/stable/sysfs-class-infiniband.
|
Documentation/ABI/stable/sysfs-class-infiniband.
|
|
@ -1,12 +1,16 @@
|
||||||
|
==================
|
||||||
Tag matching logic
|
Tag matching logic
|
||||||
|
==================
|
||||||
|
|
||||||
The MPI standard defines a set of rules, known as tag-matching, for matching
|
The MPI standard defines a set of rules, known as tag-matching, for matching
|
||||||
source send operations to destination receives. The following parameters must
|
source send operations to destination receives. The following parameters must
|
||||||
match the following source and destination parameters:
|
match the following source and destination parameters:
|
||||||
|
|
||||||
* Communicator
|
* Communicator
|
||||||
* User tag - wild card may be specified by the receiver
|
* User tag - wild card may be specified by the receiver
|
||||||
* Source rank – wild car may be specified by the receiver
|
* Source rank – wild car may be specified by the receiver
|
||||||
* Destination rank – wild
|
* Destination rank – wild
|
||||||
|
|
||||||
The ordering rules require that when more than one pair of send and receive
|
The ordering rules require that when more than one pair of send and receive
|
||||||
message envelopes may match, the pair that includes the earliest posted-send
|
message envelopes may match, the pair that includes the earliest posted-send
|
||||||
and the earliest posted-receive is the pair that must be used to satisfy the
|
and the earliest posted-receive is the pair that must be used to satisfy the
|
||||||
|
@ -35,6 +39,7 @@ the header to initiate an RDMA READ operation directly to the matching buffer.
|
||||||
A fin message needs to be received in order for the buffer to be reused.
|
A fin message needs to be received in order for the buffer to be reused.
|
||||||
|
|
||||||
Tag matching implementation
|
Tag matching implementation
|
||||||
|
===========================
|
||||||
|
|
||||||
There are two types of matching objects used, the posted receive list and the
|
There are two types of matching objects used, the posted receive list and the
|
||||||
unexpected message list. The application posts receive buffers through calls
|
unexpected message list. The application posts receive buffers through calls
|
|
@ -1,6 +1,9 @@
|
||||||
USERSPACE MAD ACCESS
|
====================
|
||||||
|
Userspace MAD access
|
||||||
|
====================
|
||||||
|
|
||||||
Device files
|
Device files
|
||||||
|
============
|
||||||
|
|
||||||
Each port of each InfiniBand device has a "umad" device and an
|
Each port of each InfiniBand device has a "umad" device and an
|
||||||
"issm" device attached. For example, a two-port HCA will have two
|
"issm" device attached. For example, a two-port HCA will have two
|
||||||
|
@ -8,12 +11,13 @@ Device files
|
||||||
device of each type (for switch port 0).
|
device of each type (for switch port 0).
|
||||||
|
|
||||||
Creating MAD agents
|
Creating MAD agents
|
||||||
|
===================
|
||||||
|
|
||||||
A MAD agent can be created by filling in a struct ib_user_mad_reg_req
|
A MAD agent can be created by filling in a struct ib_user_mad_reg_req
|
||||||
and then calling the IB_USER_MAD_REGISTER_AGENT ioctl on a file
|
and then calling the IB_USER_MAD_REGISTER_AGENT ioctl on a file
|
||||||
descriptor for the appropriate device file. If the registration
|
descriptor for the appropriate device file. If the registration
|
||||||
request succeeds, a 32-bit id will be returned in the structure.
|
request succeeds, a 32-bit id will be returned in the structure.
|
||||||
For example:
|
For example::
|
||||||
|
|
||||||
struct ib_user_mad_reg_req req = { /* ... */ };
|
struct ib_user_mad_reg_req req = { /* ... */ };
|
||||||
ret = ioctl(fd, IB_USER_MAD_REGISTER_AGENT, (char *) &req);
|
ret = ioctl(fd, IB_USER_MAD_REGISTER_AGENT, (char *) &req);
|
||||||
|
@ -26,12 +30,14 @@ Creating MAD agents
|
||||||
ioctl. Also, all agents registered through a file descriptor will
|
ioctl. Also, all agents registered through a file descriptor will
|
||||||
be unregistered when the descriptor is closed.
|
be unregistered when the descriptor is closed.
|
||||||
|
|
||||||
2014 -- a new registration ioctl is now provided which allows additional
|
2014
|
||||||
|
a new registration ioctl is now provided which allows additional
|
||||||
fields to be provided during registration.
|
fields to be provided during registration.
|
||||||
Users of this registration call are implicitly setting the use of
|
Users of this registration call are implicitly setting the use of
|
||||||
pkey_index (see below).
|
pkey_index (see below).
|
||||||
|
|
||||||
Receiving MADs
|
Receiving MADs
|
||||||
|
==============
|
||||||
|
|
||||||
MADs are received using read(). The receive side now supports
|
MADs are received using read(). The receive side now supports
|
||||||
RMPP. The buffer passed to read() must be at least one
|
RMPP. The buffer passed to read() must be at least one
|
||||||
|
@ -41,7 +47,8 @@ Receiving MADs
|
||||||
MAD (RMPP), the errno is set to ENOSPC and the length of the
|
MAD (RMPP), the errno is set to ENOSPC and the length of the
|
||||||
buffer needed is set in mad.length.
|
buffer needed is set in mad.length.
|
||||||
|
|
||||||
Example for normal MAD (non RMPP) reads:
|
Example for normal MAD (non RMPP) reads::
|
||||||
|
|
||||||
struct ib_user_mad *mad;
|
struct ib_user_mad *mad;
|
||||||
mad = malloc(sizeof *mad + 256);
|
mad = malloc(sizeof *mad + 256);
|
||||||
ret = read(fd, mad, sizeof *mad + 256);
|
ret = read(fd, mad, sizeof *mad + 256);
|
||||||
|
@ -50,7 +57,8 @@ Receiving MADs
|
||||||
free(mad);
|
free(mad);
|
||||||
}
|
}
|
||||||
|
|
||||||
Example for RMPP reads:
|
Example for RMPP reads::
|
||||||
|
|
||||||
struct ib_user_mad *mad;
|
struct ib_user_mad *mad;
|
||||||
mad = malloc(sizeof *mad + 256);
|
mad = malloc(sizeof *mad + 256);
|
||||||
ret = read(fd, mad, sizeof *mad + 256);
|
ret = read(fd, mad, sizeof *mad + 256);
|
||||||
|
@ -76,11 +84,12 @@ Receiving MADs
|
||||||
poll()/select() may be used to wait until a MAD can be read.
|
poll()/select() may be used to wait until a MAD can be read.
|
||||||
|
|
||||||
Sending MADs
|
Sending MADs
|
||||||
|
============
|
||||||
|
|
||||||
MADs are sent using write(). The agent ID for sending should be
|
MADs are sent using write(). The agent ID for sending should be
|
||||||
filled into the id field of the MAD, the destination LID should be
|
filled into the id field of the MAD, the destination LID should be
|
||||||
filled into the lid field, and so on. The send side does support
|
filled into the lid field, and so on. The send side does support
|
||||||
RMPP so arbitrary length MAD can be sent. For example:
|
RMPP so arbitrary length MAD can be sent. For example::
|
||||||
|
|
||||||
struct ib_user_mad *mad;
|
struct ib_user_mad *mad;
|
||||||
|
|
||||||
|
@ -97,6 +106,7 @@ Sending MADs
|
||||||
perror("write");
|
perror("write");
|
||||||
|
|
||||||
Transaction IDs
|
Transaction IDs
|
||||||
|
===============
|
||||||
|
|
||||||
Users of the umad devices can use the lower 32 bits of the
|
Users of the umad devices can use the lower 32 bits of the
|
||||||
transaction ID field (that is, the least significant half of the
|
transaction ID field (that is, the least significant half of the
|
||||||
|
@ -105,6 +115,7 @@ Transaction IDs
|
||||||
the kernel and will be overwritten before a MAD is sent.
|
the kernel and will be overwritten before a MAD is sent.
|
||||||
|
|
||||||
P_Key Index Handling
|
P_Key Index Handling
|
||||||
|
====================
|
||||||
|
|
||||||
The old ib_umad interface did not allow setting the P_Key index for
|
The old ib_umad interface did not allow setting the P_Key index for
|
||||||
MADs that are sent and did not provide a way for obtaining the P_Key
|
MADs that are sent and did not provide a way for obtaining the P_Key
|
||||||
|
@ -119,6 +130,7 @@ P_Key Index Handling
|
||||||
default, and the IB_USER_MAD_ENABLE_PKEY ioctl will be removed.
|
default, and the IB_USER_MAD_ENABLE_PKEY ioctl will be removed.
|
||||||
|
|
||||||
Setting IsSM Capability Bit
|
Setting IsSM Capability Bit
|
||||||
|
===========================
|
||||||
|
|
||||||
To set the IsSM capability bit for a port, simply open the
|
To set the IsSM capability bit for a port, simply open the
|
||||||
corresponding issm device file. If the IsSM bit is already set,
|
corresponding issm device file. If the IsSM bit is already set,
|
||||||
|
@ -129,25 +141,26 @@ Setting IsSM Capability Bit
|
||||||
the issm file.
|
the issm file.
|
||||||
|
|
||||||
/dev files
|
/dev files
|
||||||
|
==========
|
||||||
|
|
||||||
To create the appropriate character device files automatically with
|
To create the appropriate character device files automatically with
|
||||||
udev, a rule like
|
udev, a rule like::
|
||||||
|
|
||||||
KERNEL=="umad*", NAME="infiniband/%k"
|
KERNEL=="umad*", NAME="infiniband/%k"
|
||||||
KERNEL=="issm*", NAME="infiniband/%k"
|
KERNEL=="issm*", NAME="infiniband/%k"
|
||||||
|
|
||||||
can be used. This will create device nodes named
|
can be used. This will create device nodes named::
|
||||||
|
|
||||||
/dev/infiniband/umad0
|
/dev/infiniband/umad0
|
||||||
/dev/infiniband/issm0
|
/dev/infiniband/issm0
|
||||||
|
|
||||||
for the first port, and so on. The InfiniBand device and port
|
for the first port, and so on. The InfiniBand device and port
|
||||||
associated with these devices can be determined from the files
|
associated with these devices can be determined from the files::
|
||||||
|
|
||||||
/sys/class/infiniband_mad/umad0/ibdev
|
/sys/class/infiniband_mad/umad0/ibdev
|
||||||
/sys/class/infiniband_mad/umad0/port
|
/sys/class/infiniband_mad/umad0/port
|
||||||
|
|
||||||
and
|
and::
|
||||||
|
|
||||||
/sys/class/infiniband_mad/issm0/ibdev
|
/sys/class/infiniband_mad/issm0/ibdev
|
||||||
/sys/class/infiniband_mad/issm0/port
|
/sys/class/infiniband_mad/issm0/port
|
|
@ -1,4 +1,6 @@
|
||||||
USERSPACE VERBS ACCESS
|
======================
|
||||||
|
Userspace verbs access
|
||||||
|
======================
|
||||||
|
|
||||||
The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS,
|
The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS,
|
||||||
enables direct userspace access to IB hardware via "verbs," as
|
enables direct userspace access to IB hardware via "verbs," as
|
||||||
|
@ -13,6 +15,7 @@ USERSPACE VERBS ACCESS
|
||||||
libmthca userspace driver be installed.
|
libmthca userspace driver be installed.
|
||||||
|
|
||||||
User-kernel communication
|
User-kernel communication
|
||||||
|
=========================
|
||||||
|
|
||||||
Userspace communicates with the kernel for slow path, resource
|
Userspace communicates with the kernel for slow path, resource
|
||||||
management operations via the /dev/infiniband/uverbsN character
|
management operations via the /dev/infiniband/uverbsN character
|
||||||
|
@ -28,6 +31,7 @@ User-kernel communication
|
||||||
system call.
|
system call.
|
||||||
|
|
||||||
Resource management
|
Resource management
|
||||||
|
===================
|
||||||
|
|
||||||
Since creation and destruction of all IB resources is done by
|
Since creation and destruction of all IB resources is done by
|
||||||
commands passed through a file descriptor, the kernel can keep track
|
commands passed through a file descriptor, the kernel can keep track
|
||||||
|
@ -41,6 +45,7 @@ Resource management
|
||||||
prevent one process from touching another process's resources.
|
prevent one process from touching another process's resources.
|
||||||
|
|
||||||
Memory pinning
|
Memory pinning
|
||||||
|
==============
|
||||||
|
|
||||||
Direct userspace I/O requires that memory regions that are potential
|
Direct userspace I/O requires that memory regions that are potential
|
||||||
I/O targets be kept resident at the same physical address. The
|
I/O targets be kept resident at the same physical address. The
|
||||||
|
@ -54,13 +59,14 @@ Memory pinning
|
||||||
number of pages pinned by a process.
|
number of pages pinned by a process.
|
||||||
|
|
||||||
/dev files
|
/dev files
|
||||||
|
==========
|
||||||
|
|
||||||
To create the appropriate character device files automatically with
|
To create the appropriate character device files automatically with
|
||||||
udev, a rule like
|
udev, a rule like::
|
||||||
|
|
||||||
KERNEL=="uverbs*", NAME="infiniband/%k"
|
KERNEL=="uverbs*", NAME="infiniband/%k"
|
||||||
|
|
||||||
can be used. This will create device nodes named
|
can be used. This will create device nodes named::
|
||||||
|
|
||||||
/dev/infiniband/uverbs0
|
/dev/infiniband/uverbs0
|
||||||
|
|
|
@ -745,7 +745,7 @@ static int ib_umad_reg_agent(struct ib_umad_file *file, void __user *arg,
|
||||||
"process %s did not enable P_Key index support.\n",
|
"process %s did not enable P_Key index support.\n",
|
||||||
current->comm);
|
current->comm);
|
||||||
dev_warn(&file->port->dev,
|
dev_warn(&file->port->dev,
|
||||||
" Documentation/infiniband/user_mad.txt has info on the new ABI.\n");
|
" Documentation/infiniband/user_mad.rst has info on the new ABI.\n");
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -6,7 +6,7 @@ config INFINIBAND_IPOIB
|
||||||
transports IP packets over InfiniBand so you can use your IB
|
transports IP packets over InfiniBand so you can use your IB
|
||||||
device as a fancy NIC.
|
device as a fancy NIC.
|
||||||
|
|
||||||
See Documentation/infiniband/ipoib.txt for more information
|
See Documentation/infiniband/ipoib.rst for more information
|
||||||
|
|
||||||
config INFINIBAND_IPOIB_CM
|
config INFINIBAND_IPOIB_CM
|
||||||
bool "IP-over-InfiniBand Connected Mode support"
|
bool "IP-over-InfiniBand Connected Mode support"
|
||||||
|
|
Loading…
Reference in New Issue