doc: networking: prepare offload documents for conversion into RST
Add small number of markups which are sufficient for conversion into reStructuredText. Unfortunately there was necessary to restructure all sections in checksum-offloads.txt file and create paragraphs separated by newline. There also must not be a space at the beginning of paragpraph. There are no semantic changes. Signed-off-by: Otto Sabart <ottosabart@seberm.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This commit is contained in:
parent
9f63df26be
commit
1b23f5e997
|
@ -1,122 +1,143 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
===============================================
|
||||||
Checksum Offloads in the Linux Networking Stack
|
Checksum Offloads in the Linux Networking Stack
|
||||||
|
===============================================
|
||||||
|
|
||||||
|
|
||||||
Introduction
|
Introduction
|
||||||
============
|
============
|
||||||
|
|
||||||
This document describes a set of techniques in the Linux networking stack
|
This document describes a set of techniques in the Linux networking stack to
|
||||||
to take advantage of checksum offload capabilities of various NICs.
|
take advantage of checksum offload capabilities of various NICs.
|
||||||
|
|
||||||
The following technologies are described:
|
The following technologies are described:
|
||||||
* TX Checksum Offload
|
|
||||||
* LCO: Local Checksum Offload
|
* TX Checksum Offload
|
||||||
* RCO: Remote Checksum Offload
|
* LCO: Local Checksum Offload
|
||||||
|
* RCO: Remote Checksum Offload
|
||||||
|
|
||||||
Things that should be documented here but aren't yet:
|
Things that should be documented here but aren't yet:
|
||||||
* RX Checksum Offload
|
|
||||||
* CHECKSUM_UNNECESSARY conversion
|
* RX Checksum Offload
|
||||||
|
* CHECKSUM_UNNECESSARY conversion
|
||||||
|
|
||||||
|
|
||||||
TX Checksum Offload
|
TX Checksum Offload
|
||||||
===================
|
===================
|
||||||
|
|
||||||
The interface for offloading a transmit checksum to a device is explained
|
The interface for offloading a transmit checksum to a device is explained in
|
||||||
in detail in comments near the top of include/linux/skbuff.h.
|
detail in comments near the top of include/linux/skbuff.h.
|
||||||
|
|
||||||
In brief, it allows to request the device fill in a single ones-complement
|
In brief, it allows to request the device fill in a single ones-complement
|
||||||
checksum defined by the sk_buff fields skb->csum_start and
|
checksum defined by the sk_buff fields skb->csum_start and skb->csum_offset.
|
||||||
skb->csum_offset. The device should compute the 16-bit ones-complement
|
The device should compute the 16-bit ones-complement checksum (i.e. the
|
||||||
checksum (i.e. the 'IP-style' checksum) from csum_start to the end of the
|
'IP-style' checksum) from csum_start to the end of the packet, and fill in the
|
||||||
packet, and fill in the result at (csum_start + csum_offset).
|
result at (csum_start + csum_offset).
|
||||||
Because csum_offset cannot be negative, this ensures that the previous
|
|
||||||
value of the checksum field is included in the checksum computation, thus
|
Because csum_offset cannot be negative, this ensures that the previous value of
|
||||||
it can be used to supply any needed corrections to the checksum (such as
|
the checksum field is included in the checksum computation, thus it can be used
|
||||||
the sum of the pseudo-header for UDP or TCP).
|
to supply any needed corrections to the checksum (such as the sum of the
|
||||||
|
pseudo-header for UDP or TCP).
|
||||||
|
|
||||||
This interface only allows a single checksum to be offloaded. Where
|
This interface only allows a single checksum to be offloaded. Where
|
||||||
encapsulation is used, the packet may have multiple checksum fields in
|
encapsulation is used, the packet may have multiple checksum fields in
|
||||||
different header layers, and the rest will have to be handled by another
|
different header layers, and the rest will have to be handled by another
|
||||||
mechanism such as LCO or RCO.
|
mechanism such as LCO or RCO.
|
||||||
|
|
||||||
CRC32c can also be offloaded using this interface, by means of filling
|
CRC32c can also be offloaded using this interface, by means of filling
|
||||||
skb->csum_start and skb->csum_offset as described above, and setting
|
skb->csum_start and skb->csum_offset as described above, and setting
|
||||||
skb->csum_not_inet: see skbuff.h comment (section 'D') for more details.
|
skb->csum_not_inet: see skbuff.h comment (section 'D') for more details.
|
||||||
|
|
||||||
No offloading of the IP header checksum is performed; it is always done in
|
No offloading of the IP header checksum is performed; it is always done in
|
||||||
software. This is OK because when we build the IP header, we obviously
|
software. This is OK because when we build the IP header, we obviously have it
|
||||||
have it in cache, so summing it isn't expensive. It's also rather short.
|
in cache, so summing it isn't expensive. It's also rather short.
|
||||||
|
|
||||||
The requirements for GSO are more complicated, because when segmenting an
|
The requirements for GSO are more complicated, because when segmenting an
|
||||||
encapsulated packet both the inner and outer checksums may need to be
|
encapsulated packet both the inner and outer checksums may need to be edited or
|
||||||
edited or recomputed for each resulting segment. See the skbuff.h comment
|
recomputed for each resulting segment. See the skbuff.h comment (section 'E')
|
||||||
(section 'E') for more details.
|
for more details.
|
||||||
|
|
||||||
A driver declares its offload capabilities in netdev->hw_features; see
|
A driver declares its offload capabilities in netdev->hw_features; see
|
||||||
Documentation/networking/netdev-features.txt for more. Note that a device
|
Documentation/networking/netdev-features.txt for more. Note that a device
|
||||||
which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start
|
which only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and
|
||||||
and csum_offset given in the SKB; if it tries to deduce these itself in
|
csum_offset given in the SKB; if it tries to deduce these itself in hardware
|
||||||
hardware (as some NICs do) the driver should check that the values in the
|
(as some NICs do) the driver should check that the values in the SKB match
|
||||||
SKB match those which the hardware will deduce, and if not, fall back to
|
those which the hardware will deduce, and if not, fall back to checksumming in
|
||||||
checksumming in software instead (with skb_csum_hwoffload_help() or one of
|
software instead (with skb_csum_hwoffload_help() or one of the
|
||||||
the skb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in
|
skb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in
|
||||||
include/linux/skbuff.h).
|
include/linux/skbuff.h).
|
||||||
|
|
||||||
The stack should, for the most part, assume that checksum offload is
|
The stack should, for the most part, assume that checksum offload is supported
|
||||||
supported by the underlying device. The only place that should check is
|
by the underlying device. The only place that should check is
|
||||||
validate_xmit_skb(), and the functions it calls directly or indirectly.
|
validate_xmit_skb(), and the functions it calls directly or indirectly. That
|
||||||
That function compares the offload features requested by the SKB (which
|
function compares the offload features requested by the SKB (which may include
|
||||||
may include other offloads besides TX Checksum Offload) and, if they are
|
other offloads besides TX Checksum Offload) and, if they are not supported or
|
||||||
not supported or enabled on the device (determined by netdev->features),
|
enabled on the device (determined by netdev->features), performs the
|
||||||
performs the corresponding offload in software. In the case of TX
|
corresponding offload in software. In the case of TX Checksum Offload, that
|
||||||
Checksum Offload, that means calling skb_csum_hwoffload_help(skb, features).
|
means calling skb_csum_hwoffload_help(skb, features).
|
||||||
|
|
||||||
|
|
||||||
LCO: Local Checksum Offload
|
LCO: Local Checksum Offload
|
||||||
===========================
|
===========================
|
||||||
|
|
||||||
LCO is a technique for efficiently computing the outer checksum of an
|
LCO is a technique for efficiently computing the outer checksum of an
|
||||||
encapsulated datagram when the inner checksum is due to be offloaded.
|
encapsulated datagram when the inner checksum is due to be offloaded.
|
||||||
The ones-complement sum of a correctly checksummed TCP or UDP packet is
|
|
||||||
equal to the complement of the sum of the pseudo header, because everything
|
The ones-complement sum of a correctly checksummed TCP or UDP packet is equal
|
||||||
else gets 'cancelled out' by the checksum field. This is because the sum was
|
to the complement of the sum of the pseudo header, because everything else gets
|
||||||
complemented before being written to the checksum field.
|
'cancelled out' by the checksum field. This is because the sum was
|
||||||
|
complemented before being written to the checksum field.
|
||||||
|
|
||||||
More generally, this holds in any case where the 'IP-style' ones complement
|
More generally, this holds in any case where the 'IP-style' ones complement
|
||||||
checksum is used, and thus any checksum that TX Checksum Offload supports.
|
checksum is used, and thus any checksum that TX Checksum Offload supports.
|
||||||
|
|
||||||
That is, if we have set up TX Checksum Offload with a start/offset pair, we
|
That is, if we have set up TX Checksum Offload with a start/offset pair, we
|
||||||
know that after the device has filled in that checksum, the ones
|
know that after the device has filled in that checksum, the ones complement sum
|
||||||
complement sum from csum_start to the end of the packet will be equal to
|
from csum_start to the end of the packet will be equal to the complement of
|
||||||
the complement of whatever value we put in the checksum field beforehand.
|
whatever value we put in the checksum field beforehand. This allows us to
|
||||||
This allows us to compute the outer checksum without looking at the payload:
|
compute the outer checksum without looking at the payload: we simply stop
|
||||||
we simply stop summing when we get to csum_start, then add the complement of
|
summing when we get to csum_start, then add the complement of the 16-bit word
|
||||||
the 16-bit word at (csum_start + csum_offset).
|
at (csum_start + csum_offset).
|
||||||
|
|
||||||
Then, when the true inner checksum is filled in (either by hardware or by
|
Then, when the true inner checksum is filled in (either by hardware or by
|
||||||
skb_checksum_help()), the outer checksum will become correct by virtue of
|
skb_checksum_help()), the outer checksum will become correct by virtue of the
|
||||||
the arithmetic.
|
arithmetic.
|
||||||
|
|
||||||
LCO is performed by the stack when constructing an outer UDP header for an
|
LCO is performed by the stack when constructing an outer UDP header for an
|
||||||
encapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for
|
encapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for the
|
||||||
the IPv6 equivalents, in udp6_set_csum().
|
IPv6 equivalents, in udp6_set_csum().
|
||||||
|
|
||||||
It is also performed when constructing an IPv4 GRE header, in
|
It is also performed when constructing an IPv4 GRE header, in
|
||||||
net/ipv4/ip_gre.c:build_header(). It is *not* currently performed when
|
net/ipv4/ip_gre.c:build_header(). It is *not* currently performed when
|
||||||
constructing an IPv6 GRE header; the GRE checksum is computed over the
|
constructing an IPv6 GRE header; the GRE checksum is computed over the whole
|
||||||
whole packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be
|
packet in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use
|
||||||
possible to use LCO here as IPv6 GRE still uses an IP-style checksum.
|
LCO here as IPv6 GRE still uses an IP-style checksum.
|
||||||
|
|
||||||
All of the LCO implementations use a helper function lco_csum(), in
|
All of the LCO implementations use a helper function lco_csum(), in
|
||||||
include/linux/skbuff.h.
|
include/linux/skbuff.h.
|
||||||
|
|
||||||
LCO can safely be used for nested encapsulations; in this case, the outer
|
LCO can safely be used for nested encapsulations; in this case, the outer
|
||||||
encapsulation layer will sum over both its own header and the 'middle'
|
encapsulation layer will sum over both its own header and the 'middle' header.
|
||||||
header. This does mean that the 'middle' header will get summed multiple
|
This does mean that the 'middle' header will get summed multiple times, but
|
||||||
times, but there doesn't seem to be a way to avoid that without incurring
|
there doesn't seem to be a way to avoid that without incurring bigger costs
|
||||||
bigger costs (e.g. in SKB bloat).
|
(e.g. in SKB bloat).
|
||||||
|
|
||||||
|
|
||||||
RCO: Remote Checksum Offload
|
RCO: Remote Checksum Offload
|
||||||
============================
|
============================
|
||||||
|
|
||||||
RCO is a technique for eliding the inner checksum of an encapsulated
|
RCO is a technique for eliding the inner checksum of an encapsulated datagram,
|
||||||
datagram, allowing the outer checksum to be offloaded. It does, however,
|
allowing the outer checksum to be offloaded. It does, however, involve a
|
||||||
involve a change to the encapsulation protocols, which the receiver must
|
change to the encapsulation protocols, which the receiver must also support.
|
||||||
also support. For this reason, it is disabled by default.
|
For this reason, it is disabled by default.
|
||||||
|
|
||||||
RCO is detailed in the following Internet-Drafts:
|
RCO is detailed in the following Internet-Drafts:
|
||||||
https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00
|
|
||||||
https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
|
* https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00
|
||||||
In Linux, RCO is implemented individually in each encapsulation protocol,
|
* https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
|
||||||
and most tunnel types have flags controlling its use. For instance, VXLAN
|
|
||||||
has the flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that
|
In Linux, RCO is implemented individually in each encapsulation protocol, and
|
||||||
RCO should be used when transmitting to a given remote destination.
|
most tunnel types have flags controlling its use. For instance, VXLAN has the
|
||||||
|
flag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be
|
||||||
|
used when transmitting to a given remote destination.
|
||||||
|
|
|
@ -1,4 +1,9 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
===================================================
|
||||||
Segmentation Offloads in the Linux Networking Stack
|
Segmentation Offloads in the Linux Networking Stack
|
||||||
|
===================================================
|
||||||
|
|
||||||
|
|
||||||
Introduction
|
Introduction
|
||||||
============
|
============
|
||||||
|
@ -15,6 +20,7 @@ The following technologies are described:
|
||||||
* Partial Generic Segmentation Offload - GSO_PARTIAL
|
* Partial Generic Segmentation Offload - GSO_PARTIAL
|
||||||
* SCTP accelleration with GSO - GSO_BY_FRAGS
|
* SCTP accelleration with GSO - GSO_BY_FRAGS
|
||||||
|
|
||||||
|
|
||||||
TCP Segmentation Offload
|
TCP Segmentation Offload
|
||||||
========================
|
========================
|
||||||
|
|
||||||
|
@ -42,6 +48,7 @@ NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO
|
||||||
and we will either increment the IP ID for all frames, or leave it at a
|
and we will either increment the IP ID for all frames, or leave it at a
|
||||||
static value based on driver preference.
|
static value based on driver preference.
|
||||||
|
|
||||||
|
|
||||||
UDP Fragmentation Offload
|
UDP Fragmentation Offload
|
||||||
=========================
|
=========================
|
||||||
|
|
||||||
|
@ -54,6 +61,7 @@ UFO is deprecated: modern kernels will no longer generate UFO skbs, but can
|
||||||
still receive them from tuntap and similar devices. Offload of UDP-based
|
still receive them from tuntap and similar devices. Offload of UDP-based
|
||||||
tunnel protocols is still supported.
|
tunnel protocols is still supported.
|
||||||
|
|
||||||
|
|
||||||
IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads
|
IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads
|
||||||
========================================================
|
========================================================
|
||||||
|
|
||||||
|
@ -71,17 +79,19 @@ refer to the tunnel headers as the outer headers, while the encapsulated
|
||||||
data is normally referred to as the inner headers. Below is the list of
|
data is normally referred to as the inner headers. Below is the list of
|
||||||
calls to access the given headers:
|
calls to access the given headers:
|
||||||
|
|
||||||
IPIP/SIT Tunnel:
|
IPIP/SIT Tunnel::
|
||||||
Outer Inner
|
|
||||||
MAC skb_mac_header
|
|
||||||
Network skb_network_header skb_inner_network_header
|
|
||||||
Transport skb_transport_header
|
|
||||||
|
|
||||||
UDP/GRE Tunnel:
|
|
||||||
Outer Inner
|
Outer Inner
|
||||||
MAC skb_mac_header skb_inner_mac_header
|
MAC skb_mac_header
|
||||||
Network skb_network_header skb_inner_network_header
|
Network skb_network_header skb_inner_network_header
|
||||||
Transport skb_transport_header skb_inner_transport_header
|
Transport skb_transport_header
|
||||||
|
|
||||||
|
UDP/GRE Tunnel::
|
||||||
|
|
||||||
|
Outer Inner
|
||||||
|
MAC skb_mac_header skb_inner_mac_header
|
||||||
|
Network skb_network_header skb_inner_network_header
|
||||||
|
Transport skb_transport_header skb_inner_transport_header
|
||||||
|
|
||||||
In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and
|
In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and
|
||||||
SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the
|
SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the
|
||||||
|
@ -93,6 +103,7 @@ header has requested a remote checksum offload. In this case the inner
|
||||||
headers will be left with a partial checksum and only the outer header
|
headers will be left with a partial checksum and only the outer header
|
||||||
checksum will be computed.
|
checksum will be computed.
|
||||||
|
|
||||||
|
|
||||||
Generic Segmentation Offload
|
Generic Segmentation Offload
|
||||||
============================
|
============================
|
||||||
|
|
||||||
|
@ -106,6 +117,7 @@ Before enabling any hardware segmentation offload a corresponding software
|
||||||
offload is required in GSO. Otherwise it becomes possible for a frame to
|
offload is required in GSO. Otherwise it becomes possible for a frame to
|
||||||
be re-routed between devices and end up being unable to be transmitted.
|
be re-routed between devices and end up being unable to be transmitted.
|
||||||
|
|
||||||
|
|
||||||
Generic Receive Offload
|
Generic Receive Offload
|
||||||
=======================
|
=======================
|
||||||
|
|
||||||
|
@ -117,6 +129,7 @@ this is IPv4 ID in the case that the DF bit is set for a given IP header.
|
||||||
If the value of the IPv4 ID is not sequentially incrementing it will be
|
If the value of the IPv4 ID is not sequentially incrementing it will be
|
||||||
altered so that it is when a frame assembled via GRO is segmented via GSO.
|
altered so that it is when a frame assembled via GRO is segmented via GSO.
|
||||||
|
|
||||||
|
|
||||||
Partial Generic Segmentation Offload
|
Partial Generic Segmentation Offload
|
||||||
====================================
|
====================================
|
||||||
|
|
||||||
|
@ -134,6 +147,7 @@ is the outer IPv4 ID field. It is up to the device drivers to guarantee
|
||||||
that the IPv4 ID field is incremented in the case that a given header does
|
that the IPv4 ID field is incremented in the case that a given header does
|
||||||
not have the DF bit set.
|
not have the DF bit set.
|
||||||
|
|
||||||
|
|
||||||
SCTP accelleration with GSO
|
SCTP accelleration with GSO
|
||||||
===========================
|
===========================
|
||||||
|
|
||||||
|
@ -157,13 +171,13 @@ appropriately.
|
||||||
|
|
||||||
There are some helpers to make this easier:
|
There are some helpers to make this easier:
|
||||||
|
|
||||||
- skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if
|
- skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if
|
||||||
an skb is an SCTP GSO skb.
|
an skb is an SCTP GSO skb.
|
||||||
|
|
||||||
- For size checks, the skb_gso_validate_*_len family of helpers correctly
|
- For size checks, the skb_gso_validate_*_len family of helpers correctly
|
||||||
considers GSO_BY_FRAGS.
|
considers GSO_BY_FRAGS.
|
||||||
|
|
||||||
- For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size
|
- For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size
|
||||||
will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs.
|
will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs.
|
||||||
|
|
||||||
This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits
|
This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits
|
||||||
|
|
Loading…
Reference in New Issue