mirror of https://gitee.com/openkylin/linux.git
Merge branch 'Documentation-tls--add-offload-documentation'
Jakub Kicinski says: ==================== Documentation: tls: add offload documentation This set adds documentation for TLS offload. It starts by making the networking documentation a little easier to navigate by hiding driver docs a little deeper. It then RSTifys the existing Kernel TLS documentation. Last but not least TLS offload documentation is added. This should help vendors navigate the TLS offload, and help ensure different implementations stay aligned from user perspective. v2: - address Alexei's and Boris'es commands on patch 3. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
commit
0d18c7bd93
|
@ -0,0 +1,30 @@
|
|||
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||
|
||||
Vendor Device Drivers
|
||||
=====================
|
||||
|
||||
Contents:
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
freescale/dpaa2/index
|
||||
intel/e100
|
||||
intel/e1000
|
||||
intel/e1000e
|
||||
intel/fm10k
|
||||
intel/igb
|
||||
intel/igbvf
|
||||
intel/ixgb
|
||||
intel/ixgbe
|
||||
intel/ixgbevf
|
||||
intel/i40e
|
||||
intel/iavf
|
||||
intel/ice
|
||||
|
||||
.. only:: subproject
|
||||
|
||||
Indices
|
||||
=======
|
||||
|
||||
* :ref:`genindex`
|
|
@ -11,19 +11,7 @@ Contents:
|
|||
batman-adv
|
||||
can
|
||||
can_ucan_protocol
|
||||
device_drivers/freescale/dpaa2/index
|
||||
device_drivers/intel/e100
|
||||
device_drivers/intel/e1000
|
||||
device_drivers/intel/e1000e
|
||||
device_drivers/intel/fm10k
|
||||
device_drivers/intel/igb
|
||||
device_drivers/intel/igbvf
|
||||
device_drivers/intel/ixgb
|
||||
device_drivers/intel/ixgbe
|
||||
device_drivers/intel/ixgbevf
|
||||
device_drivers/intel/i40e
|
||||
device_drivers/intel/iavf
|
||||
device_drivers/intel/ice
|
||||
device_drivers/index
|
||||
dsa/index
|
||||
devlink-info-versions
|
||||
ieee802154
|
||||
|
@ -40,6 +28,8 @@ Contents:
|
|||
checksum-offloads
|
||||
segmentation-offloads
|
||||
scaling
|
||||
tls
|
||||
tls-offload
|
||||
|
||||
.. only:: subproject
|
||||
|
||||
|
|
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 49 KiB |
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 6.4 KiB |
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 6.4 KiB |
|
@ -0,0 +1,482 @@
|
|||
.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
|
||||
|
||||
==================
|
||||
Kernel TLS offload
|
||||
==================
|
||||
|
||||
Kernel TLS operation
|
||||
====================
|
||||
|
||||
Linux kernel provides TLS connection offload infrastructure. Once a TCP
|
||||
connection is in ``ESTABLISHED`` state user space can enable the TLS Upper
|
||||
Layer Protocol (ULP) and install the cryptographic connection state.
|
||||
For details regarding the user-facing interface refer to the TLS
|
||||
documentation in :ref:`Documentation/networking/tls.rst <kernel_tls>`.
|
||||
|
||||
``ktls`` can operate in three modes:
|
||||
|
||||
* Software crypto mode (``TLS_SW``) - CPU handles the cryptography.
|
||||
In most basic cases only crypto operations synchronous with the CPU
|
||||
can be used, but depending on calling context CPU may utilize
|
||||
asynchronous crypto accelerators. The use of accelerators introduces extra
|
||||
latency on socket reads (decryption only starts when a read syscall
|
||||
is made) and additional I/O load on the system.
|
||||
* Packet-based NIC offload mode (``TLS_HW``) - the NIC handles crypto
|
||||
on a packet by packet basis, provided the packets arrive in order.
|
||||
This mode integrates best with the kernel stack and is described in detail
|
||||
in the remaining part of this document
|
||||
(``ethtool`` flags ``tls-hw-tx-offload`` and ``tls-hw-rx-offload``).
|
||||
* Full TCP NIC offload mode (``TLS_HW_RECORD``) - mode of operation where
|
||||
NIC driver and firmware replace the kernel networking stack
|
||||
with its own TCP handling, it is not usable in production environments
|
||||
making use of the Linux networking stack for example any firewalling
|
||||
abilities or QoS and packet scheduling (``ethtool`` flag ``tls-hw-record``).
|
||||
|
||||
The operation mode is selected automatically based on device configuration,
|
||||
offload opt-in or opt-out on per-connection basis is not currently supported.
|
||||
|
||||
TX
|
||||
--
|
||||
|
||||
At a high level user write requests are turned into a scatter list, the TLS ULP
|
||||
intercepts them, inserts record framing, performs encryption (in ``TLS_SW``
|
||||
mode) and then hands the modified scatter list to the TCP layer. From this
|
||||
point on the TCP stack proceeds as normal.
|
||||
|
||||
In ``TLS_HW`` mode the encryption is not performed in the TLS ULP.
|
||||
Instead packets reach a device driver, the driver will mark the packets
|
||||
for crypto offload based on the socket the packet is attached to,
|
||||
and send them to the device for encryption and transmission.
|
||||
|
||||
RX
|
||||
--
|
||||
|
||||
On the receive side if the device handled decryption and authentication
|
||||
successfully, the driver will set the decrypted bit in the associated
|
||||
:c:type:`struct sk_buff <sk_buff>`. The packets reach the TCP stack and
|
||||
are handled normally. ``ktls`` is informed when data is queued to the socket
|
||||
and the ``strparser`` mechanism is used to delineate the records. Upon read
|
||||
request, records are retrieved from the socket and passed to decryption routine.
|
||||
If device decrypted all the segments of the record the decryption is skipped,
|
||||
otherwise software path handles decryption.
|
||||
|
||||
.. kernel-figure:: tls-offload-layers.svg
|
||||
:alt: TLS offload layers
|
||||
:align: center
|
||||
:figwidth: 28em
|
||||
|
||||
Layers of Kernel TLS stack
|
||||
|
||||
Device configuration
|
||||
====================
|
||||
|
||||
During driver initialization device sets the ``NETIF_F_HW_TLS_RX`` and
|
||||
``NETIF_F_HW_TLS_TX`` features and installs its
|
||||
:c:type:`struct tlsdev_ops <tlsdev_ops>`
|
||||
pointer in the :c:member:`tlsdev_ops` member of the
|
||||
:c:type:`struct net_device <net_device>`.
|
||||
|
||||
When TLS cryptographic connection state is installed on a ``ktls`` socket
|
||||
(note that it is done twice, once for RX and once for TX direction,
|
||||
and the two are completely independent), the kernel checks if the underlying
|
||||
network device is offload-capable and attempts the offload. In case offload
|
||||
fails the connection is handled entirely in software using the same mechanism
|
||||
as if the offload was never tried.
|
||||
|
||||
Offload request is performed via the :c:member:`tls_dev_add` callback of
|
||||
:c:type:`struct tlsdev_ops <tlsdev_ops>`:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
int (*tls_dev_add)(struct net_device *netdev, struct sock *sk,
|
||||
enum tls_offload_ctx_dir direction,
|
||||
struct tls_crypto_info *crypto_info,
|
||||
u32 start_offload_tcp_sn);
|
||||
|
||||
``direction`` indicates whether the cryptographic information is for
|
||||
the received or transmitted packets. Driver uses the ``sk`` parameter
|
||||
to retrieve the connection 5-tuple and socket family (IPv4 vs IPv6).
|
||||
Cryptographic information in ``crypto_info`` includes the key, iv, salt
|
||||
as well as TLS record sequence number. ``start_offload_tcp_sn`` indicates
|
||||
which TCP sequence number corresponds to the beginning of the record with
|
||||
sequence number from ``crypto_info``. The driver can add its state
|
||||
at the end of kernel structures (see :c:member:`driver_state` members
|
||||
in ``include/net/tls.h``) to avoid additional allocations and pointer
|
||||
dereferences.
|
||||
|
||||
TX
|
||||
--
|
||||
|
||||
After TX state is installed, the stack guarantees that the first segment
|
||||
of the stream will start exactly at the ``start_offload_tcp_sn`` sequence
|
||||
number, simplifying TCP sequence number matching.
|
||||
|
||||
TX offload being fully initialized does not imply that all segments passing
|
||||
through the driver and which belong to the offloaded socket will be after
|
||||
the expected sequence number and will have kernel record information.
|
||||
In particular, already encrypted data may have been queued to the socket
|
||||
before installing the connection state in the kernel.
|
||||
|
||||
RX
|
||||
--
|
||||
|
||||
In RX direction local networking stack has little control over the segmentation,
|
||||
so the initial records' TCP sequence number may be anywhere inside the segment.
|
||||
|
||||
Normal operation
|
||||
================
|
||||
|
||||
At the minimum the device maintains the following state for each connection, in
|
||||
each direction:
|
||||
|
||||
* crypto secrets (key, iv, salt)
|
||||
* crypto processing state (partial blocks, partial authentication tag, etc.)
|
||||
* record metadata (sequence number, processing offset and length)
|
||||
* expected TCP sequence number
|
||||
|
||||
There are no guarantees on record length or record segmentation. In particular
|
||||
segments may start at any point of a record and contain any number of records.
|
||||
Assuming segments are received in order, the device should be able to perform
|
||||
crypto operations and authentication regardless of segmentation. For this
|
||||
to be possible device has to keep small amount of segment-to-segment state.
|
||||
This includes at least:
|
||||
|
||||
* partial headers (if a segment carried only a part of the TLS header)
|
||||
* partial data block
|
||||
* partial authentication tag (all data had been seen but part of the
|
||||
authentication tag has to be written or read from the subsequent segment)
|
||||
|
||||
Record reassembly is not necessary for TLS offload. If the packets arrive
|
||||
in order the device should be able to handle them separately and make
|
||||
forward progress.
|
||||
|
||||
TX
|
||||
--
|
||||
|
||||
The kernel stack performs record framing reserving space for the authentication
|
||||
tag and populating all other TLS header and tailer fields.
|
||||
|
||||
Both the device and the driver maintain expected TCP sequence numbers
|
||||
due to the possibility of retransmissions and the lack of software fallback
|
||||
once the packet reaches the device.
|
||||
For segments passed in order, the driver marks the packets with
|
||||
a connection identifier (note that a 5-tuple lookup is insufficient to identify
|
||||
packets requiring HW offload, see the :ref:`5tuple_problems` section)
|
||||
and hands them to the device. The device identifies the packet as requiring
|
||||
TLS handling and confirms the sequence number matches its expectation.
|
||||
The device performs encryption and authentication of the record data.
|
||||
It replaces the authentication tag and TCP checksum with correct values.
|
||||
|
||||
RX
|
||||
--
|
||||
|
||||
Before a packet is DMAed to the host (but after NIC's embedded switching
|
||||
and packet transformation functions) the device validates the Layer 4
|
||||
checksum and performs a 5-tuple lookup to find any TLS connection the packet
|
||||
may belong to (technically a 4-tuple
|
||||
lookup is sufficient - IP addresses and TCP port numbers, as the protocol
|
||||
is always TCP). If connection is matched device confirms if the TCP sequence
|
||||
number is the expected one and proceeds to TLS handling (record delineation,
|
||||
decryption, authentication for each record in the packet). The device leaves
|
||||
the record framing unmodified, the stack takes care of record decapsulation.
|
||||
Device indicates successful handling of TLS offload in the per-packet context
|
||||
(descriptor) passed to the host.
|
||||
|
||||
Upon reception of a TLS offloaded packet, the driver sets
|
||||
the :c:member:`decrypted` mark in :c:type:`struct sk_buff <sk_buff>`
|
||||
corresponding to the segment. Networking stack makes sure decrypted
|
||||
and non-decrypted segments do not get coalesced (e.g. by GRO or socket layer)
|
||||
and takes care of partial decryption.
|
||||
|
||||
Resync handling
|
||||
===============
|
||||
|
||||
In presence of packet drops or network packet reordering, the device may lose
|
||||
synchronization with the TLS stream, and require a resync with the kernel's
|
||||
TCP stack.
|
||||
|
||||
Note that resync is only attempted for connections which were successfully
|
||||
added to the device table and are in TLS_HW mode. For example,
|
||||
if the table was full when cryptographic state was installed in the kernel,
|
||||
such connection will never get offloaded. Therefore the resync request
|
||||
does not carry any cryptographic connection state.
|
||||
|
||||
TX
|
||||
--
|
||||
|
||||
Segments transmitted from an offloaded socket can get out of sync
|
||||
in similar ways to the receive side-retransmissions - local drops
|
||||
are possible, though network reorders are not.
|
||||
|
||||
Whenever an out of order segment is transmitted the driver provides
|
||||
the device with enough information to perform cryptographic operations.
|
||||
This means most likely that the part of the record preceding the current
|
||||
segment has to be passed to the device as part of the packet context,
|
||||
together with its TCP sequence number and TLS record number. The device
|
||||
can then initialize its crypto state, process and discard the preceding
|
||||
data (to be able to insert the authentication tag) and move onto handling
|
||||
the actual packet.
|
||||
|
||||
In this mode depending on the implementation the driver can either ask
|
||||
for a continuation with the crypto state and the new sequence number
|
||||
(next expected segment is the one after the out of order one), or continue
|
||||
with the previous stream state - assuming that the out of order segment
|
||||
was just a retransmission. The former is simpler, and does not require
|
||||
retransmission detection therefore it is the recommended method until
|
||||
such time it is proven inefficient.
|
||||
|
||||
RX
|
||||
--
|
||||
|
||||
A small amount of RX reorder events may not require a full resynchronization.
|
||||
In particular the device should not lose synchronization
|
||||
when record boundary can be recovered:
|
||||
|
||||
.. kernel-figure:: tls-offload-reorder-good.svg
|
||||
:alt: reorder of non-header segment
|
||||
:align: center
|
||||
|
||||
Reorder of non-header segment
|
||||
|
||||
Green segments are successfully decrypted, blue ones are passed
|
||||
as received on wire, red stripes mark start of new records.
|
||||
|
||||
In above case segment 1 is received and decrypted successfully.
|
||||
Segment 2 was dropped so 3 arrives out of order. The device knows
|
||||
the next record starts inside 3, based on record length in segment 1.
|
||||
Segment 3 is passed untouched, because due to lack of data from segment 2
|
||||
the remainder of the previous record inside segment 3 cannot be handled.
|
||||
The device can, however, collect the authentication algorithm's state
|
||||
and partial block from the new record in segment 3 and when 4 and 5
|
||||
arrive continue decryption. Finally when 2 arrives it's completely outside
|
||||
of expected window of the device so it's passed as is without special
|
||||
handling. ``ktls`` software fallback handles the decryption of record
|
||||
spanning segments 1, 2 and 3. The device did not get out of sync,
|
||||
even though two segments did not get decrypted.
|
||||
|
||||
Kernel synchronization may be necessary if the lost segment contained
|
||||
a record header and arrived after the next record header has already passed:
|
||||
|
||||
.. kernel-figure:: tls-offload-reorder-bad.svg
|
||||
:alt: reorder of header segment
|
||||
:align: center
|
||||
|
||||
Reorder of segment with a TLS header
|
||||
|
||||
In this example segment 2 gets dropped, and it contains a record header.
|
||||
Device can only detect that segment 4 also contains a TLS header
|
||||
if it knows the length of the previous record from segment 2. In this case
|
||||
the device will lose synchronization with the stream.
|
||||
|
||||
When the device gets out of sync and the stream reaches TCP sequence
|
||||
numbers more than a max size record past the expected TCP sequence number,
|
||||
the device starts scanning for a known header pattern. For example
|
||||
for TLS 1.2 and TLS 1.3 subsequent bytes of value ``0x03 0x03`` occur
|
||||
in the SSL/TLS version field of the header. Once pattern is matched
|
||||
the device continues attempting parsing headers at expected locations
|
||||
(based on the length fields at guessed locations).
|
||||
Whenever the expected location does not contain a valid header the scan
|
||||
is restarted.
|
||||
|
||||
When the header is matched the device sends a confirmation request
|
||||
to the kernel, asking if the guessed location is correct (if a TLS record
|
||||
really starts there), and which record sequence number the given header had.
|
||||
The kernel confirms the guessed location was correct and tells the device
|
||||
the record sequence number. Meanwhile, the device had been parsing
|
||||
and counting all records since the just-confirmed one, it adds the number
|
||||
of records it had seen to the record number provided by the kernel.
|
||||
At this point the device is in sync and can resume decryption at next
|
||||
segment boundary.
|
||||
|
||||
In a pathological case the device may latch onto a sequence of matching
|
||||
headers and never hear back from the kernel (there is no negative
|
||||
confirmation from the kernel). The implementation may choose to periodically
|
||||
restart scan. Given how unlikely falsely-matching stream is, however,
|
||||
periodic restart is not deemed necessary.
|
||||
|
||||
Special care has to be taken if the confirmation request is passed
|
||||
asynchronously to the packet stream and record may get processed
|
||||
by the kernel before the confirmation request.
|
||||
|
||||
Error handling
|
||||
==============
|
||||
|
||||
TX
|
||||
--
|
||||
|
||||
Packets may be redirected or rerouted by the stack to a different
|
||||
device than the selected TLS offload device. The stack will handle
|
||||
such condition using the :c:func:`sk_validate_xmit_skb` helper
|
||||
(TLS offload code installs :c:func:`tls_validate_xmit_skb` at this hook).
|
||||
Offload maintains information about all records until the data is
|
||||
fully acknowledged, so if skbs reach the wrong device they can be handled
|
||||
by software fallback.
|
||||
|
||||
Any device TLS offload handling error on the transmission side must result
|
||||
in the packet being dropped. For example if a packet got out of order
|
||||
due to a bug in the stack or the device, reached the device and can't
|
||||
be encrypted such packet must be dropped.
|
||||
|
||||
RX
|
||||
--
|
||||
|
||||
If the device encounters any problems with TLS offload on the receive
|
||||
side it should pass the packet to the host's networking stack as it was
|
||||
received on the wire.
|
||||
|
||||
For example authentication failure for any record in the segment should
|
||||
result in passing the unmodified packet to the software fallback. This means
|
||||
packets should not be modified "in place". Splitting segments to handle partial
|
||||
decryption is not advised. In other words either all records in the packet
|
||||
had been handled successfully and authenticated or the packet has to be passed
|
||||
to the host's stack as it was on the wire (recovering original packet in the
|
||||
driver if device provides precise error is sufficient).
|
||||
|
||||
The Linux networking stack does not provide a way of reporting per-packet
|
||||
decryption and authentication errors, packets with errors must simply not
|
||||
have the :c:member:`decrypted` mark set.
|
||||
|
||||
A packet should also not be handled by the TLS offload if it contains
|
||||
incorrect checksums.
|
||||
|
||||
Performance metrics
|
||||
===================
|
||||
|
||||
TLS offload can be characterized by the following basic metrics:
|
||||
|
||||
* max connection count
|
||||
* connection installation rate
|
||||
* connection installation latency
|
||||
* total cryptographic performance
|
||||
|
||||
Note that each TCP connection requires a TLS session in both directions,
|
||||
the performance may be reported treating each direction separately.
|
||||
|
||||
Max connection count
|
||||
--------------------
|
||||
|
||||
The number of connections device can support can be exposed via
|
||||
``devlink resource`` API.
|
||||
|
||||
Total cryptographic performance
|
||||
-------------------------------
|
||||
|
||||
Offload performance may depend on segment and record size.
|
||||
|
||||
Overload of the cryptographic subsystem of the device should not have
|
||||
significant performance impact on non-offloaded streams.
|
||||
|
||||
Statistics
|
||||
==========
|
||||
|
||||
Following minimum set of TLS-related statistics should be reported
|
||||
by the driver:
|
||||
|
||||
* ``rx_tls_decrypted`` - number of successfully decrypted TLS segments
|
||||
* ``tx_tls_encrypted`` - number of in-order TLS segments passed to device
|
||||
for encryption
|
||||
* ``tx_tls_ooo`` - number of TX packets which were part of a TLS stream
|
||||
but did not arrive in the expected order
|
||||
* ``tx_tls_drop_no_sync_data`` - number of TX packets dropped because
|
||||
they arrived out of order and associated record could not be found
|
||||
(see also :ref:`pre_tls_data`)
|
||||
|
||||
Notable corner cases, exceptions and additional requirements
|
||||
============================================================
|
||||
|
||||
.. _5tuple_problems:
|
||||
|
||||
5-tuple matching limitations
|
||||
----------------------------
|
||||
|
||||
The device can only recognize received packets based on the 5-tuple
|
||||
of the socket. Current ``ktls`` implementation will not offload sockets
|
||||
routed through software interfaces such as those used for tunneling
|
||||
or virtual networking. However, many packet transformations performed
|
||||
by the networking stack (most notably any BPF logic) do not require
|
||||
any intermediate software device, therefore a 5-tuple match may
|
||||
consistently miss at the device level. In such cases the device
|
||||
should still be able to perform TX offload (encryption) and should
|
||||
fallback cleanly to software decryption (RX).
|
||||
|
||||
Out of order
|
||||
------------
|
||||
|
||||
Introducing extra processing in NICs should not cause packets to be
|
||||
transmitted or received out of order, for example pure ACK packets
|
||||
should not be reordered with respect to data segments.
|
||||
|
||||
Ingress reorder
|
||||
---------------
|
||||
|
||||
A device is permitted to perform packet reordering for consecutive
|
||||
TCP segments (i.e. placing packets in the correct order) but any form
|
||||
of additional buffering is disallowed.
|
||||
|
||||
Coexistence with standard networking offload features
|
||||
-----------------------------------------------------
|
||||
|
||||
Offloaded ``ktls`` sockets should support standard TCP stack features
|
||||
transparently. Enabling device TLS offload should not cause any difference
|
||||
in packets as seen on the wire.
|
||||
|
||||
Transport layer transparency
|
||||
----------------------------
|
||||
|
||||
The device should not modify any packet headers for the purpose
|
||||
of the simplifying TLS offload.
|
||||
|
||||
The device should not depend on any packet headers beyond what is strictly
|
||||
necessary for TLS offload.
|
||||
|
||||
Segment drops
|
||||
-------------
|
||||
|
||||
Dropping packets is acceptable only in the event of catastrophic
|
||||
system errors and should never be used as an error handling mechanism
|
||||
in cases arising from normal operation. In other words, reliance
|
||||
on TCP retransmissions to handle corner cases is not acceptable.
|
||||
|
||||
TLS device features
|
||||
-------------------
|
||||
|
||||
Drivers should ignore the changes to TLS the device feature flags.
|
||||
These flags will be acted upon accordingly by the core ``ktls`` code.
|
||||
TLS device feature flags only control adding of new TLS connection
|
||||
offloads, old connections will remain active after flags are cleared.
|
||||
|
||||
Known bugs
|
||||
==========
|
||||
|
||||
skb_orphan() leaks clear text
|
||||
-----------------------------
|
||||
|
||||
Currently drivers depend on the :c:member:`sk` member of
|
||||
:c:type:`struct sk_buff <sk_buff>` to identify segments requiring
|
||||
encryption. Any operation which removes or does not preserve the socket
|
||||
association such as :c:func:`skb_orphan` or :c:func:`skb_clone`
|
||||
will cause the driver to miss the packets and lead to clear text leaks.
|
||||
|
||||
Redirects leak clear text
|
||||
-------------------------
|
||||
|
||||
In the RX direction, if segment has already been decrypted by the device
|
||||
and it gets redirected or mirrored - clear text will be transmitted out.
|
||||
|
||||
.. _pre_tls_data:
|
||||
|
||||
Transmission of pre-TLS data
|
||||
----------------------------
|
||||
|
||||
User can enqueue some already encrypted and framed records before enabling
|
||||
``ktls`` on the socket. Those records have to get sent as they are. This is
|
||||
perfectly easy to handle in the software case - such data will be waiting
|
||||
in the TCP layer, TLS ULP won't see it. In the offloaded case when pre-queued
|
||||
segment reaches transmission point it appears to be out of order (before the
|
||||
expected TCP sequence number) and the stack does not have a record information
|
||||
associated.
|
||||
|
||||
All segments without record information cannot, however, be assumed to be
|
||||
pre-queued data, because a race condition exists between TCP stack queuing
|
||||
a retransmission, the driver seeing the retransmission and TCP ACK arriving
|
||||
for the retransmitted data.
|
|
@ -1,3 +1,9 @@
|
|||
.. _kernel_tls:
|
||||
|
||||
==========
|
||||
Kernel TLS
|
||||
==========
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
|
@ -12,6 +18,8 @@ Creating a TLS connection
|
|||
|
||||
First create a new TCP socket and set the TLS ULP.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
sock = socket(AF_INET, SOCK_STREAM, 0);
|
||||
setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
|
||||
|
||||
|
@ -21,6 +29,8 @@ handshake is complete, we have all the parameters required to move the
|
|||
data-path to the kernel. There is a separate socket option for moving
|
||||
the transmit and the receive into the kernel.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
/* From linux/tls.h */
|
||||
struct tls_crypto_info {
|
||||
unsigned short version;
|
||||
|
@ -58,6 +68,8 @@ After setting the TLS_TX socket option all application data sent over this
|
|||
socket is encrypted using TLS and the parameters provided in the socket option.
|
||||
For example, we can send an encrypted hello world record as follows:
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
const char *msg = "hello world\n";
|
||||
send(sock, msg, strlen(msg));
|
||||
|
||||
|
@ -67,6 +79,8 @@ to the encrypted kernel send buffer if possible.
|
|||
The sendfile system call will send the file's data over TLS records of maximum
|
||||
length (2^14).
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
file = open(filename, O_RDONLY);
|
||||
fstat(file, &stat);
|
||||
sendfile(sock, file, &offset, stat.st_size);
|
||||
|
@ -89,6 +103,8 @@ After setting the TLS_RX socket option, all recv family socket calls
|
|||
are decrypted using TLS parameters provided. A full TLS record must
|
||||
be received before decryption can happen.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
char buffer[16384];
|
||||
recv(sock, buffer, 16384);
|
||||
|
||||
|
@ -97,12 +113,12 @@ large enough, and no additional allocations occur. If the userspace
|
|||
buffer is too small, data is decrypted in the kernel and copied to
|
||||
userspace.
|
||||
|
||||
EINVAL is returned if the TLS version in the received message does not
|
||||
``EINVAL`` is returned if the TLS version in the received message does not
|
||||
match the version passed in setsockopt.
|
||||
|
||||
EMSGSIZE is returned if the received message is too big.
|
||||
``EMSGSIZE`` is returned if the received message is too big.
|
||||
|
||||
EBADMSG is returned if decryption failed for any other reason.
|
||||
``EBADMSG`` is returned if decryption failed for any other reason.
|
||||
|
||||
Send TLS control messages
|
||||
-------------------------
|
||||
|
@ -113,9 +129,11 @@ These messages can be sent over the socket by providing the TLS record type
|
|||
via a CMSG. For example the following function sends @data of @length bytes
|
||||
using a record of type @record_type.
|
||||
|
||||
/* send TLS control message using record_type */
|
||||
.. code-block:: c
|
||||
|
||||
/* send TLS control message using record_type */
|
||||
static int klts_send_ctrl_message(int sock, unsigned char record_type,
|
||||
void *data, size_t length)
|
||||
void *data, size_t length)
|
||||
{
|
||||
struct msghdr msg = {0};
|
||||
int cmsg_len = sizeof(record_type);
|
||||
|
@ -151,6 +169,8 @@ type passed via cmsg. If no cmsg buffer is provided, an error is
|
|||
returned if a control message is received. Data messages may be
|
||||
received without a cmsg buffer set.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
char buffer[16384];
|
||||
char cmsg[CMSG_SPACE(sizeof(unsigned char))];
|
||||
struct msghdr msg = {0};
|
||||
|
@ -186,12 +206,10 @@ Integrating in to userspace TLS library
|
|||
At a high level, the kernel TLS ULP is a replacement for the record
|
||||
layer of a userspace TLS library.
|
||||
|
||||
A patchset to OpenSSL to use ktls as the record layer is here:
|
||||
A patchset to OpenSSL to use ktls as the record layer is
|
||||
`here <https://github.com/Mellanox/openssl/commits/tls_rx2>`_.
|
||||
|
||||
https://github.com/Mellanox/openssl/commits/tls_rx2
|
||||
|
||||
An example of calling send directly after a handshake using
|
||||
gnutls. Since it doesn't implement a full record layer, control
|
||||
messages are not supported:
|
||||
|
||||
https://github.com/ktls/af_ktls-tool/commits/RX
|
||||
`An example <https://github.com/ktls/af_ktls-tool/commits/RX>`_
|
||||
of calling send directly after a handshake using gnutls.
|
||||
Since it doesn't implement a full record layer, control
|
||||
messages are not supported.
|
Loading…
Reference in New Issue