mirror of https://gitee.com/openkylin/linux.git
add document for TCP OFO, PAWS and skip ACK counters
add document and examples for below counters: TcpExtTCPOFOQueue TcpExtTCPOFODrop TcpExtTCPOFOMerge TcpExtPAWSActive TcpExtPAWSEstab TcpExtTCPACKSkippedSynRecv TcpExtTCPACKSkippedPAWS TcpExtTCPACKSkippedSeq TcpExtTCPACKSkippedFinWait2 TcpExtTCPACKSkippedTimeWait TcpExtTCPACKSkippedChallenge Signed-off-by: yupeng <yupeng0921@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
parent
3a0ed3e961
commit
2b96547223
|
@ -571,7 +571,97 @@ duplicate packet is received.
|
||||||
|
|
||||||
* TcpExtTCPDSACKOfoRecv
|
* TcpExtTCPDSACKOfoRecv
|
||||||
The TCP stack receives a DSACK, which indicate an out of order
|
The TCP stack receives a DSACK, which indicate an out of order
|
||||||
duplciate packet is received.
|
duplicate packet is received.
|
||||||
|
|
||||||
|
TCP out of order
|
||||||
|
===============
|
||||||
|
* TcpExtTCPOFOQueue
|
||||||
|
The TCP layer receives an out of order packet and has enough memory
|
||||||
|
to queue it.
|
||||||
|
|
||||||
|
* TcpExtTCPOFODrop
|
||||||
|
The TCP layer receives an out of order packet but doesn't have enough
|
||||||
|
memory, so drops it. Such packets won't be counted into
|
||||||
|
TcpExtTCPOFOQueue.
|
||||||
|
|
||||||
|
* TcpExtTCPOFOMerge
|
||||||
|
The received out of order packet has an overlay with the previous
|
||||||
|
packet. the overlay part will be dropped. All of TcpExtTCPOFOMerge
|
||||||
|
packets will also be counted into TcpExtTCPOFOQueue.
|
||||||
|
|
||||||
|
TCP PAWS
|
||||||
|
=======
|
||||||
|
PAWS (Protection Against Wrapped Sequence numbers) is an algorithm
|
||||||
|
which is used to drop old packets. It depends on the TCP
|
||||||
|
timestamps. For detail information, please refer the `timestamp wiki`_
|
||||||
|
and the `RFC of PAWS`_.
|
||||||
|
|
||||||
|
.. _RFC of PAWS: https://tools.ietf.org/html/rfc1323#page-17
|
||||||
|
.. _timestamp wiki: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_timestamps
|
||||||
|
|
||||||
|
* TcpExtPAWSActive
|
||||||
|
Packets are dropped by PAWS in Syn-Sent status.
|
||||||
|
|
||||||
|
* TcpExtPAWSEstab
|
||||||
|
Packets are dropped by PAWS in any status other than Syn-Sent.
|
||||||
|
|
||||||
|
TCP ACK skip
|
||||||
|
===========
|
||||||
|
In some scenarios, kernel would avoid sending duplicate ACKs too
|
||||||
|
frequently. Please find more details in the tcp_invalid_ratelimit
|
||||||
|
section of the `sysctl document`_. When kernel decides to skip an ACK
|
||||||
|
due to tcp_invalid_ratelimit, kernel would update one of below
|
||||||
|
counters to indicate the ACK is skipped in which scenario. The ACK
|
||||||
|
would only be skipped if the received packet is either a SYN packet or
|
||||||
|
it has no data.
|
||||||
|
|
||||||
|
.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
|
||||||
|
|
||||||
|
* TcpExtTCPACKSkippedSynRecv
|
||||||
|
The ACK is skipped in Syn-Recv status. The Syn-Recv status means the
|
||||||
|
TCP stack receives a SYN and replies SYN+ACK. Now the TCP stack is
|
||||||
|
waiting for an ACK. Generally, the TCP stack doesn't need to send ACK
|
||||||
|
in the Syn-Recv status. But in several scenarios, the TCP stack need
|
||||||
|
to send an ACK. E.g., the TCP stack receives the same SYN packet
|
||||||
|
repeately, the received packet does not pass the PAWS check, or the
|
||||||
|
received packet sequence number is out of window. In these scenarios,
|
||||||
|
the TCP stack needs to send ACK. If the ACk sending frequency is higher than
|
||||||
|
tcp_invalid_ratelimit allows, the TCP stack will skip sending ACK and
|
||||||
|
increase TcpExtTCPACKSkippedSynRecv.
|
||||||
|
|
||||||
|
|
||||||
|
* TcpExtTCPACKSkippedPAWS
|
||||||
|
The ACK is skipped due to PAWS (Protect Against Wrapped Sequence
|
||||||
|
numbers) check fails. If the PAWS check fails in Syn-Recv, Fin-Wait-2
|
||||||
|
or Time-Wait statuses, the skipped ACK would be counted to
|
||||||
|
TcpExtTCPACKSkippedSynRecv, TcpExtTCPACKSkippedFinWait2 or
|
||||||
|
TcpExtTCPACKSkippedTimeWait. In all other statuses, the skipped ACK
|
||||||
|
would be counted to TcpExtTCPACKSkippedPAWS.
|
||||||
|
|
||||||
|
* TcpExtTCPACKSkippedSeq
|
||||||
|
The sequence number is out of window and the timestamp passes the PAWS
|
||||||
|
check and the TCP status is not Syn-Recv, Fin-Wait-2, and Time-Wait.
|
||||||
|
|
||||||
|
* TcpExtTCPACKSkippedFinWait2
|
||||||
|
The ACK is skipped in Fin-Wait-2 status, the reason would be either
|
||||||
|
PAWS check fails or the received sequence number is out of window.
|
||||||
|
|
||||||
|
* TcpExtTCPACKSkippedTimeWait
|
||||||
|
Tha ACK is skipped in Time-Wait status, the reason would be either
|
||||||
|
PAWS check failed or the received sequence number is out of window.
|
||||||
|
|
||||||
|
* TcpExtTCPACKSkippedChallenge
|
||||||
|
The ACK is skipped if the ACK is a challenge ACK. The RFC 5961 defines
|
||||||
|
3 kind of challenge ACK, please refer `RFC 5961 section 3.2`_,
|
||||||
|
`RFC 5961 section 4.2`_ and `RFC 5961 section 5.2`_. Besides these
|
||||||
|
three scenarios, In some TCP status, the linux TCP stack would also
|
||||||
|
send challenge ACKs if the ACK number is before the first
|
||||||
|
unacknowledged number (more strict than `RFC 5961 section 5.2`_).
|
||||||
|
|
||||||
|
.. _RFC 5961 section 3.2: https://tools.ietf.org/html/rfc5961#page-7
|
||||||
|
.. _RFC 5961 section 4.2: https://tools.ietf.org/html/rfc5961#page-9
|
||||||
|
.. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11
|
||||||
|
|
||||||
|
|
||||||
examples
|
examples
|
||||||
=======
|
=======
|
||||||
|
@ -1188,3 +1278,151 @@ Run nstat on server B::
|
||||||
We have deleted the default route on server B. Server B couldn't find
|
We have deleted the default route on server B. Server B couldn't find
|
||||||
a route for the 8.8.8.8 IP address, so server B increased
|
a route for the 8.8.8.8 IP address, so server B increased
|
||||||
IpOutNoRoutes.
|
IpOutNoRoutes.
|
||||||
|
|
||||||
|
TcpExtTCPACKSkippedSynRecv
|
||||||
|
------------------------
|
||||||
|
In this test, we send 3 same SYN packets from client to server. The
|
||||||
|
first SYN will let server create a socket, set it to Syn-Recv status,
|
||||||
|
and reply a SYN/ACK. The second SYN will let server reply the SYN/ACK
|
||||||
|
again, and record the reply time (the duplicate ACK reply time). The
|
||||||
|
third SYN will let server check the previous duplicate ACK reply time,
|
||||||
|
and decide to skip the duplicate ACK, then increase the
|
||||||
|
TcpExtTCPACKSkippedSynRecv counter.
|
||||||
|
|
||||||
|
Run tcpdump to capture a SYN packet::
|
||||||
|
|
||||||
|
nstatuser@nstat-a:~$ sudo tcpdump -c 1 -w /tmp/syn.pcap port 9000
|
||||||
|
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
|
||||||
|
|
||||||
|
Open another terminal, run nc command::
|
||||||
|
|
||||||
|
nstatuser@nstat-a:~$ nc nstat-b 9000
|
||||||
|
|
||||||
|
As the nstat-b didn't listen on port 9000, it should reply a RST, and
|
||||||
|
the nc command exited immediately. It was enough for the tcpdump
|
||||||
|
command to capture a SYN packet. A linux server might use hardware
|
||||||
|
offload for the TCP checksum, so the checksum in the /tmp/syn.pcap
|
||||||
|
might be not correct. We call tcprewrite to fix it::
|
||||||
|
|
||||||
|
nstatuser@nstat-a:~$ tcprewrite --infile=/tmp/syn.pcap --outfile=/tmp/syn_fixcsum.pcap --fixcsum
|
||||||
|
|
||||||
|
On nstat-b, we run nc to listen on port 9000::
|
||||||
|
|
||||||
|
nstatuser@nstat-b:~$ nc -lkv 9000
|
||||||
|
Listening on [0.0.0.0] (family 0, port 9000)
|
||||||
|
|
||||||
|
On nstat-a, we blocked the packet from port 9000, or nstat-a would send
|
||||||
|
RST to nstat-b::
|
||||||
|
|
||||||
|
nstatuser@nstat-a:~$ sudo iptables -A INPUT -p tcp --sport 9000 -j DROP
|
||||||
|
|
||||||
|
Send 3 SYN repeatly to nstat-b::
|
||||||
|
|
||||||
|
nstatuser@nstat-a:~$ for i in {1..3}; do sudo tcpreplay -i ens3 /tmp/syn_fixcsum.pcap; done
|
||||||
|
|
||||||
|
Check snmp cunter on nstat-b::
|
||||||
|
|
||||||
|
nstatuser@nstat-b:~$ nstat | grep -i skip
|
||||||
|
TcpExtTCPACKSkippedSynRecv 1 0.0
|
||||||
|
|
||||||
|
As we expected, TcpExtTCPACKSkippedSynRecv is 1.
|
||||||
|
|
||||||
|
TcpExtTCPACKSkippedPAWS
|
||||||
|
----------------------
|
||||||
|
To trigger PAWS, we could send an old SYN.
|
||||||
|
|
||||||
|
On nstat-b, let nc listen on port 9000::
|
||||||
|
|
||||||
|
nstatuser@nstat-b:~$ nc -lkv 9000
|
||||||
|
Listening on [0.0.0.0] (family 0, port 9000)
|
||||||
|
|
||||||
|
On nstat-a, run tcpdump to capture a SYN::
|
||||||
|
|
||||||
|
nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/paws_pre.pcap -c 1 port 9000
|
||||||
|
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
|
||||||
|
|
||||||
|
On nstat-a, run nc as a client to connect nstat-b::
|
||||||
|
|
||||||
|
nstatuser@nstat-a:~$ nc -v nstat-b 9000
|
||||||
|
Connection to nstat-b 9000 port [tcp/*] succeeded!
|
||||||
|
|
||||||
|
Now the tcpdump has captured the SYN and exit. We should fix the
|
||||||
|
checksum::
|
||||||
|
|
||||||
|
nstatuser@nstat-a:~$ tcprewrite --infile /tmp/paws_pre.pcap --outfile /tmp/paws.pcap --fixcsum
|
||||||
|
|
||||||
|
Send the SYN packet twice::
|
||||||
|
|
||||||
|
nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/paws.pcap; done
|
||||||
|
|
||||||
|
On nstat-b, check the snmp counter::
|
||||||
|
|
||||||
|
nstatuser@nstat-b:~$ nstat | grep -i skip
|
||||||
|
TcpExtTCPACKSkippedPAWS 1 0.0
|
||||||
|
|
||||||
|
We sent two SYN via tcpreplay, both of them would let PAWS check
|
||||||
|
failed, the nstat-b replied an ACK for the first SYN, skipped the ACK
|
||||||
|
for the second SYN, and updated TcpExtTCPACKSkippedPAWS.
|
||||||
|
|
||||||
|
TcpExtTCPACKSkippedSeq
|
||||||
|
--------------------
|
||||||
|
To trigger TcpExtTCPACKSkippedSeq, we send packets which have valid
|
||||||
|
timestamp (to pass PAWS check) but the sequence number is out of
|
||||||
|
window. The linux TCP stack would avoid to skip if the packet has
|
||||||
|
data, so we need a pure ACK packet. To generate such a packet, we
|
||||||
|
could create two sockets: one on port 9000, another on port 9001. Then
|
||||||
|
we capture an ACK on port 9001, change the source/destination port
|
||||||
|
numbers to match the port 9000 socket. Then we could trigger
|
||||||
|
TcpExtTCPACKSkippedSeq via this packet.
|
||||||
|
|
||||||
|
On nstat-b, open two terminals, run two nc commands to listen on both
|
||||||
|
port 9000 and port 9001::
|
||||||
|
|
||||||
|
nstatuser@nstat-b:~$ nc -lkv 9000
|
||||||
|
Listening on [0.0.0.0] (family 0, port 9000)
|
||||||
|
|
||||||
|
nstatuser@nstat-b:~$ nc -lkv 9001
|
||||||
|
Listening on [0.0.0.0] (family 0, port 9001)
|
||||||
|
|
||||||
|
On nstat-a, run two nc clients::
|
||||||
|
|
||||||
|
nstatuser@nstat-a:~$ nc -v nstat-b 9000
|
||||||
|
Connection to nstat-b 9000 port [tcp/*] succeeded!
|
||||||
|
|
||||||
|
nstatuser@nstat-a:~$ nc -v nstat-b 9001
|
||||||
|
Connection to nstat-b 9001 port [tcp/*] succeeded!
|
||||||
|
|
||||||
|
On nstat-a, run tcpdump to capture an ACK::
|
||||||
|
|
||||||
|
nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/seq_pre.pcap -c 1 dst port 9001
|
||||||
|
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes
|
||||||
|
|
||||||
|
On nstat-b, send a packet via the port 9001 socket. E.g. we sent a
|
||||||
|
string 'foo' in our example::
|
||||||
|
|
||||||
|
nstatuser@nstat-b:~$ nc -lkv 9001
|
||||||
|
Listening on [0.0.0.0] (family 0, port 9001)
|
||||||
|
Connection from nstat-a 42132 received!
|
||||||
|
foo
|
||||||
|
|
||||||
|
On nstat-a, the tcpdump should have caputred the ACK. We should check
|
||||||
|
the source port numbers of the two nc clients::
|
||||||
|
|
||||||
|
nstatuser@nstat-a:~$ ss -ta '( dport = :9000 || dport = :9001 )' | tee
|
||||||
|
State Recv-Q Send-Q Local Address:Port Peer Address:Port
|
||||||
|
ESTAB 0 0 192.168.122.250:50208 192.168.122.251:9000
|
||||||
|
ESTAB 0 0 192.168.122.250:42132 192.168.122.251:9001
|
||||||
|
|
||||||
|
Run tcprewrite, change port 9001 to port 9000, chagne port 42132 to
|
||||||
|
port 50208::
|
||||||
|
|
||||||
|
nstatuser@nstat-a:~$ tcprewrite --infile /tmp/seq_pre.pcap --outfile /tmp/seq.pcap -r 9001:9000 -r 42132:50208 --fixcsum
|
||||||
|
|
||||||
|
Now the /tmp/seq.pcap is the packet we need. Send it to nstat-b::
|
||||||
|
|
||||||
|
nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/seq.pcap; done
|
||||||
|
|
||||||
|
Check TcpExtTCPACKSkippedSeq on nstat-b::
|
||||||
|
|
||||||
|
nstatuser@nstat-b:~$ nstat | grep -i skip
|
||||||
|
TcpExtTCPACKSkippedSeq 1 0.0
|
||||||
|
|
Loading…
Reference in New Issue