linux_old1/net
Neal Cardwell df92c8394e tcp: fix xmit timer to only be reset if data ACKed/SACKed
Fix a TCP loss recovery performance bug raised recently on the netdev
list, in two threads:

(i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
(ii) July 26, 2017: netdev thread:
     "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
     outstanding TLP retransmission"

The basic problem is that incoming TCP packets that did not indicate
forward progress could cause the xmit timer (TLP or RTO) to be rearmed
and pushed back in time. In certain corner cases this could result in
the following problems noted in these threads:

 - Repeated ACKs coming in with bogus SACKs corrupted by middleboxes
   could cause TCP to repeatedly schedule TLPs forever. We kept
   sending TLPs after every ~200ms, which elicited bogus SACKs, which
   caused more TLPs, ad infinitum; we never fired an RTO to fill in
   the holes.

 - Incoming data segments could, in some cases, cause us to reschedule
   our RTO or TLP timer further out in time, for no good reason. This
   could cause repeated inbound data to result in stalls in outbound
   data, in the presence of packet loss.

This commit fixes these bugs by changing the TLP and RTO ACK
processing to:

 (a) Only reschedule the xmit timer once per ACK.

 (b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the
     ACK indicates sufficient forward progress (a packet was
     cumulatively ACKed, or we got a SACK for a packet that was sent
     before the most recent retransmit of the write queue head).

This brings us back into closer compliance with the RFCs, since, as
the comment for tcp_rearm_rto() notes, we should only restart the RTO
timer after forward progress on the connection. Previously we were
restarting the xmit timer even in these cases where there was no
forward progress.

As a side benefit, this commit simplifies and speeds up the TCP timer
arming logic. We had been calling inet_csk_reset_xmit_timer() three
times on normal ACKs that cumulatively acknowledged some data:

1) Once near the top of tcp_ack() to switch from TLP timer to RTO:
        if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
               tcp_rearm_rto(sk);

2) Once in tcp_clean_rtx_queue(), to update the RTO:
        if (flag & FLAG_ACKED) {
               tcp_rearm_rto(sk);

3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO
   to TLP:
        if (icsk->icsk_pending == ICSK_TIME_RETRANS)
               tcp_schedule_loss_probe(sk);

This commit, by only rescheduling the xmit timer once per ACK,
simplifies the code and reduces CPU overhead.

This commit was tested in an A/B test with Google web server
traffic. SNMP stats and request latency metrics were within noise
levels, substantiating that for normal web traffic patterns this is a
rare issue. This commit was also tested with packetdrill tests to
verify that it fixes the timer behavior in the corner cases discussed
in the netdev threads mentioned above.

This patch is a bug fix patch intended to be queued for -stable
relases.

Fixes: 6ba8a3b19e ("tcp: Tail loss probe (TLP)")
Reported-by: Klavs Klavsen <kl@vsen.dk>
Reported-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03 15:38:31 -07:00
..
6lowpan 6lowpan: Don't set IFF_NO_QUEUE 2017-04-12 22:02:40 +02:00
9p Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 12:00:42 -07:00
802 net: introduce __skb_put_[zero, data, u8] 2017-06-20 13:30:14 -04:00
8021q net: add netlink_ext_ack argument to rtnl_link_ops.validate 2017-06-26 23:13:22 -04:00
appletalk networking: make skb_push & __skb_push return void pointers 2017-06-16 11:48:40 -04:00
atm net, atm: convert eg_cache_entry.use from atomic_t to refcount_t 2017-07-04 22:35:16 +01:00
ax25 net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t 2017-07-04 22:35:19 +01:00
batman-adv batman-adv: fix TT sync flag inconsistencies 2017-07-31 11:17:38 +02:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-07-05 12:31:59 -07:00
bpf bpf: Align packet data properly in program testing framework. 2017-05-02 11:46:28 -04:00
bridge net: bridge: fix dest lookup when vlan proto doesn't match 2017-07-14 08:19:23 -07:00
caif net: convert sock.sk_wmem_alloc from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
can networking: introduce and use skb_put_data() 2017-06-16 11:48:37 -04:00
ceph libceph: potential NULL dereference in ceph_msg_data_create() 2017-07-17 14:54:59 +02:00
core net: check dev->addr_len for dev_set_mac_address() 2017-07-29 11:25:05 -07:00
dcb dcb: enforce minimum length on IEEE_APPS attribute 2017-05-21 13:42:33 -04:00
dccp dccp: fix a memleak for dccp_feat_init err process 2017-07-27 00:01:05 -07:00
decnet net, decnet: convert dn_fib_info.fib_clntref from atomic_t to refcount_t 2017-07-04 22:35:15 +01:00
dns_resolver Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-03-03 10:16:38 -08:00
dsa net: dsa: Initialize ds->cpu_port_mask earlier 2017-07-24 17:36:27 -07:00
ethernet networking: make skb_push & __skb_push return void pointers 2017-06-16 11:48:40 -04:00
hsr net: add netlink_ext_ack argument to rtnl_link_ops.newlink 2017-06-26 23:13:21 -04:00
ieee802154 net: add netlink_ext_ack argument to rtnl_link_ops.validate 2017-06-26 23:13:22 -04:00
ife
ipv4 tcp: fix xmit timer to only be reset if data ACKed/SACKed 2017-08-03 15:38:31 -07:00
ipv6 ipv6: set rt6i_protocol properly in the route when it is installed 2017-08-03 15:10:18 -07:00
ipx net, ipx: convert ipx_route.refcnt from atomic_t to refcount_t 2017-07-04 22:35:17 +01:00
irda Merge branch 'work.memdup_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-05 16:05:24 -07:00
iucv iucv: Convert sk_wmem_alloc accesses to refcount_t. 2017-07-03 02:31:22 -07:00
kcm net: convert sock.sk_wmem_alloc from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
key net, xfrm: convert xfrm_policy.refcnt from atomic_t to refcount_t 2017-07-04 22:35:18 +01:00
l2tp net, l2tp: convert l2tp_session.ref_count from atomic_t to refcount_t 2017-07-04 22:35:15 +01:00
l3mdev
lapb net, lapb: convert lapb_cb.refcnt from atomic_t to refcount_t 2017-07-04 22:35:16 +01:00
llc net, llc: convert llc_sap.refcnt from atomic_t to refcount_t 2017-07-04 22:35:15 +01:00
mac80211 net: manual clean code which call skb_put_[data:zero] 2017-06-20 13:30:15 -04:00
mac802154 net: Fix inconsistent teardown and release of private netdev state. 2017-06-07 15:53:24 -04:00
mpls mpls: fix uninitialized in_label var warning in mpls_getroute 2017-07-08 11:26:41 +01:00
ncsi networking: make skb_push & __skb_push return void pointers 2017-06-16 11:48:40 -04:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-07-20 16:33:39 -07:00
netlabel netlink: pass extended ACK struct to parsing functions 2017-04-13 13:58:22 -04:00
netlink net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
netrom net, netrom: convert nr_node.refcount from atomic_t to refcount_t 2017-07-04 22:35:17 +01:00
nfc NFC: Add sockaddr length checks before accessing sa_family in bind handlers 2017-06-23 00:38:31 +02:00
openvswitch openvswitch: fix potential out of bound access in parse_ct 2017-07-24 16:25:06 -07:00
packet packet: fix use-after-free in prb_retire_rx_blk_timer_expired() 2017-07-24 17:33:19 -07:00
phonet net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
psample networking: make skb_put & friends return void pointers 2017-06-16 11:48:39 -04:00
qrtr networking: make skb_put & friends return void pointers 2017-06-16 11:48:39 -04:00
rds rds: Make sure updates to cp_send_gen can be observed 2017-07-20 15:33:01 -07:00
rfkill net: rfkill: gpio: Switch to devm_acpi_dev_add_driver_gpios() 2017-06-13 11:07:51 +02:00
rose net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
rxrpc net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
sched Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-07-20 16:33:39 -07:00
sctp sctp: fix an array overflow when all ext chunks are set 2017-07-14 09:05:10 -07:00
smc net/smc: Add warning about remote memory exposure 2017-05-16 14:49:43 -04:00
strparser strparser: destroy workqueue on module exit 2017-03-03 20:43:26 -08:00
sunrpc NFS client bugfixes for 4.13 2017-07-21 16:26:01 -07:00
switchdev net: switchdev: Change notifier chain to be atomic 2017-06-08 14:16:24 -04:00
tipc net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
tls TLS: Fix length check in do_tls_getsockopt_tx() 2017-07-06 10:58:19 +01:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-07-05 12:31:59 -07:00
vmw_vsock net: manual clean code which call skb_put_[data:zero] 2017-06-20 13:30:15 -04:00
wimax
wireless netlink validation fixes for nl80211 2017-07-07 11:35:55 +01:00
x25 net, x25: convert x25_neigh.refcnt from atomic_t to refcount_t 2017-07-04 22:35:18 +01:00
xfrm net, xfrm: convert sec_path.refcnt from atomic_t to refcount_t 2017-07-04 22:35:18 +01:00
Kconfig tls: kernel TLS support 2017-06-15 12:12:40 -04:00
Makefile tls: kernel TLS support 2017-06-15 12:12:40 -04:00
compat.c get_compat_bpf_fprog(): don't copyin field-by-field 2017-07-04 13:14:34 -04:00
socket.c net/socket: fix type in assignment and trim long line 2017-07-24 14:17:01 -07:00
sysctl_net.c sysctl: Remove dead register_sysctl_root 2017-04-16 23:42:49 -05:00