linux_old1

History

Neal Cardwell df92c8394e tcp: fix xmit timer to only be reset if data ACKed/SACKed Fix a TCP loss recovery performance bug raised recently on the netdev list, in two threads: (i) July 26, 2017: netdev thread "TCP fast retransmit issues" (ii) July 26, 2017: netdev thread: "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one outstanding TLP retransmission" The basic problem is that incoming TCP packets that did not indicate forward progress could cause the xmit timer (TLP or RTO) to be rearmed and pushed back in time. In certain corner cases this could result in the following problems noted in these threads: - Repeated ACKs coming in with bogus SACKs corrupted by middleboxes could cause TCP to repeatedly schedule TLPs forever. We kept sending TLPs after every ~200ms, which elicited bogus SACKs, which caused more TLPs, ad infinitum; we never fired an RTO to fill in the holes. - Incoming data segments could, in some cases, cause us to reschedule our RTO or TLP timer further out in time, for no good reason. This could cause repeated inbound data to result in stalls in outbound data, in the presence of packet loss. This commit fixes these bugs by changing the TLP and RTO ACK processing to: (a) Only reschedule the xmit timer once per ACK. (b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the ACK indicates sufficient forward progress (a packet was cumulatively ACKed, or we got a SACK for a packet that was sent before the most recent retransmit of the write queue head). This brings us back into closer compliance with the RFCs, since, as the comment for tcp_rearm_rto() notes, we should only restart the RTO timer after forward progress on the connection. Previously we were restarting the xmit timer even in these cases where there was no forward progress. As a side benefit, this commit simplifies and speeds up the TCP timer arming logic. We had been calling inet_csk_reset_xmit_timer() three times on normal ACKs that cumulatively acknowledged some data: 1) Once near the top of tcp_ack() to switch from TLP timer to RTO: if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) tcp_rearm_rto(sk); 2) Once in tcp_clean_rtx_queue(), to update the RTO: if (flag & FLAG_ACKED) { tcp_rearm_rto(sk); 3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO to TLP: if (icsk->icsk_pending == ICSK_TIME_RETRANS) tcp_schedule_loss_probe(sk); This commit, by only rescheduling the xmit timer once per ACK, simplifies the code and reduces CPU overhead. This commit was tested in an A/B test with Google web server traffic. SNMP stats and request latency metrics were within noise levels, substantiating that for normal web traffic patterns this is a rare issue. This commit was also tested with packetdrill tests to verify that it fixes the timer behavior in the corner cases discussed in the netdev threads mentioned above. This patch is a bug fix patch intended to be queued for -stable relases. Fixes: `6ba8a3b19e` ("tcp: Tail loss probe (TLP)") Reported-by: Klavs Klavsen <kl@vsen.dk> Reported-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>		2017-08-03 15:38:31 -07:00
..
6lowpan	6lowpan: Don't set IFF_NO_QUEUE	2017-04-12 22:02:40 +02:00
9p	Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-07-15 12:00:42 -07:00
802	net: introduce __skb_put_[zero, data, u8]	2017-06-20 13:30:14 -04:00
8021q	net: add netlink_ext_ack argument to rtnl_link_ops.validate	2017-06-26 23:13:22 -04:00
appletalk	networking: make skb_push & __skb_push return void pointers	2017-06-16 11:48:40 -04:00
atm	net, atm: convert eg_cache_entry.use from atomic_t to refcount_t	2017-07-04 22:35:16 +01:00
ax25	net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t	2017-07-04 22:35:19 +01:00
batman-adv	batman-adv: fix TT sync flag inconsistencies	2017-07-31 11:17:38 +02:00
bluetooth	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next	2017-07-05 12:31:59 -07:00
bpf	bpf: Align packet data properly in program testing framework.	2017-05-02 11:46:28 -04:00
bridge	net: bridge: fix dest lookup when vlan proto doesn't match	2017-07-14 08:19:23 -07:00
caif	net: convert sock.sk_wmem_alloc from atomic_t to refcount_t	2017-07-01 07:39:08 -07:00
can	networking: introduce and use skb_put_data()	2017-06-16 11:48:37 -04:00
ceph	libceph: potential NULL dereference in ceph_msg_data_create()	2017-07-17 14:54:59 +02:00
core	net: check dev->addr_len for dev_set_mac_address()	2017-07-29 11:25:05 -07:00
dcb	dcb: enforce minimum length on IEEE_APPS attribute	2017-05-21 13:42:33 -04:00
dccp	dccp: fix a memleak for dccp_feat_init err process	2017-07-27 00:01:05 -07:00
decnet	net, decnet: convert dn_fib_info.fib_clntref from atomic_t to refcount_t	2017-07-04 22:35:15 +01:00
dns_resolver	Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2017-03-03 10:16:38 -08:00
dsa	net: dsa: Initialize ds->cpu_port_mask earlier	2017-07-24 17:36:27 -07:00
ethernet	networking: make skb_push & __skb_push return void pointers	2017-06-16 11:48:40 -04:00
hsr	net: add netlink_ext_ack argument to rtnl_link_ops.newlink	2017-06-26 23:13:21 -04:00
ieee802154	net: add netlink_ext_ack argument to rtnl_link_ops.validate	2017-06-26 23:13:22 -04:00
ife	…
ipv4	tcp: fix xmit timer to only be reset if data ACKed/SACKed	2017-08-03 15:38:31 -07:00
ipv6	ipv6: set rt6i_protocol properly in the route when it is installed	2017-08-03 15:10:18 -07:00
ipx	net, ipx: convert ipx_route.refcnt from atomic_t to refcount_t	2017-07-04 22:35:17 +01:00
irda	Merge branch 'work.memdup_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2017-07-05 16:05:24 -07:00
iucv	iucv: Convert sk_wmem_alloc accesses to refcount_t.	2017-07-03 02:31:22 -07:00
kcm	net: convert sock.sk_wmem_alloc from atomic_t to refcount_t	2017-07-01 07:39:08 -07:00
key	net, xfrm: convert xfrm_policy.refcnt from atomic_t to refcount_t	2017-07-04 22:35:18 +01:00
l2tp	net, l2tp: convert l2tp_session.ref_count from atomic_t to refcount_t	2017-07-04 22:35:15 +01:00
l3mdev	…
lapb	net, lapb: convert lapb_cb.refcnt from atomic_t to refcount_t	2017-07-04 22:35:16 +01:00
llc	net, llc: convert llc_sap.refcnt from atomic_t to refcount_t	2017-07-04 22:35:15 +01:00
mac80211	net: manual clean code which call skb_put_[data:zero]	2017-06-20 13:30:15 -04:00
mac802154	net: Fix inconsistent teardown and release of private netdev state.	2017-06-07 15:53:24 -04:00
mpls	mpls: fix uninitialized in_label var warning in mpls_getroute	2017-07-08 11:26:41 +01:00
ncsi	networking: make skb_push & __skb_push return void pointers	2017-06-16 11:48:40 -04:00
netfilter	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2017-07-20 16:33:39 -07:00
netlabel	netlink: pass extended ACK struct to parsing functions	2017-04-13 13:58:22 -04:00
netlink	net: convert sock.sk_refcnt from atomic_t to refcount_t	2017-07-01 07:39:08 -07:00
netrom	net, netrom: convert nr_node.refcount from atomic_t to refcount_t	2017-07-04 22:35:17 +01:00
nfc	NFC: Add sockaddr length checks before accessing sa_family in bind handlers	2017-06-23 00:38:31 +02:00
openvswitch	openvswitch: fix potential out of bound access in parse_ct	2017-07-24 16:25:06 -07:00
packet	packet: fix use-after-free in prb_retire_rx_blk_timer_expired()	2017-07-24 17:33:19 -07:00
phonet	net: convert sock.sk_refcnt from atomic_t to refcount_t	2017-07-01 07:39:08 -07:00
psample	networking: make skb_put & friends return void pointers	2017-06-16 11:48:39 -04:00
qrtr	networking: make skb_put & friends return void pointers	2017-06-16 11:48:39 -04:00
rds	rds: Make sure updates to cp_send_gen can be observed	2017-07-20 15:33:01 -07:00
rfkill	net: rfkill: gpio: Switch to devm_acpi_dev_add_driver_gpios()	2017-06-13 11:07:51 +02:00
rose	net: Work around lockdep limitation in sockets that use sockets	2017-03-09 18:23:27 -08:00
rxrpc	net: convert sock.sk_refcnt from atomic_t to refcount_t	2017-07-01 07:39:08 -07:00
sched	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2017-07-20 16:33:39 -07:00
sctp	sctp: fix an array overflow when all ext chunks are set	2017-07-14 09:05:10 -07:00
smc	net/smc: Add warning about remote memory exposure	2017-05-16 14:49:43 -04:00
strparser	strparser: destroy workqueue on module exit	2017-03-03 20:43:26 -08:00
sunrpc	NFS client bugfixes for 4.13	2017-07-21 16:26:01 -07:00
switchdev	net: switchdev: Change notifier chain to be atomic	2017-06-08 14:16:24 -04:00
tipc	net: convert sock.sk_refcnt from atomic_t to refcount_t	2017-07-01 07:39:08 -07:00
tls	TLS: Fix length check in do_tls_getsockopt_tx()	2017-07-06 10:58:19 +01:00
unix	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next	2017-07-05 12:31:59 -07:00
vmw_vsock	net: manual clean code which call skb_put_[data:zero]	2017-06-20 13:30:15 -04:00
wimax	…
wireless	netlink validation fixes for nl80211	2017-07-07 11:35:55 +01:00
x25	net, x25: convert x25_neigh.refcnt from atomic_t to refcount_t	2017-07-04 22:35:18 +01:00
xfrm	net, xfrm: convert sec_path.refcnt from atomic_t to refcount_t	2017-07-04 22:35:18 +01:00
Kconfig	tls: kernel TLS support	2017-06-15 12:12:40 -04:00
Makefile	tls: kernel TLS support	2017-06-15 12:12:40 -04:00
compat.c	get_compat_bpf_fprog(): don't copyin field-by-field	2017-07-04 13:14:34 -04:00
socket.c	net/socket: fix type in assignment and trim long line	2017-07-24 14:17:01 -07:00
sysctl_net.c	sysctl: Remove dead register_sysctl_root	2017-04-16 23:42:49 -05:00