linux/net
Eric Dumazet 3b47d30396 net: gro: add a per device gro flush timer
Tuning coalescing parameters on NIC can be really hard.

Servers can handle both bulk and RPC like traffic, with conflicting
goals : bulk flows want as big GRO packets as possible, RPC want minimal
latencies.

To reach big GRO packets on 10Gbe NIC, one can use :

ethtool -C eth0 rx-usecs 4 rx-frames 44

But this penalizes rpc sessions, with an increase of latencies, up to
50% in some cases, as NICs generally do not force an interrupt when
a packet with TCP Push flag is received.

Some NICs do not have an absolute timer, only a timer rearmed for every
incoming packet.

This patch uses a different strategy : Let GRO stack decides what do do,
based on traffic pattern.

Packets with Push flag wont be delayed.
Packets without Push flag might be held in GRO engine, if we keep
receiving data.

This new mechanism is off by default, and shall be enabled by setting
/sys/class/net/ethX/gro_flush_timeout to a value in nanosecond.

To fully enable this mechanism, drivers should use napi_complete_done()
instead of napi_complete().

Tested:
 Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)

Without this feature, we send back about 305,000 ACK per second.

GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)

Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)

Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.

Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
0.00      0.50

B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
0.00      0.50

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10 12:05:59 -05:00
..
6lowpan 6lowpan: Allow 6LoWPAN to be modular 2014-08-07 11:44:18 -07:00
9p 9p/trans_virtio: enable VQs early 2014-10-15 10:25:04 +10:30
802 net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
8021q net: better IFF_XMIT_DST_RELEASE support 2014-10-07 13:22:11 -04:00
appletalk net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
atm net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
ax25 net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
batman-adv batman-adv: replace strnicmp with strncasecmp 2014-10-14 02:18:24 +02:00
bluetooth net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-06 22:01:18 -05:00
caif net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
can can: add hash based access to single EFF frame filters 2014-05-19 09:38:24 +02:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2014-10-15 06:46:01 +02:00
core net: gro: add a per device gro flush timer 2014-11-10 12:05:59 -05:00
dcb dcbnl : Fix misleading dcb_app->priority explanation 2014-07-30 17:21:05 -07:00
dccp dccp: Convert DCCP_WARN to net_warn_ratelimited 2014-11-08 21:22:54 -05:00
decnet af_decnet: Use time_after_eq 2014-08-22 12:23:11 -07:00
dns_resolver Merge commit 'v3.16' into next 2014-10-01 00:44:04 +10:00
dsa Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-06 22:01:18 -05:00
ethernet net: Add function for parsing the header length out of linear ethernet frames 2014-09-05 17:47:02 -07:00
hsr net/hsr: Remove left-over never-true conditional code. 2014-07-11 15:04:40 -07:00
ieee802154 net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
ipv4 udp: Increment UDP_MIB_IGNOREDMULTI for arriving unmatched multicasts 2014-11-07 15:45:50 -05:00
ipv6 udp: Increment UDP_MIB_IGNOREDMULTI for arriving unmatched multicasts 2014-11-07 15:45:50 -05:00
ipx net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
irda net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
iucv net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
key net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
l2tp net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
lapb lapb: move EXPORT_SYMBOL after functions. 2014-10-24 15:51:42 -04:00
llc net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
mac80211 mac80211: minstrels: fix buffer overflow in HT debugfs rc_stats 2014-10-20 16:37:01 +02:00
mac802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-10-08 21:40:54 -04:00
mpls net: Remove MPLS GSO feature. 2014-11-05 23:52:33 -08:00
netfilter ipvs: Avoid null-pointer deref in debug code 2014-10-28 09:48:31 +09:00
netlabel netlabel: kernel-doc warning fix 2014-10-09 01:40:05 -04:00
netlink net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
netrom net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
nfc net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
openvswitch openvswitch: Avoid NULL mask check while building mask 2014-11-05 23:52:35 -08:00
packet net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
phonet net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-10-18 09:31:37 -07:00
rfkill net: rfkill: kernel-doc warning fixes 2014-10-09 11:16:15 +02:00
rose net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
rxrpc net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
sched sched: fix act file names in header comment 2014-11-06 15:04:41 -05:00
sctp net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
sunrpc Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux 2014-10-08 12:51:44 -04:00
tipc net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
unix net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
vmw_vsock net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
wimax wimax: convert printk to pr_foo() 2014-10-07 20:28:44 -04:00
wireless Here are a few fixes for the wireless stack: one fixes the 2014-10-27 13:38:15 -04:00
x25 net: Add and use skb_copy_datagram_msg() helper. 2014-11-05 16:46:40 -05:00
xfrm net: skb_fclone_busy() needs to detect orphaned skb 2014-10-30 19:58:30 -04:00
Kconfig bpf: split eBPF out of NET 2014-10-27 19:09:59 -04:00
Makefile 6lowpan: introduce new net/6lowpan directory 2014-07-12 01:53:30 +02:00
compat.c net: sendmsg: fix NULL pointer dereference 2014-07-29 12:20:22 -07:00
nonet.c
socket.c File locking related changes for v3.18 (pile #1) 2014-10-11 13:21:34 -04:00
sysctl_net.c