linux/net
Nik Unger 5080f39e8c netem: apply correct delay when rate throttling
I recently reported on the netem list that iperf network benchmarks
show unexpected results when a bandwidth throttling rate has been
configured for netem. Specifically:

1) The measured link bandwidth *increases* when a higher delay is added
2) The measured link bandwidth appears higher than the specified limit
3) The measured link bandwidth for the same very slow settings varies significantly across
  machines

The issue can be reproduced by using tc to configure netem with a
512kbit rate and various (none, 1us, 50ms, 100ms, 200ms) delays on a
veth pair between network namespaces, and then using iperf (or any
other network benchmarking tool) to test throughput. Complete detailed
instructions are in the original email chain here:
https://lists.linuxfoundation.org/pipermail/netem/2017-February/001672.html

There appear to be two underlying bugs causing these effects:

- The first issue causes long delays when the rate is slow and no
  delay is configured (e.g., "rate 512kbit"). This is because SKBs are
  not orphaned when no delay is configured, so orphaning does not
  occur until *after* the rate-induced delay has been applied. For
  this reason, adding a tiny delay (e.g., "rate 512kbit delay 1us")
  dramatically increases the measured bandwidth.

- The second issue is that rate-induced delays are not correctly
  applied, allowing SKB delays to occur in parallel. The indended
  approach is to compute the delay for an SKB and to add this delay to
  the end of the current queue. However, the code does not detect
  existing SKBs in the queue due to improperly testing sch->q.qlen,
  which is nonzero even when packets exist only in the
  rbtree. Consequently, new SKBs do not wait for the current queue to
  empty. When packet delays vary significantly (e.g., if packet sizes
  are different), then this also causes unintended reordering.

I modified the code to expect a delay (and orphan the SKB) when a rate
is configured. I also added some defensive tests that correctly find
the latest scheduled delivery time, even if it is (unexpectedly) for a
packet in sch->q. I have tested these changes on the latest kernel
(4.11.0-rc1+) and the iperf / ping test results are as expected.

Signed-off-by: Nik Unger <njunger@uwaterloo.ca>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 20:14:06 -07:00
..
6lowpan 6lowpan: use rb_entry() 2017-01-22 16:46:13 -05:00
9p Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-03-03 21:44:35 -08:00
802 Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
8021q net: remove ndo_neigh_{construct, destroy} from stacked devices 2017-02-06 11:25:57 -05:00
appletalk lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
atm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-03-15 11:59:10 -07:00
ax25 net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
batman-adv This contains just the average.h change in order to get it 2017-03-02 14:39:17 -08:00
bluetooth net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
bridge bridge: drop netfilter fake rtable unconditionally 2017-03-13 13:01:10 -07:00
caif sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> 2017-03-02 08:42:29 +01:00
can can: bcm: fix hrtimer/tasklet termination in bcm op removal 2017-01-30 11:05:04 +01:00
ceph libceph: osd_request_timeout option 2017-03-07 14:30:38 +01:00
core ipv4: fib_rules: Check if rule is a default rule 2017-03-16 10:18:33 -07:00
dcb net: dcb: set error code on failures 2016-12-03 23:54:25 -05:00
dccp dccp: fix memory leak during tear-down of unsuccessful connection request 2017-03-13 22:00:42 -07:00
decnet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-03-15 11:59:10 -07:00
dns_resolver Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-03-03 10:16:38 -08:00
dsa net: dsa: check out-of-range ageing time value 2017-03-15 15:34:13 -07:00
ethernet Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2017-02-16 21:25:49 -05:00
hsr net/hsr: use eth_hw_addr_random() 2017-02-21 13:25:22 -05:00
ieee802154 lib/vsprintf.c: remove %Z support 2017-02-27 18:43:47 -08:00
ife net: Introduce ife encapsulation module 2017-02-03 15:16:45 -05:00
ipv4 ipv4: fib_rules: Dump FIB rules when registering FIB notifier 2017-03-16 10:18:34 -07:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-03-15 11:59:10 -07:00
ipx ktime: Get rid of the union 2016-12-25 17:21:22 +01:00
irda net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
iucv net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
kcm sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
key netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
l2tp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-28 10:00:39 -08:00
l3mdev
lapb Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
llc net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-03-04 17:31:39 -08:00
mac802154 sched/headers: Prepare to use <linux/rcuupdate.h> instead of <linux/rculist.h> in <linux/sched.h> 2017-03-02 08:42:38 +01:00
mpls Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-03-15 11:59:10 -07:00
ncsi net/ncsi: Improve HNCDSC AEN handler 2016-10-20 11:23:08 -04:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-03-04 17:31:39 -08:00
netlabel netlabel: add CALIPSO to the list of built-in protocols 2017-01-06 22:20:45 -05:00
netlink net: adjust skb->truesize in pskb_expand_head() 2017-01-27 12:03:29 -05:00
netrom net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
nfc net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
openvswitch openvswitch: actions: fixed a brace coding style warning 2017-03-02 13:14:44 -08:00
packet net: don't call strlen() on the user buffer in packet_bind_spkt() 2017-03-01 20:55:57 -08:00
phonet net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
psample net: Introduce psample, a new genetlink channel for packet sampling 2017-01-24 13:44:28 -05:00
qrtr net: qrtr: Mark 'buf' as little endian 2017-01-10 20:45:04 -05:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-03-15 11:59:10 -07:00
rfkill rfkill: remove rfkill-regulator 2017-01-24 11:07:35 +01:00
rose net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
rxrpc rxrpc: Wake up the transmitter if Rx window size increases on the peer 2017-03-10 09:34:23 -08:00
sched netem: apply correct delay when rate throttling 2017-03-16 20:14:06 -07:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-03-15 11:59:10 -07:00
smc net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
strparser strparser: destroy workqueue on module exit 2017-03-03 20:43:26 -08:00
sunrpc sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
switchdev Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-10-30 12:42:58 -04:00
tipc net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
unix net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
vmw_vsock net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
wimax genetlink: mark families as __ro_after_init 2016-10-27 16:16:09 -04:00
wireless Some more updates: 2017-02-10 14:31:51 -05:00
x25 net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2017-03-07 15:00:37 -08:00
Kconfig bpf: make jited programs visible in traces 2017-02-17 13:40:05 -05:00
Makefile net: Introduce ife encapsulation module 2017-02-03 15:16:45 -05:00
compat.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-02-22 10:15:09 -08:00
socket.c net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
sysctl_net.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2016-10-06 09:52:23 -07:00