Commit Graph

649421 Commits

Author SHA1 Message Date
Eric Dumazet 69629464e0 udp: properly cope with csum errors
Dmitry reported that UDP sockets being destroyed would trigger the
WARN_ON(atomic_read(&sk->sk_rmem_alloc)); in inet_sock_destruct()

It turns out we do not properly destroy skb(s) that have wrong UDP
checksum.

Thanks again to syzkaller team.

Fixes : 7c13f97ffd ("udp: do fwd memory scheduling on dequeue")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 11:19:00 -05:00
David S. Miller 6a413e269b Merge branch 'net-Fix-on-stack-USB-buffers'
Ben Hutchings says:

====================
net: Fix on-stack USB buffers

Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default).  This
series fixes all the instances I could find where USB networking
drivers do that.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:07:03 -05:00
Ben Hutchings 2d6a0e9de0 catc: Use heap buffer for memory size test
Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default).

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:07:02 -05:00
Ben Hutchings d41149145f catc: Combine failure cleanup code in catc_probe()
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:07:02 -05:00
Ben Hutchings 7926aff5c5 rtl8150: Use heap buffers for all register access
Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default).

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:07:02 -05:00
Ben Hutchings 5593523f96 pegasus: Use heap buffers for all register access
Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default).

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
References: https://bugs.debian.org/852556
Reported-by: Lisandro Damián Nicanor Pérez Meyer <lisandro@debian.org>
Tested-by: Lisandro Damián Nicanor Pérez Meyer <lisandro@debian.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 10:07:01 -05:00
David S. Miller 432d4f8ab0 Merge branch 'read-vnet_hdr_sz-once'
Willem de Bruijn says:

====================
read vnet_hdr_sz once

Tuntap devices allow concurrent use and update of field vnet_hdr_sz.
Read the field once to avoid TOCTOU.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 22:41:27 -05:00
Willem de Bruijn 837585a537 macvtap: read vnet_hdr_size once
When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
Data length is verified to be greater than or equal to expected header
length tun->vnet_hdr_sz before copying.

Macvtap functions read the value once, but unless READ_ONCE is used,
the compiler may ignore this and read multiple times. Enforce a single
read and locally cached value to avoid updates between test and use.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 22:41:27 -05:00
Willem de Bruijn e1edab87fa tun: read vnet_hdr_sz once
When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
Data length is verified to be greater than or equal to expected header
length tun->vnet_hdr_sz before copying.

Read this value once and cache locally, as it can be updated between
the test and use (TOCTOU).

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
CC: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 22:41:27 -05:00
Eric Dumazet ccf7abb93a tcp: avoid infinite loop in tcp_splice_read()
Splicing from TCP socket is vulnerable when a packet with URG flag is
received and stored into receive queue.

__tcp_splice_read() returns 0, and sk_wait_data() immediately
returns since there is the problematic skb in queue.

This is a nice way to burn cpu (aka infinite loop) and trigger
soft lockups.

Again, this gem was found by syzkaller tool.

Fixes: 9c55e01c0c ("[TCP]: Splice receive support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov  <dvyukov@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 22:38:55 -05:00
Arnd Bergmann b3f2d07f46 hns: avoid stack overflow with CONFIG_KASAN
The use of ACCESS_ONCE() looks like a micro-optimization to force gcc to use
an indexed load for the register address, but it has an absolutely detrimental
effect on builds with gcc-5 and CONFIG_KASAN=y, leading to a very likely
kernel stack overflow aside from very complex object code:

hisilicon/hns/hns_dsaf_gmac.c: In function 'hns_gmac_update_stats':
hisilicon/hns/hns_dsaf_gmac.c:419:1: error: the frame size of 2912 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_ppe.c: In function 'hns_ppe_reset_common':
hisilicon/hns/hns_dsaf_ppe.c:390:1: error: the frame size of 1184 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_ppe.c: In function 'hns_ppe_get_regs':
hisilicon/hns/hns_dsaf_ppe.c:621:1: error: the frame size of 3632 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_rcb.c: In function 'hns_rcb_get_common_regs':
hisilicon/hns/hns_dsaf_rcb.c:970:1: error: the frame size of 2784 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_gmac.c: In function 'hns_gmac_get_regs':
hisilicon/hns/hns_dsaf_gmac.c:641:1: error: the frame size of 5728 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_rcb.c: In function 'hns_rcb_get_ring_regs':
hisilicon/hns/hns_dsaf_rcb.c:1021:1: error: the frame size of 2208 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_comm_init':
hisilicon/hns/hns_dsaf_main.c:1209:1: error: the frame size of 1904 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_xgmac.c: In function 'hns_xgmac_get_regs':
hisilicon/hns/hns_dsaf_xgmac.c:748:1: error: the frame size of 4704 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_update_stats':
hisilicon/hns/hns_dsaf_main.c:2420:1: error: the frame size of 1088 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_get_regs':
hisilicon/hns/hns_dsaf_main.c:2753:1: error: the frame size of 10768 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

This does not seem to happen any more with gcc-7, but removing the ACCESS_ONCE
seems safe anyway and it avoids a serious issue for some people. I have verified
that with gcc-5.3.1, the object code we get is better in the new version
both with and without CONFIG_KASAN, as we no longer allocate a 1344 byte
stack frame for hns_dsaf_get_regs() but otherwise have practically identical
object code.

With gcc-7.0.0, removing ACCESS_ONCE has no effect, the object code is already
good either way.

This patch is probably not urgent to get into 4.11 as only KASAN=y builds
with certain compilers are affected, but I still think it makes sense to
backport into older kernels.

Cc: stable@vger.kernel.org
Fixes: 511e6bc ("net: add Hisilicon Network Subsystem DSAF support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 12:02:32 -05:00
Linus Lüssing a088d1d73a ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches
When for instance a mobile Linux device roams from one access point to
another with both APs sharing the same broadcast domain and a
multicast snooping switch in between:

1)    (c) <~~~> (AP1) <--[SSW]--> (AP2)

2)              (AP1) <--[SSW]--> (AP2) <~~~> (c)

Then currently IPv6 multicast packets will get lost for (c) until an
MLD Querier sends its next query message. The packet loss occurs
because upon roaming the Linux host so far stayed silent regarding
MLD and the snooping switch will therefore be unaware of the
multicast topology change for a while.

This patch fixes this by always resending MLD reports when an interface
change happens, for instance from NO-CARRIER to CARRIER state.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:43:01 -05:00
David S. Miller 62f01db9cf wireless-drivers fixes for 4.10
Only one important fix for rtlwifi which fixes a regression introduced
 in 4.9 and which caused problems for many users.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJYmJChAAoJEG4XJFUm622b+GIIAJCOUqAC67Mk1/jTgyVlUHZc
 4uLocdhupiozZjBTh7z1lClY3EdT38AAkstcPRXIIQLcVMjkU4B9e5nLUQFv3R/u
 Bt7waNd5KixG+fX0iDPiuLj21SeMNYGtBNQ1PYSiiuuYaWXyAwplK4FW00KkUlqA
 i0V2DHt1BZD3psfhPcKAmx/8kAjCjg2jX1HxMoaLHpC4HUkDdNSd87ZjrEJWN37u
 eLihSdtw5+d8HMqHTmbgGXhiQKNRN9GRv7NMX+iUSSmu4oPwJtE5VLMU24VDeDwW
 3rgprba5p/ddz20iBtWAVRytOHdt5GtrnWCInI2jZk3QpjmM4hqEqoOhc9E47Dc=
 =0wO8
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-for-davem-2017-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
wireless-drivers fixes for 4.10

Only one important fix for rtlwifi which fixes a regression introduced
in 4.9 and which caused problems for many users.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 11:20:48 -05:00
David S. Miller 89389b4d55 A few simple fixes:
* fix FILS AEAD cipher usage to use the correct AAD vectors
    and to use synchronous algorithms
  * fix using mesh HT operation data from userspace
  * fix adding mesh vendor elements to beacons & plink frames
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJYmCBmAAoJEGt7eEactAAd5BAP/2oPaRDgJv0ByqoFPh0pzKqx
 RwoXOW9xqtp+wWFA8hPTe2niVtNpexwo4ZQ2I2hkjeomFfbw0gwklBFQQ0Vbq5b9
 6UtClEBHp/xW5vdvooBwMAcUBJQMM25wIFt2jwz9xRIUxjiOisZBIp7avLTtoQKC
 +hsNJOWOmyeJYLXdeJVaJM953dANCKdzL590JX3f6tbr8LPpszrg8TmVLJWklTYQ
 Cm2latv0GezxL/d+KcSWbNoX+X+d5D0gVZXHmp5UFWX6yT0FMkNmSURmkHEfuiuD
 z11befXgvXAr3l7cxE/TEtrNCh57pwDoPtJmBqJ9G68aURK8iVb4XB/ZEB8hEvHi
 EchMXompYU/xPiGVbkb/wOFXlBY+xc85uoEwkSL1CZs4eX6r6JawrHG7RUcTKFsv
 V2zAQU0pDO29OcprHbjD+rnjrG2qtZ/pDKO7X5+eIgHvEzwaqZY3yd1YmJK52d67
 J4slSS/jislTg+rbhFi8NrCONuRlp5rixjmHINUWCsilojrKeDh9thMYrVmXWZjT
 qjoOojMmiGH7ekhvSVDciRxoLgP9aIShuIvbub9uOPQAPXsVf3KHquSiY9JOpJI8
 PpY3hPWQS6j2r5Q2pZu/LM345r0rcj5At1BzCzGqcfKxRUH7rbFDQQ1D3Moehzho
 Gqrkv2/p4FAAGFG+4bJ6
 =ZzHl
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2017-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A few simple fixes:
 * fix FILS AEAD cipher usage to use the correct AAD vectors
   and to use synchronous algorithms
 * fix using mesh HT operation data from userspace
 * fix adding mesh vendor elements to beacons & plink frames
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 10:55:08 -05:00
Eric Dumazet ebf6c9cb23 ipv6: tcp: add a missing tcp_v6_restore_cb()
Dmitry reported use-after-free in ip6_datagram_recv_specific_ctl()

A similar bug was fixed in commit 8ce48623f0 ("ipv6: tcp: restore
IP6CB for pktoptions skbs"), but I missed another spot.

tcp_v6_syn_recv_sock() can indeed set np->pktoptions from ireq->pktopts

Fixes: 971f10eca1 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-06 10:52:45 -05:00
Masashi Honma fd551bac47 nl80211: Fix mesh HT operation check
A previous change to fix checks for NL80211_MESHCONF_HT_OPMODE
missed setting the flag when replacing FILL_IN_MESH_PARAM_IF_SET
with checking codes. This results in dropping the received HT
operation value when called by nl80211_update_mesh_config(). Fix
this by setting the flag properly.

Fixes: 9757235f45 ("nl80211: correct checks for NL80211_MESHCONF_HT_OPMODE value")
Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
[rewrite commit message to use Fixes: line]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-02-06 07:59:07 +01:00
Thorsten Horstmann da7061c82e mac80211: Fix adding of mesh vendor IEs
The function ieee80211_ie_split_vendor doesn't return 0 on errors. Instead
it returns any offset < ielen when WLAN_EID_VENDOR_SPECIFIC is found. The
return value in mesh_add_vendor_ies must therefore be checked against
ifmsh->ie_len and not 0. Otherwise all ifmsh->ie starting with
WLAN_EID_VENDOR_SPECIFIC will be rejected.

Fixes: 082ebb0c25 ("mac80211: fix mesh beacon format")
Signed-off-by: Thorsten Horstmann <thorsten@defutech.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fit.fraunhofer.de>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
[sven@narfation.org: Add commit message]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-02-06 07:55:44 +01:00
Jouni Malinen 01fba20b59 mac80211: Allocate a sync skcipher explicitly for FILS AEAD
The skcipher could have been of the async variant which may return from
skcipher_encrypt() with -EINPROGRESS after having queued the request.
The FILS AEAD implementation here does not have code for dealing with
that possibility, so allocate a sync cipher explicitly to avoid
potential issues with hardware accelerators.

This is based on the patch sent out by Ard.

Fixes: 39404feee6 ("mac80211: FILS AEAD protection for station mode association frames")
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-02-06 07:42:47 +01:00
Jouni Malinen e479ab651f mac80211: Fix FILS AEAD protection in Association Request frame
Incorrect num_elem parameter value (1 vs. 5) was used in the
aes_siv_encrypt() call. This resulted in only the first one of the five
AAD vectors to SIV getting included in calculation. This does not
protect all the contents correctly and would not interoperate with a
standard compliant implementation.

Fix this by using the correct number. A matching fix is needed in the AP
side (hostapd) to get FILS authentication working properly.

Fixes: 39404feee6 ("mac80211: FILS AEAD protection for station mode association frames")
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-02-06 07:42:39 +01:00
Eric Dumazet 7892032cfe ip6_gre: fix ip6gre_err() invalid reads
Andrey Konovalov reported out of bound accesses in ip6gre_err()

If GRE flags contains GRE_KEY, the following expression
*(((__be32 *)p) + (grehlen / 4) - 1)

accesses data ~40 bytes after the expected point, since
grehlen includes the size of IPv6 headers.

Let's use a "struct gre_base_hdr *greh" pointer to make this
code more readable.

p[1] becomes greh->protocol.
grhlen is the GRE header length.

Fixes: c12b395a46 ("gre: Support GRE over IPv6")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-05 17:23:04 -05:00
Eric Dumazet d71b789688 netlabel: out of bound access in cipso_v4_validate()
syzkaller found another out of bound access in ip_options_compile(),
or more exactly in cipso_v4_validate()

Fixes: 20e2a86485 ("cipso: handle CIPSO options correctly when NetLabel is disabled")
Fixes: 446fda4f26 ("[NetLabel]: CIPSOv4 engine")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov  <dvyukov@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-04 19:44:22 -05:00
Eric Dumazet 34b2cef20f ipv4: keep skb->dst around in presence of IP options
Andrey Konovalov got crashes in __ip_options_echo() when a NULL skb->dst
is accessed.

ipv4_pktinfo_prepare() should not drop the dst if (evil) IP options
are present.

We could refine the test to the presence of ts_needtime or srr,
but IP options are not often used, so let's be conservative.

Thanks to syzkaller team for finding this bug.

Fixes: d826eb14ec ("ipv4: PKTINFO doesnt need dst reference")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-04 19:42:28 -05:00
Eric Dumazet 5fa8bbda38 net: use a work queue to defer net_disable_timestamp() work
Dmitry reported a warning [1] showing that we were calling
net_disable_timestamp() -> static_key_slow_dec() from a non
process context.

Grabbing a mutex while holding a spinlock or rcu_read_lock()
is not allowed.

As Cong suggested, we now use a work queue.

It is possible netstamp_clear() exits while netstamp_needed_deferred
is not zero, but it is probably not worth trying to do better than that.

netstamp_needed_deferred atomic tracks the exact number of deferred
decrements.

[1]
[ INFO: suspicious RCU usage. ]
4.10.0-rc5+ #192 Not tainted
-------------------------------
./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side
critical section!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 0
2 locks held by syz-executor14/23111:
 #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] lock_sock
include/net/sock.h:1454 [inline]
 #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>]
rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919
 #1:  (rcu_read_lock){......}, at: [<ffffffff83ae2678>] nf_hook
include/linux/netfilter.h:201 [inline]
 #1:  (rcu_read_lock){......}, at: [<ffffffff83ae2678>]
__ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160

stack backtrace:
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:15 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
 rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
 ___might_sleep+0x560/0x650 kernel/sched/core.c:7748
 __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
 mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
 atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
 __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
 static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
 net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
 sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
 __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
 sk_destruct+0x47/0x80 net/core/sock.c:1460
 __sk_free+0x57/0x230 net/core/sock.c:1468
 sock_wfree+0xae/0x120 net/core/sock.c:1645
 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
 skb_release_all+0x15/0x60 net/core/skbuff.c:668
 __kfree_skb+0x15/0x20 net/core/skbuff.c:684
 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
 inet_frag_put include/net/inet_frag.h:133 [inline]
 nf_ct_frag6_gather+0x1106/0x3840
net/ipv6/netfilter/nf_conntrack_reasm.c:617
 ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
 nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
 nf_hook include/linux/netfilter.h:212 [inline]
 __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
 rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
 rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
 sock_sendmsg_nosec net/socket.c:635 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:645
 sock_write_iter+0x326/0x600 net/socket.c:848
 do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
 do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
 vfs_writev+0x87/0xc0 fs/read_write.c:911
 do_writev+0x110/0x2c0 fs/read_write.c:944
 SYSC_writev fs/read_write.c:1017 [inline]
 SyS_writev+0x27/0x30 fs/read_write.c:1014
 entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559
RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559
RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005
RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000
R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400
BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:752
in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14
INFO: lockdep is turned off.
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:15 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 ___might_sleep+0x47e/0x650 kernel/sched/core.c:7780
 __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
 mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
 atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
 __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
 static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
 net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
 sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
 __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
 sk_destruct+0x47/0x80 net/core/sock.c:1460
 __sk_free+0x57/0x230 net/core/sock.c:1468
 sock_wfree+0xae/0x120 net/core/sock.c:1645
 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
 skb_release_all+0x15/0x60 net/core/skbuff.c:668
 __kfree_skb+0x15/0x20 net/core/skbuff.c:684
 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
 inet_frag_put include/net/inet_frag.h:133 [inline]
 nf_ct_frag6_gather+0x1106/0x3840
net/ipv6/netfilter/nf_conntrack_reasm.c:617
 ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
 nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
 nf_hook include/linux/netfilter.h:212 [inline]
 __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
 rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
 rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
 sock_sendmsg_nosec net/socket.c:635 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:645
 sock_write_iter+0x326/0x600 net/socket.c:848
 do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
 do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
 vfs_writev+0x87/0xc0 fs/read_write.c:911
 do_writev+0x110/0x2c0 fs/read_write.c:944
 SYSC_writev fs/read_write.c:1017 [inline]
 SyS_writev+0x27/0x30 fs/read_write.c:1014
 entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559

Fixes: b90e5794c5 ("net: dont call jump_label_dec from irq context")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:11:07 -05:00
Stanislaw Gruszka 3808d34838 ethtool: do not vzalloc(0) on registers dump
If ->get_regs_len() callback return 0, we allocate 0 bytes of memory,
what print ugly warning in dmesg, which can be found further below.

This happen on mac80211 devices where ieee80211_get_regs_len() just
return 0 and driver only fills ethtool_regs structure and actually
do not provide any dump. However I assume this can happen on other
drivers i.e. when for some devices driver provide regs dump and for
others do not. Hence preventing to to print warning in ethtool code
seems to be reasonable.

ethtool: vmalloc: allocation failure: 0 bytes, mode:0x24080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO)
<snip>
Call Trace:
[<ffffffff813bde47>] dump_stack+0x63/0x8c
[<ffffffff811b0a1f>] warn_alloc+0x13f/0x170
[<ffffffff811f0476>] __vmalloc_node_range+0x1e6/0x2c0
[<ffffffff811f0874>] vzalloc+0x54/0x60
[<ffffffff8169986c>] dev_ethtool+0xb4c/0x1b30
[<ffffffff816adbb1>] dev_ioctl+0x181/0x520
[<ffffffff816714d2>] sock_do_ioctl+0x42/0x50
<snip>
Mem-Info:
active_anon:435809 inactive_anon:173951 isolated_anon:0
 active_file:835822 inactive_file:196932 isolated_file:0
 unevictable:0 dirty:8 writeback:0 unstable:0
 slab_reclaimable:157732 slab_unreclaimable:10022
 mapped:83042 shmem:306356 pagetables:9507 bounce:0
 free:130041 free_pcp:1080 free_cma:0
Node 0 active_anon:1743236kB inactive_anon:695804kB active_file:3343288kB inactive_file:787728kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:332168kB dirty:32kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1225424kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
Node 0 DMA free:15900kB min:136kB low:168kB high:200kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15900kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 3187 7643 7643
Node 0 DMA32 free:419732kB min:28124kB low:35152kB high:42180kB active_anon:541180kB inactive_anon:248988kB active_file:1466388kB inactive_file:389632kB unevictable:0kB writepending:0kB present:3370280kB managed:3290932kB mlocked:0kB slab_reclaimable:217184kB slab_unreclaimable:4180kB kernel_stack:160kB pagetables:984kB bounce:0kB free_pcp:2236kB local_pcp:660kB free_cma:0kB
lowmem_reserve[]: 0 0 4456 4456

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 11:13:00 -05:00
David Lebrun 013e816789 ipv6: sr: remove cleanup flag and fix HMAC computation
In the latest version of the IPv6 Segment Routing IETF draft [1] the
cleanup flag is removed and the flags field length is shrunk from 16 bits
to 8 bits. As a consequence, the input of the HMAC computation is modified
in a non-backward compatible way by covering the whole octet of flags
instead of only the cleanup bit. As such, if an implementation compatible
with the latest draft computes the HMAC of an SRH who has other flags set
to 1, then the HMAC result would differ from the current implementation.

This patch carries those modifications to prevent conflict with other
implementations of IPv6 SR.

[1] https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-05

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 11:05:23 -05:00
Mao Wenan cafe8df8b9 net: phy: Fix lack of reference count on PHY driver
There is currently no reference count being held on the PHY driver,
which makes it possible to remove the PHY driver module while the PHY
state machine is running and polling the PHY. This could cause crashes
similar to this one to show up:

[   43.361162] BUG: unable to handle kernel NULL pointer dereference at 0000000000000140
[   43.361162] IP: phy_state_machine+0x32/0x490
[   43.361162] PGD 59dc067
[   43.361162] PUD 0
[   43.361162]
[   43.361162] Oops: 0000 [#1] SMP
[   43.361162] Modules linked in: dsa_loop [last unloaded: broadcom]
[   43.361162] CPU: 0 PID: 1299 Comm: kworker/0:3 Not tainted 4.10.0-rc5+ #415
[   43.361162] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[   43.361162] Workqueue: events_power_efficient phy_state_machine
[   43.361162] task: ffff880006782b80 task.stack: ffffc90000184000
[   43.361162] RIP: 0010:phy_state_machine+0x32/0x490
[   43.361162] RSP: 0018:ffffc90000187e18 EFLAGS: 00000246
[   43.361162] RAX: 0000000000000000 RBX: ffff8800059e53c0 RCX:
ffff880006a15c60
[   43.361162] RDX: ffff880006782b80 RSI: 0000000000000000 RDI:
ffff8800059e5428
[   43.361162] RBP: ffffc90000187e48 R08: ffff880006a15c40 R09:
0000000000000000
[   43.361162] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff8800059e5428
[   43.361162] R13: ffff8800059e5000 R14: 0000000000000000 R15:
ffff880006a15c40
[   43.361162] FS:  0000000000000000(0000) GS:ffff880006a00000(0000)
knlGS:0000000000000000
[   43.361162] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   43.361162] CR2: 0000000000000140 CR3: 0000000005979000 CR4:
00000000000006f0
[   43.361162] Call Trace:
[   43.361162]  process_one_work+0x1b4/0x3e0
[   43.361162]  worker_thread+0x43/0x4d0
[   43.361162]  ? __schedule+0x17f/0x4e0
[   43.361162]  kthread+0xf7/0x130
[   43.361162]  ? process_one_work+0x3e0/0x3e0
[   43.361162]  ? kthread_create_on_node+0x40/0x40
[   43.361162]  ret_from_fork+0x29/0x40
[   43.361162] Code: 56 41 55 41 54 4c 8d 67 68 53 4c 8d af 40 fc ff ff
48 89 fb 4c 89 e7 48 83 ec 08 e8 c9 9d 27 00 48 8b 83 60 ff ff ff 44 8b
73 98 <48> 8b 90 40 01 00 00 44 89 f0 48 85 d2 74 08 4c 89 ef ff d2 8b

Keep references on the PHY driver module right before we are going to
utilize it in phy_attach_direct(), and conversely when we don't use it
anymore in phy_detach().

Signed-off-by: Mao Wenan <maowenan@huawei.com>
[florian: rebase, rework commit message]
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02 22:59:43 -05:00
David S. Miller 2372bcda5e Merge branch 'mlx4-queue-reinit'
Martin KaFai Lau says:

====================
mlx4: Misc bug fixes after reinitializing queues

This patchset fixes misc bugs after reinitializing
queues (e.g. by ethtool -L).

v2:
* Add another fix to mem leak in tx_ring[t] and tx_cq[t]
* In mlx4_en_try_alloc_resources(),
  move all xdp_prog logic after calling mlx4_en_alloc_resources()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02 21:27:05 -05:00
Martin KaFai Lau 770f82253d mlx4: xdp_prog becomes inactive after ethtool '-L' or '-G'
After calling mlx4_en_try_alloc_resources (e.g. by changing the
number of rx-queues with ethtool -L), the existing xdp_prog becomes
inactive.

The bug is that the xdp_prog ptr has not been carried over from
the old rx-queues to the new rx-queues

Fixes: 47a38e1550 ("net/mlx4_en: add support for fast rx drop bpf program")
Cc: Brenden Blanco <bblanco@plumgrid.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02 21:27:05 -05:00
Martin KaFai Lau f32b20e89e mlx4: Fix memory leak after mlx4_en_update_priv()
In mlx4_en_update_priv(), dst->tx_ring[t] and dst->tx_cq[t]
are over-written by src->tx_ring[t] and src->tx_cq[t] without
first calling kfree.

One of the reproducible code paths is by doing 'ethtool -L'.

The fix is to do the kfree in mlx4_en_free_resources().

Here is the kmemleak report:
unreferenced object 0xffff880841211800 (size 2048):
  comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81930718>] kmemleak_alloc+0x28/0x50
    [<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260
    [<ffffffff8170e0a8>] mlx4_en_try_alloc_resources+0x118/0x1a0
    [<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210
    [<ffffffff818040c5>] dev_ethtool+0xae5/0x2190
    [<ffffffff8181b898>] dev_ioctl+0x168/0x6f0
    [<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50
    [<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0
    [<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0
    [<ffffffff812480f9>] SyS_ioctl+0x79/0x90
    [<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad
    [<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff880841213000 (size 2048):
  comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81930718>] kmemleak_alloc+0x28/0x50
    [<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260
    [<ffffffff8170e0cb>] mlx4_en_try_alloc_resources+0x13b/0x1a0
    [<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210
    [<ffffffff818040c5>] dev_ethtool+0xae5/0x2190
    [<ffffffff8181b898>] dev_ioctl+0x168/0x6f0
    [<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50
    [<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0
    [<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0
    [<ffffffff812480f9>] SyS_ioctl+0x79/0x90
    [<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad
    [<ffffffffffffffff>] 0xffffffffffffffff

(gdb) list *mlx4_en_try_alloc_resources+0x118
0xffffffff8170e0a8 is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2145).
2140                    if (!dst->tx_ring_num[t])
2141                            continue;
2142
2143                    dst->tx_ring[t] = kzalloc(sizeof(struct mlx4_en_tx_ring *) *
2144                                              MAX_TX_RINGS, GFP_KERNEL);
2145                    if (!dst->tx_ring[t])
2146                            goto err_free_tx;
2147
2148                    dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) *
2149                                            MAX_TX_RINGS, GFP_KERNEL);
(gdb) list *mlx4_en_try_alloc_resources+0x13b
0xffffffff8170e0cb is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2150).
2145                    if (!dst->tx_ring[t])
2146                            goto err_free_tx;
2147
2148                    dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) *
2149                                            MAX_TX_RINGS, GFP_KERNEL);
2150                    if (!dst->tx_cq[t]) {
2151                            kfree(dst->tx_ring[t]);
2152                            goto err_free_tx;
2153                    }
2154            }

Fixes: ec25bc04ed ("net/mlx4_en: Add resilience in low memory systems")
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02 21:27:05 -05:00
Linus Torvalds 6d04dfc896 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix handling of interrupt status in stmmac driver. Just because we
    have masked the event from generating interrupts, doesn't mean the
    bit won't still be set in the interrupt status register. From Alexey
    Brodkin.

 2) Fix DMA API debugging splats in gianfar driver, from Arseny Solokha.

 3) Fix off-by-one error in __ip6_append_data(), from Vlad Yasevich.

 4) cls_flow does not match on icmpv6 codes properly, from Simon Horman.

 5) Initial MAC address can be set incorrectly in some scenerios, from
    Ivan Vecera.

 6) Packet header pointer arithmetic fix in ip6_tnl_parse_tlv_end_lim(),
    from Dan Carpenter.

 7) Fix divide by zero in __tcp_select_window(), from Eric Dumazet.

 8) Fix crash in iwlwifi when unregistering thermal zone, from Jens
    Axboe.

 9) Check for DMA mapping errors in starfire driver, from Alexey
    Khoroshilov.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits)
  tcp: fix 0 divide in __tcp_select_window()
  ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim()
  net: fix ndo_features_check/ndo_fix_features comment ordering
  net/sched: matchall: Fix configuration race
  be2net: fix initial MAC setting
  ipv6: fix flow labels when the traffic class is non-0
  net: thunderx: avoid dereferencing xcv when NULL
  net/sched: cls_flower: Correct matching on ICMPv6 code
  ipv6: Paritially checksum full MTU frames
  net/mlx4_core: Avoid command timeouts during VF driver device shutdown
  gianfar: synchronize DMA API usage by free_skb_rx_queue w/ gfar_new_page
  net: ethtool: add support for 2500BaseT and 5000BaseT link modes
  can: bcm: fix hrtimer/tasklet termination in bcm op removal
  net: adaptec: starfire: add checks for dma mapping errors
  net: phy: micrel: KSZ8795 do not set SUPPORTED_[Asym_]Pause
  can: Fix kernel panic at security_sock_rcv_skb
  net: macb: Fix 64 bit addressing support for GEM
  stmmac: Discard masked flags in interrupt status register
  net/mlx5e: Check ets capability before ets query FW command
  net/mlx5e: Fix update of hash function/key via ethtool
  ...
2017-02-01 11:52:27 -08:00
Linus Torvalds 2883aaea36 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fscache fixes from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fscache: Fix dead object requeue
  fscache: Clear outstanding writes when disabling a cookie
  FS-Cache: Initialise stores_lock in netfs cookie
2017-02-01 10:30:56 -08:00
Eric Dumazet 06425c308b tcp: fix 0 divide in __tcp_select_window()
syszkaller fuzzer was able to trigger a divide by zero, when
TCP window scaling is not enabled.

SO_RCVBUF can be used not only to increase sk_rcvbuf, also
to decrease it below current receive buffers utilization.

If mss is negative or 0, just return a zero TCP window.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov  <dvyukov@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01 12:55:42 -05:00
Dan Carpenter 63117f09c7 ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim()
Casting is a high precedence operation but "off" and "i" are in terms of
bytes so we need to have some parenthesis here.

Fixes: fbfa743a9d ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01 12:27:33 -05:00
Linus Torvalds e387dc122f Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes a bug in CBC/CTR on ARM64 that breaks chaining as well as a
  bug in the core API that causes registration failures when a driver
  unloads and then reloads an algorithm"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes
  crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg
2017-02-01 09:24:00 -08:00
Linus Torvalds 35609502ac dmaengine-fix-4.10-rc7
Few late fixes for this cycle
 
 - pl330 double lock fix
 - more fixes for runtime pm handling on cppi
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYkgb+AAoJEHwUBw8lI4NH1LIP/2vG1GjnkbVP4lKlgFQkHU3f
 VaG60A3/70j247NLa3Fu5HzctRsfYSI8ES/fhZOXrVYRrPUkH+6itYDAECEwRhhJ
 bez1gcU3BDRhkmVc4hTUVfcBMDVnyliHwRaNsyWIS5FB95qmrODQLYuXHSvij56j
 Cxb6INU5fJuoGu3K0y+/1ABURwPWjx8RQR9+Yv3RAbZMn079MFu4JXKPDS0r0Bm6
 nYZrpaky9zQLQQytTfUU3SYhe/K5/IFJro/EPdf/5kHYg/20Ub/zEqlBfKY1jL6u
 2XgqhPABgkE4GwU8fr4nuOT9+xU7DEFPwuWiCphO4SQHVw6O/BC8R6Pd4x7gVX4E
 wK9NpOWZWQQHom0pezn6wF3GrYyaHUULg1SI805HVogEvLAufKYTGIP8WJC8ORZi
 5nStEyU4k2GS9SVaoH8dh/VQdb7X0G9tkRqcuCj+I7Yuq9LpCNYar2cZ7P+bqeII
 zZ0HNAoBvQdVKn8PeHQUbdhblZ13pK0k2Lf0RT88aG0mYEl0dt6CknYEUURbYwqQ
 KRN3tRB0+C99Hdi8DCsUGitFyLnIfMeJ++JJfR4rw2uGskvk/VEPStj7zbuFo1qg
 MD8ydwZvYr9lOQBOC55K4BCsNbXHWb216OH5sOdzMsInJrLpjgeD0Ya7Il11KEer
 v/OmLjYybzFPWq3pp4/3
 =4hDL
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-fix-4.10-rc7' of git://git.infradead.org/users/vkoul/slave-dma

Pull dmaengine fixes from Vinod Koul:
 "A couple of fixes showed up late in the cycle so sending them up and
  sending early in the week and not on Friday :).

  They fix a double lock in pl330 driver and runtime pm fixes for cppi
  driver"

* tag 'dmaengine-fix-4.10-rc7' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: pl330: fix double lock
  dmaengine: cppi41: Clean up pointless warnings
  dmaengine: cppi41: Fix oops in cppi41_runtime_resume
  dmaengine: cppi41: Fix runtime PM timeouts with USB mass storage
2017-02-01 09:22:08 -08:00
Dimitris Michailidis 1a2a14444d net: fix ndo_features_check/ndo_fix_features comment ordering
Commit cdba756f58 ("net: move ndo_features_check() close to
ndo_start_xmit()") inadvertently moved the doc comment for
.ndo_fix_features instead of .ndo_features_check. Fix the comment
ordering.

Fixes: cdba756f58 ("net: move ndo_features_check() close to ndo_start_xmit()")
Signed-off-by: Dimitris Michailidis <dmichail@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01 12:10:56 -05:00
Yotam Gigi fd62d9f5c5 net/sched: matchall: Fix configuration race
In the current version, the matchall internal state is split into two
structs: cls_matchall_head and cls_matchall_filter. This makes little
sense, as matchall instance supports only one filter, and there is no
situation where one exists and the other does not. In addition, that led
to some races when filter was deleted while packet was processed.

Unify that two structs into one, thus simplifying the process of matchall
creation and deletion. As a result, the new, delete and get callbacks have
a dummy implementation where all the work is done in destroy and change
callbacks, as was done in cls_cgroup.

Fixes: bf3994d2ed ("net/sched: introduce Match-all classifier")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01 11:57:33 -05:00
Linus Torvalds c325b35337 Pin control fixes for the v4.10 series.
Driver fixes only:
 
 - One fix to the Berlin driver making the SD card work fully
   again.
 
 - One fix to the Allwinner/sunxi bias function: one premature
   change needs to be partially reverted.
 
 - The remaining four patches are to Intel embedded SoCs: baytrail
   (three patches) and merrifield (one patch): register access
   debounce fixes and a missing spinlock.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYkaCaAAoJEEEQszewGV1zcFIQALbJdTw3NP4PhFCdOFH3PJLq
 JoFGNf5plpcQOV8GcMZvallUPmckdE99aO1VZkpDlJJSbrx3SrqhbxcVdL98Hj8+
 MrMHXwb/tB5i35Ykn8P/2uh40qKCfTNyJg5KC3xW/jPBt2/yMj3jpCUuhsQYFC66
 1gDiM+nwrWD/8+TQ4C0d34+Of2FLSOddH9oyH7xA7A54/w53glfg09Zx49pJSg+d
 IRtvFVKYTSSS3gQlRkmXxV+gyjQA1oKgpR7EAM5USBYMObdPMona6/I93wcgT/8s
 tPlg+wtspIkTShZ1KtbWghz4Gjsc4tYNxwsUkJ2tiqUYHHt9j+Z4UYxyYGBJMZPh
 9YLcZJWErMvnxYoQXGE31UlPYaq7+al9D1woX7RlRnGQavATFxJZ2Smt+CqqXoUE
 WcJcWGNMA1jpF5rmpnz+7a65W2fXtGJg2KRwCsk1XEi/5qISndSAscXocX7AZtjv
 mk5X/d6F2fwzeRJkjeCIdZ62NjlDJtlrbK3LqWh72rlyId7tycGKDwzv7gAWwI6B
 AQIy5n8iakjGL8eTYGiPlz/HlORRAAxw5Qg0NMT9Q6Npqg9aozwuptJGOTA6Gb+Z
 azU0c7bca59/bmNrH8iAuFPgz/TWu6wQH3Qvlr8xWXJ4xTz2Ol8br5QBKp3xdck1
 ++CT6tcdmi6c4poX8W3K
 =Smhk
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "Another week, another set of pin control fixes. The subsystem has seen
  high patch-spot activity recently.

  The majority of the patches are for Intel, I vaguely think it mostly
  concern phones, tablets and maybe chromebooks and even laptops with
  this Intel Atom family chips.

  Driver fixes only:

   - one fix to the Berlin driver making the SD card work fully again.

   - one fix to the Allwinner/sunxi bias function: one premature change
     needs to be partially reverted.

   - the remaining four patches are to Intel embedded SoCs: baytrail
     (three patches) and merrifield (one patch): register access
     debounce fixes and a missing spinlock"

* tag 'pinctrl-v4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: baytrail: Add missing spinlock usage in byt_gpio_irq_handler
  pinctrl: baytrail: Debounce register is one per community
  pinctrl: baytrail: Rectify debounce support (part 2)
  pinctrl: intel: merrifield: Add missed check in mrfld_config_set()
  pinctrl: sunxi: Don't enforce bias disable (for now)
  pinctrl: berlin-bg4ct: fix the value for "sd1a" of pin SCRD0_CRD_PRES
2017-02-01 08:34:13 -08:00
Ivan Vecera 4993b39ab0 be2net: fix initial MAC setting
Recent commit 3439352916 ("be2net: fix MAC addr setting on privileged
BE3 VFs") allows privileged BE3 VFs to set its MAC address during
initialization. Although the initial MAC for such VFs is already
programmed by parent PF the subsequent setting performed by VF is OK,
but in certain cases (after fresh boot) this command in VF can fail.

The MAC should be initialized only when:
1) no MAC is programmed (always except BE3 VFs during first init)
2) programmed MAC is different from requested (e.g. MAC is set when
   interface is down). In this case the initial MAC programmed by PF
   needs to be deleted.

The adapter->dev_mac contains MAC address currently programmed in HW so
it should be zeroed when the MAC is deleted from HW and should not be
filled when MAC is set when interface is down in be_mac_addr_set() as
no programming is performed in this case.

Example of failure without the fix (immediately after fresh boot):

# ip link set eth0 up  <- eth0 is BE3 PF
be2net 0000:01:00.0 eth0: Link is Up

# echo 1 > /sys/class/net/eth0/device/sriov_numvfs  <- Create 1 VF
...
be2net 0000:01:04.0: Emulex OneConnect(be3): VF  port 0

# ip link set eth8 up  <- eth8 is created privileged VF
be2net 0000:01:04.0: opcode 59-1 failed:status 1-76
RTNETLINK answers: Input/output error

# echo 0 > /sys/class/net/eth0/device/sriov_numvfs  <- Delete VF
iommu: Removing device 0000:01:04.0 from group 33
...

# echo 1 > /sys/class/net/eth0/device/sriov_numvfs  <- Create it again
iommu: Removing device 0000:01:04.0 from group 33
...

# ip link set eth8 up
be2net 0000:01:04.0 eth8: Link is Up

Initialization is now OK.

v2 - Corrected the comment and condition check suggested by Suresh & Harsha

Fixes: 3439352916 ("be2net: fix MAC addr setting on privileged BE3 VFs")
Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Ivan Vecera <cera@cera.cz>
Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01 11:11:41 -05:00
Linus Torvalds a2ca3d6179 It was reported to me that the thread created by the hwlat tracer does
not migrate after the first instance. I found that there was as small
 bug in the logic, and fixed it. It's minor, but should be fixed regardless.
 There's not much impact outside the hwlat tracer.
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJYkP7tFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 srEH/iqi0mC4rJF57yLmwGXVAuwSRsxHlkGfyG3RBRnXOhiPLaM9Y6QZDWj8f99D
 DEnVJ7IMpW7tjJB0pDQBCMBi+NX6GvR9Zu4ub8k69UzDlw81CPdMiJ/HfEFsP5XA
 EOiDH/xGsYmbgGyqbU2VTb4lS8CijStht0jhydriLT1ga/W3bOl0/w6TeQW+AwqJ
 0tcksaKwAcH6iN11FUQfWPlirQ/aCTvj8FelgbT7MjYnwxDdbJSfKG+GwJjytLNQ
 Rj/STWD56OZpl91OuEYA50BZVxp3eF870xJXsLjg3mGIYBseW/T6ZTP8TOM+6WoH
 u2J08p8Owdtdo+9gkw+IwEuoFP4=
 =VCLk
 -----END PGP SIGNATURE-----

Merge tag 'trace-4.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "It was reported to me that the thread created by the hwlat tracer does
  not migrate after the first instance. I found that there was as small
  bug in the logic, and fixed it. It's minor, but should be fixed
  regardless. There's not much impact outside the hwlat tracer"

* tag 'trace-4.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix hwlat kthread migration
2017-01-31 16:32:40 -08:00
Linus Torvalds 283725af0b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem fixes from Dmitry Torokhov:
 "A fix for a crash in the wm97xx driver and synaptics-rmi4 will stop
  throwing erroneous warnings."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: synaptics-rmi4 - fix reversed conditions in enable/disable_irq_wake
  Input: wm97xx - make missing platform data non-fatal
2017-01-31 13:59:10 -08:00
Linus Torvalds f1774f46d4 Merge branch 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
 "The cgroup creation path was getting the order of operations wrong and
  exposing cgroups which don't have their names set yet to controllers
  which can lead to NULL derefs.

  This contains the fix for the bug"

* 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: don't online subsystems before cgroup_name/path() are operational
2017-01-31 13:54:41 -08:00
Linus Torvalds 298a2d8751 Merge branch 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu fix from Tejun Heo:
 "Douglas found and fixed a ref leak bug in percpu_ref_tryget[_live]().

  The bug is caused by storing the return value of atomic_long_inc_not_zero()
  into an int temp variable before returning it as a bool. The interim
  cast to int loses the upper bits and can lead to false negatives. As
  percpu_ref uses a high bit to mark a draining counter, this can happen
  relatively easily.

  Fixed by using bool for the temp variable"

* 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu-refcount: fix reference leak during percpu-atomic transition
2017-01-31 13:10:59 -08:00
Linus Torvalds 52e02f2797 Merge branch 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
 "Three libata fixes: an error handling fix, blacklist addition for
  another fallout from upping the default max sectors, and fix for a
  sense data reporting bug which affects new harddrives which can report
  sense data"

* 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  ata: sata_mv:- Handle return value of devm_ioremap.
  libata: Fix ATA request sense
  libata: apply MAX_SEC_1024 to all CX1-JB*-HP devices
2017-01-31 13:07:04 -08:00
Linus Torvalds c9194b99ae Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:

 - regression fix (sleeping while atomic) for cp2112, from Johan Hovold

 - regression fix for proximity handling under certain circumstances in
   Wacom driver, from Jason Gerecke

 - functional fix for Logitech Rumblepad 2, from Ardinartsev Nikita

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: cp2112: fix gpio-callback error handling
  HID: cp2112: fix sleep-while-atomic
  HID: hid-lg: Fix immediate disconnection of Logitech Rumblepad 2
  HID: usbhid: Quirk a AMI virtual mouse and keyboard with ALWAYS_POLL
  HID: wacom: Fix poor prox handling in 'wacom_pl_irq'
2017-01-31 13:05:15 -08:00
Linus Torvalds 415f9b71d1 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fix from Steve French:
 "A small cifs fix for stable"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: initialize file_info_lock
2017-01-31 12:36:39 -08:00
David Howells e26bfebdfc fscache: Fix dead object requeue
Under some circumstances, an fscache object can become queued such that it
fscache_object_work_func() can be called once the object is in the
OBJECT_DEAD state.  This results in the kernel oopsing when it tries to
invoke the handler for the state (which is hard coded to 0x2).

The way this comes about is something like the following:

 (1) The object dispatcher is processing a work state for an object.  This
     is done in workqueue context.

 (2) An out-of-band event comes in that isn't masked, causing the object to
     be queued, say EV_KILL.

 (3) The object dispatcher finishes processing the current work state on
     that object and then sees there's another event to process, so,
     without returning to the workqueue core, it processes that event too.
     It then follows the chain of events that initiates until we reach
     OBJECT_DEAD without going through a wait state (such as
     WAIT_FOR_CLEARANCE).

     At this point, object->events may be 0, object->event_mask will be 0
     and oob_event_mask will be 0.

 (4) The object dispatcher returns to the workqueue processor, and in due
     course, this sees that the object's work item is still queued and
     invokes it again.

 (5) The current state is a work state (OBJECT_DEAD), so the dispatcher
     jumps to it - resulting in an OOPS.

When I'm seeing this, the work state in (1) appears to have been either
LOOK_UP_OBJECT or CREATE_OBJECT (object->oob_table is
fscache_osm_lookup_oob).

The window for (2) is very small:

 (A) object->event_mask is cleared whilst the event dispatch process is
     underway - though there's no memory barrier to force this to the top
     of the function.

     The window, therefore is from the time the object was selected by the
     workqueue processor and made requeueable to the time the mask was
     cleared.

 (B) fscache_raise_event() will only queue the object if it manages to set
     the event bit and the corresponding event_mask bit was set.

     The enqueuement is then deferred slightly whilst we get a ref on the
     object and get the per-CPU variable for workqueue congestion.  This
     slight deferral slightly increases the probability by allowing extra
     time for the workqueue to make the item requeueable.

Handle this by giving the dead state a processor function and checking the
for the dead state address rather than seeing if the processor function is
address 0x2.  The dead state processor function can then set a flag to
indicate that it's occurred and give a warning if it occurs more than once
per object.

If this race occurs, an oops similar to the following is seen (note the RIP
value):

BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
IP: [<0000000000000002>] 0x1
PGD 0
Oops: 0010 [#1] SMP
Modules linked in: ...
CPU: 17 PID: 16077 Comm: kworker/u48:9 Not tainted 3.10.0-327.18.2.el7.x86_64 #1
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
Workqueue: fscache_object fscache_object_work_func [fscache]
task: ffff880302b63980 ti: ffff880717544000 task.ti: ffff880717544000
RIP: 0010:[<0000000000000002>]  [<0000000000000002>] 0x1
RSP: 0018:ffff880717547df8  EFLAGS: 00010202
RAX: ffffffffa0368640 RBX: ffff880edf7a4480 RCX: dead000000200200
RDX: 0000000000000002 RSI: 00000000ffffffff RDI: ffff880edf7a4480
RBP: ffff880717547e18 R08: 0000000000000000 R09: dfc40a25cb3a4510
R10: dfc40a25cb3a4510 R11: 0000000000000400 R12: 0000000000000000
R13: ffff880edf7a4510 R14: ffff8817f6153400 R15: 0000000000000600
FS:  0000000000000000(0000) GS:ffff88181f420000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000002 CR3: 000000000194a000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffffffffa0363695 ffff880edf7a4510 ffff88093f16f900 ffff8817faa4ec00
 ffff880717547e60 ffffffff8109d5db 00000000faa4ec18 0000000000000000
 ffff8817faa4ec18 ffff88093f16f930 ffff880302b63980 ffff88093f16f900
Call Trace:
 [<ffffffffa0363695>] ? fscache_object_work_func+0xa5/0x200 [fscache]
 [<ffffffff8109d5db>] process_one_work+0x17b/0x470
 [<ffffffff8109e4ac>] worker_thread+0x21c/0x400
 [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
 [<ffffffff810a5acf>] kthread+0xcf/0xe0
 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
 [<ffffffff816460d8>] ret_from_fork+0x58/0x90
 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeremy McNicoll <jeremymc@redhat.com>
Tested-by: Frank Sorenson <sorenson@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-01-31 13:23:09 -05:00
David Howells 6bdded59c8 fscache: Clear outstanding writes when disabling a cookie
fscache_disable_cookie() needs to clear the outstanding writes on the
cookie it's disabling because they cannot be completed after.

Without this, fscache_nfs_open_file() gets stuck because it disables the
cookie when the file is opened for writing but can't uncache the pages till
afterwards - otherwise there's a race between the open routine and anyone
who already has it open R/O and is still reading from it.

Looking in /proc/pid/stack of the offending process shows:

[<ffffffffa0142883>] __fscache_wait_on_page_write+0x82/0x9b [fscache]
[<ffffffffa014336e>] __fscache_uncache_all_inode_pages+0x91/0xe1 [fscache]
[<ffffffffa01740fa>] nfs_fscache_open_file+0x59/0x9e [nfs]
[<ffffffffa01ccf41>] nfs4_file_open+0x17f/0x1b8 [nfsv4]
[<ffffffff8117350e>] do_dentry_open+0x16d/0x2b7
[<ffffffff811743ac>] vfs_open+0x5c/0x65
[<ffffffff81184185>] path_openat+0x785/0x8fb
[<ffffffff81184343>] do_filp_open+0x48/0x9e
[<ffffffff81174710>] do_sys_open+0x13b/0x1cb
[<ffffffff811747b9>] SyS_open+0x19/0x1b
[<ffffffff81001c44>] do_syscall_64+0x80/0x17a
[<ffffffff8165c2da>] return_from_SYSCALL_64+0x0/0x7a
[<ffffffffffffffff>] 0xffffffffffffffff

Reported-by: Jianhong Yin <jiyin@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-01-31 13:23:09 -05:00
David Howells 62deb8187d FS-Cache: Initialise stores_lock in netfs cookie
Initialise the stores_lock in fscache netfs cookies.  Technically, it
shouldn't be necessary, since the netfs cookie is an index and stores no
data, but initialising it anyway adds insignificant overhead.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-01-31 13:23:09 -05:00
Dimitris Michailidis 90427ef5d2 ipv6: fix flow labels when the traffic class is non-0
ip6_make_flowlabel() determines the flow label for IPv6 packets. It's
supposed to be passed a flow label, which it returns as is if non-0 and
in some other cases, otherwise it calculates a new value.

The problem is callers often pass a flowi6.flowlabel, which may also
contain traffic class bits. If the traffic class is non-0
ip6_make_flowlabel() mistakes the non-0 it gets as a flow label and
returns the whole thing. Thus it can return a 'flow label' longer than
20b and the low 20b of that is typically 0 resulting in packets with 0
label. Moreover, different packets of a flow may be labeled differently.
For a TCP flow with ECN non-payload and payload packets get different
labels as exemplified by this pair of consecutive packets:

(pure ACK)
Internet Protocol Version 6, Src: 2002:af5:11a3::, Dst: 2002:af5:11a2::
    0110 .... = Version: 6
    .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT)
        .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
        .... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    .... .... .... 0001 1100 1110 0100 1001 = Flow Label: 0x1ce49
    Payload Length: 32
    Next Header: TCP (6)

(payload)
Internet Protocol Version 6, Src: 2002:af5:11a3::, Dst: 2002:af5:11a2::
    0110 .... = Version: 6
    .... 0000 0010 .... .... .... .... .... = Traffic Class: 0x02 (DSCP: CS0, ECN: ECT(0))
        .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
        .... .... ..10 .... .... .... .... .... = Explicit Congestion Notification: ECN-Capable Transport codepoint '10' (2)
    .... .... .... 0000 0000 0000 0000 0000 = Flow Label: 0x00000
    Payload Length: 688
    Next Header: TCP (6)

This patch allows ip6_make_flowlabel() to be passed more than just a
flow label and has it extract the part it really wants. This was simpler
than modifying the callers. With this patch packets like the above become

Internet Protocol Version 6, Src: 2002:af5:11a3::, Dst: 2002:af5:11a2::
    0110 .... = Version: 6
    .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT)
        .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
        .... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    .... .... .... 1010 1111 1010 0101 1110 = Flow Label: 0xafa5e
    Payload Length: 32
    Next Header: TCP (6)

Internet Protocol Version 6, Src: 2002:af5:11a3::, Dst: 2002:af5:11a2::
    0110 .... = Version: 6
    .... 0000 0010 .... .... .... .... .... = Traffic Class: 0x02 (DSCP: CS0, ECN: ECT(0))
        .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
        .... .... ..10 .... .... .... .... .... = Explicit Congestion Notification: ECN-Capable Transport codepoint '10' (2)
    .... .... .... 1010 1111 1010 0101 1110 = Flow Label: 0xafa5e
    Payload Length: 688
    Next Header: TCP (6)

Signed-off-by: Dimitris Michailidis <dmichail@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-31 13:16:59 -05:00