Commit 109980b894 ("bpf: don't select potentially stale
ri->map from buggy xdp progs") passed the pointer to the prog
itself to be loaded into r4 prior on bpf_redirect_map() helper
call, so that we can store the owner into ri->map_owner out of
the helper.
Issue with that is that the actual address of the prog is still
subject to change when subsequent rewrites occur that require
slow path in bpf_prog_realloc() to alloc more memory, e.g. from
patching inlining helper functions or constant blinding. Thus,
we really need to take prog->aux as the address we're holding,
which also works with prog clones as they share the same aux
object.
Instead of then fetching aux->prog during runtime, which could
potentially incur cache misses due to false sharing, we are
going to just use aux for comparison on the map owner. This
will also keep the patchlet of the same size, and later check
in xdp_map_invalid() only accesses read-only aux pointer from
the prog, it's also in the same cacheline already from prior
access when calling bpf_func.
Fixes: 109980b894 ("bpf: don't select potentially stale ri->map from buggy xdp progs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is pretty much a carbon copy of
commit 3079c65214 ("caif: Fix napi poll list corruption")
with "caif" replaced by "emac".
The commit d75b1ade56 ("net: less interrupt masking in NAPI")
breaks emac.
It is now required that if the entire budget is consumed when poll
returns, the napi poll_list must remain empty. However, like some
other drivers emac tries to do a last-ditch check and if there is
more work it will call napi_reschedule and then immediately process
some of this new work. Should the entire budget be consumed while
processing such new work then we will violate the new caller
contract.
This patch fixes this by not touching any work when we reschedule
in emac.
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our recent change exposed a bug in TCP Fastopen Client that syzkaller
found right away [1]
When we prepare skb with SYN+DATA, we attempt to transmit it,
and we update socket state as if the transmit was a success.
In socket RTX queue we have two skbs, one with the SYN alone,
and a second one containing the DATA.
When (malicious) ACK comes in, we now complain that second one had no
skb_mstamp.
The proper fix is to make sure that if the transmit failed, we do not
pretend we sent the DATA skb, and make it our send_head.
When 3WHS completes, we can now send the DATA right away, without having
to wait for a timeout.
[1]
WARNING: CPU: 0 PID: 100189 at net/ipv4/tcp_input.c:3117 tcp_clean_rtx_queue+0x2057/0x2ab0 net/ipv4/tcp_input.c:3117()
WARN_ON_ONCE(last_ackt == 0);
Modules linked in:
CPU: 0 PID: 100189 Comm: syz-executor1 Not tainted
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
0000000000000000 ffff8800b35cb1d8 ffffffff81cad00d 0000000000000000
ffffffff828a4347 ffff88009f86c080 ffffffff8316eb20 0000000000000d7f
ffff8800b35cb220 ffffffff812c33c2 ffff8800baad2440 00000009d46575c0
Call Trace:
[<ffffffff81cad00d>] __dump_stack
[<ffffffff81cad00d>] dump_stack+0xc1/0x124
[<ffffffff812c33c2>] warn_slowpath_common+0xe2/0x150
[<ffffffff812c361e>] warn_slowpath_null+0x2e/0x40
[<ffffffff828a4347>] tcp_clean_rtx_queue+0x2057/0x2ab0 n
[<ffffffff828ae6fd>] tcp_ack+0x151d/0x3930
[<ffffffff828baa09>] tcp_rcv_state_process+0x1c69/0x4fd0
[<ffffffff828efb7f>] tcp_v4_do_rcv+0x54f/0x7c0
[<ffffffff8258aacb>] sk_backlog_rcv
[<ffffffff8258aacb>] __release_sock+0x12b/0x3a0
[<ffffffff8258ad9e>] release_sock+0x5e/0x1c0
[<ffffffff8294a785>] inet_wait_for_connect
[<ffffffff8294a785>] __inet_stream_connect+0x545/0xc50
[<ffffffff82886f08>] tcp_sendmsg_fastopen
[<ffffffff82886f08>] tcp_sendmsg+0x2298/0x35a0
[<ffffffff82952515>] inet_sendmsg+0xe5/0x520
[<ffffffff8257152f>] sock_sendmsg_nosec
[<ffffffff8257152f>] sock_sendmsg+0xcf/0x110
Fixes: 8c72c65b42 ("tcp: update skb->skb_mstamp more carefully")
Fixes: 783237e8da ("net-tcp: Fast Open client - sending SYN-data")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil Mehta says:
====================
Bug fixes for the HNS3 Ethernet Driver for Hip08 SoC
This patch set presents some bug fixes for the HNS3 Ethernet driver identified
during internal testing & stabilization efforts.
Change Log:
Patch V2: Resolved comments from Leon Romanovsky
Patch V1: Initial Submit
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When register/unregister ae_dev, ae_dev should match all client
in the client_list. Enet and roce can co-exists together so we
should continue checking for enet and roce presence together.
So break should not be there.
Above caused problems in loading and unloading of modules.
Fixes: 38eddd126772 ("net: hns3: Add support of the HNAE3 framework")
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When there is no vlan id in the packets, hardware will treat the vlan id
as 0 and look for the mac_vlan table. This patch set the default vlan id
of PF as 0. Without this config, it will fail when look for mac_vlan
table, and hardware will drop packets.
Fixes: 6427264ef330 ("net: hns3: Add HNS3 Acceleration Engine &
Compatibility Layer Support")
Signed-off-by: Mingguang Qu <qumingguang@huawei.com>
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces the ethernet address copy instance with more
appropriate ether_addr_copy() function.
Fixes: 6427264ef330 ("net: hns3: Add HNS3 Acceleration Engine &
Compatibility Layer Support")
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the initialization of MAC address, fetched from HNS3
firmware i.e. when it is not randomly generated, to the HNS3 hardware.
Fixes: ca60906d2795 ("net: hns3: Add support of HNS3 Ethernet Driver for
hip08 SoC")
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the vector-to-ring map and unmap command and adds
INT_GL(for, Gap Limiting Interrupts) and VF id to it as required
by the hardware interface.
Fixes: 6427264ef330 ("net: hns3: Add HNS3 Acceleration Engine &
Compatibility Layer Support")
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Mingguang Qu <qumingguang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the IMP command being used to unmap the vector
from the corresponding ring.
Fixes: 6427264ef330 ("net: hns3: Add HNS3 Acceleration Engine &
Compatibility Layer Support")
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Default phy address of every port is 0. Therefore, phy address for
each port need to be fetched from firmware and device initialized
with fetched non-default phy address.
Fixes: 6427264ef330 ("net: hns3: Add HNS3 Acceleration Engine &
Compatibility Layer Support")
Signed-off-by: Lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clarify that rhashtable_walk_{stop,start} will not reset the iterator to
the beginning of the hash table. Confusion between rhashtable_walk_enter
and rhashtable_walk_start has already lead to a bug.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the integration of PHYLINK, the configuration option which
used to be under the PHY infrastructure menu in menuconfig ended
up one level up (the network device driver section)
By placing PHYLINK option right after PHYLIB entry, it broke the
way Kconfig used to build the menu. See kconfig-language.txt, section
"Menu structure", 2nd method.
This is fixed by placing the PHYLINK option just before PHYLIB.
Fixes: 9525ae8395 ("phylink: add phylink infrastructure")
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
to crash the kernel with malformed netlink messages.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEExu3sM/nZ1eRSfR9Ha3t4Rpy0AB0FAlnAxN8ACgkQa3t4Rpy0
AB30Xw/+I4GXH2gVfqgizM8nBwSsC5qP4QFqRQmvM+Z9k4hShptb1xZEUMWRCXr5
vZJUjAb3X+YiqfuhCgNnoyb9ZEKCsQ7zOYLP1sO7LmTsTn/BX33XGdFSPC+XQNXl
UEsCuWnX/BtmL6rxRbFxR4suJzWF7bnlyMeQLqso153OGUoZHcMlp9zTWlwLlVzg
Q33iBhoNN6PY6ZiFKsYhq3w60EozLMKIQO7NHUj4DVYoQRzQxxJImJ6/44ZmXjvX
Fsu7tBNlcC/9sS3qYcdFWMrN4T9vpAYJFhFTGvlkf0rB7aXXBizpNuiYoDUALcl/
llTT4jVwglP1oKyXlJ3zUrTRnMuA4kw/d03Be+f5n2oloxZRGFPVDNLdIDkqEKBM
kc3BSfBTo3jH8xR57d6KZVaiS8C+0uvvmJYd9y+fltMqMUFOMa48GB9gcGZa3myJ
R5eeb7CVIiiXxKIH+Ma6LRksavzQg0qqt5vtY4TWy3pz6NdL2lEjsPqli2NgGgGU
5DD14Qs2rUflqtAW+KA321pO02aciZi/MAHHHgQu5wDhH6+20pPSqX9ypIJK1AbI
35oFwCauWnfANe3GwFmkvKnRGOs+z8gLBE5kRtp3YX9b+tRbhvnVSYDyf78tU7u3
SYJk+mKBlvTthBB1vMy+7s2xUhcTjCdLXs7e1BNJdN2gr/gDIJY=
=097A
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2017-11-19' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just two netlink fixes, both allowing privileged users
to crash the kernel with malformed netlink messages.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove Yuval from maintaining the bnx2x & qed* modules as he is no longer
working for the company. Thanks Yuval for your huge contributions and
tireless efforts over the many years and various companies.
Ariel
Signed-off-by: Ariel Elior <aelior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are several problems with commit 10377ba767 ("net: systemport:
Support 64bit statistics", first one got fixed in 7095c97345 ("net:
systemport: Fix 64-bit stats deadlock").
The second problem is that this specific code updates the
stats64.tx_{packets,bytes} from ndo_get_stats64() and that is what we
are returning to ethtool -S. If we are not running a tool that involves
calling ndo_get_stats64(), then we won't get updated ethtool stats.
The solution to this is to update the stats from both call sites,
factoring that into a specific function, While at it, don't just check
the sizeof() but also the type of the statistics in order to use the
64-bit stats seqlock.
Fixes: 10377ba767 ("net: systemport: Support 64bit statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems we have to be more careful in napi_complete_done()
use. This patch is not a revert, as it seems we can
avoid bug that Ville reported by moving the napi_complete_done()
test in the spinlock section.
Many thanks to Ville for detective work and all tests.
Fixes: 617f01211b ("8139too: use napi_complete_done()")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
remove tcp_may_send_now and tcp_snd_test that are no longer used
Fixes: 840a3cbe89 ("tcp: remove forward retransmit feature")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If bpf_map_precharge_memlock in dev_map_alloc, -ENOMEM is returned
regardless of the actual error produced by bpf_map_precharge_memlock.
Fix it by passing on the error returned by bpf_map_precharge_memlock.
Also return -EINVAL instead of -ENOMEM if the page count overflow check
fails.
This makes dev_map_alloc match the behavior of other bpf maps' alloc
functions wrt. return values.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check for ingress-only qdisc for flower offload, as other qdiscs
are not supported for flower offload.
Suggested-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix ASCII art in Documentation/networking/switchdev.txt:
Change non-ASCII "spaces" to ASCII spaces.
Change 2 erroneous '+' characters in ASCII art to '-' (at the '*'
characters below):
line 32:
+--+----+----+----+-*--+----+---+ +-----+-----+
line 41:
+--------------+---*------------+
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
this script, edited from Linux Advanced Routing and Traffic Control guide
tc q a dev en0 root handle 1: htb default a
tc c a dev en0 parent 1: classid 1:1 htb rate 6mbit burst 15k
tc c a dev en0 parent 1:1 classid 1:a htb rate 5mbit ceil 6mbit burst 15k
tc c a dev en0 parent 1:1 classid 1:b htb rate 1mbit ceil 6mbit burst 15k
tc f a dev en0 parent 1:0 prio 1 $clsname $clsargs classid 1:b
ping $address -c1
tc -s c s dev en0
classifies traffic to 1:b or 1:a, depending on whether the packet matches
or not the pattern $clsargs of filter $clsname. However, when $clsname is
'matchall', a systematic crash can be observed in htb_classify(). HTB and
classful qdiscs don't assign initial value to struct tcf_result, but then
they expect it to contain valid values after filters have been run. Thus,
current 'matchall' ignores the TCA_MATCHALL_CLASSID attribute, configured
by user, and makes HTB (and classful qdiscs) dereference random pointers.
By assigning head->res to *res in mall_classify(), before the actions are
invoked, we fix this crash and enable TCA_MATCHALL_CLASSID functionality,
that had no effect on 'matchall' classifier since its first introduction.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1460213
Reported-by: Jiri Benc <jbenc@redhat.com>
Fixes: b87f7936a9 ("net/sched: introduce Match-all classifier")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If ipv6 has been disabled from cmdline since kernel started, it makes
no sense to allow users to create any ip6 tunnel. Otherwise, it could
some potential problem.
Jianlin found a kernel crash caused by this in ip6_gre when he set
ipv6.disable=1 in grub:
[ 209.588865] Unable to handle kernel paging request for data at address 0x00000080
[ 209.588872] Faulting instruction address: 0xc000000000a3aa6c
[ 209.588879] Oops: Kernel access of bad area, sig: 11 [#1]
[ 209.589062] NIP [c000000000a3aa6c] fib_rules_lookup+0x4c/0x260
[ 209.589071] LR [c000000000b9ad90] fib6_rule_lookup+0x50/0xb0
[ 209.589076] Call Trace:
[ 209.589097] fib6_rule_lookup+0x50/0xb0
[ 209.589106] rt6_lookup+0xc4/0x110
[ 209.589116] ip6gre_tnl_link_config+0x214/0x2f0 [ip6_gre]
[ 209.589125] ip6gre_newlink+0x138/0x3a0 [ip6_gre]
[ 209.589134] rtnl_newlink+0x798/0xb80
[ 209.589142] rtnetlink_rcv_msg+0xec/0x390
[ 209.589151] netlink_rcv_skb+0x138/0x150
[ 209.589159] rtnetlink_rcv+0x48/0x70
[ 209.589169] netlink_unicast+0x538/0x640
[ 209.589175] netlink_sendmsg+0x40c/0x480
[ 209.589184] ___sys_sendmsg+0x384/0x4e0
[ 209.589194] SyS_sendmsg+0xd4/0x140
[ 209.589201] SyS_socketcall+0x3e0/0x4f0
[ 209.589209] system_call+0x38/0xe0
This patch is to return -EOPNOTSUPP in ip6_tunnel_init if ipv6 has been
disabled from cmdline.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To clear Speed Selection in MDIO control register(0x10),
ie, clear bits 6 and 13 to zero while keeping other bits same.
Before AND operation,The Mask value has to be perform with bitwise NOT
operation (ie, ~ operator)
This patch clears current speed selection before writing the
new speed settings to gmii2rgmii converter
Fixes: f411a6160b ("net: phy: Add gmiitorgmii converter support")
Signed-off-by: Fahad Kunnathadi <fahad.kunnathadi@dexceldesigns.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now in ip6gre_header before packing the ipv6 header, it skb_push t->hlen
which only includes encap_hlen + tun_hlen. It means greh and inner header
would be over written by ipv6 stuff and ipv6h might have no chance to set
up.
Jianlin found this issue when using remote any on ip6_gre, the packets he
captured on gre dev are truncated:
22:50:26.210866 Out ethertype IPv6 (0x86dd), length 120: truncated-ip6 -\
8128 bytes missing!(flowlabel 0x92f40, hlim 0, next-header Options (0) \
payload length: 8192) ::1:2000:0 > ::1:0:86dd: HBH [trunc] ip-proto-128 \
8184
It should also skb_push ipv6hdr so that ipv6h points to the right position
to set ipv6 stuff up.
This patch is to skb_push hlen + sizeof(*ipv6h) and also fix some indents
in ip6gre_header.
Fixes: c12b395a46 ("gre: Support GRE over IPv6")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If TX rates are specified during mesh join, the channel must
also be specified. Check the channel pointer to avoid a null
pointer dereference if it isn't.
Reported-by: Jouni Malinen <j@w1.fi>
Fixes: 8564e38206 ("cfg80211: add checks for beacon rate, extend to mesh")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
While trying an ESP transport mode encryption for UDPv6 packets of
datagram size 1436 with MTU 1500, checksum error was observed in
the secondary fragment.
This error occurs due to the UDP payload checksum being missed out
when computing the full checksum for these packets in
udp6_hwcsum_outgoing().
Fixes: d39d938c82 ("ipv6: Introduce udpv6_send_skb()")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Minor improvements
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJZvXBZAAoJEGb5WYXrGLvB7agQAIAUlQCUuVIChJVs2YYEyyR5
PCkgFC/sjQdT3nAUpvZohP5sTwdSzBaxWWFv6lbPmh/1AFzukF95sfehsBn2qPI7
enuaSEmc5CUiuj76aTspS10l0G8oLS721pnTbTxicaq1ZHl+NIMkAFwRtaTdileq
4WBS+cfdqdj4aR/qWOTmJ8OXWEL7oHLFT1Xnc+uJpD/SltvWBj9gG8fJiwQsWIi7
qiB5KeUoU3XxYI4hQkvuhr81xhvM9iz9M/4fDJocdKS4aU1Aw/btnhXgU1yG1nOm
1Jwqxr3idrbdkLBajn9MK/DxfmdkYbcb93O7kw5QuHMvXzzhVjB824PAR/k2iK3B
3JSpGFfRgtCEQk5NeiQb0NIVTbc+H133ppxoDr9CTxgikvza3Utc27oDIxUkQKuZ
Cp/2TcoD0e1/+ul22QO8TQLjfpSVo34wLOO7aBEfuWtwYwPoSkfJv0BjBzHGdQUj
XQXjdItPhlNp/tEHdto62LIZzae2C+FEHYWWIRRSh505tM8V/69skOaOp3NRmFUQ
SmX6Wzuc1i6b3UjegvIJsZfrY+0eZNE9n3UufSyqhT6GKV0oOZhyvTsrIkGuJ3L5
LJm+3BApQGEVi2rHFa8dBpcibVn7kIFVVTmKTPkOpen9AnWXlsrjg41N02FxNo+J
ewhIkECzGMxLxtxHy+yp
=gwDT
-----END PGP SIGNATURE-----
Merge tag 'upstream-4.14-rc1' of git://git.infradead.org/linux-ubifs
Pull UBI updates from Richard Weinberger:
"Minor improvements"
* tag 'upstream-4.14-rc1' of git://git.infradead.org/linux-ubifs:
UBI: Fix two typos in comments
ubi: fastmap: fix spelling mistake: "invalidiate" -> "invalidate"
ubi: pr_err() strings should end with newlines
ubi: pr_err() strings should end with newlines
ubi: pr_err() strings should end with newlines
Pull UML updates from Richard Weinberger:
- minor improvements
- fixes for Debian's new gcc defaults (pie enabled by default)
- fixes for XSTATE/XSAVE to make UML work again on modern systems
* 'for-linus-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: return negative in tuntap_open_tramp()
um: remove a stray tab
um: Use relative modversions with LD_SCRIPT_DYN
um: link vmlinux with -no-pie
um: Fix CONFIG_GCOV for modules.
Fix minor typos and grammar in UML start_up help
um: defconfig: Cleanup from old Kconfig options
um: Fix FP register size for XSTATE/XSAVE
Pull networking fixes from David Miller:
1) Fix hotplug deadlock in hv_netvsc, from Stephen Hemminger.
2) Fix double-free in rmnet driver, from Dan Carpenter.
3) INET connection socket layer can double put request sockets, fix
from Eric Dumazet.
4) Don't match collect metadata-mode tunnels if the device is down,
from Haishuang Yan.
5) Do not perform TSO6/GSO on ipv6 packets with extensions headers in
be2net driver, from Suresh Reddy.
6) Fix scaling error in gen_estimator, from Eric Dumazet.
7) Fix 64-bit statistics deadlock in systemport driver, from Florian
Fainelli.
8) Fix use-after-free in sctp_sock_dump, from Xin Long.
9) Reject invalid BPF_END instructions in verifier, from Edward Cree.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
mlxsw: spectrum_router: Only handle IPv4 and IPv6 events
Documentation: link in networking docs
tcp: fix data delivery rate
bpf/verifier: reject BPF_ALU64|BPF_END
sctp: do not mark sk dumped when inet_sctp_diag_fill returns err
sctp: fix an use-after-free issue in sctp_sock_dump
netvsc: increase default receive buffer size
tcp: update skb->skb_mstamp more carefully
net: ipv4: fix l3slave check for index returned in IP_PKTINFO
net: smsc911x: Quieten netif during suspend
net: systemport: Fix 64-bit stats deadlock
net: vrf: avoid gcc-4.6 warning
qed: remove unnecessary call to memset
tg3: clean up redundant initialization of tnapi
tls: make tls_sw_free_resources static
sctp: potential read out of bounds in sctp_ulpevent_type_enabled()
MAINTAINERS: review Renesas DT bindings as well
net_sched: gen_estimator: fix scaling error in bytes/packets samples
nfp: wait for the NSP resource to appear on boot
nfp: wait for board state before talking to the NSP
...
Pull more input updates from Dmitry Torokhov:
"A second round of updates for the input subsystem:
- a new driver for PWM-controlled vibrators
- ucb1400 touchscreen driver had completely busted suspend/resume
handling
- we now handle "home" button found on some devices with Goodix
touchscreens
- assorted other fixups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - add Gigabyte P57 to the keyboard reset table
Input: xpad - validate USB endpoint type during probe
Input: ucb1400_ts - fix suspend and resume handling
Input: edt-ft5x06 - fix access to non-existing register
Input: elantech - make arrays debounce_packet static, reduces object code size
Input: surface3_spi - make const array header static, reduces object code size
Input: goodix - add support for capacitive home button
Input: add a driver for PWM controllable vibrators
Input: adi - make array seq static, reduces object code size
Commit 5620a0d1aa ("firmware: delete in-kernel firmware") removed the
entire firmware directory. Unfortunately it thereby also removed the
support for built-in firmware.
This restores the ability to build firmware directly into the kernel by
pruning the original Makefile to the necessary minimum. The default for
EXTRA_FIRMWARE_DIR is now the standard directory /lib/firmware/.
Fixes: 5620a0d1aa ("firmware: delete in-kernel firmware")
Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Acked-by: Greg K-H <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The driver doesn't support events from address families other than IPv4
and IPv6, so ignore them. Otherwise, we risk queueing a work item before
it's initialized.
This can happen in case a VRF is configured when MROUTE_MULTIPLE_TABLES
is enabled, as the VRF driver will try to add an l3mdev rule for the
IPMR family.
Fixes: 65e65ec137 ("mlxsw: spectrum_router: Don't ignore IPv6 notifications")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Andreas Rammhold <andreas@rammhold.de>
Reported-by: Florian Klink <flokli@flokli.de>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now skb->mstamp_skb is updated later, we also need to call
tcp_rate_skb_sent() after the update is done.
Fixes: 8c72c65b42 ("tcp: update skb->skb_mstamp more carefully")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull MIPS updates from Ralf Baechle:
"This is the main pull request for 4.14 for MIPS; below a summary of
the non-merge commits:
CM:
- Rename mips_cm_base to mips_gcr_base
- Specify register size when generating accessors
- Use BIT/GENMASK for register fields, order & drop shifts
- Add cluster & block args to mips_cm_lock_other()
CPC:
- Use common CPS accessor generation macros
- Use BIT/GENMASK for register fields, order & drop shifts
- Introduce register modify (set/clear/change) accessors
- Use change_*, set_* & clear_* where appropriate
- Add CM/CPC 3.5 register definitions
- Use GlobalNumber macros rather than magic numbers
- Have asm/mips-cps.h include CM & CPC headers
- Cluster support for topology functions
- Detect CPUs in secondary clusters
CPS:
- Read GIC_VL_IDENT directly, not via irqchip driver
DMA:
- Consolidate coherent and non-coherent dma_alloc code
- Don't use dma_cache_sync to implement fd_cacheflush
FPU emulation / FP assist code:
- Another series of 14 commits fixing corner cases such as NaN
propgagation and other special input values.
- Zero bits 32-63 of the result for a CLASS.D instruction.
- Enhanced statics via debugfs
- Do not use bools for arithmetic. GCC 7.1 moans about this.
- Correct user fault_addr type
Generic MIPS:
- Enhancement of stack backtraces
- Cleanup from non-existing options
- Handle non word sized instructions when examining frame
- Fix detection and decoding of ADDIUSP instruction
- Fix decoding of SWSP16 instruction
- Refactor handling of stack pointer in get_frame_info
- Remove unreachable code from force_fcr31_sig()
- Convert to using %pOF instead of full_name
- Remove the R6000 support.
- Move FP code from *_switch.S to *_fpu.S
- Remove unused ST_OFF from r2300_switch.S
- Allow platform to specify multiple its.S files
- Add #includes to various files to ensure code builds reliable and
without warning..
- Remove __invalidate_kernel_vmap_range
- Remove plat_timer_setup
- Declare various variables & functions static
- Abstract CPU core & VP(E) ID access through accessor functions
- Store core & VP IDs in GlobalNumber-style variable
- Unify checks for sibling CPUs
- Add CPU cluster number accessors
- Prevent direct use of generic_defconfig
- Make CONFIG_MIPS_MT_SMP default y
- Add __ioread64_copy
- Remove unnecessary inclusions of linux/irqchip/mips-gic.h
GIC:
- Introduce asm/mips-gic.h with accessor functions
- Use new GIC accessor functions in mips-gic-timer
- Remove counter access functions from irq-mips-gic.c
- Remove gic_read_local_vp_id() from irq-mips-gic.c
- Simplify shared interrupt pending/mask reads in irq-mips-gic.c
- Simplify gic_local_irq_domain_map() in irq-mips-gic.c
- Drop gic_(re)set_mask() functions in irq-mips-gic.c
- Remove gic_set_polarity(), gic_set_trigger(), gic_set_dual_edge(),
gic_map_to_pin() and gic_map_to_vpe() from irq-mips-gic.c.
- Convert remaining shared reg access, local int mask access and
remaining local reg access to new accessors
- Move GIC_LOCAL_INT_* to asm/mips-gic.h
- Remove GIC_CPU_INT* macros from irq-mips-gic.c
- Move various definitions to the driver
- Remove gic_get_usm_range()
- Remove __gic_irq_dispatch() forward declaration
- Remove gic_init()
- Use mips_gic_present() in place of gic_present and remove
gic_present
- Move gic_get_c0_*_int() to asm/mips-gic.h
- Remove linux/irqchip/mips-gic.h
- Inline __gic_init()
- Inline gic_basic_init()
- Make pcpu_masks a per-cpu variable
- Use pcpu_masks to avoid reading GIC_SH_MASK*
- Clean up mti, reserved-cpu-vectors handling
- Use cpumask_first_and() in gic_set_affinity()
- Let the core set struct irq_common_data affinity
microMIPS:
- Fix microMIPS stack unwinding on big endian systems
MIPS-GIC:
- SYNC after enabling GIC region
NUMA:
- Remove the unused parent_node() macro
R6:
- Constify r2_decoder_tables
- Add accessor & bit definitions for GlobalNumber
SMP:
- Constify smp ops
- Allow boot_secondary SMP op to return errors
VDSO:
- Drop gic_get_usm_range() usage
- Avoid use of linux/irqchip/mips-gic.h
Platform changes:
Alchemy:
- Add devboard machine type to cpuinfo
- update cpu feature overrides
- Threaded carddetect irqs for devboards
AR7:
- allow NULL clock for clk_get_rate
BCM63xx:
- Fix ENETDMA_6345_MAXBURST_REG offset
- Allow NULL clock for clk_get_rate
CI20:
- Enable GPIO and RTC drivers in defconfig
- Add ethernet and fixed-regulator nodes to DTS
Generic platform:
- Move Boston and NI 169445 FIT image source to their own files
- Include asm/bootinfo.h for plat_fdt_relocated()
- Include asm/time.h for get_c0_*_int()
- Include asm/bootinfo.h for plat_fdt_relocated()
- Include asm/time.h for get_c0_*_int()
- Allow filtering enabled boards by requirements
- Don't explicitly disable CONFIG_USB_SUPPORT
- Bump default NR_CPUS to 16
JZ4700:
- Probe the jz4740-rtc driver from devicetree
Lantiq:
- Drop check of boot select from the spi-falcon driver.
- Drop check of boot select from the lantiq-flash MTD driver.
- Access boot cause register in the watchdog driver through regmap
- Add device tree binding documentation for the watchdog driver
- Add docs for the RCU DT bindings.
- Convert the fpi bus driver to a platform_driver
- Remove ltq_reset_cause() and ltq_boot_select(
- Switch to a proper reset driver
- Switch to a new drivers/soc GPHY driver
- Add an USB PHY driver for the Lantiq SoCs using the RCU module
- Use of_platform_default_populate instead of __dt_register_buses
- Enable MFD_SYSCON to be able to use it for the RCU MFD
- Replace ltq_boot_select() with dummy implementation.
Loongson 2F:
- Allow NULL clock for clk_get_rate
Malta:
- Use new GIC accessor functions
NI 169445:
- Add support for NI 169445 board.
- Only include in 32r2el kernels
Octeon:
- Add support for watchdog of 78XX SOCs.
- Add support for watchdog of CN68XX SOCs.
- Expose support for mips32r1, mips32r2 and mips64r1
- Enable more drivers in config file
- Add support for accessing the boot vector.
- Remove old boot vector code from watchdog driver
- Define watchdog registers for 70xx, 73xx, 78xx, F75xx.
- Make CSR functions node aware.
- Allow access to CIU3 IRQ domains.
- Misc cleanups in the watchdog driver
Omega2+:
- New board, add support and defconfig
Pistachio:
- Enable Root FS on NFS in defconfig
Ralink:
- Add Mediatek MT7628A SoC
- Allow NULL clock for clk_get_rate
- Explicitly request exclusive reset control in the pci-mt7620 PCI driver.
SEAD3:
- Only include in 32 bit kernels by default
VoCore:
- Add VoCore as a vendor t0 dt-bindings
- Add defconfig file"
* '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (167 commits)
MIPS: Refactor handling of stack pointer in get_frame_info
MIPS: Stacktrace: Fix microMIPS stack unwinding on big endian systems
MIPS: microMIPS: Fix decoding of swsp16 instruction
MIPS: microMIPS: Fix decoding of addiusp instruction
MIPS: microMIPS: Fix detection of addiusp instruction
MIPS: Handle non word sized instructions when examining frame
MIPS: ralink: allow NULL clock for clk_get_rate
MIPS: Loongson 2F: allow NULL clock for clk_get_rate
MIPS: BCM63XX: allow NULL clock for clk_get_rate
MIPS: AR7: allow NULL clock for clk_get_rate
MIPS: BCM63XX: fix ENETDMA_6345_MAXBURST_REG offset
mips: Save all registers when saving the frame
MIPS: Add DWARF unwinding to assembly
MIPS: Make SAVE_SOME more standard
MIPS: Fix issues in backtraces
MIPS: jz4780: DTS: Probe the jz4740-rtc driver from devicetree
MIPS: Ci20: Enable RTC driver
watchdog: octeon-wdt: Add support for 78XX SOCs.
watchdog: octeon-wdt: Add support for cn68XX SOCs.
watchdog: octeon-wdt: File cleaning.
...
Pull more i2c updates from Wolfram Sang:
"I2C has two more new drivers: Altera FPGA and STM32F7"
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: i2c-stm32f7: add driver
i2c: i2c-stm32f4: use generic definition of speed enum
dt-bindings: i2c-stm32: Document the STM32F7 I2C bindings
i2c: altera: Add Altera I2C Controller driver
dt-bindings: i2c: Add Altera I2C Controller
Neither ___bpf_prog_run nor the JITs accept it.
Also adds a new test case.
Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_diag would not actually dump out sk/asoc if inet_sctp_diag_fill
returns err, in which case it shouldn't mark sk dumped by setting
cb->args[3] as 1 in sctp_sock_dump().
Otherwise, it could cause some asocs to have no parent's sk dumped
in 'ss --sctp'.
So this patch is to not set cb->args[3] when inet_sctp_diag_fill()
returns err in sctp_sock_dump().
Fixes: 8f840e47f1 ("sctp: add the sctp_diag.c file")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 86fdb3448c ("sctp: ensure ep is not destroyed before doing the
dump") tried to fix an use-after-free issue by checking !sctp_sk(sk)->ep
with holding sock and sock lock.
But Paolo noticed that endpoint could be destroyed in sctp_rcv without
sock lock protection. It means the use-after-free issue still could be
triggered when sctp_rcv put and destroy ep after sctp_sock_dump checks
!ep, although it's pretty hard to reproduce.
I could reproduce it by mdelay in sctp_rcv while msleep in sctp_close
and sctp_sock_dump long time.
This patch is to add another param cb_done to sctp_for_each_transport
and dump ep->assocs with holding tsp after jumping out of transport's
traversal in it to avoid this issue.
It can also improve sctp diag dump to make it run faster, as no need
to save sk into cb->args[5] and keep calling sctp_for_each_transport
any more.
This patch is also to use int * instead of int for the pos argument
in sctp_for_each_transport, which could make postion increment only
in sctp_for_each_transport and no need to keep changing cb->args[2]
in sctp_sock_filter and sctp_sock_dump any more.
Fixes: 86fdb3448c ("sctp: ensure ep is not destroyed before doing the dump")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The default receive buffer size was reduced by recent change
to a value which was appropriate for 10G and Windows Server 2016.
But the value is too small for full performance with 40G on Azure.
Increase the default back to maximum supported by host.
Fixes: 8b5327975a ("netvsc: allow controlling send/recv buffer size")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
liujian reported a problem in TCP_USER_TIMEOUT processing with a patch
in tcp_probe_timer() :
https://www.spinics.net/lists/netdev/msg454496.html
After investigations, the root cause of the problem is that we update
skb->skb_mstamp of skbs in write queue, even if the attempt to send a
clone or copy of it failed. One reason being a routing problem.
This patch prevents this, solving liujian issue.
It also removes a potential RTT miscalculation, since
__tcp_retransmit_skb() is not OR-ing TCP_SKB_CB(skb)->sacked with
TCPCB_EVER_RETRANS if a failure happens, but skb->skb_mstamp has
been changed.
A future ACK would then lead to a very small RTT sample and min_rtt
would then be lowered to this too small value.
Tested:
# cat user_timeout.pkt
--local_ip=192.168.102.64
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 `ifconfig tun0 192.168.102.64/16; ip ro add 192.0.2.1 dev tun0`
+0 < S 0:0(0) win 0 <mss 1460>
+0 > S. 0:0(0) ack 1 <mss 1460>
+.1 < . 1:1(0) ack 1 win 65530
+0 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
+0 write(4, ..., 24) = 24
+0 > P. 1:25(24) ack 1 win 29200
+.1 < . 1:1(0) ack 25 win 65530
//change the ipaddress
+1 `ifconfig tun0 192.168.0.10/16`
+1 write(4, ..., 24) = 24
+1 write(4, ..., 24) = 24
+1 write(4, ..., 24) = 24
+1 write(4, ..., 24) = 24
+0 `ifconfig tun0 192.168.102.64/16`
+0 < . 1:2(1) ack 25 win 65530
+0 `ifconfig tun0 192.168.0.10/16`
+3 write(4, ..., 24) = -1
# ./packetdrill user_timeout.pkt
Signed-off-by: Eric Dumazet <edumazet@googl.com>
Reported-by: liujian <liujian56@huawei.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rt_iif is only set to the actual egress device for the output path. The
recent change to consider the l3slave flag when returning IP_PKTINFO
works for local traffic (the correct device index is returned), but it
broke the more typical use case of packets received from a remote host
always returning the VRF index rather than the original ingress device.
Update the fixup to consider l3slave and rt_iif actually getting set.
Fixes: 1dfa76390b ("net: ipv4: add check for l3slave for index returned in IP_PKTINFO")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the network interface is kept running during suspend, the net core
may call net_device_ops.ndo_start_xmit() while the Ethernet device is
still suspended, which may lead to a system crash.
E.g. on sh73a0/kzm9g and r8a73a4/ape6evm, the external Ethernet chip is
driven by a PM controlled clock. If the Ethernet registers are accessed
while the clock is not running, the system will crash with an imprecise
external abort.
As this is a race condition with a small time window, it is not so easy
to trigger at will. Using pm_test may increase your chances:
# echo 0 > /sys/module/printk/parameters/console_suspend
# echo platform > /sys/power/pm_test
# echo mem > /sys/power/state
To fix this, make sure the network interface is quietened during
suspend.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can enter a deadlock situation because there is no sufficient protection
when ndo_get_stats64() runs in process context to guard against RX or TX NAPI
contexts running in softirq, this can lead to the following lockdep splat and
actual deadlock was experienced as well with an iperf session in the background
and a while loop doing ifconfig + ethtool.
[ 5.780350] ================================
[ 5.784679] WARNING: inconsistent lock state
[ 5.789011] 4.13.0-rc7-02179-g32fae27c725d #70 Not tainted
[ 5.794561] --------------------------------
[ 5.798890] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 5.804971] swapper/0/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[ 5.810175] (&syncp->seq#2){+.?...}, at: [<c0768a28>] bcm_sysport_tx_reclaim+0x30/0x54
[ 5.818327] {SOFTIRQ-ON-W} state was registered at:
[ 5.823278] bcm_sysport_get_stats64+0x17c/0x258
[ 5.828053] dev_get_stats+0x38/0xac
[ 5.831776] rtnl_fill_stats+0x30/0x118
[ 5.835761] rtnl_fill_ifinfo+0x538/0xe24
[ 5.839921] rtmsg_ifinfo_build_skb+0x6c/0xd8
[ 5.844430] rtmsg_ifinfo_event.part.5+0x14/0x44
[ 5.849201] rtmsg_ifinfo+0x20/0x28
[ 5.852837] register_netdevice+0x628/0x6b8
[ 5.857171] register_netdev+0x14/0x24
[ 5.861051] bcm_sysport_probe+0x30c/0x438
[ 5.865280] platform_drv_probe+0x50/0xb0
[ 5.869418] driver_probe_device+0x2e8/0x450
[ 5.873817] __driver_attach+0x104/0x120
[ 5.877871] bus_for_each_dev+0x7c/0xc0
[ 5.881834] bus_add_driver+0x1b0/0x270
[ 5.885797] driver_register+0x78/0xf4
[ 5.889675] do_one_initcall+0x54/0x190
[ 5.893646] kernel_init_freeable+0x144/0x1d0
[ 5.898135] kernel_init+0x8/0x110
[ 5.901665] ret_from_fork+0x14/0x2c
[ 5.905363] irq event stamp: 24263
[ 5.908804] hardirqs last enabled at (24262): [<c08eecf0>] net_rx_action+0xc4/0x4e4
[ 5.916624] hardirqs last disabled at (24263): [<c0a7da00>] _raw_spin_lock_irqsave+0x1c/0x98
[ 5.925143] softirqs last enabled at (24258): [<c022a7fc>] irq_enter+0x84/0x98
[ 5.932524] softirqs last disabled at (24259): [<c022a918>] irq_exit+0x108/0x16c
[ 5.939985]
[ 5.939985] other info that might help us debug this:
[ 5.946576] Possible unsafe locking scenario:
[ 5.946576]
[ 5.952556] CPU0
[ 5.955031] ----
[ 5.957506] lock(&syncp->seq#2);
[ 5.960955] <Interrupt>
[ 5.963604] lock(&syncp->seq#2);
[ 5.967227]
[ 5.967227] *** DEADLOCK ***
[ 5.967227]
[ 5.973222] 1 lock held by swapper/0/0:
[ 5.977092] #0: (&(&ring->lock)->rlock){..-...}, at: [<c0768a18>] bcm_sysport_tx_reclaim+0x20/0x54
So just remove the u64_stats_update_begin()/end() pair in ndo_get_stats64()
since it does not appear to be useful for anything. No inconsistency was
observed with either ifconfig or ethtool, global TX counts equal the sum of
per-queue TX counts on a 32-bit architecture.
Fixes: 10377ba767 ("net: systemport: Support 64bit statistics")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>