Commit Graph

54624 Commits

Author SHA1 Message Date
Eric Dumazet 5bf325a532 net: fix possible overflow in __sk_mem_raise_allocated()
With many active TCP sockets, fat TCP sockets could fool
__sk_mem_raise_allocated() thanks to an overflow.

They would increase their share of the memory, instead
of decreasing it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 21:05:18 -08:00
Xin Long af98c5a785 sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrate
In sctp_stream_init(), after sctp_stream_outq_migrate() freed the
surplus streams' ext, but sctp_stream_alloc_out() returns -ENOMEM,
stream->outcnt will not be set to 'outcnt'.

With the bigger value on stream->outcnt, when closing the assoc and
freeing its streams, the ext of those surplus streams will be freed
again since those stream exts were not set to NULL after freeing in
sctp_stream_outq_migrate(). Then the invalid-free issue reported by
syzbot would be triggered.

We fix it by simply setting them to NULL after freeing.

Fixes: 5bbbbe32a4 ("sctp: introduce stream scheduler foundations")
Reported-by: syzbot+58e480e7b28f2d890bfd@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 19:33:44 -05:00
Xin Long fc228abc23 sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment
Jianlin reported a panic when running sctp gso over gre over vlan device:

  [   84.772930] RIP: 0010:do_csum+0x6d/0x170
  [   84.790605] Call Trace:
  [   84.791054]  csum_partial+0xd/0x20
  [   84.791657]  gre_gso_segment+0x2c3/0x390
  [   84.792364]  inet_gso_segment+0x161/0x3e0
  [   84.793071]  skb_mac_gso_segment+0xb8/0x120
  [   84.793846]  __skb_gso_segment+0x7e/0x180
  [   84.794581]  validate_xmit_skb+0x141/0x2e0
  [   84.795297]  __dev_queue_xmit+0x258/0x8f0
  [   84.795949]  ? eth_header+0x26/0xc0
  [   84.796581]  ip_finish_output2+0x196/0x430
  [   84.797295]  ? skb_gso_validate_network_len+0x11/0x80
  [   84.798183]  ? ip_finish_output+0x169/0x270
  [   84.798875]  ip_output+0x6c/0xe0
  [   84.799413]  ? ip_append_data.part.50+0xc0/0xc0
  [   84.800145]  iptunnel_xmit+0x144/0x1c0
  [   84.800814]  ip_tunnel_xmit+0x62d/0x930 [ip_tunnel]
  [   84.801699]  gre_tap_xmit+0xac/0xf0 [ip_gre]
  [   84.802395]  dev_hard_start_xmit+0xa5/0x210
  [   84.803086]  sch_direct_xmit+0x14f/0x340
  [   84.803733]  __dev_queue_xmit+0x799/0x8f0
  [   84.804472]  ip_finish_output2+0x2e0/0x430
  [   84.805255]  ? skb_gso_validate_network_len+0x11/0x80
  [   84.806154]  ip_output+0x6c/0xe0
  [   84.806721]  ? ip_append_data.part.50+0xc0/0xc0
  [   84.807516]  sctp_packet_transmit+0x716/0xa10 [sctp]
  [   84.808337]  sctp_outq_flush+0xd7/0x880 [sctp]

It was caused by SKB_GSO_CB(skb)->csum_start not set in sctp_gso_segment.
sctp_gso_segment() calls skb_segment() with 'feature | NETIF_F_HW_CSUM',
which causes SKB_GSO_CB(skb)->csum_start not to be set in skb_segment().

For TCP/UDP, when feature supports HW_CSUM, CHECKSUM_PARTIAL will be set
and gso_reset_checksum will be called to set SKB_GSO_CB(skb)->csum_start.

So SCTP should do the same as TCP/UDP, to call gso_reset_checksum() when
computing checksum in sctp_gso_segment.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 19:31:43 -05:00
David S. Miller f325ef7297 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for net:

1) Missing structure initialization in ebtables causes splat with
   32-bit user level on a 64-bit kernel, from Francesco Ruggeri.

2) Missing dependency on nf_defrag in IPVS IPv6 codebase, from
   Andrea Claudi.

3) Fix possible use-after-free from release path of target extensions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13 19:13:47 -05:00
Pablo Neira Ayuso 753c111f65 netfilter: nft_compat: use-after-free when deleting targets
Fetch pointer to module before target object is released.

Fixes: 29e3880109 ("netfilter: nf_tables: fix use-after-free when deleting compat expressions")
Fixes: 0ca743a559 ("netfilter: nf_tables: add compatibility layer for x_tables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-13 18:14:54 +01:00
Greg Kroah-Hartman ad6fef7769 rpc: properly check debugfs dentry before using it
debugfs can now report an error code if something went wrong instead of
just NULL.  So if the return value is to be used as a "real" dentry, it
needs to be checked if it is an error before dereferencing it.

This is now happening because of ff9fb72bc0 ("debugfs: return error
values, not NULL"), but why debugfs files are not being created properly
is an older issue, probably one that has always been there and should
probably be looked at...

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
Reported-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-12 15:51:39 -05:00
Nicolas Morey-Chaisemartin a4cb5bdb75 xprtrdma: Make sure Send CQ is allocated on an existing compvec
Make sure the device has at least 2 completion vectors
before allocating to compvec#1

Fixes: a4699f5647 (xprtrdma: Put Send CQ in IB_POLL_WORKQUEUE mode)
Signed-off-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-02-12 15:31:45 -05:00
Cong Wang 1db817e75f net_sched: fix two more memory leaks in cls_tcindex
struct tcindex_filter_result contains two parts:
struct tcf_exts and struct tcf_result.

For the local variable 'cr', its exts part is never used but
initialized without being released properly on success path. So
just completely remove the exts part to fix this leak.

For the local variable 'new_filter_result', it is never properly
released if not used by 'r' on success path.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 14:10:56 -05:00
Cong Wang 033b228e7f net_sched: fix a memory leak in cls_tcindex
When tcindex_destroy() destroys all the filter results in
the perfect hash table, it invokes the walker to delete
each of them. However, results with class==0 are skipped
in either tcindex_walk() or tcindex_delete(), which causes
a memory leak reported by kmemleak.

This patch fixes it by skipping the walker and directly
deleting these filter results so we don't miss any filter
result.

As a result of this change, we have to initialize exts->net
properly in tcindex_alloc_perfect_hash(). For net-next, we
need to consider whether we should initialize ->net in
tcf_exts_init() instead, before that just directly test
CONFIG_NET_CLS_ACT=y.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 14:10:56 -05:00
Cong Wang 8015d93ebd net_sched: fix a race condition in tcindex_destroy()
tcindex_destroy() invokes tcindex_destroy_element() via
a walker to delete each filter result in its perfect hash
table, and tcindex_destroy_element() calls tcindex_delete()
which schedules tcf RCU works to do the final deletion work.
Unfortunately this races with the RCU callback
__tcindex_destroy(), which could lead to use-after-free as
reported by Adrian.

Fix this by migrating this RCU callback to tcf RCU work too,
as that workqueue is ordered, we will not have use-after-free.

Note, we don't need to hold netns refcnt because we don't call
tcf_exts_destroy() here.

Fixes: 27ce4f05e2 ("net_sched: use tcf_queue_work() in tcindex filter")
Reported-by: Adrian <bugs@abtelecom.ro>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 14:10:56 -05:00
Kal Conley fc62814d69 net/packet: fix 4gb buffer limit due to overflow check
When calculating rb->frames_per_block * req->tp_block_nr the result
can overflow. Check it for overflow without limiting the total buffer
size to UINT_MAX.

This change fixes support for packet ring buffers >= UINT_MAX.

Fixes: 8f8d28e4d6 ("net/packet: fix overflow in check for tp_frame_nr")
Signed-off-by: Kal Conley <kal.conley@dectris.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:37:23 -05:00
Konstantin Khlebnikov 1ec17dbd90 inet_diag: fix reporting cgroup classid and fallback to priority
Field idiag_ext in struct inet_diag_req_v2 used as bitmap of requested
extensions has only 8 bits. Thus extensions starting from DCTCPINFO
cannot be requested directly. Some of them included into response
unconditionally or hook into some of lower 8 bits.

Extension INET_DIAG_CLASS_ID has not way to request from the beginning.

This patch bundle it with INET_DIAG_TCLASS (ipv6 tos), fixes space
reservation, and documents behavior for other extensions.

Also this patch adds fallback to reporting socket priority. This filed
is more widely used for traffic classification because ipv4 sockets
automatically maps TOS to priority and default qdisc pfifo_fast knows
about that. But priority could be changed via setsockopt SO_PRIORITY so
INET_DIAG_TOS isn't enough for predicting class.

Also cgroup2 obsoletes net_cls classid (it always zero), but we cannot
reuse this field for reporting cgroup2 id because it is 64-bit (ino+gen).

So, after this patch INET_DIAG_CLASS_ID will report socket priority
for most common setup when net_cls isn't set and/or cgroup2 in use.

Fixes: 0888e372c3 ("net: inet: diag: expose sockets cgroup classid")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:35:57 -05:00
Eric Dumazet 4ffcbfac60 batman-adv: fix uninit-value in batadv_interface_tx()
KMSAN reported batadv_interface_tx() was possibly using a
garbage value [1]

batadv_get_vid() does have a pskb_may_pull() call
but batadv_interface_tx() does not actually make sure
this did not fail.

[1]
BUG: KMSAN: uninit-value in batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
CPU: 0 PID: 10006 Comm: syz-executor469 Not tainted 4.20.0-rc7+ #5
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x173/0x1d0 lib/dump_stack.c:113
 kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
 __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
 batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
 __netdev_start_xmit include/linux/netdevice.h:4356 [inline]
 netdev_start_xmit include/linux/netdevice.h:4365 [inline]
 xmit_one net/core/dev.c:3257 [inline]
 dev_hard_start_xmit+0x607/0xc40 net/core/dev.c:3273
 __dev_queue_xmit+0x2e42/0x3bc0 net/core/dev.c:3843
 dev_queue_xmit+0x4b/0x60 net/core/dev.c:3876
 packet_snd net/packet/af_packet.c:2928 [inline]
 packet_sendmsg+0x8306/0x8f30 net/packet/af_packet.c:2953
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg net/socket.c:631 [inline]
 __sys_sendto+0x8c4/0xac0 net/socket.c:1788
 __do_sys_sendto net/socket.c:1800 [inline]
 __se_sys_sendto+0x107/0x130 net/socket.c:1796
 __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x441889
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffdda6fd468 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000441889
RDX: 000000000000000e RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000216 R12: 00007ffdda6fd4c0
R13: 00007ffdda6fd4b0 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
 kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
 kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
 kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
 slab_post_alloc_hook mm/slab.h:446 [inline]
 slab_alloc_node mm/slub.c:2759 [inline]
 __kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
 __kmalloc_reserve net/core/skbuff.c:137 [inline]
 __alloc_skb+0x309/0xa20 net/core/skbuff.c:205
 alloc_skb include/linux/skbuff.h:998 [inline]
 alloc_skb_with_frags+0x1c7/0xac0 net/core/skbuff.c:5220
 sock_alloc_send_pskb+0xafd/0x10e0 net/core/sock.c:2083
 packet_alloc_skb net/packet/af_packet.c:2781 [inline]
 packet_snd net/packet/af_packet.c:2872 [inline]
 packet_sendmsg+0x661a/0x8f30 net/packet/af_packet.c:2953
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg net/socket.c:631 [inline]
 __sys_sendto+0x8c4/0xac0 net/socket.c:1788
 __do_sys_sendto net/socket.c:1800 [inline]
 __se_sys_sendto+0x107/0x130 net/socket.c:1796
 __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7

Fixes: c6c8fea297 ("net: Add batman-adv meshing protocol")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc:	Marek Lindner <mareklindner@neomailbox.ch>
Cc:	Simon Wunderlich <sw@simonwunderlich.de>
Cc:	Antonio Quartulli <a@unstable.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 13:30:43 -05:00
Li RongQing d1f20798a1 ipv6: propagate genlmsg_reply return code
genlmsg_reply can fail, so propagate its return code

Fixes: 915d7e5e59 ("ipv6: sr: add code base for control plane support of SR-IPv6")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 12:36:02 -05:00
David S. Miller 1ea06107b0 Just a few fixes:
* aggregation session teardown with internal TXQs was
    continuing to send some frames marked as aggregation,
    fix from Ilan
  * IBSS join was missed during firmware restart, should
    such a thing happen
  * speculative execution based on the return value of
    cfg80211_classify8021d() - which is controlled by the
    sender of the packet - could be problematic in some
    code using it, prevent it
  * a few peer measurement fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlxisjgACgkQB8qZga/f
 l8T5aw//WUuc0SIHWvEAMVjndz6uz3y0nLiurVEwi/kMggRx9HYQZpp0P4wyQEDK
 KZ8tOpzvzfeE7hJe8Tz/6mrvkgxHOGJyxTCikB03z3lI+oRbqrfLJOONCqKafWua
 T/9E2LZmwz++8eodShWcVgcm3S0VAZCjFXybNQhRT3BT4rM3v+px8cVw8CtbW2G/
 IOLrgbkr9c8DuaJltO287/y8vjUJZTZHbFcYDFKdNmo+gX/zpKXFSjWrB3zkMdZr
 66bsiTJ7OP4efpPl1WY4Wbmo4yRt+gkN4aMmlI0ssa04IoQuET0SoyAPXR/T0KL5
 rEt1jO2PcNOLBND4QHHfjzT17zmqZaI30/1xnOuo6RE7b7plQEglZn/ssZcmchSD
 UPLShFx/bule+4+hyRhn3FdhMKkrUL+6SdYVPs8p6AneGV1auH22SPJB8GDJfkjM
 qWm8SKgvph+A2JzJzWzcUFXZoOlsnxP+hYdwseWiPimK/k63Afe1KCE2Is8PgLwZ
 v2o3qNmerxM2jwxGYa8dk5ezfvLAqYM1X/qhgodguUQKnFCAnuIxjoxR8o3uuiwe
 bbynlpN5NG8tG32SY1y5fU7TDBeyBZO9BtpHhyPu3p/tIDX63XZApoe1fy/V/q6M
 y+ETFzDL18yrfivplrSSa3i2hej+sjKmpPxHsgFd6I++7CdMZcI=
 =NssM
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2019-02-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just a few fixes:
 * aggregation session teardown with internal TXQs was
   continuing to send some frames marked as aggregation,
   fix from Ilan
 * IBSS join was missed during firmware restart, should
   such a thing happen
 * speculative execution based on the return value of
   cfg80211_classify8021d() - which is controlled by the
   sender of the packet - could be problematic in some
   code using it, prevent it
 * a few peer measurement fixes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12 11:43:19 -05:00
Björn Töpel 1e405c1a3f xsk: do not remove umem from netdevice on fall-back to copy-mode
Commit c9b47cc1fa ("xsk: fix bug when trying to use both copy and
zero-copy on one queue id") stores the umem into the netdev._rx
struct. However, the patch incorrectly removed the umem from the
netdev._rx struct when user-space passed "best-effort" mode
(i.e. select the fastest possible option available), and zero-copy
mode was not available. This commit fixes that.

Fixes: c9b47cc1fa ("xsk: fix bug when trying to use both copy and zero-copy on one queue id")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-12 17:04:45 +01:00
Andrea Claudi 098e13f5b2 ipvs: fix dependency on nf_defrag_ipv6
ipvs relies on nf_defrag_ipv6 module to manage IPv6 fragmentation,
but lacks proper Kconfig dependencies and does not explicitly
request defrag features.

As a result, if netfilter hooks are not loaded, when IPv6 fragmented
packet are handled by ipvs only the first fragment makes through.

Fix it properly declaring the dependency on Kconfig and registering
netfilter hooks on ip_vs_add_service() and ip_vs_new_dest().

Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Andrea Claudi <aclaudi@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-12 11:24:01 +01:00
Sean Tranchetti fc2d5cfdcf af_key: unconditionally clone on broadcast
Attempting to avoid cloning the skb when broadcasting by inflating
the refcount with sock_hold/sock_put while under RCU lock is dangerous
and violates RCU principles. It leads to subtle race conditions when
attempting to free the SKB, as we may reference sockets that have
already been freed by the stack.

Unable to handle kernel paging request at virtual address 6b6b6b6b6b6c4b
[006b6b6b6b6b6c4b] address between user and kernel address ranges
Internal error: Oops: 96000004 [#1] PREEMPT SMP
task: fffffff78f65b380 task.stack: ffffff8049a88000
pc : sock_rfree+0x38/0x6c
lr : skb_release_head_state+0x6c/0xcc
Process repro (pid: 7117, stack limit = 0xffffff8049a88000)
Call trace:
	sock_rfree+0x38/0x6c
	skb_release_head_state+0x6c/0xcc
	skb_release_all+0x1c/0x38
	__kfree_skb+0x1c/0x30
	kfree_skb+0xd0/0xf4
	pfkey_broadcast+0x14c/0x18c
	pfkey_sendmsg+0x1d8/0x408
	sock_sendmsg+0x44/0x60
	___sys_sendmsg+0x1d0/0x2a8
	__sys_sendmsg+0x64/0xb4
	SyS_sendmsg+0x34/0x4c
	el0_svc_naked+0x34/0x38
Kernel panic - not syncing: Fatal exception

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-02-12 10:36:42 +01:00
Tuong Lien 91986ee166 tipc: fix link session and re-establish issues
When a link endpoint is re-created (e.g. after a node reboot or
interface reset), the link session number is varied by random, the peer
endpoint will be synced with this new session number before the link is
re-established.

However, there is a shortcoming in this mechanism that can lead to the
link never re-established or faced with a failure then. It happens when
the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as
well as the 'in_session' flag have been set, but suddenly this link
endpoint leaves. When it comes back with a random session number, there
are two situations possible:

1/ If the random session number is larger than (or equal to) the
previous one, the peer endpoint will be updated with this new session
upon receipt of a RESET_MSG from this endpoint, and the link can be re-
established as normal. Otherwise, all the RESET_MSGs from this endpoint
will be rejected by the peer. In turn, when this link endpoint receives
one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start
to send STATE_MSGs, but again these messages will be dropped by the
peer due to wrong session.
The peer link endpoint can still become ESTABLISHED after receiving a
traffic message from this endpoint (e.g. a BCAST_PROTOCOL or
NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link
will be forced down sooner or later!

Even in case the random session number is larger than the previous one,
it can be that the ACTIVATE_MSG from the peer arrives first, and this
link endpoint moves quickly to ESTABLISHED without sending out any
RESET_MSG yet. Consequently, the peer link will not be updated with the
new session number, and the same link failure scenario as above will
happen.

2/ Another situation can be that, the peer link endpoint was reset due
to any reasons in the meantime, its link state was set to RESET from
ESTABLISHING but still in session, i.e. the 'in_session' flag is not
reset...
Now, if the random session number from this endpoint is less than the
previous one, all the RESET_MSGs from this endpoint will be rejected by
the peer. In the other direction, when this link endpoint receives a
RESET_MSG from the peer, it moves to ESTABLISHING and starts to send
ACTIVATE_MSGs, but all these messages will be rejected by the peer too.
As a result, the link cannot be re-established but gets stuck with this
link endpoint in state ESTABLISHING and the peer in RESET!

Solution:

===========

This link endpoint should not go directly to ESTABLISHED when getting
ACTIVATE_MSG from the peer which may belong to the old session if the
link was re-created. To ensure the session to be correct before the
link is re-established, the peer endpoint in ESTABLISHING state will
send back the last session number in ACTIVATE_MSG for a verification at
this endpoint. Then, if needed, a new and more appropriate session
number will be regenerated to force a re-synch first.

In addition, when a link in ESTABLISHING state is reset, its state will
move to RESET according to the link FSM, along with resetting the
'in_session' flag (and the other data) as a normal link reset, it will
also be deleted if requested.

The solution is backward compatible.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11 21:26:20 -08:00
Zhiqiang Liu e75913c93f net: fix IPv6 prefix route residue
Follow those steps:
 # ip addr add 2001:123::1/32 dev eth0
 # ip addr add 2001:123:456::2/64 dev eth0
 # ip addr del 2001:123::1/32 dev eth0
 # ip addr del 2001:123:456::2/64 dev eth0
and then prefix route of 2001:123::1/32 will still exist.

This is because ipv6_prefix_equal in check_cleanup_prefix_route
func does not check whether two IPv6 addresses have the same
prefix length. If the prefix of one address starts with another
shorter address prefix, even though their prefix lengths are
different, the return value of ipv6_prefix_equal is true.

Here I add a check of whether two addresses have the same prefix
to decide whether their prefixes are equal.

Fixes: 5b84efecb7 ("ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE")
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Reported-by: Wenhao Zhang <zhangwenhao8@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11 20:36:50 -08:00
Hoang Le 7384b538d3 tipc: fix skb may be leaky in tipc_link_input
When we free skb at tipc_data_input, we return a 'false' boolean.
Then, skb passed to subcalling tipc_link_input in tipc_link_rcv,

<snip>
1303 int tipc_link_rcv:
...
1354    if (!tipc_data_input(l, skb, l->inputq))
1355        rc |= tipc_link_input(l, skb, l->inputq);
</snip>

Fix it by simple changing to a 'true' boolean when skb is being free-ed.
Then, tipc_link_rcv will bypassed to subcalling tipc_link_input as above
condition.

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <maloy@donjonn.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11 18:35:52 -08:00
Francesco Ruggeri 8d29d16d21 netfilter: compat: initialize all fields in xt_init
If a non zero value happens to be in xt[NFPROTO_BRIDGE].cur at init
time, the following panic can be caused by running

% ebtables -t broute -F BROUTING

from a 32-bit user level on a 64-bit kernel. This patch replaces
kmalloc_array with kcalloc when allocating xt.

[  474.680846] BUG: unable to handle kernel paging request at 0000000009600920
[  474.687869] PGD 2037006067 P4D 2037006067 PUD 2038938067 PMD 0
[  474.693838] Oops: 0000 [#1] SMP
[  474.697055] CPU: 9 PID: 4662 Comm: ebtables Kdump: loaded Not tainted 4.19.17-11302235.AroraKernelnext.fc18.x86_64 #1
[  474.707721] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.0 06/28/2013
[  474.714313] RIP: 0010:xt_compat_calc_jump+0x2f/0x63 [x_tables]
[  474.720201] Code: 40 0f b6 ff 55 31 c0 48 6b ff 70 48 03 3d dc 45 00 00 48 89 e5 8b 4f 6c 4c 8b 47 60 ff c9 39 c8 7f 2f 8d 14 08 d1 fa 48 63 fa <41> 39 34 f8 4c 8d 0c fd 00 00 00 00 73 05 8d 42 01 eb e1 76 05 8d
[  474.739023] RSP: 0018:ffffc9000943fc58 EFLAGS: 00010207
[  474.744296] RAX: 0000000000000000 RBX: ffffc90006465000 RCX: 0000000002580249
[  474.751485] RDX: 00000000012c0124 RSI: fffffffff7be17e9 RDI: 00000000012c0124
[  474.758670] RBP: ffffc9000943fc58 R08: 0000000000000000 R09: ffffffff8117cf8f
[  474.765855] R10: ffffc90006477000 R11: 0000000000000000 R12: 0000000000000001
[  474.773048] R13: 0000000000000000 R14: ffffc9000943fcb8 R15: ffffc9000943fcb8
[  474.780234] FS:  0000000000000000(0000) GS:ffff88a03f840000(0063) knlGS:00000000f7ac7700
[  474.788612] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[  474.794632] CR2: 0000000009600920 CR3: 0000002037422006 CR4: 00000000000606e0
[  474.802052] Call Trace:
[  474.804789]  compat_do_replace+0x1fb/0x2a3 [ebtables]
[  474.810105]  compat_do_ebt_set_ctl+0x69/0xe6 [ebtables]
[  474.815605]  ? try_module_get+0x37/0x42
[  474.819716]  compat_nf_setsockopt+0x4f/0x6d
[  474.824172]  compat_ip_setsockopt+0x7e/0x8c
[  474.828641]  compat_raw_setsockopt+0x16/0x3a
[  474.833220]  compat_sock_common_setsockopt+0x1d/0x24
[  474.838458]  __compat_sys_setsockopt+0x17e/0x1b1
[  474.843343]  ? __check_object_size+0x76/0x19a
[  474.847960]  __ia32_compat_sys_socketcall+0x1cb/0x25b
[  474.853276]  do_fast_syscall_32+0xaf/0xf6
[  474.857548]  entry_SYSENTER_compat+0x6b/0x7a

Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-12 00:35:18 +01:00
Eric Dumazet cf657d22ee net/x25: do not hold the cpu too long in x25_new_lci()
Due to quadratic behavior of x25_new_lci(), syzbot was able
to trigger an rcu stall.

Fix this by not blocking BH for the whole duration of
the function, and inserting a reschedule point when possible.

If we care enough, using a bitmap could get rid of the quadratic
behavior.

syzbot report :

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu:    0-...!: (10500 ticks this GP) idle=4fa/1/0x4000000000000002 softirq=283376/283376 fqs=0
rcu:     (t=10501 jiffies g=383105 q=136)
rcu: rcu_preempt kthread starved for 10502 jiffies! g383105 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
rcu: RCU grace-period kthread stack dump:
rcu_preempt     I28928    10      2 0x80000000
Call Trace:
 context_switch kernel/sched/core.c:2844 [inline]
 __schedule+0x817/0x1cc0 kernel/sched/core.c:3485
 schedule+0x92/0x180 kernel/sched/core.c:3529
 schedule_timeout+0x4db/0xfd0 kernel/time/timer.c:1803
 rcu_gp_fqs_loop kernel/rcu/tree.c:1948 [inline]
 rcu_gp_kthread+0x956/0x17a0 kernel/rcu/tree.c:2105
 kthread+0x357/0x430 kernel/kthread.c:246
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
NMI backtrace for cpu 0
CPU: 0 PID: 8759 Comm: syz-executor2 Not tainted 5.0.0-rc4+ #51
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 nmi_cpu_backtrace.cold+0x63/0xa4 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x1be/0x236 lib/nmi_backtrace.c:62
 arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_dump_cpu_stacks+0x183/0x1cf kernel/rcu/tree.c:1211
 print_cpu_stall kernel/rcu/tree.c:1348 [inline]
 check_cpu_stall kernel/rcu/tree.c:1422 [inline]
 rcu_pending kernel/rcu/tree.c:3018 [inline]
 rcu_check_callbacks.cold+0x500/0xa4a kernel/rcu/tree.c:2521
 update_process_times+0x32/0x80 kernel/time/timer.c:1635
 tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161
 tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271
 __run_hrtimer kernel/time/hrtimer.c:1389 [inline]
 __hrtimer_run_queues+0x33e/0xde0 kernel/time/hrtimer.c:1451
 hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline]
 smp_apic_timer_interrupt+0x120/0x570 arch/x86/kernel/apic/apic.c:1060
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
 </IRQ>
RIP: 0010:__read_once_size include/linux/compiler.h:193 [inline]
RIP: 0010:queued_write_lock_slowpath+0x13e/0x290 kernel/locking/qrwlock.c:86
Code: 00 00 fc ff df 4c 8d 2c 01 41 83 c7 03 41 0f b6 45 00 41 38 c7 7c 08 84 c0 0f 85 0c 01 00 00 8b 03 3d 00 01 00 00 74 1a f3 90 <41> 0f b6 55 00 41 38 d7 7c eb 84 d2 74 e7 48 89 df e8 6c 0f 4f 00
RSP: 0018:ffff88805f117bd8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000300 RBX: ffffffff89413ba0 RCX: 1ffffffff1282774
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff89413ba0
RBP: ffff88805f117c70 R08: 1ffffffff1282774 R09: fffffbfff1282775
R10: fffffbfff1282774 R11: ffffffff89413ba3 R12: 00000000000000ff
R13: fffffbfff1282774 R14: 1ffff1100be22f7d R15: 0000000000000003
 queued_write_lock include/asm-generic/qrwlock.h:104 [inline]
 do_raw_write_lock+0x1d6/0x290 kernel/locking/spinlock_debug.c:203
 __raw_write_lock_bh include/linux/rwlock_api_smp.h:204 [inline]
 _raw_write_lock_bh+0x3b/0x50 kernel/locking/spinlock.c:312
 x25_insert_socket+0x21/0xe0 net/x25/af_x25.c:267
 x25_bind+0x273/0x340 net/x25/af_x25.c:705
 __sys_bind+0x23f/0x290 net/socket.c:1505
 __do_sys_bind net/socket.c:1516 [inline]
 __se_sys_bind net/socket.c:1514 [inline]
 __x64_sys_bind+0x73/0xb0 net/socket.c:1514
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457e39
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fafccd0dc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457e39
RDX: 0000000000000012 RSI: 0000000020000240 RDI: 0000000000000004
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fafccd0e6d4
R13: 00000000004bdf8b R14: 00000000004ce4b8 R15: 00000000ffffffff
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 8752 Comm: syz-executor4 Not tainted 5.0.0-rc4+ #51
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__x25_find_socket+0x78/0x120 net/x25/af_x25.c:328
Code: 89 f8 48 c1 e8 03 80 3c 18 00 0f 85 a6 00 00 00 4d 8b 64 24 68 4d 85 e4 74 7f e8 03 97 3d fb 49 83 ec 68 74 74 e8 f8 96 3d fb <49> 8d bc 24 88 04 00 00 48 89 f8 48 c1 e8 03 0f b6 04 18 84 c0 74
RSP: 0018:ffff8880639efc58 EFLAGS: 00000246
RAX: 0000000000040000 RBX: dffffc0000000000 RCX: ffffc9000e677000
RDX: 0000000000040000 RSI: ffffffff863244b8 RDI: ffff88806a764628
RBP: ffff8880639efc80 R08: ffff8880a80d05c0 R09: fffffbfff1282775
R10: fffffbfff1282774 R11: ffffffff89413ba3 R12: ffff88806a7645c0
R13: 0000000000000001 R14: ffff88809f29ac00 R15: 0000000000000000
FS:  00007fe8d0c58700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32823000 CR3: 00000000672eb000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 x25_new_lci net/x25/af_x25.c:357 [inline]
 x25_connect+0x374/0xdf0 net/x25/af_x25.c:786
 __sys_connect+0x266/0x330 net/socket.c:1686
 __do_sys_connect net/socket.c:1697 [inline]
 __se_sys_connect net/socket.c:1694 [inline]
 __x64_sys_connect+0x73/0xb0 net/socket.c:1694
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457e39
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fe8d0c57c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457e39
RDX: 0000000000000012 RSI: 0000000020000200 RDI: 0000000000000004
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe8d0c586d4
R13: 00000000004be378 R14: 00000000004ceb00 R15: 00000000ffffffff

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Cc: linux-x25@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11 13:20:14 -08:00
Jouke Witteveen 989723b00b Documentation: bring operstate documentation up-to-date
Netlink has moved from bitmasks to group numbers long ago.

Signed-off-by: Jouke Witteveen <j.witteveen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11 12:38:51 -08:00
Davidlohr Bueso e451eb5106 xsk: share the mmap_sem for page pinning
Holding mmap_sem exclusively for a gup() is an overkill. Lets
share the lock and replace the gup call for gup_longterm(), as
it is better suited for the lifetime of the pinning.

Fixes: c0c77d8fb7 ("xsk: add user memory registration support sockopt")
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Bjorn Topel <bjorn.topel@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
CC: netdev@vger.kernel.org
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-11 20:51:09 +01:00
Florian Westphal 8303b7e8f0 netfilter: nat: fix spurious connection timeouts
Sander Eikelenboom bisected a NAT related regression down
to the l4proto->manip_pkt indirection removal.

I forgot that ICMP(v6) errors (e.g. PKTTOOBIG) can be set as related
to the existing conntrack entry.

Therefore, when passing the skb to nf_nat_ipv4/6_manip_pkt(), that
ended up calling the wrong l4 manip function, as tuple->dst.protonum
is the original flows l4 protocol (TCP, UDP, etc).

Set the dst protocol field to ICMP(v6), we already have a private copy
of the tuple due to the inversion of src/dst.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Fixes: faec18dbb0 ("netfilter: nat: remove l4proto->manip_pkt")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-11 17:43:17 +01:00
Jann Horn c4c07b4d6f netfilter: nf_nat_snmp_basic: add missing length checks in ASN.1 cbs
The generic ASN.1 decoder infrastructure doesn't guarantee that callbacks
will get as much data as they expect; callbacks have to check the `datalen`
parameter before looking at `data`. Make sure that snmp_version() and
snmp_helper() don't read/write beyond the end of the packet data.

(Also move the assignment to `pdata` down below the check to make it clear
that it isn't necessarily a pointer we can use before the `datalen` check.)

Fixes: cc2d58634e ("netfilter: nf_nat_snmp_basic: use asn1 decoder library")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-11 17:43:17 +01:00
Ilan Peer 6157ca0d6b mac80211: Fix Tx aggregation session tear down with ITXQs
When mac80211 requests the low level driver to stop an ongoing
Tx aggregation, the low level driver is expected to call
ieee80211_stop_tx_ba_cb_irqsafe() to indicate that it is ready
to stop the session. The callback in turn schedules a worker
to complete the session tear down, which in turn also handles
the relevant state for the intermediate Tx queue.

However, as this flow in asynchronous, the intermediate queue
should be stopped and not continue servicing frames, as in
such a case frames that are dequeued would be marked as part
of an aggregation, although the aggregation is already been
stopped.

Fix this by stopping the intermediate Tx queue, before
calling the low level driver to stop the Tx aggregation.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-11 15:50:56 +01:00
Johannes Berg 1fc9b72533 cfg80211: prevent speculation on cfg80211_classify8021d() return
It's possible that the caller of cfg80211_classify8021d() uses the
value to index an array, like mac80211 in ieee80211_downgrade_queue().
Prevent speculation on the return value.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-11 15:50:56 +01:00
Johannes Berg ff1bab1ba1 cfg80211: pmsr: record netlink port ID
Without recording the netlink port ID, we cannot return the
results or complete messages to userspace, nor will we be
able to abort if the socket is closed, so clearly we need
to fill the value.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-11 15:50:56 +01:00
Aviya Erenfeld ea18709a6f nl80211: Fix FTM per burst maximum value
Fix FTM per burst maximum value from 15 to 31
(The maximal bits that represents that number in the frame
is 5 hence a maximal value of 31)

Signed-off-by: Aviya Erenfeld <aviya.erenfeld@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-11 15:50:56 +01:00
Johannes Berg 4926b51bfa mac80211: call drv_ibss_join() on restart
If a driver does any significant activity in its ibss_join method,
then it will very well expect that to be called during restart,
before any stations are added. Do that.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-11 15:50:56 +01:00
Magnus Karlsson e6762c8bcf xsk: add missing smp_rmb() in xsk_mmap
All the setup code in AF_XDP is protected by a mutex with the
exception of the mmap code that cannot use it. To make sure that a
process banging on the mmap call at the same time as another process
is setting up the socket, smp_wmb() calls were added in the umem
registration code and the queue creation code, so that the published
structures that xsk_mmap needs would be consistent. However, the
corresponding smp_rmb() calls were not added to the xsk_mmap
code. This patch adds these calls.

Fixes: 37b076933a ("xsk: add missing write- and data-dependency barrier")
Fixes: c0c77d8fb7 ("xsk: add user memory registration support sockopt")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-10 20:07:17 -08:00
Willem de Bruijn b90efd2258 bpf: only adjust gso_size on bytestream protocols
bpf_skb_change_proto and bpf_skb_adjust_room change skb header length.
For GSO packets they adjust gso_size to maintain the same MTU.

The gso size can only be safely adjusted on bytestream protocols.
Commit d02f51cbcf ("bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat
to deal with gso sctp skbs") excluded SKB_GSO_SCTP.

Since then type SKB_GSO_UDP_L4 has been added, whose contents are one
gso_size unit per datagram. Also exclude these.

Move from a blacklist to a whitelist check to future proof against
additional such new GSO types, e.g., for fraglist based GRO.

Fixes: bec1f6f697 ("udp: generate gso with UDP_SEGMENT")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-02-10 19:57:34 -08:00
Ursula Braun ccc8ca9b90 net/smc: fix byte_order for rx_curs_confirmed
The recent change in the rx_curs_confirmed assignment disregards
byte order, which causes problems on little endian architectures.
This patch fixes it.

Fixes: b8649efad8 ("net/smc: fix sender_free computation") (net-tree)
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 22:33:25 -08:00
Paolo Abeni 225d946426 vsock: cope with memory allocation failure at socket creation time
In the unlikely event that the kmalloc call in vmci_transport_socket_init()
fails, we end-up calling vmci_transport_destruct() with a NULL vmci_trans()
and oopsing.

This change addresses the above explicitly checking for zero vmci_trans()
at destruction time.

Reported-by: Xiumei Mu <xmu@redhat.com>
Fixes: d021c34405 ("VSOCK: Introduce VM Sockets")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 22:32:05 -08:00
Lorenzo Bianconi c09551c6ff net: ipv4: use a dedicated counter for icmp_v4 redirect packets
According to the algorithm described in the comment block at the
beginning of ip_rt_send_redirect, the host should try to send
'ip_rt_redirect_number' ICMP redirect packets with an exponential
backoff and then stop sending them at all assuming that the destination
ignores redirects.
If the device has previously sent some ICMP error packets that are
rate-limited (e.g TTL expired) and continues to receive traffic,
the redirect packets will never be transmitted. This happens since
peer->rate_tokens will be typically greater than 'ip_rt_redirect_number'
and so it will never be reset even if the redirect silence timeout
(ip_rt_redirect_silence) has elapsed without receiving any packet
requiring redirects.

Fix it by using a dedicated counter for the number of ICMP redirect
packets that has been sent by the host

I have not been able to identify a given commit that introduced the
issue since ip_rt_send_redirect implements the same rate-limiting
algorithm from commit 1da177e4c3 ("Linux-2.6.12-rc2")

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 21:50:15 -08:00
Linus Torvalds 27b4ad621e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "This pull request is dedicated to the upcoming snowpocalypse parts 2
  and 3 in the Pacific Northwest:

   1) Drop profiles are broken because some drivers use dev_kfree_skb*
      instead of dev_consume_skb*, from Yang Wei.

   2) Fix IWLWIFI kconfig deps, from Luca Coelho.

   3) Fix percpu maps updating in bpftool, from Paolo Abeni.

   4) Missing station release in batman-adv, from Felix Fietkau.

   5) Fix some networking compat ioctl bugs, from Johannes Berg.

   6) ucc_geth must reset the BQL queue state when stopping the device,
      from Mathias Thore.

   7) Several XDP bug fixes in virtio_net from Toshiaki Makita.

   8) TSO packets must be sent always on queue 0 in stmmac, from Jose
      Abreu.

   9) Fix socket refcounting bug in RDS, from Eric Dumazet.

  10) Handle sparse cpu allocations in bpf selftests, from Martynas
      Pumputis.

  11) Make sure mgmt frames have enough tailroom in mac80211, from Felix
      Feitkau.

  12) Use safe list walking in sctp_sendmsg() asoc list traversal, from
      Greg Kroah-Hartman.

  13) Make DCCP's ccid_hc_[rt]x_parse_options always check for NULL
      ccid, from Eric Dumazet.

  14) Need to reload WoL password into bcmsysport device after deep
      sleeps, from Florian Fainelli.

  15) Remove filter from mask before freeing in cls_flower, from Petr
      Machata.

  16) Missing release and use after free in error paths of s390 qeth
      code, from Julian Wiedmann.

  17) Fix lockdep false positive in dsa code, from Marc Zyngier.

  18) Fix counting of ATU violations in mv88e6xxx, from Andrew Lunn.

  19) Fix EQ firmware assert in qed driver, from Manish Chopra.

  20) Don't default Caivum PTP to Y in kconfig, from Bjorn Helgaas"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  net: dsa: b53: Fix for failure when irq is not defined in dt
  sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
  geneve: should not call rt6_lookup() when ipv6 was disabled
  net: Don't default Cavium PTP driver to 'y'
  net: broadcom: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: via-velocity: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: tehuti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: sun: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: fsl_ucc_hdlc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: fec_mpc52xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: smsc: epic100: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: dscc4: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: tulip: de2104x: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: defxx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net/mlx5e: Don't overwrite pedit action when multiple pedit used
  net/mlx5e: Update hw flows when encap source mac changed
  qed*: Advance drivers version to 8.37.0.20
  qed: Change verbosity for coalescing message.
  qede: Fix system crash on configuring channels.
  qed: Consider TX tcs while deriving the max num_queues for PF.
  ...
2019-02-08 11:21:54 -08:00
Hangbin Liu 173656acca sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
If we disabled IPv6 from the kernel command line (ipv6.disable=1), we should
not call ip6_err_gen_icmpv6_unreach(). This:

  ip link add sit1 type sit local 192.0.2.1 remote 192.0.2.2 ttl 1
  ip link set sit1 up
  ip addr add 198.51.100.1/24 dev sit1
  ping 198.51.100.2

if IPv6 is disabled at boot time, will crash the kernel.

v2: there's no need to use in6_dev_get(), use __in6_dev_get() instead,
    as we only need to check that idev exists and we are under
    rcu_read_lock() (from netif_receive_skb_internal()).

Reported-by: Jianlin Shi <jishi@redhat.com>
Fixes: ca15a078bd ("sit: generate icmpv6 error when receiving icmpv4 error")
Cc: Oussama Ghorbel <ghorbel@pivasoftware.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07 10:48:42 -08:00
Dan Carpenter 00670cb8a7 net: dsa: Fix NULL checking in dsa_slave_set_eee()
This function can't succeed if dp->pl is NULL.  It will Oops inside the
call to return phylink_ethtool_get_eee(dp->pl, e);

Fixes: 1be52e97ed ("dsa: slave: eee: Allow ports to use phylink")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 13:42:54 -08:00
Chuck Lever e248aa7be8 svcrdma: Remove max_sge check at connect time
Two and a half years ago, the client was changed to use gathered
Send for larger inline messages, in commit 655fec6987 ("xprtrdma:
Use gathered Send for large inline messages"). Several fixes were
required because there are a few in-kernel device drivers whose
max_sge is 3, and these were broken by the change.

Apparently my memory is going, because some time later, I submitted
commit 25fd86eca1 ("svcrdma: Don't overrun the SGE array in
svc_rdma_send_ctxt"), and after that, commit f3c1fd0ee2 ("svcrdma:
Reduce max_send_sges"). These too incorrectly assumed in-kernel
device drivers would have more than a few Send SGEs available.

The fix for the server side is not the same. This is because the
fundamental problem on the server is that, whether or not the client
has provisioned a chunk for the RPC reply, the server must squeeze
even the most complex RPC replies into a single RDMA Send. Failing
in the send path because of Send SGE exhaustion should never be an
option.

Therefore, instead of failing when the send path runs out of SGEs,
switch to using a bounce buffer mechanism to handle RPC replies that
are too complex for the device to send directly. That allows us to
remove the max_sge check to enable drivers with small max_sge to
work again.

Reported-by: Don Dutile <ddutile@redhat.com>
Fixes: 25fd86eca1 ("svcrdma: Don't overrun the SGE array in ...")
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-02-06 15:32:34 -05:00
Eric Dumazet 6dce3c20ac rxrpc: bad unlock balance in rxrpc_recvmsg
When either "goto wait_interrupted;" or "goto wait_error;"
paths are taken, socket lock has already been released.

This patch fixes following syzbot splat :

WARNING: bad unlock balance detected!
5.0.0-rc4+ #59 Not tainted
-------------------------------------
syz-executor223/8256 is trying to release lock (sk_lock-AF_RXRPC) at:
[<ffffffff86651353>] rxrpc_recvmsg+0x6d3/0x3099 net/rxrpc/recvmsg.c:598
but there are no more locks to release!

other info that might help us debug this:
1 lock held by syz-executor223/8256:
 #0: 00000000fa9ed0f4 (slock-AF_RXRPC){+...}, at: spin_lock_bh include/linux/spinlock.h:334 [inline]
 #0: 00000000fa9ed0f4 (slock-AF_RXRPC){+...}, at: release_sock+0x20/0x1c0 net/core/sock.c:2798

stack backtrace:
CPU: 1 PID: 8256 Comm: syz-executor223 Not tainted 5.0.0-rc4+ #59
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 print_unlock_imbalance_bug kernel/locking/lockdep.c:3391 [inline]
 print_unlock_imbalance_bug.cold+0x114/0x123 kernel/locking/lockdep.c:3368
 __lock_release kernel/locking/lockdep.c:3601 [inline]
 lock_release+0x67e/0xa00 kernel/locking/lockdep.c:3860
 sock_release_ownership include/net/sock.h:1471 [inline]
 release_sock+0x183/0x1c0 net/core/sock.c:2808
 rxrpc_recvmsg+0x6d3/0x3099 net/rxrpc/recvmsg.c:598
 sock_recvmsg_nosec net/socket.c:794 [inline]
 sock_recvmsg net/socket.c:801 [inline]
 sock_recvmsg+0xd0/0x110 net/socket.c:797
 __sys_recvfrom+0x1ff/0x350 net/socket.c:1845
 __do_sys_recvfrom net/socket.c:1863 [inline]
 __se_sys_recvfrom net/socket.c:1859 [inline]
 __x64_sys_recvfrom+0xe1/0x1a0 net/socket.c:1859
 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x446379
Code: e8 2c b3 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fe5da89fd98 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
RAX: ffffffffffffffda RBX: 00000000006dbc28 RCX: 0000000000446379
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00000000006dbc20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc2c
R13: 0000000000000000 R14: 0000000000000000 R15: 20c49ba5e353f7cf

Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Howells <dhowells@redhat.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 10:54:07 -08:00
Johannes Berg 73350424be cfg80211: pmsr: fix abort locking
When we destroy the interface we already hold the wdev->mtx
while calling cfg80211_pmsr_wdev_down(), which assumes this
isn't true and flushes the worker that takes the lock, thus
leading to a deadlock.

Fix this by refactoring the worker and calling its code in
cfg80211_pmsr_wdev_down() directly.

We still need to flush the work later to make sure it's not
still running and will crash, but it will not do anything.

Fixes: 9bb7e0f24e ("cfg80211: add peer measurement with FTM initiator API")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-06 10:38:00 +01:00
Johannes Berg 0acd99282b cfg80211: pmsr: fix MAC address setting
When we *don't* have a MAC address attribute, we shouldn't
try to use this - this was intended to copy the local MAC
address instead, so fix it.

Fixes: 9bb7e0f24e ("cfg80211: add peer measurement with FTM initiator API")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-06 10:37:42 +01:00
David S. Miller f09bef61f1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Use CONFIG_NF_TABLES_INET from seltests, not NF_TABLES_INET.
   From Naresh Kamboju.

2) Add a test to cover masquerading and redirect case, from Florian
   Westphal.

3) Two packets coming from the same socket may race to set up NAT,
   ending up with different tuples and the packet losing race being
   dropped. Update nf_conntrack_tuple_taken() to exercise clash
   resolution for this case. From Martynas Pumputis and Florian
   Westphal.

4) Unbind anonymous sets from the commit and abort path, this fixes
   a splat due to double set list removal/release in case that the
   transaction needs to be aborted.

5) Do not preserve original output interface for packets that are
   redirected in the output chain when ip6_route_me_harder() is
   called. Otherwise packets end up going not going to the loopback
   device. From Eli Cooper.

6) Fix bogus splat in nft_compat with CONFIG_REFCOUNT_FULL=y, this
   also simplifies the existing logic to deal with the list insertions
   of the xtables extensions. From Florian Westphal.

Diffstat look rather larger than usual because of the new selftest, but
Florian and I consider that having tests soon into the tree is good to
improve coverage. If there's a different policy in this regard, please,
let me know.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-05 11:23:23 -08:00
Florian Westphal 947e492c0f netfilter: nft_compat: don't use refcount_inc on newly allocated entry
When I moved the refcount to refcount_t type I missed the fact that
refcount_inc() will result in use-after-free warning with
CONFIG_REFCOUNT_FULL=y builds.

The correct fix would be to init the reference count to 1 at allocation
time, but, unfortunately we cannot do this, as we can't undo that
in case something else fails later in the batch.

So only solution I see is to special-case the 'new entry' condition
and replace refcount_inc() with a "delayed" refcount_set(1) in this case,
as done here.

The .activate callback can be removed to simplify things, we only
need to make sure that deactivate() decrements/unlinks the entry
from the list at end of transaction phase (commit or abort).

Fixes: 12c44aba66 ("netfilter: nft_compat: use refcnt_t type for nft_xt reference count")
Reported-by: Jordan Glover <Golden_Miller83@protonmail.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-05 14:10:33 +01:00
Eli Cooper 15df03c661 netfilter: ipv6: Don't preserve original oif for loopback address
Commit 508b09046c ("netfilter: ipv6: Preserve link scope traffic
original oif") made ip6_route_me_harder() keep the original oif for
link-local and multicast packets. However, it also affected packets
for the loopback address because it used rt6_need_strict().

REDIRECT rules in the OUTPUT chain rewrite the destination to loopback
address; thus its oif should not be preserved. This commit fixes the bug
that redirected local packets are being dropped. Actually the packet was
not exactly dropped; Instead it was sent out to the original oif rather
than lo. When a packet with daddr ::1 is sent to the router, it is
effectively dropped.

Fixes: 508b09046c ("netfilter: ipv6: Preserve link scope traffic original oif")
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-05 14:10:33 +01:00
Cong Wang f75a2804da xfrm: destroy xfrm_state synchronously on net exit path
xfrm_state_put() moves struct xfrm_state to the GC list
and schedules the GC work to clean it up. On net exit call
path, xfrm_state_flush() is called to clean up and
xfrm_flush_gc() is called to wait for the GC work to complete
before exit.

However, this doesn't work because one of the ->destructor(),
ipcomp_destroy(), schedules the same GC work again inside
the GC work. It is hard to wait for such a nested async
callback. This is also why syzbot still reports the following
warning:

 WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351
 ...
  ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
  cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551
  process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
  worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
  kthread+0x357/0x430 kernel/kthread.c:246
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

In fact, it is perfectly fine to bypass GC and destroy xfrm_state
synchronously on net exit call path, because it is in process context
and doesn't need a work struct to do any blocking work.

This patch introduces xfrm_state_put_sync() which simply bypasses
GC, and lets its callers to decide whether to use this synchronous
version. On net exit path, xfrm_state_fini() and
xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is
blocking, it can use xfrm_state_put_sync() directly too.

Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to
reflect this change.

Fixes: b48c05ab5d ("xfrm: Fix warning in xfrm6_tunnel_net_exit.")
Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-02-05 06:29:20 +01:00
Marc Zyngier c8101f7729 net: dsa: Fix lockdep false positive splat
Creating a macvtap on a DSA-backed interface results in the following
splat when lockdep is enabled:

[   19.638080] IPv6: ADDRCONF(NETDEV_CHANGE): lan0: link becomes ready
[   23.041198] device lan0 entered promiscuous mode
[   23.043445] device eth0 entered promiscuous mode
[   23.049255]
[   23.049557] ============================================
[   23.055021] WARNING: possible recursive locking detected
[   23.060490] 5.0.0-rc3-00013-g56c857a1b8d3 #118 Not tainted
[   23.066132] --------------------------------------------
[   23.071598] ip/2861 is trying to acquire lock:
[   23.076171] 00000000f61990cb (_xmit_ETHER){+...}, at: dev_set_rx_mode+0x1c/0x38
[   23.083693]
[   23.083693] but task is already holding lock:
[   23.089696] 00000000ecf0c3b4 (_xmit_ETHER){+...}, at: dev_uc_add+0x24/0x70
[   23.096774]
[   23.096774] other info that might help us debug this:
[   23.103494]  Possible unsafe locking scenario:
[   23.103494]
[   23.109584]        CPU0
[   23.112093]        ----
[   23.114601]   lock(_xmit_ETHER);
[   23.117917]   lock(_xmit_ETHER);
[   23.121233]
[   23.121233]  *** DEADLOCK ***
[   23.121233]
[   23.127325]  May be due to missing lock nesting notation
[   23.127325]
[   23.134315] 2 locks held by ip/2861:
[   23.137987]  #0: 000000003b766c72 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x338/0x4e0
[   23.146231]  #1: 00000000ecf0c3b4 (_xmit_ETHER){+...}, at: dev_uc_add+0x24/0x70
[   23.153757]
[   23.153757] stack backtrace:
[   23.158243] CPU: 0 PID: 2861 Comm: ip Not tainted 5.0.0-rc3-00013-g56c857a1b8d3 #118
[   23.166212] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
[   23.172843] Call trace:
[   23.175358]  dump_backtrace+0x0/0x188
[   23.179116]  show_stack+0x14/0x20
[   23.182524]  dump_stack+0xb4/0xec
[   23.185928]  __lock_acquire+0x123c/0x1860
[   23.190048]  lock_acquire+0xc8/0x248
[   23.193724]  _raw_spin_lock_bh+0x40/0x58
[   23.197755]  dev_set_rx_mode+0x1c/0x38
[   23.201607]  dev_set_promiscuity+0x3c/0x50
[   23.205820]  dsa_slave_change_rx_flags+0x5c/0x70
[   23.210567]  __dev_set_promiscuity+0x148/0x1e0
[   23.215136]  __dev_set_rx_mode+0x74/0x98
[   23.219167]  dev_uc_add+0x54/0x70
[   23.222575]  macvlan_open+0x170/0x1d0
[   23.226336]  __dev_open+0xe0/0x160
[   23.229830]  __dev_change_flags+0x16c/0x1b8
[   23.234132]  dev_change_flags+0x20/0x60
[   23.238074]  do_setlink+0x2d0/0xc50
[   23.241658]  __rtnl_newlink+0x5f8/0x6e8
[   23.245601]  rtnl_newlink+0x50/0x78
[   23.249184]  rtnetlink_rcv_msg+0x360/0x4e0
[   23.253397]  netlink_rcv_skb+0xe8/0x130
[   23.257338]  rtnetlink_rcv+0x14/0x20
[   23.261012]  netlink_unicast+0x190/0x210
[   23.265043]  netlink_sendmsg+0x288/0x350
[   23.269075]  sock_sendmsg+0x18/0x30
[   23.272659]  ___sys_sendmsg+0x29c/0x2c8
[   23.276602]  __sys_sendmsg+0x60/0xb8
[   23.280276]  __arm64_sys_sendmsg+0x1c/0x28
[   23.284488]  el0_svc_common+0xd8/0x138
[   23.288340]  el0_svc_handler+0x24/0x80
[   23.292192]  el0_svc+0x8/0xc

This looks fairly harmless (no actual deadlock occurs), and is
fixed in a similar way to c6894dec8e ("bridge: fix lockdep
addr_list_lock false positive splat") by putting the addr_list_lock
in its own lockdep class.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 18:30:43 -08:00
Rundong Ge 17ab4f61b8 net: dsa: slave: Don't propagate flag changes on down slave interfaces
The unbalance of master's promiscuity or allmulti will happen after ifdown
and ifup a slave interface which is in a bridge.

When we ifdown a slave interface , both the 'dsa_slave_close' and
'dsa_slave_change_rx_flags' will clear the master's flags. The flags
of master will be decrease twice.
In the other hand, if we ifup the slave interface again, since the
slave's flags were cleared the 'dsa_slave_open' won't set the master's
flag, only 'dsa_slave_change_rx_flags' that triggered by 'br_add_if'
will set the master's flags. The flags of master is increase once.

Only propagating flag changes when a slave interface is up makes
sure this does not happen. The 'vlan_dev_change_rx_flags' had the
same problem and was fixed, and changes here follows that fix.

Fixes: 91da11f870 ("net: Distributed Switch Architecture protocol support")
Signed-off-by: Rundong Ge <rdong.ge@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 18:29:35 -08:00
Petr Machata c1f7e02979 net: cls_flower: Remove filter from mask before freeing it
In fl_change(), when adding a new rule (i.e. fold == NULL), a driver may
reject the new rule, for example due to resource exhaustion. By that
point, the new rule was already assigned a mask, and it was added to
that mask's hash table. The clean-up path that's invoked as a result of
the rejection however neglects to undo the hash table addition, and
proceeds to free the new rule, thus leaving a dangling pointer in the
hash table.

Fix by removing fnew from the mask's hash table before it is freed.

Fixes: 35cc3cefc4 ("net/sched: cls_flower: Reject duplicated rules also under skip_sw")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 09:19:14 -08:00
Ursula Braun 84b799a292 net/smc: correct state change for peer closing
If some kind of closing is received from the peer while still in
state SMC_INIT, it means the peer has had an active connection and
closed the socket quickly before listen_work finished. This should
not result in a shortcut from state SMC_INIT to state SMC_CLOSED.
This patch adds the socket to the accept queue in state
SMC_APPCLOSEWAIT1. The socket reaches state SMC_CLOSED once being
accepted and closed with smc_release().

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 09:11:19 -08:00
Ursula Braun a5e04318c8 net/smc: delete rkey first before switching to unused
Once RMBs are flagged as unused they are candidates for reuse.
Thus the LLC DELETE RKEY operaton should be made before flagging
the RMB as unused.

Fixes: c7674c001b ("net/smc: unregister rkeys of unused buffer")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 09:11:19 -08:00
Ursula Braun b8649efad8 net/smc: fix sender_free computation
In some scenarios a separate consumer cursor update is necessary.
The decision is made in smc_tx_consumer_cursor_update(). The
sender_free computation could be wrong:

The rx confirmed cursor is always smaller than or equal to the
rx producer cursor. The parameters in the smc_curs_diff() call
have to be exchanged, otherwise sender_free might even be negative.

And if more data arrives local_rx_ctrl.prod might be updated, enabling
a cursor difference between local_rx_ctrl.prod and rx confirmed cursor
larger than the RMB size. This case is not covered by smc_curs_diff().
Thus function smc_curs_diff_large() is introduced here.

If a recvmsg() is processed in parallel, local_tx_ctrl.cons might
change during smc_cdc_msg_send. Make sure rx_curs_confirmed is updated
with the actually sent local_tx_ctrl.cons value.

Fixes: e82f2e31f5 ("net/smc: optimize consumer cursor updates")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 09:11:19 -08:00
Ursula Braun ad6f317f72 net/smc: preallocated memory for rdma work requests
The work requests for rdma writes are built in local variables within
function smc_tx_rdma_write(). This violates the rule that the work
request storage has to stay till the work request is confirmed by
a completion queue response.
This patch introduces preallocated memory for these work requests.
The storage is allocated, once a link (and thus a queue pair) is
established.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 09:11:19 -08:00
Pablo Neira Ayuso f6ac858589 netfilter: nf_tables: unbind set in rule from commit path
Anonymous sets that are bound to rules from the same transaction trigger
a kernel splat from the abort path due to double set list removal and
double free.

This patch updates the logic to search for the transaction that is
responsible for creating the set and disable the set list removal and
release, given the rule is now responsible for this. Lookup is reverse
since the transaction that adds the set is likely to be at the tail of
the list.

Moreover, this patch adds the unbind step to deliver the event from the
commit path.  This should not be done from the worker thread, since we
have no guarantees of in-order delivery to the listener.

This patch removes the assumption that both activate and deactivate
callbacks need to be provided.

Fixes: cd5125d8f5 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase")
Reported-by: Mikhail Morfikov <mmorfikov@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04 17:29:17 +01:00
Martynas Pumputis 4e35c1cb94 netfilter: nf_nat: skip nat clash resolution for same-origin entries
It is possible that two concurrent packets originating from the same
socket of a connection-less protocol (e.g. UDP) can end up having
different IP_CT_DIR_REPLY tuples which results in one of the packets
being dropped.

To illustrate this, consider the following simplified scenario:

1. Packet A and B are sent at the same time from two different threads
   by same UDP socket.  No matching conntrack entry exists yet.
   Both packets cause allocation of a new conntrack entry.
2. get_unique_tuple gets called for A.  No clashing entry found.
   conntrack entry for A is added to main conntrack table.
3. get_unique_tuple is called for B and will find that the reply
   tuple of B is already taken by A.
   It will allocate a new UDP source port for B to resolve the clash.
4. conntrack entry for B cannot be added to main conntrack table
   because its ORIGINAL direction is clashing with A and the REPLY
   directions of A and B are not the same anymore due to UDP source
   port reallocation done in step 3.

This patch modifies nf_conntrack_tuple_taken so it doesn't consider
colliding reply tuples if the IP_CT_DIR_ORIGINAL tuples are equal.

[ Florian: simplify patch to not use .allow_clash setting
  and always ignore identical flows ]

Signed-off-by: Martynas Pumputis <martynas@weave.works>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-04 14:26:11 +01:00
Xin Long cfe4bd7a25 sctp: check and update stream->out_curr when allocating stream_out
Now when using stream reconfig to add out streams, stream->out
will get re-allocated, and all old streams' information will
be copied to the new ones and the old ones will be freed.

So without stream->out_curr updated, next time when trying to
send from stream->out_curr stream, a panic would be caused.

This patch is to check and update stream->out_curr when
allocating stream_out.

v1->v2:
  - define fa_index() to get elem index from stream->out_curr.
v2->v3:
  - repost with no change.

Fixes: 5bbbbe32a4 ("sctp: introduce stream scheduler foundations")
Reported-by: Ying Xu <yinxu@redhat.com>
Reported-by: syzbot+e33a3a138267ca119c7d@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03 14:27:47 -08:00
Stefano Garzarella 85965487ab vsock/virtio: reset connected sockets on device removal
When the virtio transport device disappear, we should reset all
connected sockets in order to inform the users.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03 11:06:25 -08:00
Stefano Garzarella 22b5c0b63f vsock/virtio: fix kernel panic after device hot-unplug
virtio_vsock_remove() invokes the vsock_core_exit() also if there
are opened sockets for the AF_VSOCK protocol family. In this way
the vsock "transport" pointer is set to NULL, triggering the
kernel panic at the first socket activity.

This patch move the vsock_core_init()/vsock_core_exit() in the
virtio_vsock respectively in module_init and module_exit functions,
that cannot be invoked until there are open sockets.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1609699
Reported-by: Yan Fu <yafu@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03 11:06:25 -08:00
David S. Miller e7b816415e Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2019-01-31

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) disable preemption in sender side of socket filters, from Alexei.

2) fix two potential deadlocks in syscall bpf lookup and prog_register,
   from Martin and Alexei.

3) fix BTF to allow typedef on func_proto, from Yonghong.

4) two bpftool fixes, from Jiri and Paolo.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 15:28:07 -08:00
Eric Dumazet 9b1f19d810 dccp: fool proof ccid_hc_[rt]x_parse_options()
Similarly to commit 276bdb82de ("dccp: check ccid before dereferencing")
it is wise to test for a NULL ccid.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3+ #37
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline]
RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233
Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b
kobject: 'loop5' (0000000080f78fc1): kobject_uevent_env
RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000
RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001
RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80
R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0defa33518 CR3: 000000008db5e000 CR4: 00000000001406e0
kobject: 'loop5' (0000000080f78fc1): fill_kobj_path: path = '/devices/virtual/block/loop5'
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 dccp_rcv_state_process+0x2b6/0x1af6 net/dccp/input.c:654
 dccp_v4_do_rcv+0x100/0x190 net/dccp/ipv4.c:688
 sk_backlog_rcv include/net/sock.h:936 [inline]
 __sk_receive_skb+0x3a9/0xea0 net/core/sock.c:473
 dccp_v4_rcv+0x10cb/0x1f80 net/dccp/ipv4.c:880
 ip_protocol_deliver_rcu+0xb6/0xa20 net/ipv4/ip_input.c:208
 ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234
 NF_HOOK include/linux/netfilter.h:289 [inline]
 NF_HOOK include/linux/netfilter.h:283 [inline]
 ip_local_deliver+0x1f0/0x740 net/ipv4/ip_input.c:255
 dst_input include/net/dst.h:450 [inline]
 ip_rcv_finish+0x1f4/0x2f0 net/ipv4/ip_input.c:414
 NF_HOOK include/linux/netfilter.h:289 [inline]
 NF_HOOK include/linux/netfilter.h:283 [inline]
 ip_rcv+0xed/0x620 net/ipv4/ip_input.c:524
 __netif_receive_skb_one_core+0x160/0x210 net/core/dev.c:4973
 __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5083
 process_backlog+0x206/0x750 net/core/dev.c:5923
 napi_poll net/core/dev.c:6346 [inline]
 net_rx_action+0x76d/0x1930 net/core/dev.c:6412
 __do_softirq+0x30b/0xb11 kernel/softirq.c:292
 run_ksoftirqd kernel/softirq.c:654 [inline]
 run_ksoftirqd+0x8e/0x110 kernel/softirq.c:646
 smpboot_thread_fn+0x6ab/0xa10 kernel/smpboot.c:164
 kthread+0x357/0x430 kernel/kthread.c:246
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Modules linked in:
---[ end trace 58a0ba03bea2c376 ]---
RIP: 0010:ccid_hc_tx_parse_options net/dccp/ccid.h:205 [inline]
RIP: 0010:dccp_parse_options+0x8d9/0x12b0 net/dccp/options.c:233
Code: c5 0f b6 75 b3 80 38 00 0f 85 d6 08 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 b8 4c 8b b8 f8 07 00 00 4c 89 f8 48 c1 e8 03 <80> 3c 08 00 0f 85 95 08 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b
RSP: 0018:ffff8880a94df0b8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8880858ac723 RCX: dffffc0000000000
RDX: 0000000000000100 RSI: 0000000000000007 RDI: 0000000000000001
RBP: ffff8880a94df140 R08: 0000000000000001 R09: ffff888061b83a80
R10: ffffed100c370752 R11: ffff888061b83a97 R12: 0000000000000026
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0defa33518 CR3: 0000000009871000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:49:10 -08:00
Karsten Graul 46ad02229d net/smc: fix use of variable in cleared area
Do not use pend->idx as index for the arrays because its value is
located in the cleared area. Use the existing local variable instead.
Without this fix the wrong area might be cleared.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:45 -08:00
Karsten Graul e5f3aa04db net/smc: use device link provided in qp_context
The device field of the IB event structure does not always point to the
SMC IB device. Load the pointer from the qp_context which is always
provided to smc_ib_qp_event_handler() in the priv field. And for qp
events the affected port is given in the qp structure of the ibevent,
derive it from there.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:45 -08:00
Karsten Graul 2dee25af42 net/smc: call smc_cdc_msg_send() under send_lock
Call smc_cdc_msg_send() under the connection send_lock to make sure all
send operations for one connection are serialized.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:45 -08:00
Karsten Graul 33f3fcc290 net/smc: do not wait under send_lock
smc_cdc_get_free_slot() might wait for free transfer buffers when using
SMC-R. This wait should not be done under the send_lock, which is a
spin_lock. This fixes a cpu loop in parallel threads waiting for the
send_lock.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:45 -08:00
Karsten Graul 51c5aba3b6 net/smc: recvmsg and splice_read should return 0 after shutdown
When a socket was connected and is now shut down for read, return 0 to
indicate end of data in recvmsg and splice_read (like TCP) and do not
return ENOTCONN. This behavior is required by the socket api.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:45 -08:00
Karsten Graul 6889b36da7 net/smc: don't wait for send buffer space when data was already sent
When there is no more send buffer space and at least 1 byte was already
sent then return to user space. The wait is only done when no data was
sent by the sendmsg() call.
This fixes smc_tx_sendmsg() which tried to always send all user data and
started to wait for free send buffer space when needed. During this wait
the user space program was blocked in the sendmsg() call and hence not
able to receive incoming data. When both sides were in such a situation
then the connection stalled forever.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:44 -08:00
Karsten Graul 77f838ace7 net/smc: prevent races between smc_lgr_terminate() and smc_conn_free()
To prevent races between smc_lgr_terminate() and smc_conn_free() add an
extra check of the lgr field before accessing it, and cancel a delayed
free_work when a new smc connection is created.
This fixes the problem that free_work cleared the lgr variable but
smc_lgr_terminate() or smc_conn_free() still access it in parallel.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:44 -08:00
Hans Wippel ca8dc1334a net/smc: allow 16 byte pnetids in netlink policy
Currently, users can only send pnetids with a maximum length of 15 bytes
over the SMC netlink interface although the maximum pnetid length is 16
bytes. This patch changes the SMC netlink policy to accept 16 byte
pnetids.

Signed-off-by: Hans Wippel <hwippel@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:44 -08:00
Ursula Braun 14d22d4d61 net/smc: fix another sizeof to int comparison
Comparing an int to a size, which is unsigned, causes the int to become
unsigned, giving the wrong result. kernel_sendmsg can return a negative
error code.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 14:45:44 -08:00
Greg Kroah-Hartman ba59fb0273 sctp: walk the list of asoc safely
In sctp_sendmesg(), when walking the list of endpoint associations, the
association can be dropped from the list, making the list corrupt.
Properly handle this by using list_for_each_entry_safe()

Fixes: 4910280503 ("sctp: add support for snd flag SCTP_SENDALL process in sendmsg")
Reported-by: Secunia Research <vuln@secunia.com>
Tested-by: Secunia Research <vuln@secunia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 10:41:46 -08:00
David S. Miller 34dbba5d2d Here are some batman-adv bugfixes:
- Avoid WARN to report incorrect configuration, by Sven Eckelmann
 
  - Fix mac header position setting, by Sven Eckelmann
 
  - Fix releasing station statistics, by Felix Fietkau
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlxUKaYWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoZZXEACeYSbn6Z6/B6fUg0pYiDtupnr5
 wTCiomEoJHSh9Ko4xz3NhTdr68b1dkq+rZrANtMf7nGJau+VDSsPdbKG5+efdU0O
 or0F2gzBB1O4YF4zX50IwVz/kV81qsumzosrISDgJ1vbcEwUOf6Tu1BW+5IQjrS7
 rJPYQ5V9OABoPhRd0zjnQq5efxTcouVbKDwDNubHLzjUW6fRKfEH+R8IjhzSq8Wq
 E4KCEKdfFQLkaIqNYRbfaRSqD1diK5xQm1Z1ioZXPGJYLaHyDctVmC9b5vULm1fg
 BGxBOcuniEvoSMLTnfSRxKHkxhioSJjIiHu4yABcCZFI1RCsDAQchTissUjtWirP
 JAnR+/0wRzGKxdNc+rSISwBB0yxKTiSxNvBQRwnFoX3PkKlPLMQOZT45UH6qKFYj
 yBu+2hLiG+BaJksz2R1K/23COmAJO3JktfcvvKVwki+EmXlU8TFDj7HWzM3oZO/F
 b4GXR5+lO1nt92llGSn7y+IhD/VYM8mwPG40ANVPTaaWiMeAUd/0YaVw8xyf+jte
 zedLOYRlQQs1ab92CJVHKXP3IH+IY9m7HNSdFjmR3mAN2aIq7ETR1ZulAwflaYe3
 VI3tMe5kDPihrCkJm7GLIbjGxVJmcwqJFSa+usStsTv2xcSvcK1pxQKEoFjkFZXs
 wpOsiHI1nWRQBxW5SA==
 =M/+V
 -----END PGP SIGNATURE-----

Merge tag 'batadv-net-for-davem-20190201' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here are some batman-adv bugfixes:

 - Avoid WARN to report incorrect configuration, by Sven Eckelmann

 - Fix mac header position setting, by Sven Eckelmann

 - Fix releasing station statistics, by Felix Fietkau
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01 10:19:26 -08:00
Johannes Berg e005bd7dde cfg80211: call disconnect_wk when AP stops
Since we now prevent regulatory restore during STA disconnect
if concurrent AP interfaces are active, we need to reschedule
this check when the AP state changes. This fixes never doing
a restore when an AP is the last interface to stop. Or to put
it another way: we need to re-check after anything we check
here changes.

Cc: stable@vger.kernel.org
Fixes: 113f3aaa81 ("cfg80211: Prevent regulatory restore during STA disconnect in concurrent interfaces")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-01 11:12:50 +01:00
Felix Fietkau 9d0f50b802 mac80211: ensure that mgmt tx skbs have tailroom for encryption
Some drivers use IEEE80211_KEY_FLAG_SW_MGMT_TX to indicate that management
frames need to be software encrypted. Since normal data packets are still
encrypted by the hardware, crypto_tx_tailroom_needed_cnt gets decremented
after key upload to hw. This can lead to passing skbs to ccmp_encrypt_skb,
which don't have the necessary tailroom for software encryption.

Change the code to add tailroom for encrypted management packets, even if
crypto_tx_tailroom_needed_cnt is 0.

Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-02-01 11:08:02 +01:00
Eric Dumazet 6fa19f5637 rds: fix refcount bug in rds_sock_addref
syzbot was able to catch a bug in rds [1]

The issue here is that the socket might be found in a hash table
but that its refcount has already be set to 0 by another cpu.

We need to use refcount_inc_not_zero() to be safe here.

[1]

refcount_t: increment on 0; use-after-free.
WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked lib/refcount.c:153 [inline]
WARNING: CPU: 1 PID: 23129 at lib/refcount.c:153 refcount_inc_checked+0x61/0x70 lib/refcount.c:151
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 23129 Comm: syz-executor3 Not tainted 5.0.0-rc4+ #53
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
 panic+0x2cb/0x65c kernel/panic.c:214
 __warn.cold+0x20/0x48 kernel/panic.c:571
 report_bug+0x263/0x2b0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 fixup_bug arch/x86/kernel/traps.c:173 [inline]
 do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
RIP: 0010:refcount_inc_checked lib/refcount.c:153 [inline]
RIP: 0010:refcount_inc_checked+0x61/0x70 lib/refcount.c:151
Code: 1d 51 63 c8 06 31 ff 89 de e8 eb 1b f2 fd 84 db 75 dd e8 a2 1a f2 fd 48 c7 c7 60 9f 81 88 c6 05 31 63 c8 06 01 e8 af 65 bb fd <0f> 0b eb c1 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 54 49
RSP: 0018:ffff8880a0cbf1e8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90006113000
RDX: 000000000001047d RSI: ffffffff81685776 RDI: 0000000000000005
RBP: ffff8880a0cbf1f8 R08: ffff888097c9e100 R09: ffffed1015ce5021
R10: ffffed1015ce5020 R11: ffff8880ae728107 R12: ffff8880723c20c0
R13: ffff8880723c24b0 R14: dffffc0000000000 R15: ffffed1014197e64
 sock_hold include/net/sock.h:647 [inline]
 rds_sock_addref+0x19/0x20 net/rds/af_rds.c:675
 rds_find_bound+0x97c/0x1080 net/rds/bind.c:82
 rds_recv_incoming+0x3be/0x1430 net/rds/recv.c:362
 rds_loop_xmit+0xf3/0x2a0 net/rds/loop.c:96
 rds_send_xmit+0x1355/0x2a10 net/rds/send.c:355
 rds_sendmsg+0x323c/0x44e0 net/rds/send.c:1368
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg+0xdd/0x130 net/socket.c:631
 __sys_sendto+0x387/0x5f0 net/socket.c:1788
 __do_sys_sendto net/socket.c:1800 [inline]
 __se_sys_sendto net/socket.c:1796 [inline]
 __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1796
 do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458089
Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fc266df8c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 0000000000458089
RDX: 0000000000000000 RSI: 00000000204b3fff RDI: 0000000000000005
RBP: 000000000073bf00 R08: 00000000202b4000 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc266df96d4
R13: 00000000004c56e4 R14: 00000000004d94a8 R15: 00000000ffffffff

Fixes: cc4dfb7f70 ("rds: fix two RCU related problems")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: rds-devel@oss.oracle.com
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-31 09:43:27 -08:00
Jacob Wen 91c524708d l2tp: copy 4 more bytes to linear part if necessary
The size of L2TPv2 header with all optional fields is 14 bytes.
l2tp_udp_recv_core only moves 10 bytes to the linear part of a
skb. This may lead to l2tp_recv_common read data outside of a skb.

This patch make sure that there is at least 14 bytes in the linear
part of a skb to meet the maximum need of l2tp_udp_recv_core and
l2tp_recv_common. The minimum size of both PPP HDLC-like frame and
Ethernet frame is larger than 14 bytes, so we are safe to do so.

Also remove L2TP_HDR_SIZE_NOSEQ, it is unused now.

Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-31 08:58:46 -08:00
Jacob Wen 4522a70db7 l2tp: fix reading optional fields of L2TPv3
Use pskb_may_pull() to make sure the optional fields are in skb linear
parts, so we can safely read them later.

It's easy to reproduce the issue with a net driver that supports paged
skb data. Just create a L2TPv3 over IP tunnel and then generates some
network traffic.
Once reproduced, rx err in /sys/kernel/debug/l2tp/tunnels will increase.

Changes in v4:
1. s/l2tp_v3_pull_opt/l2tp_v3_ensure_opt_in_linear/
2. s/tunnel->version != L2TP_HDR_VER_2/tunnel->version == L2TP_HDR_VER_3/
3. Add 'Fixes' in commit messages.

Changes in v3:
1. To keep consistency, move the code out of l2tp_recv_common.
2. Use "net" instead of "net-next", since this is a bug fix.

Changes in v2:
1. Only fix L2TPv3 to make code simple.
   To fix both L2TPv3 and L2TPv2, we'd better refactor l2tp_recv_common.
   It's complicated to do so.
2. Reloading pointers after pskb_may_pull

Fixes: f7faffa3ff ("l2tp: Add L2TPv3 protocol support")
Fixes: 0d76751fad ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support")
Fixes: a32e0eec70 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6")
Signed-off-by: Jacob Wen <jian.w.wen@oracle.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30 21:44:17 -08:00
Yohei Kanemaru ef489749aa ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation
skb->cb may contain data from previous layers (in an observed case
IPv4 with L3 Master Device). In the observed scenario, the data in
IPCB(skb)->frags was misinterpreted as IP6CB(skb)->frag_max_size,
eventually caused an unexpected IPv6 fragmentation in ip6_fragment()
through ip6_finish_output().

This patch clears IP6CB(skb), which potentially contains garbage data,
on the SRH ip4ip6 encapsulation.

Fixes: 32d99d0b67 ("ipv6: sr: add support for ip4ip6 encapsulation")
Signed-off-by: Yohei Kanemaru <yohei.kanemaru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30 14:06:12 -08:00
Lorenzo Bianconi c706863bc8 net: ip6_gre: always reports o_key to userspace
As Erspan_v4, Erspan_v6 protocol relies on o_key to configure
session id header field. However TUNNEL_KEY bit is cleared in
ip6erspan_tunnel_xmit since ERSPAN protocol does not set the key field
of the external GRE header and so the configured o_key is not reported
to userspace. The issue can be triggered with the following reproducer:

$ip link add ip6erspan1 type ip6erspan local 2000::1 remote 2000::2 \
    key 1 seq erspan_ver 1
$ip link set ip6erspan1 up
ip -d link sh ip6erspan1

ip6erspan1@NONE: <BROADCAST,MULTICAST> mtu 1422 qdisc noop state DOWN mode DEFAULT
    link/ether ba:ff:09:24:c3:0e brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
    ip6erspan remote 2000::2 local 2000::1 encaplimit 4 flowlabel 0x00000 ikey 0.0.0.1 iseq oseq

Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
ip6gre_fill_info

Fixes: 5a963eb61b ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30 14:00:02 -08:00
Lorenzo Bianconi feaf5c796b net: ip_gre: always reports o_key to userspace
Erspan protocol (version 1 and 2) relies on o_key to configure
session id header field. However TUNNEL_KEY bit is cleared in
erspan_xmit since ERSPAN protocol does not set the key field
of the external GRE header and so the configured o_key is not reported
to userspace. The issue can be triggered with the following reproducer:

$ip link add erspan1 type erspan local 192.168.0.1 remote 192.168.0.2 \
    key 1 seq erspan_ver 1
$ip link set erspan1 up
$ip -d link sh erspan1

erspan1@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UNKNOWN mode DEFAULT
  link/ether 52:aa:99:95:9a:b5 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
  erspan remote 192.168.0.2 local 192.168.0.1 ttl inherit ikey 0.0.0.1 iseq oseq erspan_index 0

Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
ipgre_fill_info

Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30 14:00:02 -08:00
Johannes Berg 98406133dd net: socket: make bond ioctls go through compat_ifreq_ioctl()
Same story as before, these use struct ifreq and thus need
to be read with the shorter version to not cause faults.

Cc: stable@vger.kernel.org
Fixes: f92d4fc953 ("kill bond_ioctl()")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30 10:19:31 -08:00
Johannes Berg c6c9fee35d net: socket: fix SIOCGIFNAME in compat
As reported by Robert O'Callahan in
https://bugzilla.kernel.org/show_bug.cgi?id=202273
reverting the previous changes in this area broke
the SIOCGIFNAME ioctl in compat again (I'd previously
fixed it after his previous report of breakage in
https://bugzilla.kernel.org/show_bug.cgi?id=199469).

This is obviously because I fixed SIOCGIFNAME more or
less by accident.

Fix it explicitly now by making it pass through the
restored compat translation code.

Cc: stable@vger.kernel.org
Fixes: 4cf808e7ac ("kill dev_ifname32()")
Reported-by: Robert O'Callahan <robert@ocallahan.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30 10:19:31 -08:00
Johannes Berg 37ac39bddd Revert "kill dev_ifsioc()"
This reverts commit bf4405737f ("kill dev_ifsioc()").

This wasn't really unused as implied by the original commit,
it still handles the copy to/from user differently, and the
commit thus caused issues such as
  https://bugzilla.kernel.org/show_bug.cgi?id=199469
and
  https://bugzilla.kernel.org/show_bug.cgi?id=202273

However, deviating from a strict revert, rename dev_ifsioc()
to compat_ifreq_ioctl() to be clearer as to its purpose and
add a comment.

Cc: stable@vger.kernel.org
Fixes: bf4405737f ("kill dev_ifsioc()")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30 10:19:31 -08:00
Johannes Berg 63ff03ab78 Revert "socket: fix struct ifreq size in compat ioctl"
This reverts commit 1cebf8f143 ("socket: fix struct ifreq
size in compat ioctl"), it's a bugfix for another commit that
I'll revert next.

This is not a 'perfect' revert, I'm keeping some coding style
intact rather than revert to the state with indentation errors.

Cc: stable@vger.kernel.org
Fixes: 1cebf8f143 ("socket: fix struct ifreq size in compat ioctl")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30 10:19:31 -08:00
Josh Elsasser 35edfdc77f net: set default network namespace in init_dummy_netdev()
Assign a default net namespace to netdevs created by init_dummy_netdev().
Fixes a NULL pointer dereference caused by busy-polling a socket bound to
an iwlwifi wireless device, which bumps the per-net BUSYPOLLRXPACKETS stat
if napi_poll() received packets:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
  IP: napi_busy_loop+0xd6/0x200
  Call Trace:
    sock_poll+0x5e/0x80
    do_sys_poll+0x324/0x5a0
    SyS_poll+0x6c/0xf0
    do_syscall_64+0x6b/0x1f0
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Fixes: 7db6b048da ("net: Commonize busy polling code to focus on napi_id instead of socket")
Signed-off-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-29 11:29:55 -08:00
Dave Watson 1023121375 net: tls: Fix deadlock in free_resources tx
If there are outstanding async tx requests (when crypto returns EINPROGRESS),
there is a potential deadlock: the tx work acquires the lock, while we
cancel_delayed_work_sync() while holding the lock.  Drop the lock while waiting
for the work to complete.

Fixes: a42055e8d2 ("Add support for async encryption of records...")
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28 23:07:08 -08:00
Dave Watson 32eb67b93c net: tls: Save iv in tls_rec for async crypto requests
aead_request_set_crypt takes an iv pointer, and we change the iv
soon after setting it.  Some async crypto algorithms don't save the iv,
so we need to save it in the tls_rec for async requests.

Found by hardcoding x64 aesni to use async crypto manager (to test the async
codepath), however I don't think this combination can happen in the wild.
Presumably other hardware offloads will need this fix, but there have been
no user reports.

Fixes: a42055e8d2 ("Add support for async encryption of records...")
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28 23:05:55 -08:00
Jakub Sitnicki 1d79895aef sk_msg: Always cancel strp work before freeing the psock
Despite having stopped the parser, we still need to deinitialize it
by calling strp_done so that it cancels its work. Otherwise the worker
thread can run after we have freed the parser, and attempt to access
its workqueue resulting in a use-after-free:

==================================================================
BUG: KASAN: use-after-free in pwq_activate_delayed_work+0x1b/0x1d0
Read of size 8 at addr ffff888069975240 by task kworker/u2:2/93

CPU: 0 PID: 93 Comm: kworker/u2:2 Not tainted 5.0.0-rc2-00335-g28f9d1a3d4fe-dirty #14
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
Workqueue:            (null) (kstrp)
Call Trace:
 print_address_description+0x6e/0x2b0
 ? pwq_activate_delayed_work+0x1b/0x1d0
 kasan_report+0xfd/0x177
 ? pwq_activate_delayed_work+0x1b/0x1d0
 ? pwq_activate_delayed_work+0x1b/0x1d0
 pwq_activate_delayed_work+0x1b/0x1d0
 ? process_one_work+0x4aa/0x660
 pwq_dec_nr_in_flight+0x9b/0x100
 worker_thread+0x82/0x680
 ? process_one_work+0x660/0x660
 kthread+0x1b9/0x1e0
 ? __kthread_create_on_node+0x250/0x250
 ret_from_fork+0x1f/0x30

Allocated by task 111:
 sk_psock_init+0x3c/0x1b0
 sock_map_link.isra.2+0x103/0x4b0
 sock_map_update_common+0x94/0x270
 sock_map_update_elem+0x145/0x160
 __se_sys_bpf+0x152e/0x1e10
 do_syscall_64+0xb2/0x3e0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 112:
 kfree+0x7f/0x140
 process_one_work+0x40b/0x660
 worker_thread+0x82/0x680
 kthread+0x1b9/0x1e0
 ret_from_fork+0x1f/0x30

The buggy address belongs to the object at ffff888069975180
 which belongs to the cache kmalloc-512 of size 512
The buggy address is located 192 bytes inside of
 512-byte region [ffff888069975180, ffff888069975380)
The buggy address belongs to the page:
page:ffffea0001a65d00 count:1 mapcount:0 mapping:ffff88806d401280 index:0x0 compound_mapcount: 0
flags: 0x4000000000010200(slab|head)
raw: 4000000000010200 dead000000000100 dead000000000200 ffff88806d401280
raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888069975100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff888069975180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888069975200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
 ffff888069975280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888069975300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Reported-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/netdev/CAJPywTLwgXNEZ2dZVoa=udiZmtrWJ0q5SuBW64aYs0Y1khXX3A@mail.gmail.com
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-29 00:05:03 +01:00
David S. Miller ff44a8373c Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/IPVS fixes for net

The following patchset contains Netfilter/IPVS fixes for your net tree:

1) The nftnl mutex is now per-netns, therefore use reference counter
   for matches and targets to deal with concurrent updates from netns.
   Moreover, place extensions in a pernet list. Patches from Florian Westphal.

2) Bail out with EINVAL in case of negative timeouts via setsockopt()
   through ip_vs_set_timeout(), from ZhangXiaoxu.

3) Spurious EINVAL on ebtables 32bit binary with 64bit kernel, also
   from Florian.

4) Reset TCP option header parser in case of fingerprint mismatch,
   otherwise follow up overlapping fingerprint definitions including
   TCP options do not work, from Fernando Fernandez Mancera.

5) Compilation warning in ipt_CLUSTER with CONFIG_PROC_FS unset.
   From Anders Roxell.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28 10:51:51 -08:00
Martin Willi 09db512411 esp: Skip TX bytes accounting when sending from a request socket
On ESP output, sk_wmem_alloc is incremented for the added padding if a
socket is associated to the skb. When replying with TCP SYNACKs over
IPsec, the associated sk is a casted request socket, only. Increasing
sk_wmem_alloc on a request socket results in a write at an arbitrary
struct offset. In the best case, this produces the following WARNING:

WARNING: CPU: 1 PID: 0 at lib/refcount.c:102 esp_output_head+0x2e4/0x308 [esp4]
refcount_t: addition on 0; use-after-free.
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-rc3 #2
Hardware name: Marvell Armada 380/385 (Device Tree)
[...]
[<bf0ff354>] (esp_output_head [esp4]) from [<bf1006a4>] (esp_output+0xb8/0x180 [esp4])
[<bf1006a4>] (esp_output [esp4]) from [<c05dee64>] (xfrm_output_resume+0x558/0x664)
[<c05dee64>] (xfrm_output_resume) from [<c05d07b0>] (xfrm4_output+0x44/0xc4)
[<c05d07b0>] (xfrm4_output) from [<c05956bc>] (tcp_v4_send_synack+0xa8/0xe8)
[<c05956bc>] (tcp_v4_send_synack) from [<c0586ad8>] (tcp_conn_request+0x7f4/0x948)
[<c0586ad8>] (tcp_conn_request) from [<c058c404>] (tcp_rcv_state_process+0x2a0/0xe64)
[<c058c404>] (tcp_rcv_state_process) from [<c05958ac>] (tcp_v4_do_rcv+0xf0/0x1f4)
[<c05958ac>] (tcp_v4_do_rcv) from [<c0598a4c>] (tcp_v4_rcv+0xdb8/0xe20)
[<c0598a4c>] (tcp_v4_rcv) from [<c056eb74>] (ip_protocol_deliver_rcu+0x2c/0x2dc)
[<c056eb74>] (ip_protocol_deliver_rcu) from [<c056ee6c>] (ip_local_deliver_finish+0x48/0x54)
[<c056ee6c>] (ip_local_deliver_finish) from [<c056eecc>] (ip_local_deliver+0x54/0xec)
[<c056eecc>] (ip_local_deliver) from [<c056efac>] (ip_rcv+0x48/0xb8)
[<c056efac>] (ip_rcv) from [<c0519c2c>] (__netif_receive_skb_one_core+0x50/0x6c)
[...]

The issue triggers only when not using TCP syncookies, as for syncookies
no socket is associated.

Fixes: cac2661c53 ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f6a ("esp6: Avoid skb_cow_data whenever possible")
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2019-01-28 11:20:58 +01:00
Anders Roxell 206b8cc514 netfilter: ipt_CLUSTERIP: fix warning unused variable cn
When CONFIG_PROC_FS isn't set the variable cn isn't used.

net/ipv4/netfilter/ipt_CLUSTERIP.c: In function ‘clusterip_net_exit’:
net/ipv4/netfilter/ipt_CLUSTERIP.c:849:24: warning: unused variable ‘cn’ [-Wunused-variable]
  struct clusterip_net *cn = clusterip_pernet(net);
                        ^~

Rework so the variable 'cn' is declared inside "#ifdef CONFIG_PROC_FS".

Fixes: b12f7bad5a ("netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-28 11:09:12 +01:00
Fernando Fernandez Mancera 1a6a0951fc netfilter: nfnetlink_osf: add missing fmatch check
When we check the tcp options of a packet and it doesn't match the current
fingerprint, the tcp packet option pointer must be restored to its initial
value in order to do the proper tcp options check for the next fingerprint.

Here we can see an example.
Assumming the following fingerprint base with two lines:

S10:64:1:60:M*,S,T,N,W6:      Linux:3.0::Linux 3.0
S20:64:1:60:M*,S,T,N,W7:      Linux:4.19:arch:Linux 4.1

Where TCP options are the last field in the OS signature, all of them overlap
except by the last one, ie. 'W6' versus 'W7'.

In case a packet for Linux 4.19 kicks in, the osf finds no matching because the
TCP options pointer is updated after checking for the TCP options in the first
line.

Therefore, reset pointer back to where it should be.

Fixes: 11eeef41d5 ("netfilter: passive OS fingerprint xtables match")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-28 11:09:11 +01:00
Florian Westphal 2035f3ff8e netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present
Unlike ip(6)tables ebtables only counts user-defined chains.

The effect is that a 32bit ebtables binary on a 64bit kernel can do
'ebtables -N FOO' only after adding at least one rule, else the request
fails with -EINVAL.

This is a similar fix as done in
3f1e53abff ("netfilter: ebtables: don't attempt to allocate 0-sized compat array").

Fixes: 7d7d7e0211 ("netfilter: compat: reject huge allocation requests")
Reported-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-28 10:49:43 +01:00
Nir Dotan 146820cc24 ip6mr: Fix notifiers call on mroute_clean_tables()
When the MC route socket is closed, mroute_clean_tables() is called to
cleanup existing routes. Mistakenly notifiers call was put on the cleanup
of the unresolved MC route entries cache.
In a case where the MC socket closes before an unresolved route expires,
the notifier call leads to a crash, caused by the driver trying to
increment a non initialized refcount_t object [1] and then when handling
is done, to decrement it [2]. This was detected by a test recently added in
commit 6d4efada3b ("selftests: forwarding: Add multicast routing test").

Fix that by putting notifiers call on the resolved entries traversal,
instead of on the unresolved entries traversal.

[1]

[  245.748967] refcount_t: increment on 0; use-after-free.
[  245.754829] WARNING: CPU: 3 PID: 3223 at lib/refcount.c:153 refcount_inc_checked+0x2b/0x30
...
[  245.802357] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
[  245.811873] RIP: 0010:refcount_inc_checked+0x2b/0x30
...
[  245.907487] Call Trace:
[  245.910231]  mlxsw_sp_router_fib_event.cold.181+0x42/0x47 [mlxsw_spectrum]
[  245.917913]  notifier_call_chain+0x45/0x7
[  245.922484]  atomic_notifier_call_chain+0x15/0x20
[  245.927729]  call_fib_notifiers+0x15/0x30
[  245.932205]  mroute_clean_tables+0x372/0x3f
[  245.936971]  ip6mr_sk_done+0xb1/0xc0
[  245.940960]  ip6_mroute_setsockopt+0x1da/0x5f0
...

[2]

[  246.128487] refcount_t: underflow; use-after-free.
[  246.133859] WARNING: CPU: 0 PID: 7 at lib/refcount.c:187 refcount_sub_and_test_checked+0x4c/0x60
[  246.183521] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
...
[  246.193062] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fibmr_event_work [mlxsw_spectrum]
[  246.202394] RIP: 0010:refcount_sub_and_test_checked+0x4c/0x60
...
[  246.298889] Call Trace:
[  246.301617]  refcount_dec_and_test_checked+0x11/0x20
[  246.307170]  mlxsw_sp_router_fibmr_event_work.cold.196+0x47/0x78 [mlxsw_spectrum]
[  246.315531]  process_one_work+0x1fa/0x3f0
[  246.320005]  worker_thread+0x2f/0x3e0
[  246.324083]  kthread+0x118/0x130
[  246.327683]  ? wq_update_unbound_numa+0x1b0/0x1b0
[  246.332926]  ? kthread_park+0x80/0x80
[  246.337013]  ret_from_fork+0x1f/0x30

Fixes: 088aa3eec2 ("ip6mr: Support fib notifications")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-27 23:16:07 -08:00
Johannes Berg 50c2936634 decnet: fix DN_IFREQ_SIZE
Digging through the ioctls with Al because of the previous
patches, we found that on 64-bit decnet's dn_dev_ioctl()
is wrong, because struct ifreq::ifr_ifru is actually 24
bytes (not 16 as expected from struct sockaddr) due to the
ifru_map and ifru_settings members.

Clearly, decnet expects the ioctl to be called with a struct
like
  struct ifreq_dn {
    char ifr_name[IFNAMSIZ];
    struct sockaddr_dn ifr_addr;
  };

since it does
  struct ifreq *ifr = ...;
  struct sockaddr_dn *sdn = (struct sockaddr_dn *)&ifr->ifr_addr;

This means that DN_IFREQ_SIZE is too big for what it wants on
64-bit, as it is
  sizeof(struct ifreq) - sizeof(struct sockaddr) +
  sizeof(struct sockaddr_dn)

This assumes that sizeof(struct sockaddr) is the size of ifr_ifru
but that isn't true.

Fix this to use offsetof(struct ifreq, ifr_ifru).

This indeed doesn't really matter much - the result is that we
copy in/out 8 bytes more than we should on 64-bit platforms. In
case the "struct ifreq_dn" lands just on the end of a page though
it might lead to faults.

As far as I can tell, it has been like this forever, so it seems
very likely that nobody cares.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-27 23:11:55 -08:00
Bernard Pidoux b0cf029234 net/rose: fix NULL ax25_cb kernel panic
When an internally generated frame is handled by rose_xmit(),
rose_route_frame() is called:

        if (!rose_route_frame(skb, NULL)) {
                dev_kfree_skb(skb);
                stats->tx_errors++;
                return NETDEV_TX_OK;
        }

We have the same code sequence in Net/Rom where an internally generated
frame is handled by nr_xmit() calling nr_route_frame(skb, NULL).
However, in this function NULL argument is tested while it is not in
rose_route_frame().
Then kernel panic occurs later on when calling ax25cmp() with a NULL
ax25_cb argument as reported many times and recently with syzbot.

We need to test if ax25 is NULL before using it.

Testing:
Built kernel with CONFIG_ROSE=y.

Signed-off-by: Bernard Pidoux <f6bvp@free.fr>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: syzbot+1a2c456a1ea08fa5b5f7@syzkaller.appspotmail.com
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Bernard Pidoux <f6bvp@free.fr>
Cc: linux-hams@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-27 10:40:01 -08:00
Cong Wang 63346650c1 netrom: switch to sock timer API
sk_reset_timer() and sk_stop_timer() properly handle
sock refcnt for timer function. Switching to them
could fix a refcounting bug reported by syzbot.

Reported-and-tested-by: syzbot+defa700d16f1bd1b9a05@syzkaller.appspotmail.com
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-hams@vger.kernel.org
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-27 10:38:04 -08:00
David S. Miller c303a9b297 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2019-01-25

1) Several patches to fix the fallout from the recent
   tree based policy lookup work. From Florian Westphal.

2) Fix VTI for IPCOMP for 'not compressed' IPCOMP packets.
   We need an extra IPIP handler to process these packets
   correctly. From Su Yanjun.

3) Fix validation of template and selector families for
   MODE_ROUTEOPTIMIZATION with ipv4-in-ipv6 packets.
   This can lead to a stack-out-of-bounds because
   flowi4 struct is treated as flowi6 struct.
   Fix from Florian Westphal.

4) Restore the default behaviour of the xfrm set-mark
   in the output path. This was changed accidentally
   when mark setting was extended to the input path.
   From Benedict Wong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-27 10:30:01 -08:00
Linus Torvalds 037222ad3f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Count ttl-dropped frames properly in mac80211, from Bob Copeland.

 2) Integer overflow in ktime handling of bcm can code, from Oliver
    Hartkopp.

 3) Fix RX desc handling wrt. hw checksumming in ravb, from Simon
    Horman.

 4) Various hash key fixes in hv_netvsc, from Haiyang Zhang.

 5) Use after free in ax25, from Eric Dumazet.

 6) Several fixes to the SSN support in SCTP, from Xin Long.

 7) Do not process frames after a NAPI reschedule in ibmveth, from
    Thomas Falcon.

 8) Fix NLA_POLICY_NESTED arguments, from Johannes Berg.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (42 commits)
  qed: Revert error handling changes.
  cfg80211: extend range deviation for DMG
  cfg80211: reg: remove warn_on for a normal case
  mac80211: Add attribute aligned(2) to struct 'action'
  mac80211: don't initiate TDLS connection if station is not associated to AP
  nl80211: fix NLA_POLICY_NESTED() arguments
  ibmveth: Do not process frames after calling napi_reschedule
  net: dev_is_mac_header_xmit() true for ARPHRD_RAWIP
  net: usb: asix: ax88772_bind return error when hw_reset fail
  MAINTAINERS: Update cavium networking drivers
  net/mlx4_core: Fix error handling when initializing CQ bufs in the driver
  net/mlx4_core: Add masking for a few queries on HCA caps
  sctp: set flow sport from saddr only when it's 0
  sctp: set chunk transport correctly when it's a new asoc
  sctp: improve the events for sctp stream adding
  sctp: improve the events for sctp stream reset
  ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel
  ax25: fix possible use-after-free
  sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe
  hv_netvsc: fix typos in code comments
  ...
2019-01-27 08:59:12 -08:00