Commit Graph

871436 Commits

Author SHA1 Message Date
Juergen Gross 0951570685 xen/efi: have a common runtime setup function
Today the EFI runtime functions are setup in architecture specific
code (x86 and arm), with the functions themselves living in drivers/xen
as they are not architecture dependent.

As the setup is exactly the same for arm and x86 move the setup to
drivers/xen, too. This at once removes the need to make the single
functions global visible.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
[boris: "Dropped EXPORT_SYMBOL_GPL(xen_efi_runtime_setup)"]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-10-02 10:31:07 -04:00
Linus Walleij b8cd253a06 gpio: fixes for v5.4-rc2
- fix a bug with emulated open-drain/source where lines' values can no longer
   be changed
 - fix getting nonexclusive gpiods from DT
 - fix an incorrect offset for the level trigger in gpio-eic-sprd
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAl2TIpgACgkQEacuoBRx
 13JSkw//UW+uD/qfzosg0CXI5CpISoMPsy692xy2O9thn6eYsxcW8hAJAsLz2eDx
 Q8ctyysJ6jSoqC0Y8SNkbfm0sBx4QgUb6Ty8pT8cmV6XS2G/iLekuIqJQGHOQtvv
 ha1z50bCi43pfVQmEVlTkKN5iE9d+VVl52aGXCY9ZBlFPQz6G7QUQ77TwkpdRMJY
 YfQMoFZkWDVZc4DvAJYBuou5+RkB+L53u/cyzo5nz1z9SDM1HhyrooidO1oXW+1F
 DYj0xqsBni2OTvxp3q/r9gOPz2lJ5jVZiGj/TWfQaWtNxjputZ9+rfPNrK5D1P05
 O3uib/srIhA0Be3Y5lrJsrdpLtrExF7dBiXBTHqS07Tdjg9aa5OcOi14DQDy+r6u
 3Ux07QQV+3bin+KVsXnWmKlpyeFJldO28j1Gk31AZg4Q6CsphPtdHg1qD9HPXKU9
 H86MGsFTo+Xnf99/Yh+8xf3Zg1b0LpDxTrizF2ztaPyue/P5mbmYf9q03MWtMZO6
 dSUy/bD5tfx09nNe4yWWD+L+K51/QjV10Jv1eMx58I88oXtAFrceMHHmNUIN4EPY
 R8l+H2/1aysczS+kYyTwSzWDsnCfN8xGv1t0yb8HJb+0AUgF+p/rQJLWMh1IH7c+
 IEOr5JbE8M1h1TXk1eYNJMAh4qiorFWgL7bfRqbMOMVKzrNCYAg=
 =mKSK
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v5.4-rc2-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into fixes

gpio: fixes for v5.4-rc2

- fix a bug with emulated open-drain/source where lines' values can no longer
  be changed
- fix getting nonexclusive gpiods from DT
- fix an incorrect offset for the level trigger in gpio-eic-sprd
2019-10-02 16:13:28 +02:00
ZhangXiaoxu 33ea5aaa87 nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request
When xfstests testing, there are some WARNING as below:

WARNING: CPU: 0 PID: 6235 at fs/nfs/inode.c:122 nfs_clear_inode+0x9c/0xd8
Modules linked in:
CPU: 0 PID: 6235 Comm: umount.nfs
Hardware name: linux,dummy-virt (DT)
pstate: 60000005 (nZCv daif -PAN -UAO)
pc : nfs_clear_inode+0x9c/0xd8
lr : nfs_evict_inode+0x60/0x78
sp : fffffc000f68fc00
x29: fffffc000f68fc00 x28: fffffe00c53155c0
x27: fffffe00c5315000 x26: fffffc0009a63748
x25: fffffc000f68fd18 x24: fffffc000bfaaf40
x23: fffffc000936d3c0 x22: fffffe00c4ff5e20
x21: fffffc000bfaaf40 x20: fffffe00c4ff5d10
x19: fffffc000c056000 x18: 000000000000003c
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000040 x14: 0000000000000228
x13: fffffc000c3a2000 x12: 0000000000000045
x11: 0000000000000000 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 0000000000000000 x6 : fffffc00084b027c
x5 : fffffc0009a64000 x4 : fffffe00c0e77400
x3 : fffffc000c0563a8 x2 : fffffffffffffffb
x1 : 000000000000764e x0 : 0000000000000001
Call trace:
 nfs_clear_inode+0x9c/0xd8
 nfs_evict_inode+0x60/0x78
 evict+0x108/0x380
 dispose_list+0x70/0xa0
 evict_inodes+0x194/0x210
 generic_shutdown_super+0xb0/0x220
 nfs_kill_super+0x40/0x88
 deactivate_locked_super+0xb4/0x120
 deactivate_super+0x144/0x160
 cleanup_mnt+0x98/0x148
 __cleanup_mnt+0x38/0x50
 task_work_run+0x114/0x160
 do_notify_resume+0x2f8/0x308
 work_pending+0x8/0x14

The nrequest should be increased/decreased only if PG_INODE_REF flag
was setted.

But in the nfs_inode_remove_request function, it maybe decrease when
no PG_INODE_REF flag, this maybe lead nrequests count error.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-10-02 08:52:17 -04:00
Peng Fan ec066de1a5 arm: xen: mm: use __GPF_DMA32 for arm64
arm64 shares some code under arch/arm/xen, including mm.c.
However ZONE_DMA is removed by commit
ad67f5a6545("arm64: replace ZONE_DMA with ZONE_DMA32").

So add a check if CONFIG_ZONE_DMA32 is enabled use __GFP_DMA32.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2019-10-01 18:50:07 -07:00
Dongli Zhang a761129e36 xen-netfront: do not use ~0U as error return value for xennet_fill_frags()
xennet_fill_frags() uses ~0U as return value when the sk_buff is not able
to cache extra fragments. This is incorrect because the return type of
xennet_fill_frags() is RING_IDX and 0xffffffff is an expected value for
ring buffer index.

In the situation when the rsp_cons is approaching 0xffffffff, the return
value of xennet_fill_frags() may become 0xffffffff which xennet_poll() (the
caller) would regard as error. As a result, queue->rx.rsp_cons is set
incorrectly because it is updated only when there is error. If there is no
error, xennet_poll() would be responsible to update queue->rx.rsp_cons.
Finally, queue->rx.rsp_cons would point to the rx ring buffer entries whose
queue->rx_skbs[i] and queue->grant_rx_ref[i] are already cleared to NULL.
This leads to NULL pointer access in the next iteration to process rx ring
buffer entries.

The symptom is similar to the one fixed in
commit 00b368502d ("xen-netfront: do not assume sk_buff_head list is
empty in error handling").

This patch changes the return type of xennet_fill_frags() to indicate
whether it is successful or failed. The queue->rx.rsp_cons will be
always updated inside this function.

Fixes: ad4f15dc2c ("xen/netfront: don't bug in case of too many frags")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 21:49:51 -04:00
David Ahern a3ce2a21bb ipv6: Handle race in addrconf_dad_work
Rajendra reported a kernel panic when a link was taken down:

[ 6870.263084] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[ 6870.271856] IP: [<ffffffff8efc5764>] __ipv6_ifa_notify+0x154/0x290

<snip>

[ 6870.570501] Call Trace:
[ 6870.573238] [<ffffffff8efc58c6>] ? ipv6_ifa_notify+0x26/0x40
[ 6870.579665] [<ffffffff8efc98ec>] ? addrconf_dad_completed+0x4c/0x2c0
[ 6870.586869] [<ffffffff8efe70c6>] ? ipv6_dev_mc_inc+0x196/0x260
[ 6870.593491] [<ffffffff8efc9c6a>] ? addrconf_dad_work+0x10a/0x430
[ 6870.600305] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
[ 6870.606732] [<ffffffff8ea93a7a>] ? process_one_work+0x18a/0x430
[ 6870.613449] [<ffffffff8ea93d6d>] ? worker_thread+0x4d/0x490
[ 6870.619778] [<ffffffff8ea93d20>] ? process_one_work+0x430/0x430
[ 6870.626495] [<ffffffff8ea99dd9>] ? kthread+0xd9/0xf0
[ 6870.632145] [<ffffffff8f01ade4>] ? __switch_to_asm+0x34/0x70
[ 6870.638573] [<ffffffff8ea99d00>] ? kthread_park+0x60/0x60
[ 6870.644707] [<ffffffff8f01ae77>] ? ret_from_fork+0x57/0x70
[ 6870.650936] Code: 31 c0 31 d2 41 b9 20 00 08 02 b9 09 00 00 0

addrconf_dad_work is kicked to be scheduled when a device is brought
up. There is a race between addrcond_dad_work getting scheduled and
taking the rtnl lock and a process taking the link down (under rtnl).
The latter removes the host route from the inet6_addr as part of
addrconf_ifdown which is run for NETDEV_DOWN. The former attempts
to use the host route in ipv6_ifa_notify. If the down event removes
the host route due to the race to the rtnl, then the BUG listed above
occurs.

This scenario does not occur when the ipv6 address is not kept
(net.ipv6.conf.all.keep_addr_on_down = 0) as addrconf_ifdown sets the
state of the ifp to DEAD. Handle when the addresses are kept by checking
IF_READY which is reset by addrconf_ifdown.

The 'dead' flag for an inet6_addr is set only under rtnl, in
addrconf_ifdown and it means the device is getting removed (or IPv6 is
disabled). The interesting cases for changing the idev flag are
addrconf_notify (NETDEV_UP and NETDEV_CHANGE) and addrconf_ifdown
(reset the flag). The former does not have the idev lock - only rtnl;
the latter has both. Based on that the existing dead + IF_READY check
can be moved to right after the rtnl_lock in addrconf_dad_work.

Fixes: f1705ec197 ("net: ipv6: Make address flushing on ifdown optional")
Reported-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 21:43:41 -04:00
Eric Dumazet 3256a2d6ab tcp: adjust rto_base in retransmits_timed_out()
The cited commit exposed an old retransmits_timed_out() bug
which assumed it could call tcp_model_timeout() with
TCP_RTO_MIN as rto_base for all states.

But flows in SYN_SENT or SYN_RECV state uses a different
RTO base (1 sec instead of 200 ms, unless BPF choses
another value)

This caused a reduction of SYN retransmits from 6 to 4 with
the default /proc/sys/net/ipv4/tcp_syn_retries value.

Fixes: a41e8a88b0 ("tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 21:40:49 -04:00
Dexuan Cui 0d9138ffac vsock: Fix a lockdep warning in __vsock_release()
Lockdep is unhappy if two locks from the same class are held.

Fix the below warning for hyperv and virtio sockets (vmci socket code
doesn't have the issue) by using lock_sock_nested() when __vsock_release()
is called recursively:

============================================
WARNING: possible recursive locking detected
5.3.0+ #1 Not tainted
--------------------------------------------
server/1795 is trying to acquire lock:
ffff8880c5158990 (sk_lock-AF_VSOCK){+.+.}, at: hvs_release+0x10/0x120 [hv_sock]

but task is already holding lock:
ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(sk_lock-AF_VSOCK);
  lock(sk_lock-AF_VSOCK);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by server/1795:
 #0: ffff8880c5d05ff8 (&sb->s_type->i_mutex_key#10){+.+.}, at: __sock_release+0x2d/0xa0
 #1: ffff8880c5158150 (sk_lock-AF_VSOCK){+.+.}, at: __vsock_release+0x2e/0xf0 [vsock]

stack backtrace:
CPU: 5 PID: 1795 Comm: server Not tainted 5.3.0+ #1
Call Trace:
 dump_stack+0x67/0x90
 __lock_acquire.cold.67+0xd2/0x20b
 lock_acquire+0xb5/0x1c0
 lock_sock_nested+0x6d/0x90
 hvs_release+0x10/0x120 [hv_sock]
 __vsock_release+0x24/0xf0 [vsock]
 __vsock_release+0xa0/0xf0 [vsock]
 vsock_release+0x12/0x30 [vsock]
 __sock_release+0x37/0xa0
 sock_close+0x14/0x20
 __fput+0xc1/0x250
 task_work_run+0x98/0xc0
 do_exit+0x344/0xc60
 do_group_exit+0x47/0xb0
 get_signal+0x15c/0xc50
 do_signal+0x30/0x720
 exit_to_usermode_loop+0x50/0xa0
 do_syscall_64+0x24e/0x270
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f4184e85f31

Tested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 21:23:35 -04:00
Johan Hovold 8353da9fa6 hso: fix NULL-deref on tty open
Fix NULL-pointer dereference on tty open due to a failure to handle a
missing interrupt-in endpoint when probing modem ports:

	BUG: kernel NULL pointer dereference, address: 0000000000000006
	...
	RIP: 0010:tiocmget_submit_urb+0x1c/0xe0 [hso]
	...
	Call Trace:
	hso_start_serial_device+0xdc/0x140 [hso]
	hso_serial_open+0x118/0x1b0 [hso]
	tty_open+0xf1/0x490

Fixes: 542f548236 ("tty: Modem functions for the HSO driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 21:21:10 -04:00
Albert Ou 922b0375fc riscv: Fix memblock reservation for device tree blob
This fixes an error with how the FDT blob is reserved in memblock.
An incorrect physical address calculation exposed the FDT header to
unintended corruption, which typically manifested with of_fdt_raw_init()
faulting during late boot after fdt_totalsize() returned a wrong value.
Systems with smaller physical memory sizes more frequently trigger this
issue, as the kernel is more likely to allocate from the DMA32 zone
where bbl places the DTB after the kernel image.

Commit 671f9a3e2e ("RISC-V: Setup initial page tables in two stages")
changed the mapping of the DTB to reside in the fixmap area.
Consequently, early_init_fdt_reserve_self() cannot be used anymore in
setup_bootmem() since it relies on __pa() to derive a physical address,
which does not work with dtb_early_va that is no longer a valid kernel
logical address.

The reserved[0x1] region shows the effect of the pointer underflow
resulting from the __pa(initial_boot_params) offset subtraction:

[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x000000001fe00000 reserved size = 0x0000000000a2e514
[    0.000000]  memory.cnt  = 0x1
[    0.000000]  memory[0x0]     [0x0000000080200000-0x000000009fffffff], 0x000000001fe00000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x2
[    0.000000]  reserved[0x0]   [0x0000000080200000-0x0000000080c2dfeb], 0x0000000000a2dfec bytes flags: 0x0
[    0.000000]  reserved[0x1]   [0xfffffff080100000-0xfffffff080100527], 0x0000000000000528 bytes flags: 0x0

With the fix applied:

[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x000000001fe00000 reserved size = 0x0000000000a2e514
[    0.000000]  memory.cnt  = 0x1
[    0.000000]  memory[0x0]     [0x0000000080200000-0x000000009fffffff], 0x000000001fe00000 bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x2
[    0.000000]  reserved[0x0]   [0x0000000080200000-0x0000000080c2dfeb], 0x0000000000a2dfec bytes flags: 0x0
[    0.000000]  reserved[0x1]   [0x0000000080e00000-0x0000000080e00527], 0x0000000000000528 bytes flags: 0x0

Fixes: 671f9a3e2e ("RISC-V: Setup initial page tables in two stages")
Signed-off-by: Albert Ou <aou@eecs.berkeley.edu>
Tested-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-10-01 13:22:39 -07:00
Palmer Dabbelt 18856604b3 RISC-V: Clear load reservations while restoring hart contexts
This is almost entirely a comment.  The bug is unlikely to manifest on
existing hardware because there is a timeout on load reservations, but
manifests on QEMU because there is no timeout.

Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2019-10-01 13:16:40 -07:00
Jiaxun Yang d345d9cad2
MIPS: cpu-bugs64: Mark inline functions as __always_inline
Commit ac7c3e4ff4 ("compiler: enable CONFIG_OPTIMIZE_INLINING
forcibly") allows compiler to uninline functions marked as 'inline'.
Leading to section mismatch in this case.

Since we're using const variables to pass assembly flags, 'inline's
can't be dropped. So we simply mark them as __always_inline.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: linux-mips@vger.kernel.org
[paul.burton@mips.com:
  - Annotate these functions with __init, even if it only serves to
    inform human readers when the code can be used.
  - Drop the __always_inline from check_daddi() & check_daddiu() which
    don't use arguments as immediates in inline asm.
  - Rewrap the commit message.]
Signed-off-by: Paul Burton <paul.burton@mips.com>
2019-10-01 12:59:59 -07:00
David Hildenbrand 04ccbdc390 xen/balloon: Clear PG_offline in balloon_retrieve()
Let's move the clearing to balloon_retrieve(). In
bp_state increase_reservation(), we now clear the flag a little earlier
than before, however, this should not matter for XEN.

Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-10-01 15:52:53 -04:00
David Hildenbrand 59b52f105f xen/balloon: Mark pages PG_offline in balloon_append()
Let's move the __SetPageOffline() call which all callers perform into
balloon_append().

In bp_state decrease_reservation(), pages are now marked PG_offline a
little later than before, however, this should not matter for XEN.

Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-10-01 15:52:53 -04:00
David Hildenbrand dde3285ffa xen/balloon: Drop __balloon_append()
Let's simply use balloon_append() directly.

Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-10-01 15:52:53 -04:00
David Hildenbrand c5ad81eb02 xen/balloon: Set pages PageOffline() in balloon_add_region()
We are missing a __SetPageOffline(), which is why we can get
!PageOffline() pages onto the balloon list, where
alloc_xenballooned_pages() will complain:

page:ffffea0003e7ffc0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0xffffe00001000(reserved)
raw: 000ffffe00001000 dead000000000100 dead000000000200 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(!PageOffline(page))
------------[ cut here ]------------
kernel BUG at include/linux/page-flags.h:744!
invalid opcode: 0000 [#1] SMP NOPTI

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Fixes: 77c4adf6a6 ("xen/balloon: mark inflated pages PG_offline")
Cc: stable@vger.kernel.org # v5.1+
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2019-10-01 15:52:53 -04:00
Stefano Stabellini ade77a2da8 ARM: xen: unexport HYPERVISOR_platform_op function
HYPERVISOR_platform_op() is an inline function and should not
be exported. Since commit 15bfc2348d ("modpost: check for
static EXPORT_SYMBOL* functions"), this causes a warning:

WARNING: "HYPERVISOR_platform_op" [vmlinux] is a static EXPORT_SYMBOL_GPL

Instead, export the underlying function called by the static inline:
HYPERVISOR_platform_op_raw.

Fixes: 15bfc2348d ("modpost: check for static EXPORT_SYMBOL* functions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
2019-10-01 12:43:12 -07:00
Kees Cook 852c8cbf34 selftests/kselftest/runner.sh: Add 45 second timeout per test
Commit a745f7af3c ("selftests/harness: Add 30 second timeout per
test") solves the problem of kselftest_harness.h-using binary tests
possibly hanging forever. However, scripts and other binaries can still
hang forever. This adds a global timeout to each test script run.

To make this configurable (e.g. as needed in the "rtc" test case),
include a new per-test-directory "settings" file (similar to "config")
that can contain kselftest-specific settings. The first recognized field
is "timeout".

Additionally, this splits the reporting for timeouts into a specific
"TIMEOUT" not-ok (and adds exit code reporting in the remaining case).

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-10-01 13:33:28 -06:00
Cristian Marussi 131b30c94f kselftest: exclude failed TARGETS from runlist
A TARGET which failed to be built/installed should not be included in the
runlist generated inside the run_kselftest.sh script.

Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-10-01 13:28:38 -06:00
Peter Zijlstra 73956fc07d membarrier: Fix RCU locking bug caused by faulty merge
The following commit:

  227a4aadc7 ("sched/membarrier: Fix p->mm->membarrier_state racy load")

got fat fingered by me when merging it with other patches. It meant to move
the RCU section out of the for loop but ended up doing it partially, leaving
a superfluous rcu_read_lock() inside, causing havok.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-tip-commits@vger.kernel.org
Fixes: 227a4aadc7 ("sched/membarrier: Fix p->mm->membarrier_state racy load")
Link: https://lkml.kernel.org/r/20191001085033.GP4519@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-10-01 21:27:50 +02:00
Cristian Marussi 3a24f7f6b6 kselftest: add capability to skip chosen TARGETS
Let the user specify an optional TARGETS skiplist through the new optional
SKIP_TARGETS Makefile variable.

It is easier to skip at will using a reduced and well defined list of
possibly problematic targets with SKIP_TARGETS than to provide a partially
stripped down list of good targets using the usual TARGETS variable.

Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2019-10-01 13:19:04 -06:00
Dexuan Cui 83b50f83a9 Drivers: hv: vmbus: Fix harmless building warnings without CONFIG_PM_SLEEP
If CONFIG_PM_SLEEP is not set, we can comment out these functions to avoid
the below warnings:

drivers/hv/vmbus_drv.c:2208:12: warning: ‘vmbus_bus_resume’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:2128:12: warning: ‘vmbus_bus_suspend’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:937:12: warning: ‘vmbus_resume’ defined but not used [-Wunused-function]
drivers/hv/vmbus_drv.c:918:12: warning: ‘vmbus_suspend’ defined but not used [-Wunused-function]

Fixes: 271b2224d4 ("Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation")
Fixes: f53335e328 ("Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by:  Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-01 14:49:45 -04:00
Dexuan Cui 6a297c90ef HID: hyperv: Use in-place iterator API in the channel callback
Simplify the ring buffer handling with the in-place API.

Also avoid the dynamic allocation and the memory leak in the channel
callback function.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-10-01 14:49:41 -04:00
Nishad Kamdar 6184488a19 x86: Use the correct SPDX License Identifier in headers
Correct the SPDX License Identifier format in a couple of headers.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/697848ff866ade29e78e872525d7a3067642fd37.1555427420.git.nishadkamdar@gmail.com
2019-10-01 20:31:35 +02:00
Josef Bacik 11a19a9087 btrfs: allocate new inode in NOFS context
A user reported a lockdep splat

 ======================================================
 WARNING: possible circular locking dependency detected
 5.2.11-gentoo #2 Not tainted
 ------------------------------------------------------
 kswapd0/711 is trying to acquire lock:
 000000007777a663 (sb_internal){.+.+}, at: start_transaction+0x3a8/0x500

but task is already holding lock:
 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (fs_reclaim){+.+.}:
 kmem_cache_alloc+0x1f/0x1c0
 btrfs_alloc_inode+0x1f/0x260
 alloc_inode+0x16/0xa0
 new_inode+0xe/0xb0
 btrfs_new_inode+0x70/0x610
 btrfs_symlink+0xd0/0x420
 vfs_symlink+0x9c/0x100
 do_symlinkat+0x66/0xe0
 do_syscall_64+0x55/0x1c0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (sb_internal){.+.+}:
 __sb_start_write+0xf6/0x150
 start_transaction+0x3a8/0x500
 btrfs_commit_inode_delayed_inode+0x59/0x110
 btrfs_evict_inode+0x19e/0x4c0
 evict+0xbc/0x1f0
 inode_lru_isolate+0x113/0x190
 __list_lru_walk_one.isra.4+0x5c/0x100
 list_lru_walk_one+0x32/0x50
 prune_icache_sb+0x36/0x80
 super_cache_scan+0x14a/0x1d0
 do_shrink_slab+0x131/0x320
 shrink_node+0xf7/0x380
 balance_pgdat+0x2d5/0x640
 kswapd+0x2ba/0x5e0
 kthread+0x147/0x160
 ret_from_fork+0x24/0x30

other info that might help us debug this:

 Possible unsafe locking scenario:

 CPU0 CPU1
 ---- ----
 lock(fs_reclaim);
 lock(sb_internal);
 lock(fs_reclaim);
 lock(sb_internal);
*** DEADLOCK ***

 3 locks held by kswapd0/711:
 #0: 000000000ba86300 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x0/0x30
 #1: 000000004a5100f8 (shrinker_rwsem){++++}, at: shrink_node+0x9a/0x380
 #2: 00000000f956fa46 (&type->s_umount_key#30){++++}, at: super_cache_scan+0x35/0x1d0

stack backtrace:
 CPU: 7 PID: 711 Comm: kswapd0 Not tainted 5.2.11-gentoo #2
 Hardware name: Dell Inc. Precision Tower 3620/0MWYPT, BIOS 2.4.2 09/29/2017
 Call Trace:
 dump_stack+0x85/0xc7
 print_circular_bug.cold.40+0x1d9/0x235
 __lock_acquire+0x18b1/0x1f00
 lock_acquire+0xa6/0x170
 ? start_transaction+0x3a8/0x500
 __sb_start_write+0xf6/0x150
 ? start_transaction+0x3a8/0x500
 start_transaction+0x3a8/0x500
 btrfs_commit_inode_delayed_inode+0x59/0x110
 btrfs_evict_inode+0x19e/0x4c0
 ? var_wake_function+0x20/0x20
 evict+0xbc/0x1f0
 inode_lru_isolate+0x113/0x190
 ? discard_new_inode+0xc0/0xc0
 __list_lru_walk_one.isra.4+0x5c/0x100
 ? discard_new_inode+0xc0/0xc0
 list_lru_walk_one+0x32/0x50
 prune_icache_sb+0x36/0x80
 super_cache_scan+0x14a/0x1d0
 do_shrink_slab+0x131/0x320
 shrink_node+0xf7/0x380
 balance_pgdat+0x2d5/0x640
 kswapd+0x2ba/0x5e0
 ? __wake_up_common_lock+0x90/0x90
 kthread+0x147/0x160
 ? balance_pgdat+0x640/0x640
 ? __kthread_create_on_node+0x160/0x160
 ret_from_fork+0x24/0x30

This is because btrfs_new_inode() calls new_inode() under the
transaction.  We could probably move the new_inode() outside of this but
for now just wrap it in memalloc_nofs_save().

Reported-by: Zdenek Sojka <zsojka@seznam.cz>
Fixes: 712e36c5f2 ("btrfs: use GFP_KERNEL in btrfs_alloc_inode")
CC: stable@vger.kernel.org # 4.16+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-10-01 20:12:27 +02:00
Linus Walleij cdee3b60af ARM: dts: ux500: Fix up the CPU thermal zone
This fixes up the default ux500 CPU thermal zone:

- Set polling delay to 0 and explain why
- Set passive polling delay to 250
- Remove restrictions from the CPU cooling device,
  we should use all cpufreq steps to cool down if
  needed.

Link: https://lore.kernel.org/r/20191001074628.8122-1-linus.walleij@linaro.org
Fixes: b786a05f6c ("ARM: dts: ux500: Update thermal zone")
Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
2019-10-01 11:01:27 -07:00
Zygo Blaxell 7a54789074 btrfs: fix balance convert to single on 32-bit host CPUs
Currently, the command:

	btrfs balance start -dconvert=single,soft .

on a Raspberry Pi produces the following kernel message:

	BTRFS error (device mmcblk0p2): balance: invalid convert data profile single

This fails because we use is_power_of_2(unsigned long) to validate
the new data profile, the constant for 'single' profile uses bit 48,
and there are only 32 bits in a long on ARM.

Fix by open-coding the check using u64 variables.

Tested by completing the original balance command on several Raspberry
Pis.

Fixes: 818255feec ("btrfs: use common helper instead of open coding a bit test")
CC: stable@vger.kernel.org # 4.20+
Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-10-01 19:37:29 +02:00
Oleksij Rempel 569aad4fcd net: ag71xx: fix mdio subnode support
This patch is syncing driver with actual devicetree documentation:
Documentation/devicetree/bindings/net/qca,ar71xx.txt
|Optional subnodes:
|- mdio : specifies the mdio bus, used as a container for phy nodes
|  according to phy.txt in the same directory

The driver was working with fixed phy without any noticeable issues. This bug
was uncovered by introducing dsa ar9331-switch driver.
Since no one reported this bug until now, I assume no body is using it
and this patch should not brake existing system.

Fixes: d51b6ce441 ("net: ethernet: add ag71xx driver")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:19:19 -07:00
David S. Miller b33210e379 Merge branch 'stmmac-fixes'
Jose Abreu says:

====================
net: stmmac: Fixes for -net

Misc fixes for -net tree. More info in commit logs.

v2 is just a rebase of v1 against -net and we added a new patch (09/09) to
fix RSS feature.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:38 -07:00
Jose Abreu 56627336b6 net: stmmac: xgmac: Fix RSS writing wrong keys
Commit b6b6cc9acd, changed the call to dwxgmac2_rss_write_reg()
passing it the variable cfg->key[i].

As key is an u8 but we write 32 bits at a time we need to cast it into
an u32 so that the correct key values are written. Notice that the for
loop already takes this into account so we don't try to write past the
keys size.

Fixes: b6b6cc9acd ("net: stmmac: selftest: avoid large stack usage")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:38 -07:00
Jose Abreu 3c72d4d330 net: stmmac: xgmac: Fix RSS not writing all Keys to HW
The sizeof(cfg->key) is != ARRAY_SIZE(cfg->key). Fix it. This warning is
triggered when running with cc flag -Wsizeof-array-div.

Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Fixes: 76067459c6 ("net: stmmac: Implement RSS and enable it in XGMAC core")
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:38 -07:00
Jose Abreu 30300d9f91 net: stmmac: xgmac: Disable the Timestamp interrupt by default
We don't use it anyway as XGMAC only supports polling for timestamp (in
current SW implementation). This greatly reduces the system load by
reducing the number of interrupts.

Fixes: 2142754f8b ("net: stmmac: Add MAC related callbacks for XGMAC2")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:37 -07:00
Jose Abreu 3e2bf04fb0 net: stmmac: Do not stop PHY if WoL is enabled
If WoL is enabled we can't really stop the PHY, otherwise we will not
receive the WoL packet. Fix this by telling phylink that only the MAC is
down and only stop the PHY if WoL is not enabled.

Fixes: 74371272f9 ("net: stmmac: Convert to phylink and remove phylib logic")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:37 -07:00
Jose Abreu 14f347334b net: stmmac: Correctly take timestamp for PTPv2
The case for PTPV2_EVENT requires event packets to be captured so add
this setting to the list of enabled captures.

Fixes: 891434b18e ("stmmac: add IEEE PTPv1 and PTPv2 support.")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:37 -07:00
Jose Abreu f79bfda375 net: stmmac: dwmac4: Always update the MAC Hash Filter
We need to always update the MAC Hash Filter so that previous entries
are invalidated.

Found out while running stmmac selftests.

Fixes: b8ef7020d6 ("net: stmmac: add support for hash table size 128/256 in dwmac4")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:37 -07:00
Jose Abreu 432439fef9 net: stmmac: selftests: Always use max DMA size in Jumbo Test
Although some XGMAC setups support frames larger than DMA size, some of
them may not. As we can't know before-hand which ones support let's use
the maximum DMA buffer size in the Jumbo Tests.

User can always reconfigure the MTU to achieve larger frames.

Fixes: 427849e8c3 ("net: stmmac: selftests: Add Jumbo Frame tests")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:37 -07:00
Jose Abreu c11986b9fa net: stmmac: xgmac: Detect Hash Table size dinamically
Since commit b8ef7020d6 ("net: stmmac: add support for hash table size
128/256 in dwmac4"), we can detect the Hash Table dinamically.

Let's implement this feature in XGMAC cores and fix possible setups that
don't support the maximum size for Hash Table.

Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:37 -07:00
Jose Abreu 9a2ae7b396 net: stmmac: xgmac: Not all Unicast addresses may be available
Some setups may not have all Unicast addresses filters available. Let's
check this before trying to setup filters.

Fixes: 0efedbf11f ("net: stmmac: xgmac: Fix XGMAC selftests")
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:12:37 -07:00
Vasundhara Volam 93c2fcb01a devlink: Fix error handling in param and info_get dumpit cb
If any of the param or info_get op returns error, dumpit cb is
skipping to dump remaining params or info_get ops for all the
drivers.

Fix to not return if any of the param/info_get op returns error
as not supported and continue to dump remaining information.

v2: Modify the patch to return error, except for params/info_get
op that return -EOPNOTSUPP as suggested by Andrew Lunn. Also, modify
commit message to reflect the same.

Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:08:30 -07:00
Wen Yang f32eb9d804 net: dsa: rtl8366rb: add missing of_node_put after calling of_get_child_by_name
of_node_put needs to be called when the device node which is got
from of_get_child_by_name finished using.
irq_domain_add_linear() also calls of_node_get() to increase refcount,
so irq_domain will not be affected when it is released.

Fixes: d8652956cf ("net: dsa: realtek-smi: Add Realtek SMI driver")
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:02:56 -07:00
Wen Yang d2c50b1cd9 net: mscc: ocelot: add missing of_node_put after calling of_get_child_by_name
of_node_put needs to be called when the device node which is got
from of_get_child_by_name finished using.
In both cases of success and failure, we need to release 'ports',
so clean up the code using goto.

fixes: a556c76adc ("net: mscc: Add initial Ocelot switch support")
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 10:01:58 -07:00
Vladimir Oltean 83c8c3cf45 net: sched: cbs: Avoid division by zero when calculating the port rate
As explained in the "net: sched: taprio: Avoid division by zero on
invalid link speed" commit, it is legal for the ethtool API to return
zero as a link speed. So guard against it to ensure we don't perform a
division by zero in kernel.

Fixes: e0a7683d30 ("net/sched: cbs: fix port_rate miscalculation")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 09:51:39 -07:00
Vladimir Oltean 9a9251a353 net: sched: taprio: Avoid division by zero on invalid link speed
The check in taprio_set_picos_per_byte is currently not robust enough
and will trigger this division by zero, due to e.g. PHYLINK not setting
kset->base.speed when there is no PHY connected:

[   27.109992] Division by zero in kernel.
[   27.113842] CPU: 1 PID: 198 Comm: tc Not tainted 5.3.0-rc5-01246-gc4006b8c2637-dirty #212
[   27.121974] Hardware name: Freescale LS1021A
[   27.126234] [<c03132e0>] (unwind_backtrace) from [<c030d8b8>] (show_stack+0x10/0x14)
[   27.133938] [<c030d8b8>] (show_stack) from [<c10b21b0>] (dump_stack+0xb0/0xc4)
[   27.141124] [<c10b21b0>] (dump_stack) from [<c10af97c>] (Ldiv0_64+0x8/0x18)
[   27.148052] [<c10af97c>] (Ldiv0_64) from [<c0700260>] (div64_u64+0xcc/0xf0)
[   27.154978] [<c0700260>] (div64_u64) from [<c07002d0>] (div64_s64+0x4c/0x68)
[   27.161993] [<c07002d0>] (div64_s64) from [<c0f3d890>] (taprio_set_picos_per_byte+0xe8/0xf4)
[   27.170388] [<c0f3d890>] (taprio_set_picos_per_byte) from [<c0f3f614>] (taprio_change+0x668/0xcec)
[   27.179302] [<c0f3f614>] (taprio_change) from [<c0f2bc24>] (qdisc_create+0x1fc/0x4f4)
[   27.187091] [<c0f2bc24>] (qdisc_create) from [<c0f2c0c8>] (tc_modify_qdisc+0x1ac/0x6f8)
[   27.195055] [<c0f2c0c8>] (tc_modify_qdisc) from [<c0ee9604>] (rtnetlink_rcv_msg+0x268/0x2dc)
[   27.203449] [<c0ee9604>] (rtnetlink_rcv_msg) from [<c0f4fef0>] (netlink_rcv_skb+0xe0/0x114)
[   27.211756] [<c0f4fef0>] (netlink_rcv_skb) from [<c0f4f6cc>] (netlink_unicast+0x1b4/0x22c)
[   27.219977] [<c0f4f6cc>] (netlink_unicast) from [<c0f4fa84>] (netlink_sendmsg+0x284/0x340)
[   27.228198] [<c0f4fa84>] (netlink_sendmsg) from [<c0eae5fc>] (sock_sendmsg+0x14/0x24)
[   27.235988] [<c0eae5fc>] (sock_sendmsg) from [<c0eaedf8>] (___sys_sendmsg+0x214/0x228)
[   27.243863] [<c0eaedf8>] (___sys_sendmsg) from [<c0eb015c>] (__sys_sendmsg+0x50/0x8c)
[   27.251652] [<c0eb015c>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x54)
[   27.259524] Exception stack(0xe8045fa8 to 0xe8045ff0)
[   27.264546] 5fa0:                   b6f608c8 000000f8 00000003 bed7e2f0 00000000 00000000
[   27.272681] 5fc0: b6f608c8 000000f8 004ce54c 00000128 5d3ce8c7 00000000 00000026 00505c9c
[   27.280812] 5fe0: 00000070 bed7e298 004ddd64 b6dd1e64

Russell King points out that the ethtool API says zero is a valid return
value of __ethtool_get_link_ksettings:

   * If it is enabled then they are read-only; if the link
   * is up they represent the negotiated link mode; if the link is down,
   * the speed is 0, %SPEED_UNKNOWN or the highest enabled speed and
   * @duplex is %DUPLEX_UNKNOWN or the best enabled duplex mode.

  So, it seems that taprio is not following the API... I'd suggest either
  fixing taprio, or getting agreement to change the ethtool API.

The chosen path was to fix taprio.

Fixes: 7b9eba7ba0 ("net/sched: taprio: fix picos_per_byte miscalculation")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 09:44:03 -07:00
Pablo Neira Ayuso 34a4c95abd netfilter: nft_connlimit: disable bh on garbage collection
BH must be disabled when invoking nf_conncount_gc_list() to perform
garbage collection, otherwise deadlock might happen.

  nf_conncount_add+0x1f/0x50 [nf_conncount]
  nft_connlimit_eval+0x4c/0xe0 [nft_connlimit]
  nft_dynset_eval+0xb5/0x100 [nf_tables]
  nft_do_chain+0xea/0x420 [nf_tables]
  ? sch_direct_xmit+0x111/0x360
  ? noqueue_init+0x10/0x10
  ? __qdisc_run+0x84/0x510
  ? tcp_packet+0x655/0x1610 [nf_conntrack]
  ? ip_finish_output2+0x1a7/0x430
  ? tcp_error+0x130/0x150 [nf_conntrack]
  ? nf_conntrack_in+0x1fc/0x4c0 [nf_conntrack]
  nft_do_chain_ipv4+0x66/0x80 [nf_tables]
  nf_hook_slow+0x44/0xc0
  ip_rcv+0xb5/0xd0
  ? ip_rcv_finish_core.isra.19+0x360/0x360
  __netif_receive_skb_one_core+0x52/0x70
  netif_receive_skb_internal+0x34/0xe0
  napi_gro_receive+0xba/0xe0
  e1000_clean_rx_irq+0x1e9/0x420 [e1000e]
  e1000e_poll+0xbe/0x290 [e1000e]
  net_rx_action+0x149/0x3b0
  __do_softirq+0xde/0x2d8
  irq_exit+0xba/0xc0
  do_IRQ+0x85/0xd0
  common_interrupt+0xf/0xf
  </IRQ>
  RIP: 0010:nf_conncount_gc_list+0x3b/0x130 [nf_conncount]

Fixes: 2f971a8f42 ("netfilter: nf_conncount: move all list iterations under spinlock")
Reported-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-01 18:42:15 +02:00
Florian Westphal 895b5c9f20 netfilter: drop bridge nf reset from nf_reset
commit 174e23810c
("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
recycle always drop skb extensions.  The additional skb_ext_del() that is
performed via nf_reset on napi skb recycle is not needed anymore.

Most nf_reset() calls in the stack are there so queued skb won't block
'rmmod nf_conntrack' indefinitely.

This removes the skb_ext_del from nf_reset, and renames it to a more
fitting nf_reset_ct().

In a few selected places, add a call to skb_ext_reset to make sure that
no active extensions remain.

I am submitting this for "net", because we're still early in the release
cycle.  The patch applies to net-next too, but I think the rename causes
needless divergence between those trees.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-10-01 18:42:15 +02:00
Josef Bacik 4203e96894 btrfs: fix incorrect updating of log root tree
We've historically had reports of being unable to mount file systems
because the tree log root couldn't be read.  Usually this is the "parent
transid failure", but could be any of the related errors, including
"fsid mismatch" or "bad tree block", depending on which block got
allocated.

The modification of the individual log root items are serialized on the
per-log root root_mutex.  This means that any modification to the
per-subvol log root_item is completely protected.

However we update the root item in the log root tree outside of the log
root tree log_mutex.  We do this in order to allow multiple subvolumes
to be updated in each log transaction.

This is problematic however because when we are writing the log root
tree out we update the super block with the _current_ log root node
information.  Since these two operations happen independently of each
other, you can end up updating the log root tree in between writing out
the dirty blocks and setting the super block to point at the current
root.

This means we'll point at the new root node that hasn't been written
out, instead of the one we should be pointing at.  Thus whatever garbage
or old block we end up pointing at complains when we mount the file
system later and try to replay the log.

Fix this by copying the log's root item into a local root item copy.
Then once we're safely under the log_root_tree->log_mutex we update the
root item in the log_root_tree.  This way we do not modify the
log_root_tree while we're committing it, fixing the problem.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Chris Mason <clm@fb.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-10-01 18:41:02 +02:00
Filipe Manana c67d970f0e Btrfs: fix memory leak due to concurrent append writes with fiemap
When we have a buffered write that starts at an offset greater than or
equals to the file's size happening concurrently with a full ranged
fiemap, we can end up leaking an extent state structure.

Suppose we have a file with a size of 1Mb, and before the buffered write
and fiemap are performed, it has a single extent state in its io tree
representing the range from 0 to 1Mb, with the EXTENT_DELALLOC bit set.

The following sequence diagram shows how the memory leak happens if a
fiemap a buffered write, starting at offset 1Mb and with a length of
4Kb, are performed concurrently.

          CPU 1                                                  CPU 2

  extent_fiemap()
    --> it's a full ranged fiemap
        range from 0 to LLONG_MAX - 1
        (9223372036854775807)

    --> locks range in the inode's
        io tree
      --> after this we have 2 extent
          states in the io tree:
          --> 1 for range [0, 1Mb[ with
              the bits EXTENT_LOCKED and
              EXTENT_DELALLOC_BITS set
          --> 1 for the range
              [1Mb, LLONG_MAX[ with
              the EXTENT_LOCKED bit set

                                                  --> start buffered write at offset
                                                      1Mb with a length of 4Kb

                                                  btrfs_file_write_iter()

                                                    btrfs_buffered_write()
                                                      --> cached_state is NULL

                                                      lock_and_cleanup_extent_if_need()
                                                        --> returns 0 and does not lock
                                                            range because it starts
                                                            at current i_size / eof

                                                      --> cached_state remains NULL

                                                      btrfs_dirty_pages()
                                                        btrfs_set_extent_delalloc()
                                                          (...)
                                                          __set_extent_bit()

                                                            --> splits extent state for range
                                                                [1Mb, LLONG_MAX[ and now we
                                                                have 2 extent states:

                                                                --> one for the range
                                                                    [1Mb, 1Mb + 4Kb[ with
                                                                    EXTENT_LOCKED set
                                                                --> another one for the range
                                                                    [1Mb + 4Kb, LLONG_MAX[ with
                                                                    EXTENT_LOCKED set as well

                                                            --> sets EXTENT_DELALLOC on the
                                                                extent state for the range
                                                                [1Mb, 1Mb + 4Kb[
                                                            --> caches extent state
                                                                [1Mb, 1Mb + 4Kb[ into
                                                                @cached_state because it has
                                                                the bit EXTENT_LOCKED set

                                                    --> btrfs_buffered_write() ends up
                                                        with a non-NULL cached_state and
                                                        never calls anything to release its
                                                        reference on it, resulting in a
                                                        memory leak

Fix this by calling free_extent_state() on cached_state if the range was
not locked by lock_and_cleanup_extent_if_need().

The same issue can happen if anything else other than fiemap locks a range
that covers eof and beyond.

This could be triggered, sporadically, by test case generic/561 from the
fstests suite, which makes duperemove run concurrently with fsstress, and
duperemove does plenty of calls to fiemap. When CONFIG_BTRFS_DEBUG is set
the leak is reported in dmesg/syslog when removing the btrfs module with
a message like the following:

  [77100.039461] BTRFS: state leak: start 6574080 end 6582271 state 16402 in tree 0 refs 1

Otherwise (CONFIG_BTRFS_DEBUG not set) detectable with kmemleak.

CC: stable@vger.kernel.org # 4.16+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-10-01 18:40:58 +02:00
David S. Miller 9cfc370240 A small list of fixes this time:
* two null pointer dereference fixes
  * a fix for preempt-enabled/BHs-enabled (lockdep) splats
    (that correctly pointed out a bug)
  * a fix for multi-BSSID ordering assumptions
  * a fix for the EDMG support, on-stack chandefs need to
    be initialized properly (now that they're bigger)
  * beacon (head) data from userspace should be validated
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl2Td78ACgkQB8qZga/f
 l8SnIA/9GatOHIbmXb0BE/ojm3FEQlQOfzdO2VgC40Z8oR0qMFIwkWPzbsqy2Qhl
 xjzhh35q6iZunwp49LXRH1kDQn8xqo+RKpYDvrBSPvJW7jQj8l3UUK6tGaPL55RN
 NN5Tk/nWQVun70qPF/JIFeA/S7GpWJuyAj28hVgyukzNfksaYHqoAZQ1yU1otuou
 OmzsrXXzGVO9Xu0DU6U5b6UxcTUHiILLywr0kdE35oUATct7AijrU1E4f94/wmXG
 O4S3BMgLG4Ggxqdn+GPNdHLstEH/z0nyoon3LeautkOSEDgAeZoXNAgRGHSzaLdn
 YsTZ9mD1uKopSlro0obtyQPYswejnJ1dcEhMV6gpNSUqlf8hrzwzvh1ZxvZwZXpZ
 bislxnLcA+t10tkRApYQ0JhpvNm2O2lHlXqWz8tug0szoR/GKawrpPEJXrb/9yxF
 PFVI8TzXA0bLkO6clNV3vWWAf2Hg9My/hmPpbuORWdIw3KbpcMPfoWHkhAspTdpO
 CmpHurDt1u0Oh/8NawrqUTYXKZkGfseoDXQvQCDOfDCfGl8RPrdthzfPJhh8w4rd
 NCJa+WYNbFrvYcwi4FLCdRuO2dQjHLTclmZ/yXcVp8mxG5e8eihEIAfm30pyUxov
 uH29GwctmoA9CBYOAfHsEFWJNeGcpLSa9hzogmBygrPh61eoP14=
 =nhNe
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2019-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A small list of fixes this time:
 * two null pointer dereference fixes
 * a fix for preempt-enabled/BHs-enabled (lockdep) splats
   (that correctly pointed out a bug)
 * a fix for multi-BSSID ordering assumptions
 * a fix for the EDMG support, on-stack chandefs need to
   be initialized properly (now that they're bigger)
 * beacon (head) data from userspace should be validated
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 09:28:56 -07:00
Arnd Bergmann 6de6d18591 ionic: select CONFIG_NET_DEVLINK
When no other driver selects the devlink library code, ionic
produces a link failure:

drivers/net/ethernet/pensando/ionic/ionic_devlink.o: In function `ionic_devlink_alloc':
ionic_devlink.c:(.text+0xd): undefined reference to `devlink_alloc'
drivers/net/ethernet/pensando/ionic/ionic_devlink.o: In function `ionic_devlink_register':
ionic_devlink.c:(.text+0x71): undefined reference to `devlink_register'

Add the same 'select' statement that the other drivers use here.

Fixes: fbfb803153 ("ionic: Add hardware init and device commands")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 09:24:05 -07:00
Adam Zerella c5f75a14a0 docs: networking: Add title caret and missing doc
Resolving a couple of Sphinx documentation warnings
that are generated in the networking section.

- WARNING: document isn't included in any toctree
- WARNING: Title underline too short.

Signed-off-by: Adam Zerella <adam.zerella@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01 09:19:49 -07:00