Commit Graph

662299 Commits

Author SHA1 Message Date
Sowmini Varadhan b21dd4506b rds: tcp: Sequence teardown of listen and acceptor sockets to avoid races
Commit a93d01f577 ("RDS: TCP: avoid bad page reference in
rds_tcp_listen_data_ready") added the function
rds_tcp_listen_sock_def_readable()  to handle the case when a
partially set-up acceptor socket drops into rds_tcp_listen_data_ready().
However, if the listen socket (rtn->rds_tcp_listen_sock) is itself going
through a tear-down via rds_tcp_listen_stop(), the (*ready)() will be
null and we would hit a panic  of the form
  BUG: unable to handle kernel NULL pointer dereference at   (null)
  IP:           (null)
   :
  ? rds_tcp_listen_data_ready+0x59/0xb0 [rds_tcp]
  tcp_data_queue+0x39d/0x5b0
  tcp_rcv_established+0x2e5/0x660
  tcp_v4_do_rcv+0x122/0x220
  tcp_v4_rcv+0x8b7/0x980
    :
In the above case, it is not fatal to encounter a NULL value for
ready- we should just drop the packet and let the flush of the
acceptor thread finish gracefully.

In general, the tear-down sequence for listen() and accept() socket
that is ensured by this commit is:
     rtn->rds_tcp_listen_sock = NULL; /* prevent any new accepts */
     In rds_tcp_listen_stop():
         serialize with, and prevent, further callbacks using lock_sock()
         flush rds_wq
         flush acceptor workq
         sock_release(listen socket)

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 14:09:59 -08:00
Sowmini Varadhan 16c09b1c76 rds: tcp: Reorder initialization sequence in rds_tcp_init to avoid races
Order of initialization in rds_tcp_init needs to be done so
that resources are set up and destroyed in the correct synchronization
sequence with both the data path, as well as netns create/destroy
path. Specifically,

- we must call register_pernet_subsys and get the rds_tcp_netid
  before calling register_netdevice_notifier, otherwise we risk
  the sequence
    1. register_netdevice_notifier sets up netdev notifier callback
    2. rds_tcp_dev_event -> rds_tcp_kill_sock uses netid 0, and finds
       the wrong rtn, resulting in a panic with string that is of the form:

  BUG: unable to handle kernel NULL pointer dereference at 000000000000000d
  IP: rds_tcp_kill_sock+0x3a/0x1d0 [rds_tcp]
         :

- the rds_tcp_incoming_slab kmem_cache must be initialized before the
  datapath starts up. The latter can happen any time after the
  pernet_subsys registration of rds_tcp_net_ops, whose -> init
  function sets up the listen socket. If the rds_tcp_incoming_slab has
  not been set up at that time, a panic of the form below may be
  encountered

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
  IP: kmem_cache_alloc+0x90/0x1c0
     :
  rds_tcp_data_recv+0x1e7/0x370 [rds_tcp]
  tcp_read_sock+0x96/0x1c0
  rds_tcp_recv_path+0x65/0x80 [rds_tcp]
     :

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 14:09:59 -08:00
Sowmini Varadhan 8edc3affc0 rds: tcp: Take explicit refcounts on struct net
It is incorrect for the rds_connection to piggyback on the
sock_net() refcount for the netns because this gives rise to
a chicken-and-egg problem during rds_conn_destroy. Instead explicitly
take a ref on the net, and hold the netns down till the connection
tear-down is complete.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 14:09:59 -08:00
David S. Miller fa4c7fb2ad Merge branch 'sock_hold-misuses'
Eric Dumazet says:

====================
net: fix possible sock_hold() misuses

skb_complete_wifi_ack() and skb_complete_tx_timestamp() currently
call sock_hold() on sockets that might have transitioned their sk_refcnt
to zero already.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 14:06:15 -08:00
Eric Dumazet 9ac25fc063 net: fix socket refcounting in skb_complete_tx_timestamp()
TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt
By the time TX completion happens, sk_refcnt might be already 0.

sock_hold()/sock_put() would then corrupt critical state, like
sk_wmem_alloc and lead to leaks or use after free.

Fixes: 62bccb8cdb ("net-timestamp: Make the clone operation stand-alone from phy timestamping")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 14:06:15 -08:00
Eric Dumazet dd4f10722a net: fix socket refcounting in skb_complete_wifi_ack()
TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt
By the time TX completion happens, sk_refcnt might be already 0.

sock_hold()/sock_put() would then corrupt critical state, like
sk_wmem_alloc.

Fixes: bf7fa551e0 ("mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 14:06:14 -08:00
Linus Torvalds c688f14ccd Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Ingo Molnar:
 "A couple of sched.h splitup related build fixes, plus an objtool fix"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix another GCC jump table detection issue
  drivers/char/nwbutton: Fix build breakage caused by include file reshuffling
  h8300: Fix build breakage caused by header file changes
  avr32: Fix build error caused by include file reshuffling
2017-03-07 14:02:56 -08:00
David Howells 146d8fef9d rxrpc: Call state should be read with READ_ONCE() under some circumstances
The call state may be changed at any time by the data-ready routine in
response to received packets, so if the call state is to be read and acted
upon several times in a function, READ_ONCE() must be used unless the call
state lock is held.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 13:59:06 -08:00
Eric Dumazet 02b2faaf0a tcp: fix various issues for sockets morphing to listen state
Dmitry Vyukov reported a divide by 0 triggered by syzkaller, exploiting
tcp_disconnect() path that was never really considered and/or used
before syzkaller ;)

I was not able to reproduce the bug, but it seems issues here are the
three possible actions that assumed they would never trigger on a
listener.

1) tcp_write_timer_handler
2) tcp_delack_timer_handler
3) MTU reduction

Only IPv6 MTU reduction was properly testing TCP_CLOSE and TCP_LISTEN
 states from tcp_v6_mtu_reduced()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 13:58:33 -08:00
David S. Miller b73d2da8c7 Merge branch 'bnx2x-fixes'
Michal Schmidt says:

====================
bnx2x: PTP crash, VF VLAN fixes

here are fixes for a crash with PTP, a crash in setting of VF multicast
addresses, and non-working VLAN filters configuration from the VF side.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 13:47:16 -08:00
Michal Schmidt e395132594 bnx2x: add missing configuration of VF VLAN filters
Configuring VLANs from the VF side had no effect, because the PF ignored
filters of type VFPF_VLAN_FILTER in the VF-PF message.

Add the missing filter type to configure.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 13:47:15 -08:00
Michal Schmidt 74bcbeb7d7 bnx2x: fix incorrect filter count in an error message
filters->count is the number of filters we were supposed to configure.
There is no reason to increase it by +1 when printing the count in an error
message.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 13:47:15 -08:00
Michal Schmidt 78d5505432 bnx2x: do not rollback VF MAC/VLAN filters we did not configure
On failure to configure a VF MAC/VLAN filter we should not attempt to
rollback filters that we failed to configure with -EEXIST.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 13:47:15 -08:00
Michal Schmidt 83bd9eb8fc bnx2x: fix detection of VLAN filtering feature for VF
VFs are currently missing the VLAN filtering feature, because we were
checking the PF's acquire response before actually performing the acquire.

Fix it by setting the feature flag later when we have the PF response.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 13:47:15 -08:00
Michal Schmidt 22118d861c bnx2x: fix possible overrun of VFPF multicast addresses array
It is too late to check for the limit of the number of VF multicast
addresses after they have already been copied to the req->multicast[]
array, possibly overflowing it.

Do the check before copying.

Also fix the error path to not skip unlocking vf2pf_mutex.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 13:47:15 -08:00
Michal Schmidt 850268d320 bnx2x: lower verbosity of VF stats debug messages
When BNX2X_MSG_IOV is enabled, the driver produces too many VF statistics
messages. Lower the verbosity of the VF stats messages similarly as in
commit 76ca70fabb ("bnx2x: [Debug] change verbosity of some prints").

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 13:47:14 -08:00
Michal Schmidt 466e8bf10a bnx2x: prevent crash when accessing PTP with interface down
It is possible to crash the kernel by accessing a PTP device while its
associated bnx2x interface is down. Before the interface is brought up,
the timecounter is not initialized, so accessing it results in NULL
dereference.

Fix it by checking if the interface is up.

Use -ENETDOWN as the error code when the interface is down.
 -EFAULT in bnx2x_ptp_adjfreq() did not seem right.

Tested using phc_ctl get/set/adj/freq commands.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 13:47:14 -08:00
Blomme, Maarten 239870f2a0 spi_ks8995: regs_size incorrect for some devices
Signed-off-by: Maarten Blomme <Maarten.Blomme@flir.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 13:33:24 -08:00
Blomme, Maarten 4342696df7 spi_ks8995: fix "BUG: key accdaa28 not in .data!"
Signed-off-by: Maarten Blomme <Maarten.Blomme@flir.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 13:33:23 -08:00
Arnd Bergmann 0253f2681f net/mlx5e: add IPV6 dependency
The ethernet support now calls directly into the ipv6 core code, which
fails if IPV6 is a loadable module but mlx5 is built-in:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.o: In function `mlx5e_create_encap_header_ipv6':
en_tc.c:(.text.mlx5e_create_encap_header_ipv6+0x110): undefined reference to `ip6_route_output_flags'

This adds a dependency to ensure that MLX5_CORE_EN can only be built
if we are able link the kernel successfully. The downside is that the
ethernet option can be hidden. Alternatively we could make MLX5_CORE
depend on "IPV6 || !IPV6", which would force MLX5_CORE to be a module
when IPV6 is, including in configurations where we don't use the ethernet
support at all.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 12:25:24 -08:00
Yinghai Lu 3bd7db63a8 PCI/ASPM: Always set link->downstream to avoid NULL dereference on remove
We call pcie_aspm_exit_link_state() when we remove a device.  If the device
is the last PCIe function to be removed below a bridge and the bridge has
an ASPM link_state struct, we disable ASPM on the link.  Disabling ASPM
requires link->downstream (used in pcie_config_aspm_link()).

We previously set link->downstream in pcie_aspm_cap_init(), but only if the
device was not blacklisted.  Removing the blacklisted device caused a NULL
pointer dereference in the pcie_aspm_exit_link_state() ->
pcie_config_aspm_link() path:

  # echo 1 > /sys/bus/pci/devices/0000\:0b\:00.0/remove
  ...
   BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
   IP: pcie_config_aspm_link+0x5d/0x2b0
   Call Trace:
    pcie_aspm_exit_link_state+0x75/0x130
    pci_stop_bus_device+0xa4/0xb0
    pci_stop_and_remove_bus_device_locked+0x1a/0x30
    remove_store+0x50/0x70
    dev_attr_store+0x18/0x30
    sysfs_kf_write+0x44/0x60
    kernfs_fop_write+0x10e/0x190
    __vfs_write+0x28/0x110
    ? rcu_read_lock_sched_held+0x5d/0x80
    ? rcu_sync_lockdep_assert+0x2c/0x60
    ? __sb_start_write+0x173/0x1a0
    ? vfs_write+0xb3/0x180
    vfs_write+0xc4/0x180
    SyS_write+0x49/0xa0
    do_syscall_64+0xa6/0x1c0
    entry_SYSCALL64_slow_path+0x25/0x25
   ---[ end trace bd187ee0267df5d9 ]---

To avoid this, set link->downstream in alloc_pcie_link_state(), so every
pcie_link_state structure has a valid link->downstream pointer.

[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rajat Jain <rajatja@google.com>
CC: stable@vger.kernel.org
2017-03-07 14:23:30 -06:00
Ethan Zhao 0d5370d1d8 PCI: Prevent VPD access for QLogic ISP2722
QLogic ISP2722-based 16/32Gb Fibre Channel to PCIe Adapter has the VPD
access issue too, while read the common pci-sysfs access interface shown as

 /sys/devices/pci0000:00/0000:00:03.2/0000:0b:00.0/vpd

with simple 'cat' could cause system hang and panic:

  Kernel panic - not syncing: An NMI occurred. Depending on your system the reason for the NMI is logged in any one of the following resources:
  1. Integrated Management Log (IML)
  2. OA Syslog
  3. OA Forward Progress Log
  4. iLO Event Log
  CPU: 0 PID: 15070 Comm: udevadm Not tainted 4.1.12
  Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
   0000000000000086 000000007f0cdf51 ffff880c4fa05d58 ffffffff817193de
   ffffffffa00b42d8 0000000000000075 ffff880c4fa05dd8 ffffffff81714072
   0000000000000008 ffff880c4fa05de8 ffff880c4fa05d88 000000007f0cdf51
  Call Trace:
   <NMI>  [<ffffffff817193de>] dump_stack+0x63/0x81
   [<ffffffff81714072>] panic+0xd0/0x20e
   [<ffffffffa00b390d>] hpwdt_pretimeout+0xdd/0xe0 [hpwdt]
   [<ffffffff81021fc9>] ? sched_clock+0x9/0x10
   [<ffffffff8101c101>] nmi_handle+0x91/0x170
   [<ffffffff8101c10c>] ? nmi_handle+0x9c/0x170
   [<ffffffff8101c5fe>] io_check_error+0x1e/0xa0
   [<ffffffff8101c719>] default_do_nmi+0x99/0x140
   [<ffffffff8101c8b4>] do_nmi+0xf4/0x170
   [<ffffffff817232c5>] end_repeat_nmi+0x1a/0x1e
   [<ffffffff815d724b>] ? pci_conf1_read+0xeb/0x120
   [<ffffffff815d724b>] ? pci_conf1_read+0xeb/0x120
   [<ffffffff815d724b>] ? pci_conf1_read+0xeb/0x120
   <<EOE>>  [<ffffffff815db4b3>] raw_pci_read+0x23/0x40
   [<ffffffff815db4fc>] pci_read+0x2c/0x30
   [<ffffffff8136f612>] pci_user_read_config_word+0x72/0x110
   [<ffffffff8136f746>] pci_vpd_pci22_wait+0x96/0x130
   [<ffffffff8136ff9b>] pci_vpd_pci22_read+0xdb/0x1a0
   [<ffffffff8136ea30>] pci_read_vpd+0x20/0x30
   [<ffffffff8137d590>] read_vpd_attr+0x30/0x40
   [<ffffffff8128e037>] sysfs_kf_bin_read+0x47/0x70
   [<ffffffff8128d24e>] kernfs_fop_read+0xae/0x180
   [<ffffffff8120dd97>] __vfs_read+0x37/0x100
   [<ffffffff812ba7e4>] ? security_file_permission+0x84/0xa0
   [<ffffffff8120e366>] ? rw_verify_area+0x56/0xe0
   [<ffffffff8120e476>] vfs_read+0x86/0x140
   [<ffffffff8120f3f5>] SyS_read+0x55/0xd0
   [<ffffffff81720f2e>] system_call_fastpath+0x12/0x71
  Shutting down cpus with NMI
  Kernel Offset: disabled
  drm_kms_helper: panic occurred, switching back to text console

So blacklist the access to its VPD.

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org	# v4.6+
2017-03-07 14:16:57 -06:00
Christian Lamparter a3a4a816b4 dt: emac: document device-tree based phy discovery and setup
This patch adds documentation for a new "phy-handle" property,
"fixed-link" and "mdio" sub-node. These allows the enumeration
of PHYs which are supported by the phy library under drivers/net/phy.

The EMAC ethernet controller in IBM and AMCC 4xx chips is
currently stuck with a few privately defined phy
implementations. It has no support for PHYs which
are supported by the generic phylib.

Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-07 12:15:06 -08:00
Krzysztof Kozlowski f98c7bce57 serial: samsung: Continue to work if DMA request fails
If DMA is not available (even when configured in DeviceTree), the driver
will fail the startup procedure thus making serial console not
available.

For example this causes boot failure on QEMU ARMv7 (Exynos4210, SMDKC210):
    [    1.302575] OF: amba_device_add() failed (-19) for /amba/pdma@12680000
    ...
    [   11.435732] samsung-uart 13800000.serial: DMA request failed
    [   72.963893] samsung-uart 13800000.serial: DMA request failed
    [   73.143361] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000

DMA is not necessary for serial to work, so continue with UART startup
after emitting a warning.

Fixes: 62c37eedb7 ("serial: samsung: add dma reqest/release functions")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-07 19:58:37 +01:00
Linus Torvalds 9e91c144e6 Merge branch 'idr-4.11' of git://git.infradead.org/users/willy/linux-dax
Pull idr fix (and new tests) from Matthew Wilcox:
 "One urgent patch in here; freeing the correct IDA bitmap.

  Everything else is changes to the test suite"

* 'idr-4.11' of git://git.infradead.org/users/willy/linux-dax:
  radix tree test suite: Specify -m32 in LDFLAGS too
  ida: Free correct IDA bitmap
  radix tree test suite: Depend on Makefile and quieten grep
  radix tree test suite: Fix build with --as-needed
  radix tree test suite: Build 32 bit binaries
  radix tree test suite: Add performance test for radix_tree_join()
  radix tree test suite: Add performance test for radix_tree_split()
  radix tree test suite: Add performance benchmarks
  radix tree test suite: Add test for radix_tree_clear_tags()
  radix tree test suite: Add tests for ida_simple_get() and ida_simple_remove()
  radix tree test suite: Add test for idr_get_next()
2017-03-07 10:52:26 -08:00
Jaehoon Chung 544714d8e1 PCI: exynos: Initialize elbi_base even when using PHY framework
Even when using the PHY framework, we need the elbi_base.  Before this
patch, we didn't initialize elbi_base, which caused NULL pointer
dereferences later.

Fixes: e7cd7ef58e ("PCI: exynos: Support the PHY generic framework")
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-03-07 12:46:38 -06:00
Linus Torvalds f7d6a7283a powerpc fixes for 4.11 #3
Five fairly small fixes for things that went in this cycle.
 
 A fairly large patch to rework the CAS logic on Power9, necessitated by a late
 change to the firmware API, and we can't boot without it.
 
 Three fixes going to stable, allowing more instructions to be emulated on LE,
 fixing a boot crash on 32-bit Freescale BookE machines, and the OPAL XICS
 workaround.
 
 And a patch from me to sort the selects under CONFIG PPC. Annoying churn, but
 worth it in the long run, and best for it to go in now to avoid conflicts.
 
 Thanks to:
   Alexey Kardashevskiy, Anton Blanchard, Balbir Singh, Gautham R. Shenoy,
   Laurentiu Tudor, Nicholas Piggin, Paul Mackerras, Ravi Bangoria, Sachin Sant,
   Shile Zhang, Suraj Jitindar Singh.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYvqSxAAoJEFHr6jzI4aWAjMQP/06OFGz3VQvO5Q8jPsqRF22y
 Wr+04OKFmKnYVObdQk15HGOagp1fSkWWHfP/eu50kx1WNCzq7tQdLjNSi7H4F3s1
 4NwlaOfSQoxctsVtfnITJkfVScjcxK7XVagswtb3wvBpBx4lwD8fGwxkSxj6NhRw
 PNxLi44wobb8mDyR6L/6tJKBI2Jt12qXZY+kBQIleun5+lF8fNXIu4qPiglMOia6
 oPhXlp4RASt8wz74H8JuMTwGv17MxG+zvbkDPwQC7PI/fohJLybgWEfByN4H5UMy
 7Xi/lWHlShAyc7ulAIN+A1mHKY9LSv45U6qrrHFUJgRftZihoZHe6ekcI+h5oFVX
 chP9oUrQNeeZ5QqUC4rYdWwsMfiXBI0y5+BCupItixXc1LANBH9Ym9IECbgPRP93
 LQVqiS4958KijHlYBOA2zPicl/FnVO16orqakyRS0B3lQ54XBvhcgG8gIXjQr8PM
 Mt2W4r6RtGJ4ddhUPpF/W4lEuR4+dmXfEqs7DkgBKRbvi8XYkiLx2byBNh/OMRUG
 T4ILXsYf50AKRAq/jFTs9A0zkjtmtBeDdn96Mcan8i3WZuTQ7b8mQlC46zEg23A8
 XmTG2xt7N1dMjjwS78CfnvQ8sIVtA9AUfK37aTc0ICMsBCqEcWLAhHKZyCw0h25C
 wq9BMn4e5Gdg2xLTHKlL
 =SxON
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Five fairly small fixes for things that went in this cycle.

  A fairly large patch to rework the CAS logic on Power9, necessitated
  by a late change to the firmware API, and we can't boot without it.

  Three fixes going to stable, allowing more instructions to be emulated
  on LE, fixing a boot crash on 32-bit Freescale BookE machines, and the
  OPAL XICS workaround.

  And a patch from me to sort the selects under CONFIG PPC. Annoying
  churn, but worth it in the long run, and best for it to go in now to
  avoid conflicts.

  Thanks to:
    Alexey Kardashevskiy, Anton Blanchard, Balbir Singh, Gautham R.
    Shenoy, Laurentiu Tudor, Nicholas Piggin, Paul Mackerras, Ravi
    Bangoria, Sachin Sant, Shile Zhang, Suraj Jitindar Singh"

* tag 'powerpc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: Sort the selects under CONFIG_PPC
  powerpc/64: Fix L1D cache shape vector reporting L1I values
  powerpc/64: Avoid panic during boot due to divide by zero in init_cache_info()
  powerpc: Update to new option-vector-5 format for CAS
  powerpc: Parse the command line before calling CAS
  powerpc/xics: Work around limitations of OPAL XICS priority handling
  powerpc/64: Fix checksum folding in csum_add()
  powerpc/powernv: Fix opal tracepoints with JUMP_LABEL=n
  powerpc/booke: Fix boot crash due to null hugepd
  powerpc: Fix compiling a BE kernel with a powerpc64le toolchain
  selftest/powerpc: Fix false failures for skipped tests
  powerpc/powernv: Fix bug due to labeling ambiguity in power_enter_stop
  powerpc/64: Invalidate process table caching after setting process table
  powerpc: emulate_step() tests for load/store instructions
  powerpc: Emulation support for load/store instructions on LE
2017-03-07 10:46:10 -08:00
Linus Torvalds 8c2c8ed8b8 Merge branch 'stable/for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb updates from Konrad Rzeszutek Wilk:
 "Two tiny implementations of the DMA API for callback in ARM (for Xen)"

* 'stable/for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb-xen: implement xen_swiotlb_get_sgtable callback
  swiotlb-xen: implement xen_swiotlb_dma_mmap callback
2017-03-07 10:23:17 -08:00
Matthew Wilcox f0f3f2d0a3 radix tree test suite: Specify -m32 in LDFLAGS too
Michael's patch to use the default make rule for linking and the patch
from Rehas to use -m32 if building a 32-bit test-suite on a 64-bit
platform don't work well together.

Reported-by: Rehas Sachdeva <aquannie@gmail.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07 13:18:24 -05:00
Matthew Wilcox 4ecd9542db ida: Free correct IDA bitmap
There's a relatively rare race where we look at the per-cpu preallocated
IDA bitmap, see it's NULL, allocate a new one, and atomically update it.
If the kmalloc() happened to sleep and we were rescheduled to a different
CPU, or an interrupt came in at the exact right time, another task
might have successfully allocated a bitmap and already deposited it.
I forgot what the semantics of cmpxchg() were and ended up freeing the
wrong bitmap leading to KASAN reporting a use-after-free.

Dmitry found the bug with syzkaller & wrote the patch.  I wrote the test
case that will reproduce the bug without his patch being applied.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07 13:18:23 -05:00
Matthew Wilcox 3f1b6f9d49 radix tree test suite: Depend on Makefile and quieten grep
Changing the CFLAGS in the Makefile didn't always lead to a
recompilation because the OFILES didn't depend on the Makefile.
Also, after doing make clean, grep would still complain about
a missing map-shift.h; we need -s as well as -q.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07 13:18:22 -05:00
Michael Ellerman 284d96a494 radix tree test suite: Fix build with --as-needed
Currently the radix tree test suite doesn't build with toolchains that
use --as-needed by default, for example Ubuntu's:

  cc -I. -I../../include -g -O2 -Wall -D_LGPL_SOURCE -fsanitize=address -lpthread -lurcu main.o ... -o main
  /usr/bin/ld: regression1.o: undefined reference to symbol 'pthread_join@@GLIBC_2.17'
  /lib/powerpc64le-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
  collect2: error: ld returned 1 exit status

This is caused by the custom makefile rules placing LDFLAGS before the
.o files that need the libraries.

We could fix it by using --no-as-needed, or rewriting the custom rules.
But we can also just drop the custom rules and move the libraries to
LDLIBS, and then the default rules work correctly - with the one caveat
that we need to add -fsanitize=address to LDFLAGS because that must be
passed to the linker as well as the compiler.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07 13:18:22 -05:00
Rehas Sachdeva c4634b08d9 radix tree test suite: Build 32 bit binaries
Add option 'make BUILD=32' for building 32-bit binaries.

Signed-off-by: Rehas Sachdeva <aquannie@gmail.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07 13:18:21 -05:00
Rehas Sachdeva 54f4d3341c radix tree test suite: Add performance test for radix_tree_join()
Signed-off-by: Rehas Sachdeva <aquannie@gmail.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07 13:18:21 -05:00
Rehas Sachdeva 6478581c85 radix tree test suite: Add performance test for radix_tree_split()
Signed-off-by: Rehas Sachdeva <aquannie@gmail.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07 13:18:20 -05:00
Rehas Sachdeva 0d4a41c1a0 radix tree test suite: Add performance benchmarks
Add performance benchmarks for radix tree insertion, tagging and deletion.

Signed-off-by: Rehas Sachdeva <aquannie@gmail.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07 13:18:20 -05:00
Rehas Sachdeva c629a344ac radix tree test suite: Add test for radix_tree_clear_tags()
Assert that radix_tree_clear_tags() clears the tags on the passed node and
slot. Assert that the case where the radix tree has only one entry at index
zero and the node is NULL, is also handled.

Signed-off-by: Rehas Sachdeva <aquannie@gmail.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07 13:18:19 -05:00
Rehas Sachdeva 166bb1f532 radix tree test suite: Add tests for ida_simple_get() and ida_simple_remove()
Assert that ida_simple_get() allocates an id in the passed range or returns
error on failure, and ida_simple_remove() releases an allocated id.

Signed-off-by: Rehas Sachdeva <aquannie@gmail.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07 13:18:19 -05:00
Rehas Sachdeva 2eacc79c27 radix tree test suite: Add test for idr_get_next()
Assert that idr_get_next() returns the next populated entry in the tree with
an ID greater than or equal to the value pointed to by @nextid argument.

Signed-off-by: Rehas Sachdeva <aquannie@gmail.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-03-07 13:18:18 -05:00
Linus Torvalds 304362a8bc Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace fix from Eric Biederman:
 "This fixes a race between put_ucounts and get_ucounts that can cause a
  use after free. The fix works by simplifying the code and so there is
  not even a temptation to be clever and play spinlock vs atomic
  reference games"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ucount: Remove the atomicity from ucount->count
2017-03-07 10:06:25 -08:00
Alexander Popov 82f2341c94 tty: n_hdlc: get rid of racy n_hdlc.tbuf
Currently N_HDLC line discipline uses a self-made singly linked list for
data buffers and has n_hdlc.tbuf pointer for buffer retransmitting after
an error.

The commit be10eb7589
("tty: n_hdlc add buffer flushing") introduced racy access to n_hdlc.tbuf.
After tx error concurrent flush_tx_queue() and n_hdlc_send_frames() can put
one data buffer to tx_free_buf_list twice. That causes double free in
n_hdlc_release().

Let's use standard kernel linked list and get rid of n_hdlc.tbuf:
in case of tx error put current data buffer after the head of tx_buf_list.

Signed-off-by: Alexander Popov <alex.popov@linux.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-07 18:54:30 +01:00
Linus Torvalds f26db9649a There was some breakage with the changes for jump labels in the 4.11 merge
window. Namely powerpc broke as jump labels uses the two LSB bits as flags
 in initialization. A check was added to make sure that all jump label
 entries were 4 bytes aligned, but powerpc didn't work that way for modules.
 Adding an alignment in the module linker script appeared to be the best
 solution.
 
 Jump labels also added an anonymous union to access those LSB bits as a
 normal long. But because this structure had static initialization, it broke
 older compilers that could not statically initialize anonymous unions
 without brackets.
 
 The command line parameter for setting function graph filter broke the
 "EMPTY_HASH" descriptor by modifying it instead of creating a new hash to
 hold the entries.
 
 The command line parameter ftrace_graph_max_depth was added to allow its
 setting at boot time. It uses existing code and only the command line hook
 was added. This is not really a fix, but as it uses existing code without
 affecting anything else, I added it to this release. It was ready before the
 merge window closed, but I wanted to let it sit in linux-next for a couple
 of days first.
 -----BEGIN PGP SIGNATURE-----
 
 iQExBAABCAAbBQJYvNrAFBxyb3N0ZWR0QGdvb2RtaXMub3JnAAoJEMm5BfJq2Y3L
 JGQIAMkayeZ0OCyYHRPR4EcCrdE3fATmt1huJWHrMPnT4/fLabL8XQqrOpnOBMq1
 GFZb1SMkBmvGtAHF4GbvCxnIUfDQko6BTQAd8EMea1WM8+Kb66/BLgJawjWIU9I0
 dNYre9ONgR2NOzkz6nfKRXnmy0lRcOweBb09YYGSzY11Md7d8T3T4TUrPNZdYrO9
 8ZMbF4qRd9KLMRHcsWqvhWhBISxWnmtUSlthfweukKgDMy8OKpb7pR0ckjtYwsWX
 RF41jqLqzSUqtd/nE2Sj/aT8XOP4pfrKEUuNM4SBj8q5jmNcZuqi8Q9wItu3LWR2
 jqM/9UKTzaCr9cchwuvUC0i+jWc=
 =kDql
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "There was some breakage with the changes for jump labels in the 4.11
  merge window:

   - powerpc broke as jump labels uses the two LSB bits as flags in
     initialization.

     A check was added to make sure that all jump label entries were 4
     bytes aligned, but powerpc didn't work that way for modules. Adding
     an alignment in the module linker script appeared to be the best
     solution.

   - Jump labels also added an anonymous union to access those LSB bits
     as a normal long. But because this structure had static
     initialization, it broke older compilers that could not statically
     initialize anonymous unions without brackets.

   - The command line parameter for setting function graph filter broke
     the "EMPTY_HASH" descriptor by modifying it instead of creating a
     new hash to hold the entries.

   - The command line parameter ftrace_graph_max_depth was added to
     allow its setting at boot time. It uses existing code and only the
     command line hook was added.

     This is not really a fix, but as it uses existing code without
     affecting anything else, I added it to this release. It was ready
     before the merge window closed, but I wanted to let it sit in
     linux-next for a couple of days first"

* tag 'trace-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace/graph: Add ftrace_graph_max_depth kernel parameter
  tracing: Add #undef to fix compile error
  jump_label: Add comment about initialization order for anonymous unions
  jump_label: Fix anonymous union initialization
  module: set __jump_table alignment to 8
  ftrace/graph: Do not modify the EMPTY_HASH for the function_graph filter
  tracing: Fix code comment for ftrace_ops_get_func()
2017-03-07 09:37:28 -08:00
Kieran Bingham 8c71fff434 [media] v4l: vsp1: Adapt vsp1_du_setup_lif() interface to use a structure
The interface to configure the LIF in the VSP1 requires adapting the
function prototype for any changes. This makes extending the interface
difficult.

Change the function prototype to pass a structure which can be easily
extended.

This changes the means of disabling the pipeline, by now passing a NULL
configuration rather than passing either a 0 width or height.

[Fixed kerneldoc, made vsp1_du_setup_lif() cfg argument const]

Signed-off-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
2017-03-07 13:34:11 -03:00
Andre Przywara a5e1e6ca94 KVM: arm/arm64: VGIC: Fix command handling while ITS being disabled
The ITS spec says that ITS commands are only processed when the ITS
is enabled (section 8.19.4, Enabled, bit[0]). Our emulation was not taking
this into account.
Fix this by checking the enabled state before handling CWRITER writes.

On the other hand that means that CWRITER could advance while the ITS
is disabled, and enabling it would need those commands to be processed.
Fix this case as well by refactoring actual command processing and
calling this from both the GITS_CWRITER and GITS_CTLR handlers.

Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-03-07 15:44:08 +00:00
Mark Rutland ba4dd156ea arm64: KVM: Survive unknown traps from guests
Currently we BUG() if we see an ESR_EL2.EC value we don't recognise. As
configurable disables/enables are added to the architecture (controlled
by RES1/RES0 bits respectively), with associated synchronous exceptions,
it may be possible for a guest to trigger exceptions with classes that
we don't recognise.

While we can't service these exceptions in a manner useful to the guest,
we can avoid bringing down the host. Per ARM DDI 0487A.k_iss10775, page
D7-1937, EC values within the range 0x00 - 0x2c are reserved for future
use with synchronous exceptions, and EC values within the range 0x2d -
0x3f may be used for either synchronous or asynchronous exceptions.

The patch makes KVM handle any unknown EC by injecting an UNDEFINED
exception into the guest, with a corresponding (ratelimited) warning in
the host dmesg. We could later improve on this with with a new (opt-in)
exit to the host userspace.

Cc: Dave Martin <dave.martin@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-03-07 14:50:46 +00:00
Mark Rutland f050fe7a91 arm: KVM: Survive unknown traps from guests
Currently we BUG() if we see a HSR.EC value we don't recognise. As
configurable disables/enables are added to the architecture (controlled
by RES1/RES0 bits respectively), with associated synchronous exceptions,
it may be possible for a guest to trigger exceptions with classes that
we don't recognise.

While we can't service these exceptions in a manner useful to the guest,
we can avoid bringing down the host. Per ARM DDI 0406C.c, all currently
unallocated HSR EC encodings are reserved, and per ARM DDI
0487A.k_iss10775, page G6-4395, EC values within the range 0x00 - 0x2c
are reserved for future use with synchronous exceptions, and EC values
within the range 0x2d - 0x3f may be used for either synchronous or
asynchronous exceptions.

The patch makes KVM handle any unknown EC by injecting an UNDEFINED
exception into the guest, with a corresponding (ratelimited) warning in
the host dmesg. We could later improve on this with with a new (opt-in)
exit to the host userspace.

Cc: Dave Martin <dave.martin@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-03-07 14:50:45 +00:00
Jintack Lim 370a0ec181 KVM: arm/arm64: Let vcpu thread modify its own active state
Currently, if a vcpu thread tries to change the active state of an
interrupt which is already on the same vcpu's AP list, it will loop
forever. Since the VGIC mmio handler is called after a vcpu has
already synced back the LR state to the struct vgic_irq, we can just
let it proceed safely.

Cc: stable@vger.kernel.org
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-03-07 14:48:16 +00:00
Wanpeng Li 2f707d9798 KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset
Reported by syzkaller:

    WARNING: CPU: 1 PID: 27742 at arch/x86/kvm/vmx.c:11029
    nested_vmx_vmexit+0x5c35/0x74d0 arch/x86/kvm/vmx.c:11029
    CPU: 1 PID: 27742 Comm: a.out Not tainted 4.10.0+ #229
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    Call Trace:
     __dump_stack lib/dump_stack.c:15 [inline]
     dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
     panic+0x1fb/0x412 kernel/panic.c:179
     __warn+0x1c4/0x1e0 kernel/panic.c:540
     warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
     nested_vmx_vmexit+0x5c35/0x74d0 arch/x86/kvm/vmx.c:11029
     vmx_leave_nested arch/x86/kvm/vmx.c:11136 [inline]
     vmx_set_msr+0x1565/0x1910 arch/x86/kvm/vmx.c:3324
     kvm_set_msr+0xd4/0x170 arch/x86/kvm/x86.c:1099
     do_set_msr+0x11e/0x190 arch/x86/kvm/x86.c:1128
     __msr_io arch/x86/kvm/x86.c:2577 [inline]
     msr_io+0x24b/0x450 arch/x86/kvm/x86.c:2614
     kvm_arch_vcpu_ioctl+0x35b/0x46a0 arch/x86/kvm/x86.c:3497
     kvm_vcpu_ioctl+0x232/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2721
     vfs_ioctl fs/ioctl.c:43 [inline]
     do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683
     SYSC_ioctl fs/ioctl.c:698 [inline]
     SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
     entry_SYSCALL_64_fastpath+0x1f/0xc2

The syzkaller folks reported a nested_run_pending warning during userspace
clear VMX capability which is exposed to L1 before.

The warning gets thrown while doing

(*(uint32_t*)0x20aecfe8 = (uint32_t)0x1);
(*(uint32_t*)0x20aecfec = (uint32_t)0x0);
(*(uint32_t*)0x20aecff0 = (uint32_t)0x3a);
(*(uint32_t*)0x20aecff4 = (uint32_t)0x0);
(*(uint64_t*)0x20aecff8 = (uint64_t)0x0);
r[29] = syscall(__NR_ioctl, r[4], 0x4008ae89ul,
		0x20aecfe8ul, 0, 0, 0, 0, 0, 0);

i.e. KVM_SET_MSR ioctl with

struct kvm_msrs {
	.nmsrs = 1,
		.pad = 0,
		.entries = {
			{.index = MSR_IA32_FEATURE_CONTROL,
			 .reserved = 0,
			 .data = 0}
		}
}

The VMLANCH/VMRESUME emulation should be stopped since the CPU is going to
reset here. This patch resets the nested_run_pending since the CPU is going
to be reset hence there should be nothing pending.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-03-07 15:41:12 +01:00
Franck Demathieu 4b9de5da7e irqchip/crossbar: Fix incorrect type of register size
The 'size' variable is unsigned according to the dt-bindings.
As this variable is used as integer in other places, create a new variable
that allows to fix the following sparse issue (-Wtypesign):

  drivers/irqchip/irq-crossbar.c:279:52: warning: incorrect type in argument 3 (different signedness)
  drivers/irqchip/irq-crossbar.c:279:52:    expected unsigned int [usertype] *out_value
  drivers/irqchip/irq-crossbar.c:279:52:    got int *<noident>

Signed-off-by: Franck Demathieu <fdemathieu@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-03-07 14:34:39 +00:00
Shanker Donthineni 90922a2d03 irqchip/gicv3-its: Add workaround for QDF2400 ITS erratum 0065
On Qualcomm Datacenter Technologies QDF2400 SoCs, the ITS hardware
implementation uses 16Bytes for Interrupt Translation Entry (ITE),
but reports an incorrect value of 8Bytes in GITS_TYPER.ITTE_size.

It might cause kernel memory corruption depending on the number
of MSI(x) that are configured and the amount of memory that has
been allocated for ITEs in its_create_device().

This patch fixes the potential memory corruption by setting the
correct ITE size to 16Bytes.

Cc: stable@vger.kernel.org
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-03-07 14:34:27 +00:00