Commit Graph

857596 Commits

Author SHA1 Message Date
Trond Myklebust 8f54c7a4ba NFS: Fix spurious EIO read errors
If the client attempts to read a page, but the read fails due to some
spurious error (e.g. an ACCESS error or a timeout, ...) then we need
to allow other processes to retry.
Also try to report errors correctly when doing a synchronous readpage.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26 15:31:29 -04:00
Trond Myklebust 7af46292da pNFS/flexfiles: Don't time out requests on hard mounts
If the mount is hard, we should ignore the 'io_maxretrans' module
parameter so that we always keep retrying.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26 15:31:29 -04:00
Trond Myklebust c82e5472c9 SUNRPC: Handle connection breakages correctly in call_status()
If the connection breaks while we're waiting for a reply from the
server, then we want to immediately try to reconnect.

Fixes: ec6017d903 ("SUNRPC fix regression in umount of a secure mount")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26 15:31:29 -04:00
Trond Myklebust d5711920ec Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated"
This reverts commit a79f194aa4.
The mechanism for aborting I/O is racy, since we are not guaranteed that
the request is asleep while we're changing both task->tk_status and
task->tk_action.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v5.1
2019-08-26 15:31:29 -04:00
Trond Myklebust 80f455da6c SUNRPC: Handle EADDRINUSE and ENOBUFS correctly
If a connect or bind attempt returns EADDRINUSE, that means we want to
retry with a different port. It is not a fatal connection error.
Similarly, ENOBUFS is not fatal, but just indicates a memory allocation
issue. Retry after a short delay.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26 15:31:29 -04:00
Trond Myklebust bf2bf9b80e pNFS/flexfiles: Turn off soft RPC calls
The pNFS/flexfiles I/O requests are sent with the SOFTCONN flag set, so
they automatically time out if the connection breaks. It should
therefore not be necessary to have the soft flag set in addition.

Fixes: 5f01d95394 ("nfs41: create NFSv3 DS connection if specified")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26 15:31:29 -04:00
Trond Myklebust bd736ed3e2 SUNRPC: Don't handle errors if the bind/connect succeeded
Don't handle errors in call_bind_status()/call_connect_status()
if it turns out that a previous call caused it to succeed.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v5.1+
2019-08-26 15:31:29 -04:00
Bandan Das 558682b529 x86/apic: Include the LDR when clearing out APIC registers
Although APIC initialization will typically clear out the LDR before
setting it, the APIC cleanup code should reset the LDR.

This was discovered with a 32-bit KVM guest jumping into a kdump
kernel. The stale bits in the LDR triggered a bug in the KVM APIC
implementation which caused the destination mapping for VCPUs to be
corrupted.

Note that this isn't intended to paper over the KVM APIC bug. The kernel
has to clear the LDR when resetting the APIC registers except when X2APIC
is enabled.

This lacks a Fixes tag because missing to clear LDR goes way back into pre
git history.

[ tglx: Made x2apic_enabled a function call as required ]

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190826101513.5080-3-bsd@redhat.com
2019-08-26 20:00:57 +02:00
Bandan Das bae3a8d330 x86/apic: Do not initialize LDR and DFR for bigsmp
Legacy apic init uses bigsmp for smp systems with 8 and more CPUs. The
bigsmp APIC implementation uses physical destination mode, but it
nevertheless initializes LDR and DFR. The LDR even ends up incorrectly with
multiple bit being set.

This does not cause a functional problem because LDR and DFR are ignored
when physical destination mode is active, but it triggered a problem on a
32-bit KVM guest which jumps into a kdump kernel.

The multiple bits set unearthed a bug in the KVM APIC implementation. The
code which creates the logical destination map for VCPUs ignores the
disabled state of the APIC and ends up overwriting an existing valid entry
and as a result, APIC calibration hangs in the guest during kdump
initialization.

Remove the bogus LDR/DFR initialization.

This is not intended to work around the KVM APIC bug. The LDR/DFR
ininitalization is wrong on its own.

The issue goes back into the pre git history. The fixes tag is the commit
in the bitkeeper import which introduced bigsmp support in 2003.

  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

Fixes: db7b9e9f26b8 ("[PATCH] Clustered APIC setup for >8 CPU systems")
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190826101513.5080-2-bsd@redhat.com
2019-08-26 20:00:56 +02:00
Nick Desaulniers 2f029413cb arc: prefer __section from compiler_attributes.h
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-08-26 22:37:12 +05:30
Mischa Jonker d85f6b93a7 dt-bindings: IDU-intc: Add support for edge-triggered interrupts
This updates the documentation for supporting an optional extra interrupt
cell to specify edge vs level triggered.

Signed-off-by: Mischa Jonker <mischa.jonker@synopsys.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-08-26 22:35:51 +05:30
Mischa Jonker 01449985e6 dt-bindings: IDU-intc: Clean up documentation
* Some lines exceeded 80 characters.
* Clarified statement about AUX register interface

Signed-off-by: Mischa Jonker <mischa.jonker@synopsys.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-08-26 22:35:25 +05:30
Mischa Jonker 174ae4e96e ARCv2: IDU-intc: Add support for edge-triggered interrupts
This adds support for an optional extra interrupt cell to specify edge
vs level triggered. It is backward compatible with dts files with only
one cell, and will default to level-triggered in such a case.

Note that I had to make a change to idu_irq_set_affinity as well, as
this function was setting the interrupt type to "level" unconditionally,
since this was the only type supported previously.

Signed-off-by: Mischa Jonker <mischa.jonker@synopsys.com>
Reviewed-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-08-26 22:34:59 +05:30
Takashi Sakamoto 2fd2329393 ALSA: oxfw: fix to handle correct stream for PCM playback
When userspace application calls ioctl(2) to configure hardware for PCM
playback substream, ALSA OXFW driver handles incoming AMDTP stream.
In this case, outgoing AMDTP stream should be handled.

This commit fixes the bug for v5.3-rc kernel.

Fixes: 4f380d0070 ("ALSA: oxfw: configure packet format in pcm.hw_params callback")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-08-26 16:00:30 +02:00
Sebastian Mayr 9212ec7d83 uprobes/x86: Fix detection of 32-bit user mode
32-bit processes running on a 64-bit kernel are not always detected
correctly, causing the process to crash when uretprobes are installed.

The reason for the crash is that in_ia32_syscall() is used to determine the
process's mode, which only works correctly when called from a syscall.

In the case of uretprobes, however, the function is called from a exception
and always returns 'false' on a 64-bit kernel. In consequence this leads to
corruption of the process's return address.

Fix this by using user_64bit_mode() instead of in_ia32_syscall(), which
is correct in any situation.

[ tglx: Add a comment and the following historical info ]

This should have been detected by the rename which happened in commit

  abfb9498ee ("x86/entry: Rename is_{ia32,x32}_task() to in_{ia32,x32}_syscall()")

which states in the changelog:

    The is_ia32_task()/is_x32_task() function names are a big misnomer: they
    suggests that the compat-ness of a system call is a task property, which
    is not true, the compatness of a system call purely depends on how it
    was invoked through the system call layer.
    .....

and then it went and blindly renamed every call site.

Sadly enough this was already mentioned here:

   8faaed1b9f ("uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and
arch_uretprobe_hijack_return_addr()")

where the changelog says:

    TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
    not necessarily mean 32bit. Fortunately syscall-like insns can't be
    probed so it actually works, but it would be better to rename and
    use is_ia32_frame().

and goes all the way back to:

    0326f5a94d ("uprobes/core: Handle breakpoint and singlestep exceptions")

Oh well. 7+ years until someone actually tried a uretprobe on a 32bit
process on a 64bit kernel....

Fixes: 0326f5a94d ("uprobes/core: Handle breakpoint and singlestep exceptions")
Signed-off-by: Sebastian Mayr <me@sam.st>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190728152617.7308-1-me@sam.st
2019-08-26 15:55:09 +02:00
Bjorn Andersson 14ced7e3a1 phy: qcom-qmp: Correct ready status, again
Despite extensive testing of commit 885bd76596 ("phy: qcom-qmp: Correct
READY_STATUS poll break condition") I failed to conclude that the
PHYSTATUS bit of the PCS_STATUS register used in PCIe and USB3 falls as
the PHY gets ready. Similar to the prior bug with UFS the code will
generally get past the check before the transition and thereby
"succeed".

Correct the name of the register used PCIe and USB3 PHYs, replace
mask_pcs_ready with a constant expression depending on the type of the
PHY and check for the appropriate ready state.

Cc: stable@vger.kernel.org
Cc: Vivek Gautam <vivek.gautam@codeaurora.org>
Cc: Evan Green <evgreen@chromium.org>
Cc: Niklas Cassel <niklas.cassel@linaro.org>
Reported-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
Fixes: 885bd76596 ("phy: qcom-qmp: Correct READY_STATUS poll break condition")
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Tested-by: Marc Gonzalez <marc.w.gonzalez@free.fr>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2019-08-26 17:20:04 +05:30
Nishka Dasgupta be0345b2cc phy: qualcomm: phy-qcom-qmp: Add of_node_put() before return
Each iteration of for_each_available_child_of_node() puts the previous
node, but in the case of a return from the middle of the loop, there is
no put, thus causing a memory leak. Hence create a new label,
err_node_put, that puts the previous node (child) before returning the
required value. Also include the statement pm_runtime_disable() under
this label in order to avoid repetition among mid-loop return
conditions. Edit the mid-loop return statements to instead go to this
new label err_node_put.
Issue found with Coccinelle.

Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2019-08-26 17:20:02 +05:30
Yoshihiro Shimoda e6839c31a6 phy: renesas: rcar-gen3-usb2: Disable clearing VBUS in over-current
The hardware manual should be revised, but the initial value of
VBCTRL.OCCLREN is set to 1 actually. If the bit is set, the hardware
clears VBCTRL.VBOUT and ADPCTRL.DRVVBUS registers automatically
when the hardware detects over-current signal from a USB power switch.
However, since the hardware doesn't have any registers which
indicates over-current, the driver cannot handle it at all. So, if
"is_otg_channel" hardware detects over-current, since ADPCTRL.DRVVBUS
register is cleared automatically, the channel cannot be used after
that.

To resolve this behavior, this patch sets the VBCTRL.OCCLREN to 0
to keep ADPCTRL.DRVVBUS even if the "is_otg_channel" hardware
detects over-current. (We assume a USB power switch itself protects
over-current and turns the VBUS off.)

This patch is inspired by a BSP patch from Kazuya Mizuguchi.

Fixes: 1114e2d317 ("phy: rcar-gen3-usb2: change the mode to OTG on the combined channel")
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2019-08-26 17:20:02 +05:30
Wen Yang 3e64482842 phy: ti: am654-serdes: fix an use-after-free in serdes_am654_clk_register()
The regmap_node variable is still being used in the syscon_node_to_regmap()
call after the of_node_put() call, which may result in use-after-free.

Fixes: 71e2f5c5c2 ("phy: ti: Add a new SERDES driver for TI's AM654x SoC")
Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
Cc: Kishon Vijay Abraham I <kishon@ti.com>
Cc: Roger Quadros <rogerq@ti.com>
Reviewed-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2019-08-26 17:20:01 +05:30
Thomas Gleixner 3e5bedc2c2 x86/apic: Fix arch_dynirq_lower_bound() bug for DT enabled machines
Rahul Tanwar reported the following bug on DT systems:

> 'ioapic_dynirq_base' contains the virtual IRQ base number. Presently, it is
> updated to the end of hardware IRQ numbers but this is done only when IOAPIC
> configuration type is IOAPIC_DOMAIN_LEGACY or IOAPIC_DOMAIN_STRICT. There is
> a third type IOAPIC_DOMAIN_DYNAMIC which applies when IOAPIC configuration
> comes from devicetree.
>
> See dtb_add_ioapic() in arch/x86/kernel/devicetree.c
>
> In case of IOAPIC_DOMAIN_DYNAMIC (DT/OF based system), 'ioapic_dynirq_base'
> remains to zero initialized value. This means that for OF based systems,
> virtual IRQ base will get set to zero.

Such systems will very likely not even boot.

For DT enabled machines ioapic_dynirq_base is irrelevant and not
updated, so simply map the IRQ base 1:1 instead.

Reported-by: Rahul Tanwar <rahul.tanwar@linux.intel.com>
Tested-by: Rahul Tanwar <rahul.tanwar@linux.intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: alan@linux.intel.com
Cc: bp@alien8.de
Cc: cheol.yong.kim@intel.com
Cc: qi-ming.wu@intel.com
Cc: rahul.tanwar@intel.com
Cc: rppt@linux.ibm.com
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/20190821081330.1187-1-rahul.tanwar@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-26 12:11:23 +02:00
Mika Westerberg dfda204198 ACPI / property: Add two new Thunderbolt property GUIDs to the list
Ice Lake Thunderbolt controller includes two new device property
compatible properties that we need to be able to extract in the driver
so add them to the growing array of GUIDs.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2019-08-26 12:15:11 +03:00
Mika Westerberg 3cdb9446a1 thunderbolt: Add support for Intel Ice Lake
The Thunderbolt controller is integrated into the Ice Lake CPU itself
and requires special flows to power it on and off using force power bit
in NHI VSEC registers. Runtime PM (RTD3) and Sx flows also differ from
the discrete solutions. Now the firmware notifies the driver whether
RTD3 entry or exit are possible. The driver is responsible of sending
Go2Sx command through link controller mailbox when system enters Sx
states (suspend-to-mem/disk). Rest of the ICM firwmare flows follow
Titan Ridge.

Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Tested-by: Mario Limonciello <mario.limonciello@dell.com>
2019-08-26 12:15:06 +03:00
Mika Westerberg 3f415e5ee1 thunderbolt: Expose active parts of NVM even if upgrade is not supported
Ice Lake Thunderbolt controller NVM firmware is part of the BIOS image
which means it is not writable through the DMA port anymore. However, we
can still read it so we can keep nvm_version and active parts of NVM.
This way users still can find out the active NVM version and other
potentially useful information directly from Linux.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Tested-by: Mario Limonciello <mario.limonciello@dell.com>
2019-08-26 12:15:01 +03:00
Mika Westerberg 58f414fa43 thunderbolt: Hide switch attributes that are not set
Thunderbolt host routers may not always contain DROM that includes
device identification information. This is mostly needed for Ice Lake
systems but some Falcon Ridge controllers on PCs also do not have DROM.

In that case hide the identification attributes.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Tested-by: Mario Limonciello <mario.limonciello@dell.com>
2019-08-26 12:14:56 +03:00
Mika Westerberg d94dcbb101 thunderbolt: Do not fail adding switch if some port is not implemented
There are two ways to mark a port as unimplemented. Typical way is to
return port type as TB_TYPE_INACTIVE when its config space is read.
Alternatively if the port is not physically present (such as ports 10
and 11 in ICL) reading from port config space returns
TB_CFG_ERROR_INVALID_CONFIG_SPACE instead. Currently the driver bails
out from adding the switch if it receives any error during port
inititialization which is wrong.

Handle this properly and just leave the port as TB_TYPE_INACTIVE before
continuing to the next port.

This also allows us to get rid of special casing for Light Ridge port 5
in eeprom.c.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Tested-by: Mario Limonciello <mario.limonciello@dell.com>
2019-08-26 12:14:51 +03:00
Mika Westerberg 943795219d thunderbolt: Use 32-bit writes when writing ring producer/consumer
The register access should be using 32-bit reads/writes according to the
datasheet. With the previous generation hardware 16-bit writes have been
working but starting with ICL this is not the case anymore so fix
producer/consumer register update to use correct width register address.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Tested-by: Mario Limonciello <mario.limonciello@dell.com>
2019-08-26 12:14:46 +03:00
Mika Westerberg f437c24bf6 thunderbolt: Move NVM upgrade support flag to struct icm
This is depends on the controller and on the platform/CPU we are
running. Move it to struct icm so we can set it per controller.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <YehezkelShB@gmail.com>
Tested-by: Mario Limonciello <mario.limonciello@dell.com>
2019-08-26 12:14:32 +03:00
Mika Westerberg ce19f91eae thunderbolt: Correct path indices for PCIe tunnel
PCIe tunnel path indices got mixed up when we added support for tunnels
between switches that are not adjacent. This did not affect the
functionality as it is just an index but fix it now nevertheless to make
the code easier to understand.

Reported-by: Rajmohan Mani <rajmohan.mani@intel.com>
Fixes: 8c7acaaf02 ("thunderbolt: Extend tunnel creation to more than 2 adjacent switches")
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Yehezkel Bernat <YehezkelShB@gmail.com>
2019-08-26 12:08:57 +03:00
Prashant Malani f53a7ad189 r8152: Set memory to all 0xFFs on failed reg reads
get_registers() blindly copies the memory written to by the
usb_control_msg() call even if the underlying urb failed.

This could lead to junk register values being read by the driver, since
some indirect callers of get_registers() ignore the return values. One
example is:
  ocp_read_dword() ignores the return value of generic_ocp_read(), which
  calls get_registers().

So, emulate PCI "Master Abort" behavior by setting the buffer to all
0xFFs when usb_control_msg() fails.

This patch is copied from the r8152 driver (v2.12.0) published by
Realtek (www.realtek.com).

Signed-off-by: Prashant Malani <pmalani@chromium.org>
Acked-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-25 19:52:59 -07:00
Yi-Hung Wei 7177895154 openvswitch: Fix conntrack cache with timeout
This patch addresses a conntrack cache issue with timeout policy.
Currently, we do not check if the timeout extension is set properly in the
cached conntrack entry.  Thus, after packet recirculate from conntrack
action, the timeout policy is not applied properly.  This patch fixes the
aforementioned issue.

Fixes: 06bd2bdf19 ("openvswitch: Add timeout support to ct action")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-25 14:48:43 -07:00
Alexey Kodanev 803f3e22ae ipv4: mpls: fix mpls_xmit for iptunnel
When using mpls over gre/gre6 setup, rt->rt_gw4 address is not set, the
same for rt->rt_gw_family.  Therefore, when rt->rt_gw_family is checked
in mpls_xmit(), neigh_xmit() call is skipped. As a result, such setup
doesn't work anymore.

This issue was found with LTP mpls03 tests.

Fixes: 1550c17193 ("ipv4: Prepare rtable for IPv6 gateway")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-25 14:34:08 -07:00
David Ahern 9b5f684182 nexthop: Fix nexthop_num_path for blackhole nexthops
Donald reported this sequence:
  ip next add id 1 blackhole
  ip next add id 2 blackhole
  ip ro add 1.1.1.1/32 nhid 1
  ip ro add 1.1.1.2/32 nhid 2

would cause a crash. Backtrace is:

[  151.302790] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  151.304043] CPU: 1 PID: 277 Comm: ip Not tainted 5.3.0-rc5+ #37
[  151.305078] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
[  151.306526] RIP: 0010:fib_add_nexthop+0x8b/0x2aa
[  151.307343] Code: 35 f7 81 48 8d 14 01 c7 02 f1 f1 f1 f1 c7 42 04 01 f4 f4 f4 48 89 f2 48 c1 ea 03 65 48 8b 0c 25 28 00 00 00 48 89 4d d0 31 c9 <80> 3c 02 00 74 08 48 89 f7 e8 1a e8 53 ff be 08 00 00 00 4c 89 e7
[  151.310549] RSP: 0018:ffff888116c27340 EFLAGS: 00010246
[  151.311469] RAX: dffffc0000000000 RBX: ffff8881154ece00 RCX: 0000000000000000
[  151.312713] RDX: 0000000000000004 RSI: 0000000000000020 RDI: ffff888115649b40
[  151.313968] RBP: ffff888116c273d8 R08: ffffed10221e3757 R09: ffff888110f1bab8
[  151.315212] R10: 0000000000000001 R11: ffff888110f1bab3 R12: ffff888115649b40
[  151.316456] R13: 0000000000000020 R14: ffff888116c273b0 R15: ffff888115649b40
[  151.317707] FS:  00007f60b4d8d800(0000) GS:ffff88811ac00000(0000) knlGS:0000000000000000
[  151.319113] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  151.320119] CR2: 0000555671ffdc00 CR3: 00000001136ba005 CR4: 0000000000020ee0
[  151.321367] Call Trace:
[  151.321820]  ? fib_nexthop_info+0x635/0x635
[  151.322572]  fib_dump_info+0xaa4/0xde0
[  151.323247]  ? fib_create_info+0x2431/0x2431
[  151.324008]  ? napi_alloc_frag+0x2a/0x2a
[  151.324711]  rtmsg_fib+0x2c4/0x3be
[  151.325339]  fib_table_insert+0xe2f/0xeee
...

fib_dump_info incorrectly has nhs = 0 for blackhole nexthops, so it
believes the nexthop object is a multipath group (nhs != 1) and ends
up down the nexthop_mpath_fill_node() path which is wrong for a
blackhole.

The blackhole check in nexthop_num_path is leftover from early days
of the blackhole implementation which did not initialize the device.
In the end the design was simpler (fewer special case checks) to set
the device to loopback in nh_info, so the check in nexthop_num_path
should have been removed.

Fixes: 430a049190 ("nexthop: Add support for nexthop groups")
Reported-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-25 14:29:10 -07:00
Linus Torvalds a55aa89aab Linux 5.3-rc6 2019-08-25 12:01:23 -07:00
Linus Torvalds c749088f25 A minor auxdisplay improvement:
- ht16k33: Make ht16k33_fb_fix and ht16k33_fb_var constant (Nishka Dasgupta)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAl1ixpEACgkQGXyLc2ht
 IW3Alg//f7+tOooILnDsxByF6T3bD5ObZFuMAW01jnHER8q93sBAuddY28OjiSrI
 MZrwZLbz43Ek9zF+Q2A8RIVYD79vFUZbD33ZbQHJ1CJmD/urapVE13rmQMo+EsiB
 PsCgIKjRByj/WfUexRdJTZ7gbKb+l6l/gvLO9tqLbb0rD/CMEny7rLEzmC5uLwYE
 koM6A74AhXBEQMYR2Vn7HpLF9U3vzo7O0QuDLUlvaSv5TJgpdpZuLDJHXbBOcnRU
 qrD7ruPOXxwo6b218TaIeCP6IDIEOdHz/4XxcZ0rFjiTxF0nLx4OjDHlCYfsxlEw
 6kujamc8kJmdUwHk3xQM2kxUlR/mMSmvpW5bRdUEBk2+Cqe4S5c2OFSxYoHMBiI/
 SpmUJbkLgzQSo33k0rNKiZL49arlrsNN94EV9+QHSHbmTq/HlPWuPleUUfA0Ep46
 mN7wbQkE1FAniwoOu3Tx4T1Kw+L2gTqAmqxNCFf6HoihnkFjf/RAYEGPLBP9mKAN
 o2W9icMSREeM9pKy4NYr0Fcq7eD1vcYGkSY1gpFfNDDEt7TTH7M3L85ty0ky+JvU
 jHRayXNRg/SGtx3CBhDw3iiq4Dj5t2YJ0NTNF2XyzHTkass4dGfE8duHoDDnaFP5
 GphEAjf3kV+f+j9f7Kj8Y5cCeCMxctWp3bv6eZGK+LMiPUzPqC0=
 =qxac
 -----END PGP SIGNATURE-----

Merge tag 'auxdisplay-for-linus-v5.3-rc7' of git://github.com/ojeda/linux

Pull auxdisplay cleanup from Miguel Ojeda:
 "Make ht16k33_fb_fix and ht16k33_fb_var constant (Nishka Dasgupta)"

* tag 'auxdisplay-for-linus-v5.3-rc7' of git://github.com/ojeda/linux:
  auxdisplay: ht16k33: Make ht16k33_fb_fix and ht16k33_fb_var constant
2019-08-25 11:43:17 -07:00
Linus Torvalds 32ae83ffec This pull request contains a single bug fix for UML:
- Fix time travel mode
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl1ikVwWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7waNYEADaqyJu+2UAp2hZGkwric9dqh4l
 IibXY0bPKokDIAt/gGmh5CX8cqBWKjWJSny91mqrINm1SBv4iTm0GLrSq7ZmQmYH
 1JRZSk3QtxRfVMVKizp2L/K22lPSMIViYoAsTGYTbRAmNyjBGJNSZrgCs3BBi/1F
 mxINtpyg2MyWOg9aNIzil6ZfwcPEazt9US6XM/2Tcs3z9wDO5bfRIgD3ILoWcT7D
 RPwLbtMi242Uak+Eyi44QCfwB5UjC1UvDdKjgr3paHiTVm7LS0dCEnBhaDhtGeb8
 bqEnSVH9oHA0XQhUAYdFNMQN0n1+bEDbqnbz9JLg4iJt6jXpvY8oL9xi7k/FglSu
 zXlhRRE4G7AYpBoCvQp/Anh85aCAcsZ9nP4aSN8GXLi7IqyaZ7KRTBHrAFxYi/WP
 dXVaqR984w5bEBDLRUsGosKHlHXHMnAwPDthQhuRrCqqmE/YyzpOaCsG46Wzpriy
 Jg302QmlTOMfx0uUoCVsiEq6rwar6LGTP7raihaR8j9g0EzFr7f4FpzmWxQpvJqG
 YpE3jVwp3OOKJjOETIW6ko2lzai3GOP9rPqoPfOhtqeALHLtORlg7XAhBj7n3Tji
 rLHKmVIxiiAmkfQItMdRjJbu9gFAiW+ZR7nEnDnhMjer1iPkJX+DtCLEZFpui7Me
 WrrQx4ypeO4RFemQCQ==
 =bDrL
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml

Pull UML fix from Richard Weinberger:
 "Fix time travel mode"

* tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: fix time travel mode
2019-08-25 11:40:24 -07:00
Linus Torvalds 94a76d9b52 This pull request contains the following fixes for UBIFS and JFFS2:
UBIFS:
 
 - Don't block too long in writeback_inodes_sb()
 - Fix for a possible overrun of the log head
 - Fix double unlock in orphan_delete()
 
 JFFS2:
 
 - Remove C++ style from UAPI header and unbreak picky toolchains
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl1ik14WHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wbP2D/4xVW7YP5Yyt6YrABJuclfoib30
 2LI6eOz0+5OojQKUbOzXCN9N7Dv4TLJKrCjRc9qKYTIB1DiQXuBDqtYKg6CTBhHb
 MjiftEDiBQ6j3jVmRxkQRXZEB9I3Uu9CkA8s65+UmL8peJfgNElpH34omsU1fzup
 y0NhZhj77P5jsAG6r7yXvuaofCOTlZIZVPya9FX17J0Ra+3rMOCtVEqnaHk2E5RB
 EQPAEByqXUIx7+9mOi1Krw7B7fesB7oOVbCykE5knX1pZQCTURP64yNr35WxN+7Z
 crcpdEQtf54qWMCKf4ClIBHiPmmsDIHYJy3JXjgJKOwIYvrB3dZ5E170qPr3JixY
 nS+l8x69IYZhWUzHg8gxDizk92iFYKbO1h5vBwI7NUFHkHLzylsgonBK0KdaUnol
 OvI5oCO/rdJEMBPr5LEFpOjZJIEptPtXpDvQCpm5tWd5tuW+8edNpI38lDO9LThC
 O0diZZUQfsuzD1XrvKRORPU+4lskzGV5b1UA0DWXdGKALqM5VrQZo1XftvA74Zkv
 oZQcHNK5wdecQX81Oadfb/0a5SN7FGGtTUCKTpOyBIu0adarGIasC6TQr2aDiiNh
 7jLjBoV2XEGhXZQrK2lm8G+6rJ7Mp11B6aoTFgDELzt+SB7htp6dARR2+4aGWXh9
 iXgme0n9HXDDeuosag==
 =Bsgx
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull UBIFS and JFFS2 fixes from Richard Weinberger:
 "UBIFS:
   - Don't block too long in writeback_inodes_sb()
   - Fix for a possible overrun of the log head
   - Fix double unlock in orphan_delete()

  JFFS2:
   - Remove C++ style from UAPI header and unbreak picky toolchains"

* tag 'for-linus-5.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubifs: Limit the number of pages in shrink_liability
  ubifs: Correctly initialize c->min_log_bytes
  ubifs: Fix double unlock around orphan_delete()
  jffs2: Remove C++ style comments from uapi header
2019-08-25 11:29:27 -07:00
Linus Torvalds 146c3d3220 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A few fixes for x86:

   - Fix a boot regression caused by the recent bootparam sanitizing
     change, which escaped the attention of all people who reviewed that
     code.

   - Address a boot problem on machines with broken E820 tables caused
     by an underflow which ended up placing the trampoline start at
     physical address 0.

   - Handle machines which do not advertise a legacy timer of any form,
     but need calibration of the local APIC timer gracefully by making
     the calibration routine independent from the tick interrupt. Marked
     for stable as well as there seems to be quite some new laptops
     rolled out which expose this.

   - Clear the RDRAND CPUID bit on AMD family 15h and 16h CPUs which are
     affected by broken firmware which does not initialize RDRAND
     correctly after resume. Add a command line parameter to override
     this for machine which either do not use suspend/resume or have a
     fixed BIOS. Unfortunately there is no way to detect this on boot,
     so the only safe decision is to turn it off by default.

   - Prevent RFLAGS from being clobbers in CALL_NOSPEC on 32bit which
     caused fast KVM instruction emulation to break.

   - Explain the Intel CPU model naming convention so that the repeating
     discussions come to an end"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386
  x86/boot: Fix boot regression caused by bootparam sanitizing
  x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h
  x86/boot/compressed/64: Fix boot on machines with broken E820 table
  x86/apic: Handle missing global clockevent gracefully
  x86/cpu: Explain Intel model naming convention
2019-08-25 10:10:15 -07:00
Linus Torvalds 5a13fc3d8b Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timekeeping fix from Thomas Gleixner:
 "A single fix for a regression caused by the generic VDSO
  implementation where a math overflow causes CLOCK_BOOTTIME to become a
  random number generator"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping/vsyscall: Prevent math overflow in BOOTTIME update
2019-08-25 10:08:01 -07:00
Linus Torvalds 8a04c2ee62 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
 "Handle the worker management in situations where a task is scheduled
  out on a PI lock contention correctly and schedule a new worker if
  possible"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Schedule new worker even if PI-blocked
2019-08-25 10:06:12 -07:00
Linus Torvalds 05bbb9360a Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "Two small fixes for kprobes and perf:

   - Prevent a deadlock in kprobe_optimizer() causes by reverse lock
     ordering

   - Fix a comment typo"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kprobes: Fix potential deadlock in kprobe_optimizer()
  perf/x86: Fix typo in comment
2019-08-25 10:03:32 -07:00
Linus Torvalds 44c471e436 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
 "A single fix for a imbalanced kobject operation in the irq decriptor
  code which was unearthed by the new warnings in the kobject code"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Properly pair kobject_del() with kobject_add()
2019-08-25 10:00:21 -07:00
Linus Torvalds f47edb59bb Merge branch 'akpm' (patches from Andrew)
Mergr misc fixes from Andrew Morton:
 "11 fixes"

Mostly VM fixes, one psi polling fix, and one parisc build fix.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y
  mm/zsmalloc.c: fix race condition in zs_destroy_pool
  mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely
  mm, page_owner: handle THP splits correctly
  userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
  psi: get poll_work to run when calling poll syscall next time
  mm: memcontrol: flush percpu vmevents before releasing memcg
  mm: memcontrol: flush percpu vmstats before releasing memcg
  parisc: fix compilation errrors
  mm, page_alloc: move_freepages should not examine struct page of reserved memory
  mm/z3fold.c: fix race between migration and destruction
2019-08-25 09:56:27 -07:00
Takashi Iwai 75545304eb ALSA: seq: Fix potential concurrent access to the deleted pool
The input pool of a client might be deleted via the resize ioctl, the
the access to it should be covered by the proper locks.  Currently the
only missing place is the call in snd_seq_ioctl_get_client_pool(), and
this patch papers over it.

Reported-by: syzbot+4a75454b9ca2777f35c7@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-08-25 09:31:10 +02:00
Linus Torvalds e67095fd2f dma-mapping fixes for 5.3-rc
Two fixes for regressions in this merge window:
 
  - select the Kconfig symbols for the noncoherent dma arch helpers
    on arm if swiotlb is selected, not just for LPAE to not break then
    Xen build, that uses swiotlb indirectly through swiotlb-xen
  - fix the page allocator fallback in dma_alloc_contiguous if the CMA
    allocation fails
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl1hvn4LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYON4w//Recfoy5T2Q4Gfjp1xVKGbr2sP7J93Vs7VCyQNZmX
 PrtzhmNKs4gxCEXVgHm+GVA+IJwQFqDtSFaPb8q3GQ+qM9NUDF4ScMFpfrLZsFr1
 dorm5kC1xcwrQtWjS1CQS/Gj0VBtWiMQOoUcAESMqgBIUo4ssj3Ny+vnh8hWgAOs
 oVDgOM4wt35bW0Pv/iY44uQzOq7xcYJUUYtPIiP9vMDrhPsxe6D1DgFQ4HZKJWix
 uS3BjZnsZDnLltXM/0CKdRV9wLF+jHYP/wJTztksRlr/A5V3FJ8lJIvgphxG1v3J
 tDfQs4BNuGWBjqdg+Qo6qOPEL9krvVYYVVql93DXwtPK/cJW1Z+0glgC2rbbHmIy
 ew35DFnYm9v0sFLZnbpuoHd6sQ9G59nTZstkqt/Z/hldBvKotwBpeuILAcMC9Nlw
 3iYW6Sz5L7cmkifC8OvopKKJWVoW5rVtMrVQw5niBiZVERtWbY825r/7ju2xYhZC
 iSAaUHT5wNtXsXQOTrFQ5LzTDBtgGyXRXgvNagEHhBf120jBQfOhvOCVT2HHOxdy
 5vx7xeeRS0M2HpxIsmd3XQjIUQEY9x1to4FKiYczGM1kcKeyWWBMFOXfLxe2Rmhg
 h14lbfsAxIEWdFkJAVFhjyjzC6IzxyVGtHCxw1iw0VgGzYATO/K6Oo8T2hG3HagR
 abQ=
 =DXk9
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-5.3-5' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:
 "Two fixes for regressions in this merge window:

   - select the Kconfig symbols for the noncoherent dma arch helpers on
     arm if swiotlb is selected, not just for LPAE to not break then Xen
     build, that uses swiotlb indirectly through swiotlb-xen

   - fix the page allocator fallback in dma_alloc_contiguous if the CMA
     allocation fails"

* tag 'dma-mapping-5.3-5' of git://git.infradead.org/users/hch/dma-mapping:
  dma-direct: fix zone selection after an unaddressable CMA allocation
  arm: select the dma-noncoherent symbols for all swiotlb builds
2019-08-24 20:00:11 -07:00
Andrey Ryabinin 00fb24a42a mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y
The code like this:

	ptr = kmalloc(size, GFP_KERNEL);
	page = virt_to_page(ptr);
	offset = offset_in_page(ptr);
	kfree(page_address(page) + offset);

may produce false-positive invalid-free reports on the kernel with
CONFIG_KASAN_SW_TAGS=y.

In the example above we lose the original tag assigned to 'ptr', so
kfree() gets the pointer with 0xFF tag.  In kfree() we check that 0xFF
tag is different from the tag in shadow hence print false report.

Instead of just comparing tags, do the following:

1) Check that shadow doesn't contain KASAN_TAG_INVALID.  Otherwise it's
   double-free and it doesn't matter what tag the pointer have.

2) If pointer tag is different from 0xFF, make sure that tag in the
   shadow is the same as in the pointer.

Link: http://lkml.kernel.org/r/20190819172540.19581-1-aryabinin@virtuozzo.com
Fixes: 7f94ffbc4c ("kasan: add hooks implementation for tag-based mode")
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Walter Wu <walter-zh.wu@mediatek.com>
Reported-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Henry Burns 701d678599 mm/zsmalloc.c: fix race condition in zs_destroy_pool
In zs_destroy_pool() we call flush_work(&pool->free_work).  However, we
have no guarantee that migration isn't happening in the background at
that time.

Since migration can't directly free pages, it relies on free_work being
scheduled to free the pages.  But there's nothing preventing an
in-progress migrate from queuing the work *after*
zs_unregister_migration() has called flush_work().  Which would mean
pages still pointing at the inode when we free it.

Since we know at destroy time all objects should be free, no new
migrations can come in (since zs_page_isolate() fails for fully-free
zspages).  This means it is sufficient to track a "# isolated zspages"
count by class, and have the destroy logic ensure all such pages have
drained before proceeding.  Keeping that state under the class spinlock
keeps the logic straightforward.

In this case a memory leak could lead to an eventual crash if compaction
hits the leaked page.  This crash would only occur if people are
changing their zswap backend at runtime (which eventually starts
destruction).

Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com
Fixes: 48b4800a1c ("zsmalloc: page migration support")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Henry Burns 1a87aa0359 mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely
In zs_page_migrate() we call putback_zspage() after we have finished
migrating all pages in this zspage.  However, the return value is
ignored.  If a zs_free() races in between zs_page_isolate() and
zs_page_migrate(), freeing the last object in the zspage,
putback_zspage() will leave the page in ZS_EMPTY for potentially an
unbounded amount of time.

To fix this, we need to do the same thing as zs_page_putback() does:
schedule free_work to occur.

To avoid duplicated code, move the sequence to a new
putback_zspage_deferred() function which both zs_page_migrate() and
zs_page_putback() call.

Link: http://lkml.kernel.org/r/20190809181751.219326-1-henryburns@google.com
Fixes: 48b4800a1c ("zsmalloc: page migration support")
Signed-off-by: Henry Burns <henryburns@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Henry Burns <henrywolfeburns@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Jonathan Adams <jwadams@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Vlastimil Babka f7da677bc6 mm, page_owner: handle THP splits correctly
THP splitting path is missing the split_page_owner() call that
split_page() has.

As a result, split THP pages are wrongly reported in the page_owner file
as order-9 pages.  Furthermore when the former head page is freed, the
remaining former tail pages are not listed in the page_owner file at
all.  This patch fixes that by adding the split_page_owner() call into
__split_huge_page().

Link: http://lkml.kernel.org/r/20190820131828.22684-2-vbabka@suse.cz
Fixes: a9627bc5e3 ("mm/page_owner: introduce split_page_owner and replace manual handling")
Reported-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Oleg Nesterov 46d0b24c5e userfaultfd_release: always remove uffd flags and clear vm_userfaultfd_ctx
userfaultfd_release() should clear vm_flags/vm_userfaultfd_ctx even if
mm->core_state != NULL.

Otherwise a page fault can see userfaultfd_missing() == T and use an
already freed userfaultfd_ctx.

Link: http://lkml.kernel.org/r/20190820160237.GB4983@redhat.com
Fixes: 04f5866e41 ("coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00
Jason Xing 7b2b55da1d psi: get poll_work to run when calling poll syscall next time
Only when calling the poll syscall the first time can user receive
POLLPRI correctly.  After that, user always fails to acquire the event
signal.

Reproduce case:
 1. Get the monitor code in Documentation/accounting/psi.txt
 2. Run it, and wait for the event triggered.
 3. Kill and restart the process.

The question is why we can end up with poll_scheduled = 1 but the work
not running (which would reset it to 0).  And the answer is because the
scheduling side sees group->poll_kworker under RCU protection and then
schedules it, but here we cancel the work and destroy the worker.  The
cancel needs to pair with resetting the poll_scheduled flag.

Link: http://lkml.kernel.org/r/1566357985-97781-1-git-send-email-joseph.qi@linux.alibaba.com
Signed-off-by: Jason Xing <kerneljasonxing@linux.alibaba.com>
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Caspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24 19:48:42 -07:00