Commit Graph

17555 Commits

Author SHA1 Message Date
Benjamin Herrenschmidt bba9bc89b9 powerpc: Add aacraid and nvme to powernv_defconfig
These adapters can be found in a number of our systems, so let's
enable the corresponding drivers by default.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:47:15 +11:00
Christophe Leroy 4f94b2c746 powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP
When CONFIG_SWAP is set, the TLB miss handlers have to also take
into account _PAGE_ACCESSED flag. At the moment it is done by
anding _PAGE_ACCESSED into _PAGE_PRESENT using 3 instructions.

This patch uses APG for handling _PAGE_ACCESSED, allowing to
just copy _PAGE_ACCESSED bit into APG field, hence reducing the
action to a single instruction.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:47:15 +11:00
Christophe Leroy de0f938739 powerpc/8xx: Remove _PAGE_USER and handle user access at PMD level
As Linux kernel separates KERNEL and USER address spaces, there is
therefore no need to flag USER access at page level.

Today, the 8xx TLB handlers already handle user access in the L1 entry
through Access Protection Groups, it is then natural to move the user
access handling at PMD level once _PAGE_NA allows to handle PAGE_NONE
protection without _PAGE_USER

In the mean time, as we free up one bit in the PTE, we can use it to
include SPS (page size flag) in the PTE and avoid handling it at every
TLB miss hence removing special handling based on compiled page size.

For _PAGE_EXEC, we rework it to use PP PTE bits, avoiding the copy
of _PAGE_EXEC bit into the L1 entry. Unfortunatly we are not
able to put it at the correct location as it conflicts with
NA/RO/RW bits for data entries.

Upper bits of APG in L1 entry overlap with PMD base address. In
order to avoid having to filter that out, we set up all groups so that
upper bits can have any value.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:47:14 +11:00
Christophe Leroy 351750331f powerpc/mm: Introduce _PAGE_NA
Today, PAGE_NONE is defined as a page not having _PAGE_USER.
In some circunstances, when the CPU supports it, it might be
better to be able to flag a page with NO ACCESS.

In a following patch, the 8xx will switch user access being flagged
in the PMD, therefore it will not be possible anymore to use
_PAGE_USER as a way to flag a page with no access.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:47:14 +11:00
Christophe Leroy 812fadcb94 powerpc/mm: extend _PAGE_PRIVILEGED to all CPUs
commit ac29c64089 ("powerpc/mm: Replace _PAGE_USER with
_PAGE_PRIVILEGED") introduced _PAGE_PRIVILEGED for BOOK3S/64

This patch generalises _PAGE_PRIVILEGED for all CPUs, allowing
to have either _PAGE_PRIVILEGED or _PAGE_USER or both.

PPC_8xx has a _PAGE_SHARED flag which is set for and only for
all non user pages. Lets rename it _PAGE_PRIVILEGED to remove
confusion as it has nothing to do with Linux shared pages.

On BookE, there's a _PAGE_BAP_SR which has to be set for kernel
pages: defining _PAGE_PRIVILEGED as _PAGE_BAP_SR will make
this generic

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:47:13 +11:00
Christophe Leroy 5f356497c3 powerpc/8xx: remove unused _PAGE_WRITETHRU
_PAGE_WRITETHRU is only used in:
* AMIGA_Z2RAM block driver which is never activated on powerPC
* Video/FB driver which is for PPC_PMAC

Therefore, no need to spend time in 8xx TLB miss handlers for
handling it.

And by removing it, we free up bit 20 which then avoids having
to clear it on each TLB miss.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:47:13 +11:00
Christophe Leroy cd99ddbea2 powerpc/8xx: Only perform perf counting when perf is in use.
In TLB miss handlers, updating the perf counter is only useful
when performing a perf analysis. As it has a noticeable overhead,
let's only do it when needed.

In order to do so, the exit of the miss handlers will be patched
when starting/stopping 'perf': the first register restore
instruction of each exit point will be replaced by a jump to
the counting code.

Once this is done, CONFIG_PPC_8xx_PERF_EVENT becomes useless as
this feature doesn't add any overhead.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:47:12 +11:00
Christophe Leroy bb9b5a8332 powerpc/8xx: remove EXCEPTION_PROLOG/EPILOG_0 and change r3 to r12
EXCEPTION_PROLOG_0 and EXCEPTION_EPILOG_0 were added some
time ago in order to regroup the two mtspr/mfspr to SCRATCH0 and
SCRATCH1 and the mfcr/mtcr in order to ease entry and exit of
function not using the full EXCEPTION_PROLOG.

Since then, the mfcr/mtcr has been taken out, hence just leaving
the two mtspr/mfspr in the macro.

In order to improve readability of the exception functions, we
remove those two macros and copy back the two mtspr/mfspr instead.

As r10 and r11 are used for SCRATCH0 and SCRATCH1, lets also use
r12 for SCRATCH2. It will also improve the readability/maintenance.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:47:12 +11:00
Christophe Leroy 2a45addd21 powerpc/8xx: Remove CPU6 ERRATA Workaround
CPU6 ERRATA affects only MPC860 revisions prior to C.0. Manufacturing
of those revisiosn was stopped in 1999-2000.
Therefore, it has been almost 20 years since this ERRATA has been
fixed in the silicon.

This patch removes the workaround for that ERRATA.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:47:12 +11:00
Christophe Leroy 9a3b849bfe powerpc/8xx: do not select CONFIG_PPC_LIB_RHEAP
Since commit 0e6e01ff69 ("CPM/QE: use genalloc to manage CPM/QE
muram"), rheap is not used anymore.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:47:11 +11:00
Balbir Singh 4145f35864 powernv/kdump: Fix cases where the kdump kernel can get HMI's
Certain HMI's such as malfunction error propagate through
all threads/core on the system. If a thread was offline
prior to us crashing the system and jumping to the kdump
kernel, bad things happen when it wakes up due to an HMI
in the kdump kernel.

There are several possible ways to solve this problem

1. Put the offline cores in a state such that they are
not woken up for machine check and HMI errors. This
does not work, since we might need to wake up offline
threads to handle TB errors
2. Ignore HMI errors, setup HMEER to mask HMI errors,
but this still leads the window open for any MCEs
and masking them for the duration of the dump might
be a concern
3. Wake up offline CPUs, as in send them to
crash_ipi_callback (not wake them up as in mark them
online as seen by the hotplug). kexec does a
wake_online_cpus() call, this patch does something
similar, but instead sends an IPI and forces them to
crash_ipi_callback()

This patch takes approach #3.

Care is taken to enable this only for powenv platforms
via crash_wake_offline (a global value set at setup
time). The crash code sends out IPI's to all CPU's
which then move to crash_ipi_callback and kexec_smp_wait().

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:47:11 +11:00
Balbir Singh 04b9c96eae powerpc/crash: Remove the test for cpu_online in the IPI callback
Our check was extra cautious, we've audited crash_send_ipi
and it sends an IPI only to online CPU's. Removal of this
check should have not functional impact on crash kdump.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:47:10 +11:00
Dmitry Torokhov 9625e69a38 powerpc: make use of for_each_node_by_type() instead of open-coding it
Instead of manually coding the loop with of_find_node_by_type(), let's
switch to the standard macro for iterating over nodes with given type.

Also fixed a couple of refcount leaks in the aforementioned loops.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:47:10 +11:00
Christophe Leroy 94d3084a0f powerpc/32s: Fix compile error with CONFIG_PPC_PTDUMP
This patch remove CONFIG_PPC_HTDUMP if not PPC_BOOK3S_64 to avoid
below compile failure on BOOK3S_32:

  In file included from arch/powerpc/mm/dump_hashpagetable.c:27:0:
  ./arch/powerpc/include/asm/plpar_wrappers.h: In function 'get_cede_latency_hint':
  ./arch/powerpc/include/asm/plpar_wrappers.h:27:2: error: implicit declaration of function 'get_lppaca' [-Werror=implicit-function-declaration]
  ...
  arch/powerpc/mm/dump_hashpagetable.c: At top level:
  arch/powerpc/mm/dump_hashpagetable.c:69:13: error: 'SLB_VSID_B' undeclared here (not in a function)
  ...
  arch/powerpc/mm/dump_hashpagetable.c:506:38: error: 'VMEMMAP_BASE' undeclared (first use in this function)
  arch/powerpc/mm/dump_hashpagetable.c:506:35: error: assignment makes integer from pointer without a cast [-Werror]

Fixes: dd5ac03e09 ("powerpc/mm: Fix page table dump build on non-Book3S")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Trim change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:35:02 +11:00
Nathan Fontenot 0c38ed6f6f powerpc/pseries: Enable support of ibm,dynamic-memory-v2
Add required bits to the architecture vector to enable support
of the ibm,dynamic-memory-v2 device tree property.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:26:30 +11:00
Nathan Fontenot 2b31e3aec1 powerpc/drmem: Add support for ibm, dynamic-memory-v2 property
The Power Hypervisor has introduced a new device tree format for
the property describing the dynamic reconfiguration LMBs for a system,
ibm,dynamic-memory-v2. This new format condenses the size of the
property, especially on large memory systems, by reporting sets
of LMBs that have the same properties (flags and associativity array
index).

This patch updates the powerpc/mm/drmem.c code to provide routines
that can parse the new device tree format during the walk_drmem_lmb*
routines used during boot, the creation of the LMB array, and updating
the device tree to create a new property in the proper format for
ibm,dynamic-memory-v2.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:26:29 +11:00
Nathan Fontenot 2c77721552 powerpc: Move of_drconf_cell struct to asm/drmem.h
Now that the powerpc code parses dynamic reconfiguration memory
LMB information from the LMB array and not the device tree
directly we can move the of_drconf_cell struct to drmem.h where
it fits better.

In addition, the struct is renamed to of_drconf_cell_v1 in
anticipation of upcoming support for version 2 of the dynamic
reconfiguration property and the members are typed as __be*
values to reflect how they exist in the device tree.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:26:29 +11:00
Nathan Fontenot 6195a5001f powerpc/pseries: Update memory hotplug code to use drmem LMB array
Update the pseries memory hotplug code to use the newly added
dynamic reconfiguration LMB array. Doing this is required for the
upcoming support of version 2 of the dynamic reconfiguration
device tree property.

In addition, making this change cleans up the code that parses the
LMB information as we no longer need to worry about device tree
format. This allows us to discard one of the first steps on memory
hotplug where we make a working copy of the device tree property and
convert the entire property to cpu format. Instead we just use the
LMB array directly while holding the memory hotplug lock.

This patch also moves the updating of the device tree property to
powerpc/mm/drmem.c. This allows to the hotplug code to work without
needing to know the device tree format and provides a single
routine for updating the device tree property. This new routine
will handle determination of the proper device tree format and
generate a properly formatted device tree property.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:26:28 +11:00
Nathan Fontenot 514a9cb331 powerpc/numa: Update numa code use walk_drmem_lmbs
Update code in powerpc/numa.c to use the walk_drmem_lmbs()
routine instead of parsing the device tree directly. This is
in anticipation of introducing a new ibm,dynamic-memory-v2
property with a different format. This will allow the numa code
to use a single initialization routine per-LMB irregardless of
the device tree format.

Additionally, to support additional routines in numa.c that need
to look up LMB information, an late_init routine is added to drmem.c
to allocate the array of LMB information. This LMB array will provide
per-LMB information to separate the LMB data from the device tree
format.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:26:28 +11:00
Nathan Fontenot 6c6ea53725 powerpc/mm: Separate ibm, dynamic-memory data from DT format
We currently have code to parse the dynamic reconfiguration LMB
information from the ibm,dynamic-meory device tree property in
multiple locations; numa.c, prom.c, and pseries/hotplug-memory.c.
In anticipation of adding support for a version 2 of the
ibm,dynamic-memory property this patch aims to separate the device
tree information from the device tree format.

Doing this requires a two step process to avoid a possibly very large
bootmem allocation early in boot. During initial boot, new routines
are provided to walk the device tree property and make a call-back
for each LMB.

The second step (introduced in later patches) will allocate an
array of LMB information that can be used directly without needing
to know the DT format.

This approach provides the benefit of consolidating the device tree
property parsing to a single location and (eventually) providing
a common data structure for retrieving LMB information.

This patch introduces a routine to walk the ibm,dynamic-memory
property in the flattened device tree and updates the prom.c code
to use this to initialize memory.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 23:26:27 +11:00
Nathan Fontenot b88fc309d6 powerpc/numa: Look up associativity array in of_drconf_to_nid_single
Look up the associativity arrays in of_drconf_to_nid_single when
deriving the nid for a LMB instead of having it passed in as a
parameter.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 18:23:48 +11:00
Nathan Fontenot 22508f3dc9 powerpc/numa: Look up device node in of_get_usable_memory()
Look up the device node for the usable memory property instead
of having it passed in as a parameter. This changes precedes an update
in which the calling routines for of_get_usable_memory() will not have
the device node pointer to pass in.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 18:23:48 +11:00
Nathan Fontenot 35f80debae powerpc/numa: Look up device node in of_get_assoc_arrays()
Look up the device node for the associativity array property instead
of having it passed in as a parameter. This changes precedes an update
in which the calling routines for of_get_assoc_arrays() will not have
the device node pointer to pass in.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16 18:23:47 +11:00
Aneesh Kumar K.V 5769beaf18 powerpc/mm: Add proper pte access check helper for other platforms
pte_access_premitted get called in get_user_pages_fast path. If we
have marked the pte PROT_NONE, we should not allow a read access on
the address. With the current implementation we are not checking the
READ and only check for WRITE. This is needed on archs like ppc64 that
implement PROT_NONE using _PAGE_USER access instead of _PAGE_PRESENT.
Also add pte_user check just to make sure we are not accessing kernel
mapping.

Even though there is code duplication, keeping the low level pte
accessors different for different platforms helps in code readability.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-22 22:28:40 +11:00
Aneesh Kumar K.V f72a85e347 powerpc/mm/book3s/64: Add proper pte access check helper
pte_access_premitted get called in get_user_pages_fast path. If we
have marked the pte PROT_NONE, we should not allow a read access on
the address. With the current implementation we are not checking the
READ and only check for WRITE. This is needed on archs like ppc64 that
implement PROT_NONE using RWX access instead of _PAGE_PRESENT. Also
add pte_user check just to make sure we are not accessing kernel
mapping.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-22 22:28:35 +11:00
Aneesh Kumar K.V 5fa5b16be5 powerpc/mm/hugetlb: Use pte_access_permitted for hugetlb access check
No functional change in this patch. This update gup_hugepte to use the
helper. This will help later when we add memory keys.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-22 22:28:31 +11:00
Ram Pai 7e4363550c powerpc: capture the PTE format changes in the dump pte report
The H_PAGE_F_SECOND,H_PAGE_F_GIX are not in the 64K main-PTE.
capture these changes in the dump pte report.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-20 18:57:26 +11:00
Ram Pai a854868646 powerpc: use helper functions to get and set hash slots
replace redundant code in __hash_page_4K() and flush_hash_page()
with helper functions pte_get_hash_gslot() and pte_set_hidx()

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-20 18:57:25 +11:00
Ram Pai 273b493689 powerpc: Swizzle around 4K PTE bits to free up bit 5 and bit 6
We need PTE bits 3 ,4, 5, 6 and 57 to support protection-keys,
because these are the bits we want to consolidate on across all
configuration to support protection keys.

Bit 3,4,5 and 6 are currently used on 4K-pte kernels. But bit 9
and 10 are available. Hence we use the two available bits and
free up bit 5 and 6. We will still not be able to free up bit 3
and 4. In the absence of any other free bits, we will have to
stay satisfied with what we have :-(. This means we will not
be able to support 32 protection keys, but only 8. The bit
numbers are big-endian as defined in the ISA3.0

This patch does the following change to 4K PTE.

H_PAGE_F_SECOND (S) which occupied bit 4 moves to bit 7.
H_PAGE_F_GIX (G,I,X) which occupied bit 5, 6 and 7 also moves
to bit 8,9, 10 respectively.
H_PAGE_HASHPTE (H) which occupied bit 8 moves to bit 4.

Before the patch, the 4k PTE format was as follows

 0 1 2 3 4  5  6  7  8 9 10....................57.....63
 : : : : :  :  :  :  : : :                      :     :
 v v v v v  v  v  v  v v v                      v     v
,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-,
|x|x|x|B|S |G |I |X |H| | |x|x|................| |x|x|x|
'_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_'

After the patch, the 4k PTE format is as follows

 0 1 2 3 4  5  6  7  8 9 10....................57.....63
 : : : : :  :  :  :  : : :                      :     :
 v v v v v  v  v  v  v v v                      v     v
,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-,
|x|x|x|B|H |  |  |S |G|I|X|x|x|................| |.|.|.|
'_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_'

The patch has no code changes; just swizzles around bits.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-20 18:57:24 +11:00
Ram Pai 7b84947cad powerpc: shifted-by-one hidx value
0xf is considered invalid hidx value. It indicates absence of a backing
HPTE. A PTE is initialized to 0xf either
a) when it is new it is newly allocated to hold 4k-backing-HPTE
	or
b) Any time it gets demoted to a 4k-backing-HPTE

This patch shifts the representation by one-modulo-0xf; i.e hidx 0 is
represented as 1, 1 as 2,... , and 0xf as 0. This convention lets us
initialize the secondary-part of the PTE to all zeroes. PTEs are anyway
zero'd when allocated. We do not have to zero them again; thus saving on
the initialization.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-20 18:57:23 +11:00
Ram Pai bf9a95f9a6 powerpc: Free up four 64K PTE bits in 64K backed HPTE pages
Rearrange 64K PTE bits to free up bits 3, 4, 5 and 6
in the 64K backed HPTE pages. This along with the earlier
patch will entirely free up the four bits from 64K PTE.
The bit numbers are big-endian as defined in the ISA3.0

This patch does the following change to 64K PTE backed
by 64K HPTE.

H_PAGE_F_SECOND (S) which occupied bit 4 moves to the
	second part of the pte to bit 60.
H_PAGE_F_GIX (G,I,X) which occupied bit 5, 6 and 7 also
	moves to the second part of the pte to bit 61,
 	62, 63, 64 respectively

since bit 7 is now freed up, we move H_PAGE_BUSY (B) from
bit 9 to bit 7.

The second part of the PTE will hold
(H_PAGE_F_SECOND|H_PAGE_F_GIX) at bit 60,61,62,63.
NOTE: None of the bits in the secondary PTE were not used
by 64k-HPTE backed PTE.

Before the patch, the 64K HPTE backed 64k PTE format was
as follows

 0 1 2 3 4  5  6  7  8 9 10...........................63
 : : : : :  :  :  :  : : :                            :
 v v v v v  v  v  v  v v v                            v

,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-,
|x|x|x| |S |G |I |X |x|B| |x|x|................|x|x|x|x| <- primary pte
'_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_'
| | | | |  |  |  |  | | | | |..................| | | | | <- secondary pte
'_'_'_'_'__'__'__'__'_'_'_'_'__________________'_'_'_'_'

After the patch, the 64k HPTE backed 64k PTE format is
as follows

 0 1 2 3 4  5  6  7  8 9 10...........................63
 : : : : :  :  :  :  : : :                            :
 v v v v v  v  v  v  v v v                            v

,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-,
|x|x|x| |  |  |  |B |x| | |x|x|................|.|.|.|.| <- primary pte
'_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_'
| | | | |  |  |  |  | | | | |..................|S|G|I|X| <- secondary pte
'_'_'_'_'__'__'__'__'_'_'_'_'__________________'_'_'_'_'

The above PTE changes is applicable to hugetlbpages aswell.

The patch does the following code changes:

a) moves the H_PAGE_F_SECOND and H_PAGE_F_GIX to 4k PTE
	header since it is no more needed b the 64k PTEs.
b) abstracts out __real_pte() and __rpte_to_hidx() so the
	caller need not know the bit location of the slot.
c) moves the slot bits to the secondary pte.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-20 18:57:22 +11:00
Ram Pai 9d2edb1848 powerpc: Free up four 64K PTE bits in 4K backed HPTE pages
Rearrange 64K PTE bits to free up bits 3, 4, 5 and 6,
in the 4K backed HPTE pages.These bits continue to be used
for 64K backed HPTE pages in this patch, but will be freed
up in the next patch. The bit numbers are big-endian as
defined in the ISA3.0

The patch does the following change to the 4k HTPE backed
64K PTE's format.

H_PAGE_BUSY moves from bit 3 to bit 9 (B bit in the figure
		below)
V0 which occupied bit 4 is not used anymore.
V1 which occupied bit 5 is not used anymore.
V2 which occupied bit 6 is not used anymore.
V3 which occupied bit 7 is not used anymore.

Before the patch, the 4k backed 64k PTE format was as follows

 0 1 2 3 4  5  6  7  8 9 10...........................63
 : : : : :  :  :  :  : : :                            :
 v v v v v  v  v  v  v v v                            v

,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-,
|x|x|x|B|V0|V1|V2|V3|x| | |x|x|................|x|x|x|x| <- primary pte
'_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_'
|S|G|I|X|S |G |I |X |S|G|I|X|..................|S|G|I|X| <- secondary pte
'_'_'_'_'__'__'__'__'_'_'_'_'__________________'_'_'_'_'

After the patch, the 4k backed 64k PTE format is as follows

 0 1 2 3 4  5  6  7  8 9 10...........................63
 : : : : :  :  :  :  : : :                            :
 v v v v v  v  v  v  v v v                            v

,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-,
|x|x|x| |  |  |  |  |x|B| |x|x|................|.|.|.|.| <- primary pte
'_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_'
|S|G|I|X|S |G |I |X |S|G|I|X|..................|S|G|I|X| <- secondary pte
'_'_'_'_'__'__'__'__'_'_'_'_'__________________'_'_'_'_'

the four bits S,G,I,X (one quadruplet per 4k HPTE) that
cache the hash-bucket slot value, is initialized to
1,1,1,1 indicating -- an invalid slot. If a HPTE gets
cached in a 1111 slot(i.e 7th slot of secondary hash
bucket), it is released immediately. In other words,
even though 1111 is a valid slot value in the hash
bucket, we consider it invalid and release the slot and
the HPTE. This gives us the opportunity to determine
the validity of S,G,I,X bits based on its contents and
not on any of the bits V0,V1,V2 or V3 in the primary PTE

When we release a HPTE cached in the 1111 slot
we also release a legitimate slot in the primary
hash bucket and unmap its corresponding HPTE. This
is to ensure that we do get a HPTE cached in a slot
of the primary hash bucket, the next time we retry.

Though treating 1111 slot as invalid, reduces the
number of available slots in the hash bucket and may
have an effect on the performance, the probabilty of
hitting a 1111 slot is extermely low.

Compared to the current scheme, the above scheme
reduces the number of false hash table updates
significantly and has the added advantage of releasing
four valuable PTE bits for other purpose.

NOTE:even though bits 3, 4, 5, 6, 7 are not used when
the 64K PTE is backed by 4k HPTE, they continue to be
used if the PTE gets backed by 64k HPTE. The next
patch will decouple that aswell, and truely release the
bits.

This idea was jointly developed by Paul Mackerras,
Aneesh, Michael Ellermen and myself.

4K PTE format remains unchanged currently.

The patch does the following code changes
a) PTE flags are split between 64k and 4k header files.
b) __hash_page_4K() is reimplemented to reflect the
 above logic.

Acked-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-20 18:57:20 +11:00
Ram Pai 318995b4f5 powerpc: introduce pte_get_hash_gslot() helper
Introduce pte_get_hash_gslot()() which returns the global slot number of
the HPTE in the global hash table.

This function will come in handy as we work towards re-arranging the PTE
bits in the later patches.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-20 18:57:19 +11:00
Ram Pai 59aa31fd6f powerpc: introduce pte_set_hidx() helper
Introduce pte_set_hidx().It sets the (H_PAGE_F_SECOND|H_PAGE_F_GIX) bits
at the appropriate location in the PTE of 4K PTE. For 64K PTE, it sets
the bits in the second part of the PTE. Though the implementation for
the former just needs the slot parameter, it does take some additional
parameters to keep the prototype consistent.

This function will be handy as we work towards re-arranging the bits in
the subsequent patches.

Acked-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-20 18:57:19 +11:00
Bryant G. Ly 608c0d8804 PCI/IOV: Add pci_vf_drivers_autoprobe() interface
Add a pci_vf_drivers_autoprobe() interface. Setting autoprobe to false
on the PF prevents drivers from binding to VFs when they are enabled.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:36 +11:00
Bryant G. Ly dae7253f9f powerpc/pseries: Add pseries SR-IOV Machine dependent calls
Add calls for pseries platform to configure/deconfigure SR-IOV.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@us.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:36 +11:00
Bryant G. Ly 988fc3ba56 powerpc/pci: Separate SR-IOV Calls
SR-IOV can now be enabled for the powernv platform and pseries
platform. Therefore move the appropriate calls to machine dependent
code instead of relying on definition at compile time.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@us.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:35 +11:00
Alan Modra 5c45b52801 powerpc/modules: Fix alignment of .toc section in kernel modules
powerpc64 gcc can generate code that offsets an address, to access
part of an object in memory. If the address is a -mcmodel=medium toc
pointer relative address then code like the following is possible.

  addis r9,r2,var@toc@ha
  ld r3,var@toc@l(r9)
  ld r4,(var+8)@toc@l(r9)

This works fine so long as var is naturally aligned, *and* r2 is
sufficiently aligned. If not, there is a possibility that the offset
added to access var+8 wraps over a n*64k+32k boundary. Modules don't
have any guarantee that r2 is sufficiently aligned. Moreover, code
generated by older compilers generates a .toc section with 2**0
alignment, which can result in relocation failures at module load time
even without the wrap problem.

Thus, this patch links modules with an aligned .toc section (Makefile
and module.lds changes), and forces alignment for out of tree modules
or those without a .toc section (module_64.c changes).

Signed-off-by: Alan Modra <amodra@gmail.com>
[desnesn: updated patch to apply to powerpc-next kernel v4.15 ]
Signed-off-by: Desnes A. Nunes do Rosario <desnesn@linux.vnet.ibm.com>
[mpe: Fix out-of-tree build, swap -256 for ~0xff, reflow comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:35 +11:00
Nicholas Piggin acb1feab32 powerpc/64: Don't trace irqs-off at interrupt return to soft-disabled context
When an interrupt is returning to a soft-disabled context (which can
happen for non-maskable interrupts or synchronous interrupts), it goes
through the motions of soft-disabling again, including calling
TRACE_DISABLE_INTS (i.e., trace_hardirqs_off()).

This is not necessary, because we must already be soft-disabled in the
interrupt context, it also may be causing crashes in the irq tracing
code to re-enter as an nmi. Replace it with a warning to ensure that
soft-interrupts are still disabled.

Fixes: 7c0482e3d0 ("powerpc/irq: Fix another case of lazy IRQ state getting out of sync")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:34 +11:00
Ivan Mikhaylov 7813043e1b powerpc/44x/fsp2: Add irq error handlers
Add irq error handlers for cmu, plb, opb, mcue, conf
with debug information output in case of problems.

Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:34 +11:00
Ivan Mikhaylov 50f01c57d7 powerpc/44x/fsp2: tvsense workaround for dd1
TVSENSE(temperature and voltage sensors) reset is blocked (clock gated)
by the POR default of the TVS sleep config bit. As a consequence,
TVSENSE will provide erratic sensor values, which may result in
spurious (parity) errors recorded in the CMU FIR and leading to
erroneous interrupt requests once the CMU interrupt is unmasked.
Purpose of this to set up CMU in working state in any cases even
in case of parity errors.

Reviewed-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:33 +11:00
Ivan Mikhaylov 9c4c374676 powerpc/44x/fsp2: Interrupt handling setup
* clear out any possible plb6 errors
* board interrupt handling setup within l2 reg set
* fsp2 parity error setup

All those points are needed for correct interrupt
handling on board level including error handling report.

Reviewed-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:32 +11:00
Ivan Mikhaylov 494d82ceae powerpc/44x/fsp2: Add fsp2 headers
Add cmu, plbX, l2, ddr3/4, crcs register definitions.
Add mfcmu, mtcmu functions for indirect access to cmu.
Add mtl2, mfl2 same for l2 cache core reg set.

Signed-off-by: Ivan Mikhaylov <ivan@de.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:32 +11:00
Nicholas Piggin a10726075d powerpc/32: Add .data.rel* sections explicitly
Match powerpc/64 and include .data.rel* input sections in the .data output
section explicitly.

This solves the warning:

powerpc-linux-gnu-ld: warning: orphan section `.data.rel.ro' from `arch/powerpc/kernel/head_44x.o' being placed in section `.data.rel.ro'.

Link: https://lists.01.org/pipermail/kbuild-all/2017-November/040010.html
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:31 +11:00
Michael Ellerman c3bb690dc6 powerpc/boot: Only build CPM code when necessary
As far as I can tell CONFIG_CPM is the right symbol to use to
conditionally compile the cpm-serial.c code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:31 +11:00
Michael Ellerman f8e8e69cea powerpc/boot: Only build OPAL code when necessary
Only build the OPAL console code in when necessary. This looks like it
should use CONFIG_PPC_POWERNV, but because the opal-call.S code is
64-bit only, we must only build it when we're building the boot
wrapper 64-bit.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:30 +11:00
Michael Ellerman 3d6bf693d8 powerpc/boot: Only build uartlite if XILINX_VIRTEX=y
The serial code in uartlite.c only matches if we find one of two
Xilinx (xlnx) nodes in the device tree, there's no need to build or
link the code on other platforms.

As far as I can tell CONFIG_XILINX_VIRTEX is the appropriate symbol to
use to conditionally compile the code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:30 +11:00
Josh Poimboeuf 1ea61ea239 powerpc/modules: Improve restore_r2() error message
Print the function address associated with the restore_r2() error to
make it easier to debug the problem.

Also clarify the wording a bit.

Before:

  module_64: patch_foo: Expect noop after relocate, got 3c820000

After:

  module_64: patch_foo: Expected nop after call, got 7c630034 at netdev_has_upper_dev+0x54/0xb0 [patch_foo]

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
[mpe: Change noop to nop, as that's the name of the instruction]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:29 +11:00
Josh Poimboeuf b9eab08d01 powerpc/modules: Don't try to restore r2 after a sibling call
When attempting to load a livepatch module, I got the following error:

  module_64: patch_module: Expect noop after relocate, got 3c820000

The error was triggered by the following code in
unregister_netdevice_queue():

  14c:   00 00 00 48     b       14c <unregister_netdevice_queue+0x14c>
                         14c: R_PPC64_REL24      net_set_todo
  150:   00 00 82 3c     addis   r4,r2,0

GCC didn't insert a nop after the branch to net_set_todo() because it's
a sibling call, so it never returns.  The nop isn't needed after the
branch in that case.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-and-tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:29 +11:00
Kamalesh Babulal a443bf6e8a powerpc/modules: Add REL24 relocation support of livepatch symbols
Livepatch re-uses module loader function apply_relocate_add() to write
relocations, instead of managing them by arch-dependent
klp_write_module_reloc() function.

apply_relocate_add() doesn't understand livepatch symbols (marked with
SHN_LIVEPATCH symbol section index) and assumes them to be local
symbols by default for R_PPC64_REL24 relocation type. It fails with an
error, when trying to calculate offset with local_entry_offset():

  module_64: kpatch_meminfo: REL24 -1152921504897399800 out of range!

Whereas livepatch symbols are essentially SHN_UNDEF, should be called
via stub used for global calls. This issue can be fixed by teaching
apply_relocate_add() to handle both SHN_UNDEF/SHN_LIVEPATCH symbols
via the same stub. This patch extends SHN_UNDEF code to handle
livepatch symbols too.

Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:28 +11:00
Benjamin Herrenschmidt 5138b31422 powerpc: Reduce log level of "OPAL detected !" message
This message isn't terribly useful.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:28 +11:00
Benjamin Herrenschmidt 10de741fd3 powerpc: Remove DEBUG define in 64-bit early setup code
This statement causes some not very useful messages to always
be printed on the serial port at boot, even on quiet boots.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:27 +11:00
Santosh Sivaraj 5c929885f1 powerpc/vdso64: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE
Current vDSO64 implementation does not have support for coarse clocks
(CLOCK_MONOTONIC_COARSE, CLOCK_REALTIME_COARSE), for which it falls back
to system call, increasing the response time, vDSO implementation reduces
the cycle time. Below is a benchmark of the difference in execution times.

(Non-coarse clocks are also included just for completion)

clock-gettime-realtime: syscall: 172 nsec/call
clock-gettime-realtime:    libc: 28 nsec/call
clock-gettime-realtime:    vdso: 22 nsec/call
clock-gettime-monotonic: syscall: 171 nsec/call
clock-gettime-monotonic:    libc: 30 nsec/call
clock-gettime-monotonic:    vdso: 25 nsec/call
clock-gettime-realtime-coarse: syscall: 153 nsec/call
clock-gettime-realtime-coarse:    libc: 16 nsec/call
clock-gettime-realtime-coarse:    vdso: 10 nsec/call
clock-gettime-monotonic-coarse: syscall: 167 nsec/call
clock-gettime-monotonic-coarse:    libc: 17 nsec/call
clock-gettime-monotonic-coarse:    vdso: 11 nsec/call

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-04 15:01:09 +11:00
Dmitry Torokhov 0aa8ff9b76 powerpc: Use of for_each_node_by_name() instead of open-coding it
Instead of manually coding the loop with of_find_node_by_name(), let's
switch to the standard macro for iterating over nodes with given name.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
[mpe: Fix build failures due to typo in mpc832x_mds.c]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-04 13:10:59 +11:00
Dmitry Torokhov df26200299 powerpc/powermac: drop useless call to of_find_node_by_name
We are not using result, so this simply results in a leaked refcount.

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-04 11:54:34 +11:00
Dmitry Torokhov 2aaf0a198d powerpc/powermac: fix OF node refcount leak
We need to call of_node_put() for device nodes obtained with
of_find_node_by_name().

Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-04 11:54:34 +11:00
Joe Perches f2c2cbcc35 powerpc: Use pr_warn instead of pr_warning
At some point, pr_warning will be removed so all logging messages use
a consistent <prefix>_warn style.

Update arch/powerpc/

Miscellanea:

o Coalesce formats
o Realign arguments
o Use %s, __func__ instead of embedded function names
o Remove unnecessary line continuations

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Geoff Levand <geoff@infradead.org>
[mpe: Rebase due to some %pOF changes.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-04 11:54:34 +11:00
Linus Torvalds a0651c7fa2 powerpc fixes for 4.15 #3
Two fixes for nasty kexec/kdump crashes in certain configurations.
 
 A couple of minor fixes for the new TIDR code.
 
 A fix for an oops in a CXL error handling path.
 
 Thanks to:
   Andrew Donnellan, Christophe Lombard, David Gibson, Mahesh Salgaonkar, Vaibhav Jain.
 -----BEGIN PGP SIGNATURE-----
 
 iQIwBAABCAAaBQJaITuLExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYBS
 9A/+IYjSoDdTcJXqfR46xn2147RGBQIym2rqaBJd+WwZj8Br0Yap5hrPtr1zAilD
 75aj0CRR0Y91nodnfishjujsJZckyQYOHi0/WQluLbpRlWEmeQt47gDjz70Wt1T8
 BZUEVqPF2k6Mk5WJV6sSIHBtw2uKrl/lJZAUJbTobOWgsMdopO504MkFxvySWKMV
 AX7UEXrcxPLb/yVGk9Ih9iwXxm/ymvQrkljp4s3jWqkc7bWwN93CmimIQ+X6bop0
 yqmAzCiUJsPsulmkBkmsY78llPg0roUrh98R4JIe0+cUiQROa5Kvt/u0zohN/rqS
 6SkPT0ds2Fs1z5cHayyQWMN0j0A5sfwW2KRMLHCJjAwAxzoT2CdMZDv0+QLi0ETy
 RGtYvnew8eCqrfBpyBneEP1JySARJ85ML4rZvudewSHJoMzTkYDnSEKU8+wlqRIf
 KHdvHmErRMlF7OB6Om3Uxz6oIXan/Puj7HsdL8f7MazjFPqb/r+/AuTDzUov17Fs
 7Y0qVawFyJyAJ8zkUAGB1kN2FN+eYnsFxUa7ubpeJY7VX+8pUOwT24rFc803eAu4
 p/ad1CpBy+8xaq83WeaM6BpMqW80ao2BzzbQUhDcEQl4qovO/ZEZxQt0ySoQp+SY
 MqE8SnZMkL/30CasbKTAqmt+P44GCSYZVtOUwTmvLNMTjSg=
 =/LD/
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Two fixes for nasty kexec/kdump crashes in certain configurations.

  A couple of minor fixes for the new TIDR code.

  A fix for an oops in a CXL error handling path.

  Thanks to: Andrew Donnellan, Christophe Lombard, David Gibson, Mahesh
  Salgaonkar, Vaibhav Jain"

* tag 'powerpc-4.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: Do not assign thread.tidr if already assigned
  powerpc: Avoid signed to unsigned conversion in set_thread_tidr()
  powerpc/kexec: Fix kexec/kdump in P9 guest kernels
  powerpc/powernv: Fix kexec crashes caused by tlbie tracing
  cxl: Check if vphb exists before iterating over AFU devices
2017-12-01 08:40:17 -05:00
Linus Torvalds 9e0600f5cf * x86 bugfixes: APIC, nested virtualization, IOAPIC
* PPC bugfix: HPT guests on a POWER9 radix host
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJaICi1AAoJEL/70l94x66DjvEIAIML/e9YX1YrJZi0rsB9cbm0
 Le3o5b3wKxPrlZdnpOZQ2mVWubUQdiHMPGX6BkpgyiJWUchnbj5ql1gUf5S0i3jk
 TOk6nae6DU94xBuboeqZJlmx2VfPY/fqzLWsX3HFHpnzRl4XvXL5o7cWguIxVcVO
 yU6bPgbAXyXSBennLWZxC3aQ2Ojikr3uxZQpUZTAPOW5hFINpCKCpqJBMxsb67wq
 rwI0cJhRl92mHpbe8qeNJhavqY5eviy9iPUaZrOW9P4yw1uqjTAjgsUc1ydiaZSV
 rOHeKBOgVfY/KBaNJKyKySfuL1MJ+DLcQqm9RlGpKNpFIeB0vvSf0gtmmqIAXIk=
 =kh2y
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:

 - x86 bugfixes: APIC, nested virtualization, IOAPIC

 - PPC bugfix: HPT guests on a POWER9 radix host

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits)
  KVM: Let KVM_SET_SIGNAL_MASK work as advertised
  KVM: VMX: Fix vmx->nested freeing when no SMI handler
  KVM: VMX: Fix rflags cache during vCPU reset
  KVM: X86: Fix softlockup when get the current kvmclock
  KVM: lapic: Fixup LDR on load in x2apic
  KVM: lapic: Split out x2apic ldr calculation
  KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hosts
  KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIP
  KVM: x86: Fix CPUID function for word 6 (80000001_ECX)
  KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2
  KVM: x86: ioapic: Preserve read-only values in the redirection table
  KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered
  KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq
  KVM: x86: ioapic: Don't fire level irq when Remote IRR set
  KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
  KVM: x86: inject exceptions produced by x86_decode_insn
  KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs
  KVM: x86: fix em_fxstor() sleeping while in atomic
  KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure
  KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry
  ...
2017-11-30 08:15:19 -08:00
Dan Williams e4e40e0263 mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITE
In response to compile breakage introduced by a series that added the
pud_write helper to x86, Stephen notes:

    did you consider using the other paradigm:

    In arch include files:
    #define pud_write       pud_write
    static inline int pud_write(pud_t pud)
     .....

    Then in include/asm-generic/pgtable.h:

    #ifndef pud_write
    tatic inline int pud_write(pud_t pud)
    {
            ....
    }
    #endif

    If you had, then the powerpc code would have worked ... ;-) and many
    of the other interfaces in include/asm-generic/pgtable.h are
    protected that way ...

Given that some architecture already define pmd_write() as a macro, it's
a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE.

Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-29 18:40:42 -08:00
Vaibhav Jain 7e4d423326 powerpc: Do not assign thread.tidr if already assigned
If set_thread_tidr() is called twice for same task_struct then it will
allocate a new tidr value to it leaving the previous value still
dangling in the vas_thread_ida table.

To fix this the patch changes set_thread_tidr() to check if a tidr
value is already assigned to the task_struct and if yes then returns
zero.

Fixes: ec233ede4c86("powerpc: Add support for setting SPRN_TIDR")
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
[mpe: Modify to return 0 in the success case, not the TID value]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-29 19:56:18 +11:00
Vaibhav Jain aca7573fde powerpc: Avoid signed to unsigned conversion in set_thread_tidr()
There is an unsafe signed to unsigned conversion in set_thread_tidr()
that may cause an error value to be assigned to SPRN_TIDR register and
used as thread-id.

The issue happens as assign_thread_tidr() returns an int and
thread.tidr is an unsigned-long. So a negative error code returned
from assign_thread_tidr() will fail the error check and gets assigned
as tidr as a large positive value.

To fix this the patch assigns the return value of assign_thread_tidr()
to a temporary int and assigns it to thread.tidr iff its '> 0'.

The patch shouldn't impact the calling convention of set_thread_tidr()
i.e all -ve return-values are error codes and a return value of '0'
indicates success.

Fixes: ec233ede4c86("powerpc: Add support for setting SPRN_TIDR")
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Christophe Lombard clombard@linux.vnet.ibm.com
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-29 19:36:13 +11:00
Paolo Bonzini a63dd7480d PPC KVM fixes for 4.15
One commit here, that fixes a couple of bugs relating to the patch
 series that enables HPT guests to run on a radix host on POWER9
 systems.  This patch series went upstream in the 4.15 merge window,
 so no stable backport is required.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJaGKSnAAoJEJ2a6ncsY3GfF7MIANLLhznEMrWq8jw4g95WsJU1
 MkDGwp8kIdhIOM9HD6JRskoJZB5Mws2BWlQ5PSaVFxO6v6eUgNLaRb/UBxC1r7gU
 1f9/8corY4BNkezSdJqTL7Xgp13KjTU726OwYAqCPEyCSPEc9ciMyeIgyZuv2dPa
 Pju+u4tnA+9JJyskgNL+/ybOOZwVat91VmNUVRq29zP6+zo1tmIDxrQchy6Bqui/
 7Wg298G+yjAkJ8ktQu69ACk+0oEBGUOcLUlraqGSr9auR+b0nJ1PAGCDRaONdwgE
 +X+OE+t+UC6rU+coUXMwO+Id0X7HMdsLQd3066ODEtD55g8MIVZ126Wt8xDmj5o=
 =GSTh
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-fixes-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master

PPC KVM fixes for 4.15

One commit here, that fixes a couple of bugs relating to the patch
series that enables HPT guests to run on a radix host on POWER9
systems.  This patch series went upstream in the 4.15 merge window,
so no stable backport is required.
2017-11-27 17:54:13 +01:00
Jan H. Schönherr 20b7035c66 KVM: Let KVM_SET_SIGNAL_MASK work as advertised
KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that
"any unblocked signal received [...] will cause KVM_RUN to return with
-EINTR" and that "the signal will only be delivered if not blocked by
the original signal mask".

This, however, is only true, when the calling task has a signal handler
registered for a signal. If not, signal evaluation is short-circuited for
SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN
returning or the whole process is terminated.

Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar
to that in do_sigtimedwait() to avoid short-circuiting of signals.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-11-27 17:53:47 +01:00
Linus Torvalds 844056fd74 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:

 - The final conversion of timer wheel timers to timer_setup().

   A few manual conversions and a large coccinelle assisted sweep and
   the removal of the old initialization mechanisms and the related
   code.

 - Remove the now unused VSYSCALL update code

 - Fix permissions of /proc/timer_list. I still need to get rid of that
   file completely

 - Rename a misnomed clocksource function and remove a stale declaration

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  m68k/macboing: Fix missed timer callback assignment
  treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts
  timer: Remove redundant __setup_timer*() macros
  timer: Pass function down to initialization routines
  timer: Remove unused data arguments from macros
  timer: Switch callback prototype to take struct timer_list * argument
  timer: Pass timer_list pointer to callbacks unconditionally
  Coccinelle: Remove setup_timer.cocci
  timer: Remove setup_*timer() interface
  timer: Remove init_timer() interface
  treewide: setup_timer() -> timer_setup() (2 field)
  treewide: setup_timer() -> timer_setup()
  treewide: init_timer() -> setup_timer()
  treewide: Switch DEFINE_TIMER callbacks to struct timer_list *
  s390: cmm: Convert timers to use timer_setup()
  lightnvm: Convert timers to use timer_setup()
  drivers/net: cris: Convert timers to use timer_setup()
  drm/vc4: Convert timers to use timer_setup()
  block/laptop_mode: Convert timers to use timer_setup()
  net/atm/mpc: Avoid open-coded assignment of timer callback function
  ...
2017-11-25 08:37:16 -10:00
Linus Torvalds 83ada03196 powerpc fixes for 4.15 #2
A small batch of fixes, about 50% tagged for stable and the rest for recently
 merged code.
 
 There's one more fix for the >128T handling on hash. Once a process had
 requested a single mmap above 128T we would then always search above 128T. The
 correct behaviour is to consider the hint address in isolation for each mmap
 request.
 
 Then a couple of fixes for the IMC PMU, a missing EXPORT_SYMBOL in VAS, a fix
 for STRICT_KERNEL_RWX on 32-bit, and a fix to correctly identify P9 DD2.1 but in
 code that is currently not used by default.
 
 Thanks to:
   Aneesh Kumar K.V, Christophe Leroy, Madhavan Srinivasan, Sukadev Bhattiprolu.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJaF/VqAAoJEFHr6jzI4aWA994P/3NNXkSASJHjLrIlQAKXtmx9
 lrv1v+6MbPWhyB8Q8LVnnC3Ab2LTHnkccjq2Jw0bP0RQ86HF4mH7Sb7N5Wj0cG+M
 5NioikvGE057ncLfxVhesOK0C9Lhc7Zb+zphXZliYP76IGxwbxorJRepeZctVkyO
 KPMv4eaImdblVn71aoQQSlepON4+/rtiW2yo5u98uCqR+Ttds4J1fiDZ4TNrBYRP
 Ilh6DmA//CWvN+KsGT+brRd/PjEkxQKHyS8px3lxRl4cwCJucXPCik/Gn9t6OiMw
 3S6y1Mu8nrh4z+YepKv6APvl2DEwwXn8w9f85kn+QiE9Qp3Z/wckW9/4LT5FeuKE
 L8E3dKq2NzJ9oDs/20sVbBvVR7CUvBoyWytsXVkmmlC6sVReTrYAJ1UP9HnNvcF6
 be4zYUKusU83uG6saGgchRrPUrD31XKXw8Piv9EoWo1Uz7VgWCkxidclRNocgeDO
 k5VxYnRd9jPsv2pCzXH2YmuQAypGUh12IPTxEOnSt5uzXSXcamZJBLKp5fAJ/9dl
 jD6GlRQMX8JpNRJzxOBLly3CmwQBw2ekOuPLXI+M/ilks66AGK8lp4bg5cWwDGNe
 puzmRJ2mO3dnFlVUHBQ5LyX8ne4yunin1JZB1YQ4xm8yxZbGO2AdypEWMSkPKNPN
 fkrGPlwQ1JwFheMbHHLj
 =gv70
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "A small batch of fixes, about 50% tagged for stable and the rest for
  recently merged code.

  There's one more fix for the >128T handling on hash. Once a process
  had requested a single mmap above 128T we would then always search
  above 128T. The correct behaviour is to consider the hint address in
  isolation for each mmap request.

  Then a couple of fixes for the IMC PMU, a missing EXPORT_SYMBOL in
  VAS, a fix for STRICT_KERNEL_RWX on 32-bit, and a fix to correctly
  identify P9 DD2.1 but in code that is currently not used by default.

  Thanks to: Aneesh Kumar K.V, Christophe Leroy, Madhavan Srinivasan,
  Sukadev Bhattiprolu"

* tag 'powerpc-4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Fix Power9 DD2.1 logic in DT CPU features
  powerpc/perf: Fix IMC_MAX_PMU macro
  powerpc/perf: Fix pmu_count to count only nest imc pmus
  powerpc: Fix boot on BOOK3S_32 with CONFIG_STRICT_KERNEL_RWX
  powerpc/perf/imc: Use cpu_to_node() not topology_physical_package_id()
  powerpc/vas: Export chip_to_vas_id()
  powerpc/64s/slice: Use addr limit when computing slice mask
2017-11-24 19:40:12 -10:00
Michael Ellerman 2621e945fb powerpc/kexec: Fix kexec/kdump in P9 guest kernels
The code that cleans up the IAMR/AMOR before kexec'ing failed to
remember that when we're running as a guest AMOR is not writable, it's
hypervisor privileged.

They symptom is that the kexec stops before entering purgatory and
nothing else is seen on the console. If you examine the state of the
system all threads will be in the 0x700 program check handler.

Fix it by making the write to AMOR dependent on HV mode.

Fixes: 1e2a516e89 ("powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR")
Cc: stable@vger.kernel.org # v4.10+
Reported-by: Yilin Zhang <yilzhang@redhat.com>
Debugged-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-24 16:49:37 +11:00
Mahesh Salgaonkar a3961f824c powerpc/powernv: Fix kexec crashes caused by tlbie tracing
Rebooting into a new kernel with kexec fails in trace_tlbie() which is
called from native_hpte_clear(). This happens if the running kernel
has CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints
always execute few RCU checks regardless of whether tracing is on or
off. We are already in the last phase of kexec sequence in real mode
with HILE_BE set. At this point the RCU check ends up in
RCU_LOCKDEP_WARN and causes kexec to fail.

Fix this by not calling trace_tlbie() from native_hpte_clear().

mpe: It's not safe to call trace points at this point in the kexec
path, even if we could avoid the RCU checks/warnings. The only
solution is to not call them.

Fixes: 0428491cba ("powerpc/mm: Trace tlbie(l) instructions")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-23 23:10:14 +11:00
Paul Mackerras ded13fc11b KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hosts
This fixes two errors that prevent a guest using the HPT MMU from
successfully migrating to a POWER9 host in radix MMU mode, or resizing
its HPT when running on a radix host.

The first bug was that commit 8dc6cca556 ("KVM: PPC: Book3S HV:
Don't rely on host's page size information", 2017-09-11) missed two
uses of hpte_base_page_size(), one in the HPT rehashing code and
one in kvm_htab_write() (which is used on the destination side in
migrating a HPT guest).  Instead we use kvmppc_hpte_base_page_shift().
Having the shift count means that we can use left and right shifts
instead of multiplication and division in a few places.

Along the way, this adds a check in kvm_htab_write() to ensure that the
page size encoding in the incoming HPTEs is recognized, and if not
return an EINVAL error to userspace.

The second bug was that kvm_htab_write was performing some but not all
of the functions of kvmhv_setup_mmu(), resulting in the destination VM
being left in radix mode as far as the hardware is concerned.  The
simplest fix for now is make kvm_htab_write() call
kvmppc_setup_partition_table() like kvmppc_hv_setup_htab_rma() does.
In future it would be better to refactor the code more extensively
to remove the duplication.

Fixes: 8dc6cca556 ("KVM: PPC: Book3S HV: Don't rely on host's page size information")
Fixes: 7a84084c60 ("KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9")
Reported-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-11-23 14:17:54 +11:00
Michael Ellerman 4d6c51b107 powerpc/64s: Fix Power9 DD2.1 logic in DT CPU features
I got the logic wrong in the DT CPU features code when I added the
Power9 DD2.1 feature. We should be setting the bit if we detect a
DD2.1, not clearing it if we detect a DD2.0.

This code isn't actually exercised at the moment so nothing is
actually broken.

Fixes: 3ffa9d9e2a ("powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-22 23:17:01 +11:00
Madhavan Srinivasan 73ce9aec65 powerpc/perf: Fix IMC_MAX_PMU macro
IMC_MAX_PMU is used for static storage (per_nest_pmu_arr) which holds
nest pmu information. Current value for the macro is 32 based on
the initial number of nest pmu units supported by the nest microcode.
But going forward, microcode could support more nest units. Instead
of static storage, patch to fix the code to dynamically allocate an
array based on the number of nest imc units found in the device tree.

Fixes:8f95faaac56c1 ('powerpc/powernv: Detect and create IMC device')
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-22 23:04:23 +11:00
Madhavan Srinivasan de34787f10 powerpc/perf: Fix pmu_count to count only nest imc pmus
"pmu_count" in opal_imc_counters_probe() is intended to hold
the number of successful nest imc pmu registerations. But
current code also counts other imc units like core_imc and
thread_imc. Patch add a check to count only nest imc pmus.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-22 23:04:23 +11:00
Christophe Leroy 252eb55816 powerpc: Fix boot on BOOK3S_32 with CONFIG_STRICT_KERNEL_RWX
On powerpc32, patch_instruction() is called by apply_feature_fixups()
which is called from early_init()

There is the following note in front of early_init():
 * Note that the kernel may be running at an address which is different
 * from the address that it was linked at, so we must use RELOC/PTRRELOC
 * to access static data (including strings).  -- paulus

Therefore, slab_is_available() cannot be called yet, and
text_poke_area must be addressed with PTRRELOC()

Fixes: 95902e6c88 ("powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32")
Cc: stable@vger.kernel.org # v4.14+
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-22 23:04:20 +11:00
Michael Ellerman f3f1dfd600 powerpc/perf/imc: Use cpu_to_node() not topology_physical_package_id()
init_imc_pmu() uses topology_physical_package_id() to detect the
node id of the processor it is on to get local memory, but that's
wrong, and can lead to crashes. Fix it to use cpu_to_node().

Fixes: 885dcd709b ("powerpc/perf: Add nest IMC PMU support")
Cc: stable@vger.kernel.org # v4.14+
Reported-By: Rob Lippert <rlippert@google.com>
Tested-By: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-22 12:24:46 +11:00
Kees Cook 86cb30ec07 treewide: setup_timer() -> timer_setup() (2 field)
This converts all remaining setup_timer() calls that use a nested field
to reach a struct timer_list. Coccinelle does not have an easy way to
match multiple fields, so a new script is needed to change the matches of
"&_E->_timer" into "&_E->_field1._timer" in all the rules.

spatch --very-quiet --all-includes --include-headers \
	-I ./arch/x86/include -I ./arch/x86/include/generated \
	-I ./include -I ./arch/x86/include/uapi \
	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
	--dir . \
	--cocci-file ~/src/data/timer_setup-2fields.cocci

@fix_address_of depends@
expression e;
@@

 setup_timer(
-&(e)
+&e
 , ...)

// Update any raw setup_timer() usages that have a NULL callback, but
// would otherwise match change_timer_function_usage, since the latter
// will update all function assignments done in the face of a NULL
// function initialization in setup_timer().
@change_timer_function_usage_NULL@
expression _E;
identifier _field1;
identifier _timer;
type _cast_data;
@@

(
-setup_timer(&_E->_field1._timer, NULL, _E);
+timer_setup(&_E->_field1._timer, NULL, 0);
|
-setup_timer(&_E->_field1._timer, NULL, (_cast_data)_E);
+timer_setup(&_E->_field1._timer, NULL, 0);
|
-setup_timer(&_E._field1._timer, NULL, &_E);
+timer_setup(&_E._field1._timer, NULL, 0);
|
-setup_timer(&_E._field1._timer, NULL, (_cast_data)&_E);
+timer_setup(&_E._field1._timer, NULL, 0);
)

@change_timer_function_usage@
expression _E;
identifier _field1;
identifier _timer;
struct timer_list _stl;
identifier _callback;
type _cast_func, _cast_data;
@@

(
-setup_timer(&_E->_field1._timer, _callback, _E);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, &_callback, _E);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, _callback, (_cast_data)_E);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, &_callback, (_cast_data)_E);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, (_cast_func)_callback, _E);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, (_cast_func)&_callback, _E);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, _callback, (_cast_data)_E);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, _callback, (_cast_data)&_E);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, &_callback, (_cast_data)_E);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, &_callback, (_cast_data)&_E);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, (_cast_func)_callback, (_cast_data)&_E);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, (_cast_func)&_callback, (_cast_data)&_E);
+timer_setup(&_E._field1._timer, _callback, 0);
|
 _E->_field1._timer@_stl.function = _callback;
|
 _E->_field1._timer@_stl.function = &_callback;
|
 _E->_field1._timer@_stl.function = (_cast_func)_callback;
|
 _E->_field1._timer@_stl.function = (_cast_func)&_callback;
|
 _E._field1._timer@_stl.function = _callback;
|
 _E._field1._timer@_stl.function = &_callback;
|
 _E._field1._timer@_stl.function = (_cast_func)_callback;
|
 _E._field1._timer@_stl.function = (_cast_func)&_callback;
)

// callback(unsigned long arg)
@change_callback_handle_cast
 depends on change_timer_function_usage@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._field1;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
identifier _handle;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *t
 )
 {
(
	... when != _origarg
	_handletype *_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _field1._timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle =
-(void *)_origarg;
+from_timer(_handle, t, _field1._timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle;
	... when != _handle
	_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _field1._timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle;
	... when != _handle
	_handle =
-(void *)_origarg;
+from_timer(_handle, t, _field1._timer);
	... when != _origarg
)
 }

// callback(unsigned long arg) without existing variable
@change_callback_handle_cast_no_arg
 depends on change_timer_function_usage &&
                     !change_callback_handle_cast@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._field1;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *t
 )
 {
+	_handletype *_origarg = from_timer(_origarg, t, _field1._timer);
+
	... when != _origarg
-	(_handletype *)_origarg
+	_origarg
	... when != _origarg
 }

// Avoid already converted callbacks.
@match_callback_converted
 depends on change_timer_function_usage &&
            !change_callback_handle_cast &&
	    !change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier t;
@@

 void _callback(struct timer_list *t)
 { ... }

// callback(struct something *handle)
@change_callback_handle_arg
 depends on change_timer_function_usage &&
	    !match_callback_converted &&
            !change_callback_handle_cast &&
            !change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._field1;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
@@

 void _callback(
-_handletype *_handle
+struct timer_list *t
 )
 {
+	_handletype *_handle = from_timer(_handle, t, _field1._timer);
	...
 }

// If change_callback_handle_arg ran on an empty function, remove
// the added handler.
@unchange_callback_handle_arg
 depends on change_timer_function_usage &&
	    change_callback_handle_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._field1;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
identifier t;
@@

 void _callback(struct timer_list *t)
 {
-	_handletype *_handle = from_timer(_handle, t, _field1._timer);
 }

// We only want to refactor the setup_timer() data argument if we've found
// the matching callback. This undoes changes in change_timer_function_usage.
@unchange_timer_function_usage
 depends on change_timer_function_usage &&
            !change_callback_handle_cast &&
            !change_callback_handle_cast_no_arg &&
	    !change_callback_handle_arg@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._field1;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type change_timer_function_usage._cast_data;
@@

(
-timer_setup(&_E->_field1._timer, _callback, 0);
+setup_timer(&_E->_field1._timer, _callback, (_cast_data)_E);
|
-timer_setup(&_E._field1._timer, _callback, 0);
+setup_timer(&_E._field1._timer, _callback, (_cast_data)&_E);
)

// If we fixed a callback from a .function assignment, fix the
// assignment cast now.
@change_timer_function_assignment
 depends on change_timer_function_usage &&
            (change_callback_handle_cast ||
             change_callback_handle_cast_no_arg ||
             change_callback_handle_arg)@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._field1;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_func;
typedef TIMER_FUNC_TYPE;
@@

(
 _E->_field1._timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_field1._timer.function =
-&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_field1._timer.function =
-(_cast_func)_callback;
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_field1._timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._field1._timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._field1._timer.function =
-&_callback;
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._field1._timer.function =
-(_cast_func)_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._field1._timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
)

// Sometimes timer functions are called directly. Replace matched args.
@change_timer_function_calls
 depends on change_timer_function_usage &&
            (change_callback_handle_cast ||
             change_callback_handle_cast_no_arg ||
             change_callback_handle_arg)@
expression _E;
identifier change_timer_function_usage._field1;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_data;
@@

 _callback(
(
-(_cast_data)_E
+&_E->_field1._timer
|
-(_cast_data)&_E
+&_E._field1._timer
|
-_E
+&_E->_field1._timer
)
 )

// If a timer has been configured without a data argument, it can be
// converted without regard to the callback argument, since it is unused.
@match_timer_function_unused_data@
expression _E;
identifier _field1;
identifier _timer;
identifier _callback;
@@

(
-setup_timer(&_E->_field1._timer, _callback, 0);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, _callback, 0L);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E->_field1._timer, _callback, 0UL);
+timer_setup(&_E->_field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, _callback, 0);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, _callback, 0L);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_E._field1._timer, _callback, 0UL);
+timer_setup(&_E._field1._timer, _callback, 0);
|
-setup_timer(&_field1._timer, _callback, 0);
+timer_setup(&_field1._timer, _callback, 0);
|
-setup_timer(&_field1._timer, _callback, 0L);
+timer_setup(&_field1._timer, _callback, 0);
|
-setup_timer(&_field1._timer, _callback, 0UL);
+timer_setup(&_field1._timer, _callback, 0);
|
-setup_timer(_field1._timer, _callback, 0);
+timer_setup(_field1._timer, _callback, 0);
|
-setup_timer(_field1._timer, _callback, 0L);
+timer_setup(_field1._timer, _callback, 0);
|
-setup_timer(_field1._timer, _callback, 0UL);
+timer_setup(_field1._timer, _callback, 0);
)

@change_callback_unused_data
 depends on match_timer_function_unused_data@
identifier match_timer_function_unused_data._callback;
type _origtype;
identifier _origarg;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *unused
 )
 {
	... when != _origarg
 }

Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21 15:57:09 -08:00
Kees Cook e99e88a9d2 treewide: setup_timer() -> timer_setup()
This converts all remaining cases of the old setup_timer() API into using
timer_setup(), where the callback argument is the structure already
holding the struct timer_list. These should have no behavioral changes,
since they just change which pointer is passed into the callback with
the same available pointers after conversion. It handles the following
examples, in addition to some other variations.

Casting from unsigned long:

    void my_callback(unsigned long data)
    {
        struct something *ptr = (struct something *)data;
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, ptr);

and forced object casts:

    void my_callback(struct something *ptr)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);

become:

    void my_callback(struct timer_list *t)
    {
        struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

Direct function assignments:

    void my_callback(unsigned long data)
    {
        struct something *ptr = (struct something *)data;
    ...
    }
    ...
    ptr->my_timer.function = my_callback;

have a temporary cast added, along with converting the args:

    void my_callback(struct timer_list *t)
    {
        struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;

And finally, callbacks without a data assignment:

    void my_callback(unsigned long data)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, 0);

have their argument renamed to verify they're unused during conversion:

    void my_callback(struct timer_list *unused)
    {
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

The conversion is done with the following Coccinelle script:

spatch --very-quiet --all-includes --include-headers \
	-I ./arch/x86/include -I ./arch/x86/include/generated \
	-I ./include -I ./arch/x86/include/uapi \
	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
	--dir . \
	--cocci-file ~/src/data/timer_setup.cocci

@fix_address_of@
expression e;
@@

 setup_timer(
-&(e)
+&e
 , ...)

// Update any raw setup_timer() usages that have a NULL callback, but
// would otherwise match change_timer_function_usage, since the latter
// will update all function assignments done in the face of a NULL
// function initialization in setup_timer().
@change_timer_function_usage_NULL@
expression _E;
identifier _timer;
type _cast_data;
@@

(
-setup_timer(&_E->_timer, NULL, _E);
+timer_setup(&_E->_timer, NULL, 0);
|
-setup_timer(&_E->_timer, NULL, (_cast_data)_E);
+timer_setup(&_E->_timer, NULL, 0);
|
-setup_timer(&_E._timer, NULL, &_E);
+timer_setup(&_E._timer, NULL, 0);
|
-setup_timer(&_E._timer, NULL, (_cast_data)&_E);
+timer_setup(&_E._timer, NULL, 0);
)

@change_timer_function_usage@
expression _E;
identifier _timer;
struct timer_list _stl;
identifier _callback;
type _cast_func, _cast_data;
@@

(
-setup_timer(&_E->_timer, _callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, &_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, &_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
 _E->_timer@_stl.function = _callback;
|
 _E->_timer@_stl.function = &_callback;
|
 _E->_timer@_stl.function = (_cast_func)_callback;
|
 _E->_timer@_stl.function = (_cast_func)&_callback;
|
 _E._timer@_stl.function = _callback;
|
 _E._timer@_stl.function = &_callback;
|
 _E._timer@_stl.function = (_cast_func)_callback;
|
 _E._timer@_stl.function = (_cast_func)&_callback;
)

// callback(unsigned long arg)
@change_callback_handle_cast
 depends on change_timer_function_usage@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
identifier _handle;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *t
 )
 {
(
	... when != _origarg
	_handletype *_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle =
-(void *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle;
	... when != _handle
	_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle;
	... when != _handle
	_handle =
-(void *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
)
 }

// callback(unsigned long arg) without existing variable
@change_callback_handle_cast_no_arg
 depends on change_timer_function_usage &&
                     !change_callback_handle_cast@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *t
 )
 {
+	_handletype *_origarg = from_timer(_origarg, t, _timer);
+
	... when != _origarg
-	(_handletype *)_origarg
+	_origarg
	... when != _origarg
 }

// Avoid already converted callbacks.
@match_callback_converted
 depends on change_timer_function_usage &&
            !change_callback_handle_cast &&
	    !change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier t;
@@

 void _callback(struct timer_list *t)
 { ... }

// callback(struct something *handle)
@change_callback_handle_arg
 depends on change_timer_function_usage &&
	    !match_callback_converted &&
            !change_callback_handle_cast &&
            !change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
@@

 void _callback(
-_handletype *_handle
+struct timer_list *t
 )
 {
+	_handletype *_handle = from_timer(_handle, t, _timer);
	...
 }

// If change_callback_handle_arg ran on an empty function, remove
// the added handler.
@unchange_callback_handle_arg
 depends on change_timer_function_usage &&
	    change_callback_handle_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
identifier t;
@@

 void _callback(struct timer_list *t)
 {
-	_handletype *_handle = from_timer(_handle, t, _timer);
 }

// We only want to refactor the setup_timer() data argument if we've found
// the matching callback. This undoes changes in change_timer_function_usage.
@unchange_timer_function_usage
 depends on change_timer_function_usage &&
            !change_callback_handle_cast &&
            !change_callback_handle_cast_no_arg &&
	    !change_callback_handle_arg@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type change_timer_function_usage._cast_data;
@@

(
-timer_setup(&_E->_timer, _callback, 0);
+setup_timer(&_E->_timer, _callback, (_cast_data)_E);
|
-timer_setup(&_E._timer, _callback, 0);
+setup_timer(&_E._timer, _callback, (_cast_data)&_E);
)

// If we fixed a callback from a .function assignment, fix the
// assignment cast now.
@change_timer_function_assignment
 depends on change_timer_function_usage &&
            (change_callback_handle_cast ||
             change_callback_handle_cast_no_arg ||
             change_callback_handle_arg)@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_func;
typedef TIMER_FUNC_TYPE;
@@

(
 _E->_timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_timer.function =
-&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_timer.function =
-(_cast_func)_callback;
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-&_callback;
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-(_cast_func)_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
)

// Sometimes timer functions are called directly. Replace matched args.
@change_timer_function_calls
 depends on change_timer_function_usage &&
            (change_callback_handle_cast ||
             change_callback_handle_cast_no_arg ||
             change_callback_handle_arg)@
expression _E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_data;
@@

 _callback(
(
-(_cast_data)_E
+&_E->_timer
|
-(_cast_data)&_E
+&_E._timer
|
-_E
+&_E->_timer
)
 )

// If a timer has been configured without a data argument, it can be
// converted without regard to the callback argument, since it is unused.
@match_timer_function_unused_data@
expression _E;
identifier _timer;
identifier _callback;
@@

(
-setup_timer(&_E->_timer, _callback, 0);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, 0L);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, 0UL);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0L);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0UL);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0L);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0UL);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0);
+timer_setup(_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0L);
+timer_setup(_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0UL);
+timer_setup(_timer, _callback, 0);
)

@change_callback_unused_data
 depends on match_timer_function_unused_data@
identifier match_timer_function_unused_data._callback;
type _origtype;
identifier _origarg;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *unused
 )
 {
	... when != _origarg
 }

Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21 15:57:07 -08:00
Sukadev Bhattiprolu 62b49c4210 powerpc/vas: Export chip_to_vas_id()
Export the symbol chip_to_vas_id() to fix a build failure when
CONFIG_CRYPTO_DEV_NX_COMPRESS_POWERNV=m.

Fixes: d4ef61b5e8 ("powerpc/vas, nx-842: Define and use chip_to_vas_id()")
Reported-by: Haren Myneni <hbabu@us.ibm.com>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-21 21:02:26 +11:00
Aneesh Kumar K.V 7a06c66835 powerpc/64s/slice: Use addr limit when computing slice mask
While computing slice mask for the free area we need make sure we only
search in the addr limit applicable for this mmap. We update the
slb_addr_limit after we request for a mmap above 128TB. But the
following mmap request with hint addr below 128TB should still limit
its search to below 128TB. ie. we should not use slb_addr_limit to
compute slice mask in this case. Instead, we should derive high addr
limit based on the mmap hint addr value.

Fixes: f4ea6dcb08 ("powerpc/mm: Enable mappings above 128TB")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-20 19:28:25 +11:00
Linus Torvalds fa7f578076 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - a bit more MM

 - procfs updates

 - dynamic-debug fixes

 - lib/ updates

 - checkpatch

 - epoll

 - nilfs2

 - signals

 - rapidio

 - PID management cleanup and optimization

 - kcov updates

 - sysvipc updates

 - quite a few misc things all over the place

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits)
  EXPERT Kconfig menu: fix broken EXPERT menu
  include/asm-generic/topology.h: remove unused parent_node() macro
  arch/tile/include/asm/topology.h: remove unused parent_node() macro
  arch/sparc/include/asm/topology_64.h: remove unused parent_node() macro
  arch/sh/include/asm/topology.h: remove unused parent_node() macro
  arch/ia64/include/asm/topology.h: remove unused parent_node() macro
  drivers/pcmcia/sa1111_badge4.c: avoid unused function warning
  mm: add infrastructure for get_user_pages_fast() benchmarking
  sysvipc: make get_maxid O(1) again
  sysvipc: properly name ipc_addid() limit parameter
  sysvipc: duplicate lock comments wrt ipc_addid()
  sysvipc: unteach ids->next_id for !CHECKPOINT_RESTORE
  initramfs: use time64_t timestamps
  drivers/watchdog: make use of devm_register_reboot_notifier()
  kernel/reboot.c: add devm_register_reboot_notifier()
  kcov: update documentation
  Makefile: support flag -fsanitizer-coverage=trace-cmp
  kcov: support comparison operands collection
  kcov: remove pointless current != NULL check
  kernel/panic.c: add TAINT_AUX
  ...
2017-11-17 16:56:17 -08:00
Gargi Sharma 95846ecf9d pid: replace pid bitmap implementation with IDR API
Patch series "Replacing PID bitmap implementation with IDR API", v4.

This series replaces kernel bitmap implementation of PID allocation with
IDR API.  These patches are written to simplify the kernel by replacing
custom code with calls to generic code.

The following are the stats for pid and pid_namespace object files
before and after the replacement.  There is a noteworthy change between
the IDR and bitmap implementation.

Before
   text       data        bss        dec        hex    filename
   8447       3894         64      12405       3075    kernel/pid.o
After
   text       data        bss        dec        hex    filename
   3397        304          0       3701        e75    kernel/pid.o

Before
   text       data        bss        dec        hex    filename
   5692       1842        192       7726       1e2e    kernel/pid_namespace.o
After
   text       data        bss        dec        hex    filename
   2854        216         16       3086        c0e    kernel/pid_namespace.o

The following are the stats for ps, pstree and calling readdir on /proc
for 10,000 processes.

ps:
        With IDR API    With bitmap
real    0m1.479s        0m2.319s
user    0m0.070s        0m0.060s
sys     0m0.289s        0m0.516s

pstree:
        With IDR API    With bitmap
real    0m1.024s        0m1.794s
user    0m0.348s        0m0.612s
sys     0m0.184s        0m0.264s

proc:
        With IDR API    With bitmap
real    0m0.059s        0m0.074s
user    0m0.000s        0m0.004s
sys     0m0.016s        0m0.016s

This patch (of 2):

Replace the current bitmap implementation for Process ID allocation.
Functions that are no longer required, for example, free_pidmap(),
alloc_pidmap(), etc.  are removed.  The rest of the functions are
modified to use the IDR API.  The change was made to make the PID
allocation less complex by replacing custom code with calls to generic
API.

[gs051095@gmail.com: v6]
  Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com
[avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl]
  Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org
Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.com
Signed-off-by: Gargi Sharma <gs051095@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-17 16:10:03 -08:00
Linus Torvalds 5a3e0b196b File locking related changes for v4.15
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaDuoWAAoJEAAOaEEZVoIVXEQP/jQYoU9hgvEj8j3ZIgi56SDJ
 pR45w2zcJz2/uU43DEKyShyLgsuoBbJQ3l/gGBH/tl+xGm9NzB0gatoEu9GmKNYz
 /IN6/vUFnoIAUyD+iMZbpmsYKIkz0z2YJo261IfspAwIft/cvHJnYYGQrP9YXg9F
 c7bdDuANTKocdQigc4BQyOe3OfIBGfTwJhuakO+1yuZmGOVNyxEcdYbMM8FiTfc8
 +62kvQQ3t7WMqSbM8M0QdGcYQjG0EwcVAuV7COurLJIva7hUkVel32MVUjoFcf28
 BnRu2ztFJCubm1HA85twlJDtpeXbcMqrUl/CcwRMpwDaePd5GVB1h5iKqbZ51BZ1
 fWT2STmt+8hY2B5eiXoYEaG3B7ZRr+r0oroxqOxpiZ/m4AVeouF+gPGv+NV5zgvD
 NGWC0MdklIJ4xaC99NEeP6kBhz0M74VKymFCTeHkVg9m4TqDepNvitKed0qagw19
 uw8seei7TOTm4o117+l55NHmyfTHXFO4U0WLTJyeZcoEnUs0rOcHeqyy0RwCBMrK
 W2fJtdBLFr+tBIIrID4TnPhhYtSvIPjz+FpiRDobqhgvMva/PIvLGTWK4unrgIjG
 ZQ7YGnwWda8GjqKhgZacn/BSXyJzOAF9hJp0mz2ORaOxaMarEV55duiZufCvGuZw
 uUQWRCKuQX7Oi05i9jXp
 =fCeF
 -----END PGP SIGNATURE-----

Merge tag 'locks-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux

Pull file locking update from Jeff Layton:
 "A couple of fixes for a patch that went into v4.14, and the bug report
  just came in a few days ago.. It passes my (minimal) testing, and has
  been in linux-next for a few days now.

  I also would like to get my address changed in MAINTAINERS to clear
  that hurdle"

* tag 'locks-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall
  fcntl: don't leak fd reference when fixup_compat_flock fails
  MAINTAINERS: s/jlayton@poochiereds.net/jlayton@kernel.org/
2017-11-17 13:21:58 -08:00
Linus Torvalds 93f30c73ec Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull compat and uaccess updates from Al Viro:

 - {get,put}_compat_sigset() series

 - assorted compat ioctl stuff

 - more set_fs() elimination

 - a few more timespec64 conversions

 - several removals of pointless access_ok() in places where it was
   followed only by non-__ variants of primitives

* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits)
  coredump: call do_unlinkat directly instead of sys_unlink
  fs: expose do_unlinkat for built-in callers
  ext4: take handling of EXT4_IOC_GROUP_ADD into a helper, get rid of set_fs()
  ipmi: get rid of pointless access_ok()
  pi433: sanitize ioctl
  cxlflash: get rid of pointless access_ok()
  mtdchar: get rid of pointless access_ok()
  r128: switch compat ioctls to drm_ioctl_kernel()
  selection: get rid of field-by-field copyin
  VT_RESIZEX: get rid of field-by-field copyin
  i2c compat ioctls: move to ->compat_ioctl()
  sched_rr_get_interval(): move compat to native, get rid of set_fs()
  mips: switch to {get,put}_compat_sigset()
  sparc: switch to {get,put}_compat_sigset()
  s390: switch to {get,put}_compat_sigset()
  ppc: switch to {get,put}_compat_sigset()
  parisc: switch to {get,put}_compat_sigset()
  get_compat_sigset()
  get rid of {get,put}_compat_itimerspec()
  io_getevents: Use timespec64 to represent timeouts
  ...
2017-11-17 11:54:55 -08:00
Linus Torvalds 974aa5630b First batch of KVM changes for 4.15
Common:
  - Python 3 support in kvm_stat
 
  - Accounting of slabs to kmemcg
 
 ARM:
  - Optimized arch timer handling for KVM/ARM
 
  - Improvements to the VGIC ITS code and introduction of an ITS reset
    ioctl
 
  - Unification of the 32-bit fault injection logic
 
  - More exact external abort matching logic
 
 PPC:
  - Support for running hashed page table (HPT) MMU mode on a host that
    is using the radix MMU mode;  single threaded mode on POWER 9 is
    added as a pre-requisite
 
  - Resolution of merge conflicts with the last second 4.14 HPT fixes
 
  - Fixes and cleanups
 
 s390:
  - Some initial preparation patches for exitless interrupts and crypto
 
  - New capability for AIS migration
 
  - Fixes
 
 x86:
  - Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs, and
    after-reset state
 
  - Refined dependencies for VMX features
 
  - Fixes for nested SMI injection
 
  - A lot of cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJaDayXAAoJEED/6hsPKofo/3UH/3HvlcHt+ADTkCU1/iiKAs+i
 0zngIOXIxgHDnV0ww6bV+Znww0BzTYgKCAXX76z603jdpDwG/pzQQcbLDF5ZoJnD
 sQtF10gZinWaRsHlfbLqjrHGL2pGDHO1UKBKLJ0bAIyORPZBxs7i+VmrY/blnr9c
 0wsybJ8RbvwAxjsDL5jeX/z4NehPupmKUc4Lf0eZdSHwVOf9sjn+MP6jJ0r2JcIb
 D+zddPBiLStzN97t4gZpQsrlj3LKrDS+6hY+1TjSvlh+yHKFVFh58VhLm4DuDeb5
 bYOAlWJ/gAWEzfvr5Ld+Nd7SqWWn/14logPkQ4gcU4BI/neAOzk4c6hJfCHl1nk=
 =593n
 -----END PGP SIGNATURE-----

Merge tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Radim Krčmář:
 "First batch of KVM changes for 4.15

  Common:
   - Python 3 support in kvm_stat
   - Accounting of slabs to kmemcg

  ARM:
   - Optimized arch timer handling for KVM/ARM
   - Improvements to the VGIC ITS code and introduction of an ITS reset
     ioctl
   - Unification of the 32-bit fault injection logic
   - More exact external abort matching logic

  PPC:
   - Support for running hashed page table (HPT) MMU mode on a host that
     is using the radix MMU mode; single threaded mode on POWER 9 is
     added as a pre-requisite
   - Resolution of merge conflicts with the last second 4.14 HPT fixes
   - Fixes and cleanups

  s390:
   - Some initial preparation patches for exitless interrupts and crypto
   - New capability for AIS migration
   - Fixes

  x86:
   - Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs,
     and after-reset state
   - Refined dependencies for VMX features
   - Fixes for nested SMI injection
   - A lot of cleanups"

* tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (89 commits)
  KVM: s390: provide a capability for AIS state migration
  KVM: s390: clear_io_irq() requests are not expected for adapter interrupts
  KVM: s390: abstract conversion between isc and enum irq_types
  KVM: s390: vsie: use common code functions for pinning
  KVM: s390: SIE considerations for AP Queue virtualization
  KVM: s390: document memory ordering for kvm_s390_vcpu_wakeup
  KVM: PPC: Book3S HV: Cosmetic post-merge cleanups
  KVM: arm/arm64: fix the incompatible matching for external abort
  KVM: arm/arm64: Unify 32bit fault injection
  KVM: arm/arm64: vgic-its: Implement KVM_DEV_ARM_ITS_CTRL_RESET
  KVM: arm/arm64: Document KVM_DEV_ARM_ITS_CTRL_RESET
  KVM: arm/arm64: vgic-its: Free caches when GITS_BASER Valid bit is cleared
  KVM: arm/arm64: vgic-its: New helper functions to free the caches
  KVM: arm/arm64: vgic-its: Remove kvm_its_unmap_device
  arm/arm64: KVM: Load the timer state when enabling the timer
  KVM: arm/arm64: Rework kvm_timer_should_fire
  KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate
  KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
  KVM: arm/arm64: Move phys_timer_emulate function
  KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps
  ...
2017-11-16 13:00:24 -08:00
Linus Torvalds 5b0e2cb020 powerpc updates for 4.15
Non-highlights:
 
  - Five fixes for the >128T address space handling, both to fix bugs in our
    implementation and to bring the semantics exactly into line with x86.
 
 Highlights:
 
  - Support for a new OPAL call on bare metal machines which gives us a true NMI
    (ie. is not masked by MSR[EE]=0) for debugging etc.
 
  - Support for Power9 DD2 in the CXL driver.
 
  - Improvements to machine check handling so that uncorrectable errors can be
    reported into the generic memory_failure() machinery.
 
  - Some fixes and improvements for VPHN, which is used under PowerVM to notify
    the Linux partition of topology changes.
 
  - Plumbing to enable TM (transactional memory) without suspend on some Power9
    processors (PPC_FEATURE2_HTM_NO_SUSPEND).
 
  - Support for emulating vector loads form cache-inhibited memory, on some
    Power9 revisions.
 
  - Disable the fast-endian switch "syscall" by default (behind a CONFIG), we
    believe it has never had any users.
 
  - A major rework of the API drivers use when initiating and waiting for long
    running operations performed by OPAL firmware, and changes to the
    powernv_flash driver to use the new API.
 
  - Several fixes for the handling of FP/VMX/VSX while processes are using
    transactional memory.
 
  - Optimisations of TLB range flushes when using the radix MMU on Power9.
 
  - Improvements to the VAS facility used to access coprocessors on Power9, and
    related improvements to the way the NX crypto driver handles requests.
 
  - Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit.
 
 Thanks to:
   Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew Donnellan, Aneesh
   Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin Herrenschmidt, Breno Leitao,
   Christophe Leroy, Christophe Lombard, Cyril Bur, Frederic Barrat, Gautham R.
   Shenoy, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo Romero, Haren
   Myneni, Joel Stanley, Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami
   Hiramatsu, Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao,
   Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia Franco de
   Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee, Shriya, Stephen
   Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel Datwyler, Vaibhav Jain,
   Vaidyanathan Srinivasan, William A. Kennington III.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJaDXGuAAoJEFHr6jzI4aWAEqwP/0TA35KFAK6wqfkCf67z4q+O
 I+5piI4eDV4jdCakfoIN1JfjhQRULNePSoCHTccan30mu/bm30p69xtOLL2/h5xH
 Mhz/eDBAOo0lrT20nyZfYMW3FnM66wnNf++qJ0O+8L052r4WOB02J0k1uM1ST01D
 5Lb5mUoxRLRzCgKRYAYWJifn+IFPUB9NMsvMTym94krAFlIjIzMEQXhDoln+jJMr
 QmY5f1BTA/fLfXobn0zwoc/C1oa2PUtxd+rxbwGrLoZ6G843mMqUi90SMr5ybhXp
 RzepnBTj4by3vOsnk/X1mANyaZfLsunp75FwnjHdPzKrAS/TuPp8D/iSxxE/PzEq
 cLwJFBnFXSgQMefDErXxhHSDz2dAg5r14rsTpDcq2Ko8TPV4rPsuSfmbd9Txekb0
 yWHsjoJUBBMl2QcWqIHl+AlV8j1RklF6solcTBcGnH1CZJMfa05VKXV7xGEvOHa0
 RJ+/xPyR9KjoB/SUp++9Vmx/M6SwQYFOJlr3Zpg9LNtR8WpoPYu1E6eO+u1Hhzny
 eJqaNstH+i+VdY9eqszkAsEBh8o9M/+b+7Wx7TetvU+v368CbXtgFYs9qy2oZjPF
 t9sY/BHaHZ8eZ7I00an77a0fVV5B1PVASUtIz5CqkwGpMvX6Z6W2K/XUUFI61kuu
 E06HS6Ht8UPJAzrAPUMl
 =Rq81
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "A bit of a small release, I suspect in part due to me travelling for
  KS. But my backlog of patches to review is smaller than usual, so I
  think in part folks just didn't send as much this cycle.

  Non-highlights:

   - Five fixes for the >128T address space handling, both to fix bugs
     in our implementation and to bring the semantics exactly into line
     with x86.

  Highlights:

   - Support for a new OPAL call on bare metal machines which gives us a
     true NMI (ie. is not masked by MSR[EE]=0) for debugging etc.

   - Support for Power9 DD2 in the CXL driver.

   - Improvements to machine check handling so that uncorrectable errors
     can be reported into the generic memory_failure() machinery.

   - Some fixes and improvements for VPHN, which is used under PowerVM
     to notify the Linux partition of topology changes.

   - Plumbing to enable TM (transactional memory) without suspend on
     some Power9 processors (PPC_FEATURE2_HTM_NO_SUSPEND).

   - Support for emulating vector loads form cache-inhibited memory, on
     some Power9 revisions.

   - Disable the fast-endian switch "syscall" by default (behind a
     CONFIG), we believe it has never had any users.

   - A major rework of the API drivers use when initiating and waiting
     for long running operations performed by OPAL firmware, and changes
     to the powernv_flash driver to use the new API.

   - Several fixes for the handling of FP/VMX/VSX while processes are
     using transactional memory.

   - Optimisations of TLB range flushes when using the radix MMU on
     Power9.

   - Improvements to the VAS facility used to access coprocessors on
     Power9, and related improvements to the way the NX crypto driver
     handles requests.

   - Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit.

  Thanks to: Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew
  Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin
  Herrenschmidt, Breno Leitao, Christophe Leroy, Christophe Lombard,
  Cyril Bur, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven,
  Guilherme G. Piccoli, Gustavo Romero, Haren Myneni, Joel Stanley,
  Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami Hiramatsu,
  Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao,
  Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia
  Franco de Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee,
  Shriya, Stephen Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel
  Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, and William A.
  Kennington III"

* tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (151 commits)
  powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature
  powerpc/64s: Fix masking of SRR1 bits on instruction fault
  powerpc/64s: mm_context.addr_limit is only used on hash
  powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation
  powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundary
  powerpc/64s/hash: Fix fork() with 512TB process address space
  powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocation
  powerpc/64s/hash: Fix 512T hint detection to use >= 128T
  powerpc: Fix DABR match on hash based systems
  powerpc/signal: Properly handle return value from uprobe_deny_signal()
  powerpc/fadump: use kstrtoint to handle sysfs store
  powerpc/lib: Implement UACCESS_FLUSHCACHE API
  powerpc/lib: Implement PMEM API
  powerpc/powernv/npu: Don't explicitly flush nmmu tlb
  powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm()
  powerpc/powernv/idle: Round up latency and residency values
  powerpc/kprobes: refactor kprobe_lookup_name for safer string operations
  powerpc/kprobes: Blacklist emulate_update_regs() from kprobes
  powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace
  powerpc/kprobes: Disable preemption before invoking probe handler for optprobes
  ...
2017-11-16 12:47:46 -08:00
Linus Torvalds e60e1ee606 main drm pull request for v4.15
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaCm8RAAoJEAx081l5xIa+zX0QAJSm31kCG3vdw2CNiRx25L3q
 3hcsEOgAjVJ9FQVGKFWjzb8TK35tSqtNx5kWIj0VGaIfBE5Bdg5SLLgKKUYas8rY
 4LaphqICq2uxu2BNa2tpiar/sHhAnuozwQ4czpVWXzlaISnb9yYzRl7gMuyUVGkx
 +Gih5VUhLmQC0HsRTLJ3vaZQoUsLAl2gAjKcWa1bx57j2S+iKOPfsLaq7VYo+y1I
 Njc+iSGqMhJzRLXVkxL2lQKaslp7R38Bbh5K4Kvyjkm4Aq7zErOF6irpOXKMcrGl
 mwnr89vf1G9thjikrBaXpKnuvdbWYveoN/ORMlTdCfxkFnChHLnm3bd7NJ49RXDN
 Hv/Iq9YYjmZ9GTatxnx7lWtmXnZXC5he1yn1JAuz/yt7/0b/Wx+Mu/wEpBXYNFTd
 1AZdD586i+AmPo3yDkqH9nBu8JC0W0AnS9VZma4LVvZOP2UfJmj5Im1CLHItbGDN
 FnUCkwyD/lJUUk+WgT+w/GOMJgmFHDiFFl4tFtYVVjrUirpCFVguSKG9xuv6tT8P
 8iRsoP7RrcmDN9ojN2SEHwcpsAv3HnKkDv+9+GIbWnrGsSbCPq8Qm+JDSvf4h22I
 K5lwNpJrcpSKI+q10L7w2xliTBwb98sJkWGA/rssomrdBOWteGZAyqFRYAVgQ+mJ
 x/nJurIqQYh2KQN9+uLG
 =xVV2
 -----END PGP SIGNATURE-----

Merge tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux

Pull drm updates from Dave Airlie:
 "This is the main drm pull request for v4.15.

  Core:
   - Atomic object lifetime fixes
   - Atomic iterator improvements
   - Sparse/smatch fixes
   - Legacy kms ioctls to be interruptible
   - EDID override improvements
   - fb/gem helper cleanups
   - Simple outreachy patches
   - Documentation improvements
   - Fix dma-buf rcu races
   - DRM mode object leasing for improving VR use cases.
   - vgaarb improvements for non-x86 platforms.

  New driver:
   - tve200: Faraday Technology TVE200 block.

     This "TV Encoder" encodes a ITU-T BT.656 stream and can be found in
     the StorLink SL3516 (later Cortina Systems CS3516) as well as the
     Grain Media GM8180.

  New bridges:
   - SiI9234 support

  New panels:
   - S6E63J0X03, OTM8009A, Seiko 43WVF1G, 7" rpi touch panel, Toshiba
     LT089AC19000, Innolux AT043TN24

  i915:
   - Remove Coffeelake from alpha support
   - Cannonlake workarounds
   - Infoframe refactoring for DisplayPort
   - VBT updates
   - DisplayPort vswing/emph/buffer translation refactoring
   - CCS fixes
   - Restore GPU clock boost on missed vblanks
   - Scatter list updates for userptr allocations
   - Gen9+ transition watermarks
   - Display IPC (Isochronous Priority Control)
   - Private PAT management
   - GVT: improved error handling and pci config sanitizing
   - Execlist refactoring
   - Transparent Huge Page support
   - User defined priorities support
   - HuC/GuC firmware refactoring
   - DP MST fixes
   - eDP power sequencing fixes
   - Use RCU instead of stop_machine
   - PSR state tracking support
   - Eviction fixes
   - BDW DP aux channel timeout fixes
   - LSPCON fixes
   - Cannonlake PLL fixes

  amdgpu:
   - Per VM BO support
   - Powerplay cleanups
   - CI powerplay support
   - PASID mgr for kfd
   - SR-IOV fixes
   - initial GPU reset for vega10
   - Prime mmap support
   - TTM updates
   - Clock query interface for Raven
   - Fence to handle ioctl
   - UVD encode ring support on Polaris
   - Transparent huge page DMA support
   - Compute LRU pipe tweaks
   - BO flag to allow buffers to opt out of implicit sync
   - CTX priority setting API
   - VRAM lost infrastructure plumbing

  qxl:
   - fix flicker since atomic rework

  amdkfd:
   - Further improvements from internal AMD tree
   - Usermode events
   - Drop radeon support

  nouveau:
   - Pascal temperature sensor support
   - Improved BAR2 handling
   - MMU rework to support Pascal MMU

  exynos:
   - Improved HDMI/mixer support
   - HDMI audio interface support

  tegra:
   - Prep work for tegra186
   - Cleanup/fixes

  msm:
   - Preemption support for a5xx
   - Display fixes for 8x96 (snapdragon 820)
   - Async cursor plane fixes
   - FW loading rework
   - GPU debugging improvements

  vc4:
   - Prep for DSI panels
   - fix T-format tiling scanout
   - New madvise ioctl

  Rockchip:
   - LVDS support

  omapdrm:
   - omap4 HDMI CEC support

  etnaviv:
   - GPU performance counters groundwork

  sun4i:
   - refactor driver load + TCON backend
   - HDMI improvements
   - A31 support
   - Misc fixes

  udl:
   - Probe/EDID read fixes.

  tilcdc:
   - Misc fixes.

  pl111:
   - Support more variants

  adv7511:
   - Improve EDID handling.
   - HDMI CEC support

  sii8620:
   - Add remote control support"

* tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux: (1480 commits)
  drm/rockchip: analogix_dp: Use mutex rather than spinlock
  drm/mode_object: fix documentation for object lookups.
  drm/i915: Reorder context-close to avoid calling i915_vma_close() under RCU
  drm/i915: Move init_clock_gating() back to where it was
  drm/i915: Prune the reservation shared fence array
  drm/i915: Idle the GPU before shinking everything
  drm/i915: Lock llist_del_first() vs llist_del_all()
  drm/i915: Calculate ironlake intermediate watermarks correctly, v2.
  drm/i915: Disable lazy PPGTT page table optimization for vGPU
  drm/i915/execlists: Remove the priority "optimisation"
  drm/i915: Filter out spurious execlists context-switch interrupts
  drm/amdgpu: use irq-safe lock for kiq->ring_lock
  drm/amdgpu: bypass lru touch for KIQ ring submission
  drm/amdgpu: Potential uninitialized variable in amdgpu_vm_update_directories()
  drm/amdgpu: potential uninitialized variable in amdgpu_vce_ring_parse_cs()
  drm/amd/powerplay: initialize a variable before using it
  drm/amd/powerplay: suppress KASAN out of bounds warning in vega10_populate_all_memory_levels
  drm/amd/amdgpu: fix evicted VRAM bo adjudgement condition
  drm/vblank: Tune drm_crtc_accurate_vblank_count() WARN down to a debug
  drm/rockchip: add CONFIG_OF dependency for lvds
  ...
2017-11-15 20:42:10 -08:00
Linus Torvalds 7c225c69f8 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few misc bits

 - ocfs2 updates

 - almost all of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits)
  memory hotplug: fix comments when adding section
  mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP
  mm: simplify nodemask printing
  mm,oom_reaper: remove pointless kthread_run() error check
  mm/page_ext.c: check if page_ext is not prepared
  writeback: remove unused function parameter
  mm: do not rely on preempt_count in print_vma_addr
  mm, sparse: do not swamp log with huge vmemmap allocation failures
  mm/hmm: remove redundant variable align_end
  mm/list_lru.c: mark expected switch fall-through
  mm/shmem.c: mark expected switch fall-through
  mm/page_alloc.c: broken deferred calculation
  mm: don't warn about allocations which stall for too long
  fs: fuse: account fuse_inode slab memory as reclaimable
  mm, page_alloc: fix potential false positive in __zone_watermark_ok
  mm: mlock: remove lru_add_drain_all()
  mm, sysctl: make NUMA stats configurable
  shmem: convert shmem_init_inodecache() to void
  Unify migrate_pages and move_pages access checks
  mm, pagevec: rename pagevec drained field
  ...
2017-11-15 19:42:40 -08:00
Mel Gorman 2d4894b5d2 mm: remove cold parameter from free_hot_cold_page*
Most callers users of free_hot_cold_page claim the pages being released
are cache hot.  The exception is the page reclaim paths where it is
likely that enough pages will be freed in the near future that the
per-cpu lists are going to be recycled and the cache hotness information
is lost.  As no one really cares about the hotness of pages being
released to the allocator, just ditch the parameter.

The APIs are renamed to indicate that it's no longer about hot/cold
pages.  It should also be less confusing as there are subtle differences
between them.  __free_pages drops a reference and frees a page when the
refcount reaches zero.  free_hot_cold_page handled pages whose refcount
was already zero which is non-obvious from the name.  free_unref_page
should be more obvious.

No performance impact is expected as the overhead is marginal.  The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.

[mgorman@techsingularity.net: add pages to head, not tail]
  Link: http://lkml.kernel.org/r/20171019154321.qtpzaeftoyyw4iey@techsingularity.net
Link: http://lkml.kernel.org/r/20171018075952.10627-8-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:06 -08:00
Levin, Alexander (Sasha Levin) 75f296d93b kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACK
Convert all allocations that used a NOTRACK flag to stop using it.

Link: http://lkml.kernel.org/r/20171007030159.22241-3-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Kirill A. Shutemov b4e98d9ac7 mm: account pud page tables
On a machine with 5-level paging support a process can allocate
significant amount of memory and stay unnoticed by oom-killer and memory
cgroup.  The trick is to allocate a lot of PUD page tables.  We don't
account PUD page tables, only PMD and PTE.

We already addressed the same issue for PMD page tables, see commit
dc6c9a35b6 ("mm: account pmd page tables to the process").
Introduction of 5-level paging brings the same issue for PUD page
tables.

The patch expands accounting to PUD level.

[kirill.shutemov@linux.intel.com: s/pmd_t/pud_t/]
  Link: http://lkml.kernel.org/r/20171004074305.x35eh5u7ybbt5kar@black.fi.intel.com
[heiko.carstens@de.ibm.com: s390/mm: fix pud table accounting]
  Link: http://lkml.kernel.org/r/20171103090551.18231-1-heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20171002080427.3320-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Linus Torvalds 1b6115fbe3 pci-v4.15-changes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJaDFIdAAoJEFmIoMA60/r8Jg4P/3IrmMNVnpqmYEZ7lRSW7UQ3
 8jtupbzIkbPsIAEhbJ7xqO7zKx85j6Og+ZSOv4a8u/tS6cd1aVZu2PpWsTkacez0
 7nLGVCSL3HZi5qcrtOvb2Pmke18SUKSPxVYSgS2ajQavB1oKaY03FbHDWyWidCZx
 qxkeGZOiUDw5kSGkQWyks1WgB0dd76rVbPcrKKEJueGgrdSm+EdgdDv8eT6bZInZ
 uMrCmSjNYTQP0KASCJJvgYOtJbdwvP6NuQTxzOlU2G+H2SqsLRjsz4UUR8FF06T5
 cndpgpG3QSAZLx7wCeWTvRorTEYORzKMoyw/AUjhiGbRep9Zw0aKNvCC99E6xjyD
 FECrk6kCrqZs7l+LVXK4SwpBXCVjNgRoFAHBEKF2X3/SWUkUhHXZHCVvMQB8LQiS
 2p8VRoYWw2aCLkHCGynuzToUrD2P2Pjxe5n/13aYVJkyBNfQqqZ3l2YHiZdpDO3j
 rgG6RW0WCrpZxfb/0WAbPnQ2qpZAwDPO6hOW7dIfTZabFVXRIkBvNq53by/0MxvP
 jyOcMTsq2l8y46f3VgNPUAHj0f52HwfZA3PQRPh+MQDz5385BJklDRWtfVM7cQx9
 IoeGkq1zLLvpOh63he/jnnRELxDvNVcxND8lOkenJlObj9kK63hUEcXg/zEMS4w3
 oetLw9TqE32Jb7GfpVSw
 =j4L3
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

  - detach driver before tearing down procfs/sysfs (Alex Williamson)

  - disable PCIe services during shutdown (Sinan Kaya)

  - fix ASPM oops on systems with no Root Ports (Ard Biesheuvel)

  - fix ASPM LTR_L1.2_THRESHOLD programming (Bjorn Helgaas)

  - fix ASPM Common_Mode_Restore_Time computation (Bjorn Helgaas)

  - fix portdrv MSI/MSI-X vector allocation (Dongdong Liu, Bjorn
    Helgaas)

  - report non-fatal AER errors only to the affected endpoint (Gabriele
    Paoloni)

  - distribute bus numbers, MMIO, and I/O space among hotplug bridges to
    allow more devices to be hot-added (Mika Westerberg)

  - fix pciehp races during initialization and surprise link down (Mika
    Westerberg)

  - handle surprise-removed devices in PME handling (Qiang)

  - support resizable BARs for large graphics devices (Christian König)

  - expose SR-IOV offset, stride, and VF device ID via sysfs (Filippo
    Sironi)

  - create SR-IOV virtfn/physfn sysfs links before attaching driver
    (Stuart Hayes)

  - fix SR-IOV "ARI Capable Hierarchy" restore issue (Tony Nguyen)

  - enforce Kconfig IOV/REALLOC dependency (Sascha El-Sharkawy)

  - avoid slot reset if bridge itself is broken (Jan Glauber)

  - clean up pci_reset_function() path (Jan H. Schönherr)

  - make pci_map_rom() fail if the option ROM is invalid (Changbin Du)

  - convert timers to timer_setup() (Kees Cook)

  - move PCI_QUIRKS to PCI bus Kconfig menu (Randy Dunlap)

  - constify pci_dev_type and intel_mid_pci_ops (Bhumika Goyal)

  - remove unnecessary pci_dev, pci_bus, resource, pcibios_set_master()
    declarations (Bjorn Helgaas)

  - fix endpoint framework overflows and BUG()s (Dan Carpenter)

  - fix endpoint framework issues (Kishon Vijay Abraham I)

  - avoid broken Cavium CN8xxx bus reset behavior (David Daney)

  - extend Cavium ACS capability quirks (Vadim Lomovtsev)

  - support Synopsys DesignWare RC in ECAM mode (Ard Biesheuvel)

  - turn off dra7xx clocks cleanly on shutdown (Keerthy)

  - fix Faraday probe error path (Wei Yongjun)

  - support HiSilicon STB SoC PCIe host controller (Jianguo Sun)

  - fix Hyper-V interrupt affinity issue (Dexuan Cui)

  - remove useless ACPI warning for Hyper-V pass-through devices (Vitaly
    Kuznetsov)

  - support multiple MSI on iProc (Sandor Bodo-Merle)

  - support Layerscape LS1012a and LS1046a PCIe host controllers (Hou
    Zhiqiang)

  - fix Layerscape default error response (Minghuan Lian)

  - support MSI on Tango host controller (Marc Gonzalez)

  - support Tegra186 PCIe host controller (Manikanta Maddireddy)

  - use generic accessors on Tegra when possible (Thierry Reding)

  - support V3 Semiconductor PCI host controller (Linus Walleij)

* tag 'pci-v4.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (85 commits)
  PCI/ASPM: Add L1 Substates definitions
  PCI/ASPM: Reformat ASPM register definitions
  PCI/ASPM: Use correct capability pointer to program LTR_L1.2_THRESHOLD
  PCI/ASPM: Account for downstream device's Port Common_Mode_Restore_Time
  PCI: xgene: Rename xgene_pcie_probe_bridge() to xgene_pcie_probe()
  PCI: xilinx: Rename xilinx_pcie_link_is_up() to xilinx_pcie_link_up()
  PCI: altera: Rename altera_pcie_link_is_up() to altera_pcie_link_up()
  PCI: Fix kernel-doc build warning
  PCI: Fail pci_map_rom() if the option ROM is invalid
  PCI: Move pci_map_rom() error path
  PCI: Move PCI_QUIRKS to the PCI bus menu
  alpha/PCI: Make pdev_save_srm_config() static
  PCI: Remove unused declarations
  PCI: Remove redundant pci_dev, pci_bus, resource declarations
  PCI: Remove redundant pcibios_set_master() declarations
  PCI/PME: Handle invalid data when reading Root Status
  PCI: hv: Use effective affinity mask
  PCI: pciehp: Do not clear Presence Detect Changed during initialization
  PCI: pciehp: Fix race condition handling surprise link down
  PCI: Distribute available resources to hotplug-capable bridges
  ...
2017-11-15 15:01:28 -08:00
Linus Torvalds 1be2172e96 Modules updates for v4.15
Summary of modules changes for the 4.15 merge window:
 
 - Treewide module_param_call() cleanup, fix up set/get function
   prototype mismatches, from Kees Cook
 
 - Minor code cleanups
 
 Signed-off-by: Jessica Yu <jeyu@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJaDCyzAAoJEMBFfjjOO8FyaYQP/AwHBy6XmwwVlWDP4BqIF6hL
 Vhy3ccVLYEORvePv68tWSRPUz5n6+1Ebqanmwtkw6i8l+KwxY2SfkZql09cARc33
 2iBE4bHF98iWQmnJbF6me80fedY9n5bZJNMQKEF9VozJWwTMOTQFTCfmyJRDBmk9
 iidQj6M3idbSUOYIJjvc40VGx5NyQWSr+FFfqsz1rU5iLGRGEvA3I2/CDT0oTuV6
 D4MmFxzE2Tv/vIMa2GzKJ1LGScuUfSjf93Lq9Kk0cG36qWao8l930CaXyVdE9WJv
 bkUzpf3QYv/rDX6QbAGA0cada13zd+dfBr8YhchclEAfJ+GDLjMEDu04NEmI6KUT
 5lP0Xw0xYNZQI7bkdxDMhsj5jaz/HJpXCjPCtZBnSEKiL4OPXVMe+pBHoCJ2/yFN
 6M716XpWYgUviUOdiE+chczB5p3z4FA6u2ykaM4Tlk0btZuHGxjcSWwvcIdlPmjm
 kY4AfDV6K0bfEBVguWPJicvrkx44atqT5nWbbPhDwTSavtsuRJLb3GCsHedx7K8h
 ZO47lCQFAWCtrycK1HYw+oupNC3hYWQ0SR42XRdGhL1bq26C+1sei1QhfqSgA9PQ
 7CwWH4UTOL9fhtrzSqZngYOh9sjQNFNefqQHcecNzcEjK2vjrgQZvRNWZKHSwaFs
 fbGX8juZWP4ypbK+irTB
 =c8vb
 -----END PGP SIGNATURE-----

Merge tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux

Pull module updates from Jessica Yu:
 "Summary of modules changes for the 4.15 merge window:

   - treewide module_param_call() cleanup, fix up set/get function
     prototype mismatches, from Kees Cook

   - minor code cleanups"

* tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module: Do not paper over type mismatches in module_param_call()
  treewide: Fix function prototypes for module_param_call()
  module: Prepare to convert all module_param_call() prototypes
  kernel/module: Delete an error message for a failed memory allocation in add_module_usage()
2017-11-15 13:46:33 -08:00
Linus Torvalds 9682b3dea2 Merge branch 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "The usual rocket-science from trivial tree for 4.15"

* 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  MAINTAINERS: relinquish kconfig
  MAINTAINERS: Update my email address
  treewide: Fix typos in Kconfig
  kfifo: Fix comments
  init/Kconfig: Fix module signing document location
  misc: ibmasm: Return error on error path
  HID: logitech-hidpp: fix mistake in printk, "feeback" -> "feedback"
  MAINTAINERS: Correct path to uDraw PS3 driver
  tracing: Fix doc mistakes in trace sample
  tracing: Kconfig text fixes for CONFIG_HWLAT_TRACER
  MIPS: Alchemy: Remove reverted CONFIG_NETLINK_MMAP from db1xxx_defconfig
  mm/huge_memory.c: fixup grammar in comment
  lib/xz: Add fall-through comments to a switch statement
2017-11-15 10:14:11 -08:00
Jeff Layton 4d2dc2cc76 fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall
Currently, we're capping the values too low in the F_GETLK64 case. The
fields in that structure are 64-bit values, so we shouldn't need to do
any sort of fixup there.

Make sure we check that assumption at build time in the future however
by ensuring that the sizes we're copying will fit.

With this, we no longer need COMPAT_LOFF_T_MAX either, so remove it.

Fixes: 94073ad77f (fs/locks: don't mess with the address limit in compat_fcntl64)
Reported-by: Vitaly Lipatov <lav@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
2017-11-15 08:08:36 -05:00
Michael Ellerman 3ffa9d9e2a powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature
Recently we added a CPU feature for Power9 DD2.0, to capture the fact
that some workarounds are required only on Power9 DD1 and DD2.0 but
not DD2.1 or later.

Then in commit 9d2f510a66 ("powerpc/64s/idle: avoid POWER9 DD1 and
DD2.0 ERAT workaround on DD2.1") and commit e3646330cf
"powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on
DD2.1") we changed CPU_FTR_SECTIONs to check for DD1 or DD20, eg:

  BEGIN_FTR_SECTION
          PPC_INVALIDATE_ERAT
  END_FTR_SECTION_IFSET(CPU_FTR_POWER9_DD1 | CPU_FTR_POWER9_DD20)

Unfortunately although this reads as "if set DD1 or DD2.0", the or is
a bitwise or and actually generates a mask of both bits. The code that
does the feature patching then checks that the value of the CPU
features masked with that mask are equal to the mask.

So the end result is we're checking for DD1 and DD20 being set, which
never happens. Yes the API is terrible.

Removing the ERAT workaround on DD2.0 results in random SEGVs, the
system tends to boot, but things randomly die including sometimes
dhclient, udev etc.

To fix the problem and hopefully avoid it in future, we remove the
DD2.0 CPU feature and instead add a DD2.1 (or later) feature. This
allows us to easily express that the workarounds are required if DD2.1
is not set.

At some point we will drop the DD1 workarounds entirely and some of
this can be cleaned up.

Fixes: 9d2f510a66 ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 ERAT workaround on DD2.1")
Fixes: e3646330cf ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on DD2.1")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-15 14:25:42 +11:00
Linus Torvalds 37cb8e1f8e DeviceTree for 4.15:
- kbuild cleanups and improvements for dtbs
 
 - Code clean-up of overlay code and fixing for some long standing memory
   leak and race condition in applying overlays
 
 - Improvements to DT memory usage making sysfs/kobjects optional and
   skipping unflattening of disabled nodes. This is part of kernel
   tinification efforts.
 
 - Final piece of removing storing the full path for every DT node. The
   prerequisite conversion of printk's to use device_node format
   specifier happened in 4.14.
 
 - Sync with current upstream dtc. This brings additional checks to dtb
   compiling.
 
 - Binding doc tree wide removal of leading 0s from examples
 
 - RTC binding documentation adding missing devices and some
   consolidation of duplicated bindings
 
 - Vendor prefix documentation for nutsboard, Silicon Storage Technology,
   shimafuji, Tecon Microprocessor Technologies, DH electronics GmbH,
   Opal Kelly, and Next Thing
 -----BEGIN PGP SIGNATURE-----
 
 iQItBAABCAAXBQJaCwaSEBxyb2JoQGtlcm5lbC5vcmcACgkQ+vtdtY28YcNzeA/8
 C8uQhSsX2+UQZvFzcEA8KQAMGT3kYdrcf+gidRKwCEUWg1qscUEpTb3n3Rm5NUbU
 RPD1s6GSlh6fJCMHDTQ6Tti/T59L7nZa2/AIGmUishGu4x4q1o18AobpFJmYP/EM
 SJPwnmm5RV9WcZFao1y+sY3Xtn8DStxHO4cS+dyF5/EvPN9D8nbLJfu7bgTBAZww
 HktIMB9kx+GTipRQZBvBwXoy5MJjthIZub4XwzesA4tGananj4cXlc0xaVxpdYy3
 5bO6q5F7cbrZ2uyrF+oIChpCENK4VaXh80m0WHc8EzaG++shzEkR4he1vYkwnV+I
 OYo4vsUg9dP8rBksUG1eYhS8fJKPvEBRNP7ETT5utVBy5I/tDEbo/crmQZRTIDIC
 hZbhcdZlISZj0DzkMK2ZHQV9UYtRWzXrJbZHFIPP12GCyvXVxYJUIWb9iYnUYSon
 KugygsFSpZHMWmfAhemw5/ctJZ19qhM5UIl2KZk5tMBHAf466ILmZjg0me6fYkOp
 eADfwHJ1dLMdK79CVMHSfp+vArcZXp35B16c3sWpJB36Il97Mc/9siEufCL4GKX7
 IBBnQBlbpSBKBejWVyI7Ip/Xp5u4qAQD+ZMJ9oLqBRqfWerHbDuOERlEOgwGqJYr
 9v4HvP7V8eVUvAdqXka4EBfCyAgUzXDAxG2Dfmv9vGU=
 =jgpN
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull DeviceTree updates from Rob Herring:
 "A bigger diffstat than usual with the kbuild changes and a tree wide
  fix in the binding documentation.

  Summary:

   - kbuild cleanups and improvements for dtbs

   - Code clean-up of overlay code and fixing for some long standing
     memory leak and race condition in applying overlays

   - Improvements to DT memory usage making sysfs/kobjects optional and
     skipping unflattening of disabled nodes. This is part of kernel
     tinification efforts.

   - Final piece of removing storing the full path for every DT node.
     The prerequisite conversion of printk's to use device_node format
     specifier happened in 4.14.

   - Sync with current upstream dtc. This brings additional checks to
     dtb compiling.

   - Binding doc tree wide removal of leading 0s from examples

   - RTC binding documentation adding missing devices and some
     consolidation of duplicated bindings

   - Vendor prefix documentation for nutsboard, Silicon Storage
     Technology, shimafuji, Tecon Microprocessor Technologies, DH
     electronics GmbH, Opal Kelly, and Next Thing"

* tag 'devicetree-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (55 commits)
  dt-bindings: usb: add #phy-cells to usb-nop-xceiv
  dt-bindings: Remove leading zeros from bindings notation
  kbuild: handle dtb-y and CONFIG_OF_ALL_DTBS natively in Makefile.lib
  MIPS: dts: remove bogus bcm96358nb4ser.dtb from dtb-y entry
  kbuild: clean up *.dtb and *.dtb.S patterns from top-level Makefile
  .gitignore: move *.dtb and *.dtb.S patterns to the top-level .gitignore
  .gitignore: sort normal pattern rules alphabetically
  dt-bindings: add vendor prefix for Next Thing Co.
  scripts/dtc: Update to upstream version v1.4.5-6-gc1e55a5513e9
  of: dynamic: fix memory leak related to properties of __of_node_dup
  of: overlay: make pr_err() string unique
  of: overlay: pr_err from return NOTIFY_OK to overlay apply/remove
  of: overlay: remove unneeded check for NULL kbasename()
  of: overlay: remove a dependency on device node full_name
  of: overlay: simplify applying symbols from an overlay
  of: overlay: avoid race condition between applying multiple overlays
  of: overlay: loosen overly strict phandle clash check
  of: overlay: expand check of whether overlay changeset can be removed
  of: overlay: detect cases where device tree may become corrupt
  of: overlay: minor restructuring
  ...
2017-11-14 18:25:40 -08:00
Linus Torvalds e37e0ee019 A couple of dma-mapping updates:
- turn dma_cache_sync into a dma_map_ops instance and remove
    implementation that purely are dead because the architecture
    doesn't support noncoherent allocations
  - add a flag for busses that need DMA configuration (Robin Murphy)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAloLSrYLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOMuQ//XXD94uNPYavrgXzGsAtg+I+LEm+xyk4T0dX5fxfj
 amXX49MHoGemjsBgzJlkQMMFqwDEdkKyEuFnEuy6OeowYCyD6zW0MJ3MwP9OosNJ
 PNTdGZIfSvxPYEW8cR9AdK3iQ2loMBZnYhd+O/oVjSugULLW2DNa7r2VRktcCKoh
 8Ob/8gL6Y9xEYJBRszhrBwKTa/hU8IThxxozBFzN7I3LIKyFboSTcwXGLAHow43g
 4anCTjWTaDcoU2JwY6UTRKRRTV+gD0ZRcsZfd8lNNb5rtMVZkBVOHbF14SMAmw1r
 kSgRcU3+WIFPhK/8wBYqtGZZGnOgFBTHVeqow3AdS728pBWlWl8niTK0DiIgCd3m
 qzScF6SqfN1bCZkZAy8FUV2l0DPYKS6lvyNkf00Eb2W/f6LEqAcjCi2QDDxRfaw+
 Vm97nPUiM+uXNy/6KtAy6ChdprSqx12/edXPp7Y3H2rS/+Dmr6exeix+wb7QUN8W
 JI7ZRHo4JLaJZk/XrZtGX/6jnN1Jo7vfApQOmYDY7kE1iGtOU/LQQj8gcZRVQxML
 4soN6ivSmZX2k03LabWHpYQ8QiyCSYChLC+Az7rQH47LDLeu1IdTJu6orpXpaxyo
 ymzEWlHbmF7mE66X4g/Up/eAYk2YLUA3rKLGVjAIaWDBzHftSFg5EaAnqMADC1G2
 hSo=
 =ALJf
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-4.15' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - turn dma_cache_sync into a dma_map_ops instance and remove
   implementation that purely are dead because the architecture doesn't
   support noncoherent allocations

 - add a flag for busses that need DMA configuration (Robin Murphy)

* tag 'dma-mapping-4.15' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: turn dma_cache_sync into a dma_map_ops method
  sh: make dma_cache_sync a no-op
  xtensa: make dma_cache_sync a no-op
  unicore32: make dma_cache_sync a no-op
  powerpc: make dma_cache_sync a no-op
  mn10300: make dma_cache_sync a no-op
  microblaze: make dma_cache_sync a no-op
  ia64: make dma_cache_sync a no-op
  frv: make dma_cache_sync a no-op
  x86: make dma_cache_sync a no-op
  floppy: consolidate the dummy fd_cacheflush definition
  drivers: flag buses which demand DMA configuration
2017-11-14 16:54:12 -08:00
Bjorn Helgaas 9ceb09cce1 Merge branch 'pci/virtualization' into next
* pci/virtualization:
  PCI: Document reset method return values
  PCI: Detach driver before procfs & sysfs teardown on device remove
  PCI: Apply Cavium ThunderX ACS quirk to more Root Ports
  PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF
  PCI: Restore ARI Capable Hierarchy before setting numVFs
  PCI: Create SR-IOV virtfn/physfn links before attaching driver
  PCI: Expose SR-IOV offset, stride, and VF device ID via sysfs
  PCI: Cache the VF device ID in the SR-IOV structure
  PCI: Add Kconfig PCI_IOV dependency for PCI_REALLOC_ENABLE_AUTO
  PCI: Remove unused function __pci_reset_function()
  PCI: Remove reset argument from pci_iov_{add,remove}_virtfn()
2017-11-14 12:11:26 -06:00
Michael Ellerman 475b581ff5 powerpc/64s: Fix masking of SRR1 bits on instruction fault
On 64-bit Book3s, when we take an instruction fault the reason for the
fault may be reported in SRR1. For data faults the reason is reported
in DSISR (Data Storage Instruction Status Register).

The reasons reported in each do not necessarily correspond, so we mask
the SRR1 bits before copying them to the DSISR, which is then used by
the page fault code.

Prior to commit b4c001dc44 ("powerpc/mm: Use symbolic constants for
filtering SRR1 bits on ISIs") we used a hard-coded mask of 0x58200000,
which corresponds to:

  DSISR_NOHPTE		0x40000000 /* no translation found */
  DSISR_NOEXEC_OR_G	0x10000000 /* exec of no-exec or guarded */
  DSISR_PROTFAULT	0x08000000 /* protection fault */
  DSISR_KEYFAULT	0x00200000 /* Storage Key fault */

That commit added a #define for the mask, DSISR_SRR1_MATCH_64S, but
incorrectly used a different similarly named DSISR_BAD_FAULT_64S.

This had the effect of changing the mask to 0xa43a0000, which omits
everything but DSISR_KEYFAULT.

Luckily this had no visible effect, because in practice we hardly use
the DSISR bits. The lack of DSISR_NOHPTE means a TLB flush
optimisation was missed in the native HPTE code, and DSISR_NOEXEC_OR_G
and DSISR_PROTFAULT are both only used to trigger rare warnings.

So we got lucky, but let's fix it. The new value only has bits between
17 and 30 set, so we can continue to use andis.

Fixes: b4c001dc44 ("powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIs")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-14 15:48:47 +11:00
Linus Torvalds 2bcc673101 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "Yet another big pile of changes:

   - More year 2038 work from Arnd slowly reaching the point where we
     need to think about the syscalls themself.

   - A new timer function which allows to conditionally (re)arm a timer
     only when it's either not running or the new expiry time is sooner
     than the armed expiry time. This allows to use a single timer for
     multiple timeout requirements w/o caring about the first expiry
     time at the call site.

   - A new NMI safe accessor to clock real time for the printk timestamp
     work. Can be used by tracing, perf as well if required.

   - A large number of timer setup conversions from Kees which got
     collected here because either maintainers requested so or they
     simply got ignored. As Kees pointed out already there are a few
     trivial merge conflicts and some redundant commits which was
     unavoidable due to the size of this conversion effort.

   - Avoid a redundant iteration in the timer wheel softirq processing.

   - Provide a mechanism to treat RTC implementations depending on their
     hardware properties, i.e. don't inflict the write at the 0.5
     seconds boundary which originates from the PC CMOS RTC to all RTCs.
     No functional change as drivers need to be updated separately.

   - The usual small updates to core code clocksource drivers. Nothing
     really exciting"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits)
  timers: Add a function to start/reduce a timer
  pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday()
  timer: Prepare to change all DEFINE_TIMER() callbacks
  netfilter: ipvs: Convert timers to use timer_setup()
  scsi: qla2xxx: Convert timers to use timer_setup()
  block/aoe: discover_timer: Convert timers to use timer_setup()
  ide: Convert timers to use timer_setup()
  drbd: Convert timers to use timer_setup()
  mailbox: Convert timers to use timer_setup()
  crypto: Convert timers to use timer_setup()
  drivers/pcmcia: omap1: Fix error in automated timer conversion
  ARM: footbridge: Fix typo in timer conversion
  drivers/sgi-xp: Convert timers to use timer_setup()
  drivers/pcmcia: Convert timers to use timer_setup()
  drivers/memstick: Convert timers to use timer_setup()
  drivers/macintosh: Convert timers to use timer_setup()
  hwrng/xgene-rng: Convert timers to use timer_setup()
  auxdisplay: Convert timers to use timer_setup()
  sparc/led: Convert timers to use timer_setup()
  mips: ip22/32: Convert timers to use timer_setup()
  ...
2017-11-13 17:56:58 -08:00
Linus Torvalds d6ec9d9a4d Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Ingo Molnar:
 "Note that in this cycle most of the x86 topics interacted at a level
  that caused them to be merged into tip:x86/asm - but this should be a
  temporary phenomenon, hopefully we'll back to the usual patterns in
  the next merge window.

  The main changes in this cycle were:

  Hardware enablement:

   - Add support for the Intel UMIP (User Mode Instruction Prevention)
     CPU feature. This is a security feature that disables certain
     instructions such as SGDT, SLDT, SIDT, SMSW and STR. (Ricardo Neri)

     [ Note that this is disabled by default for now, there are some
       smaller enhancements in the pipeline that I'll follow up with in
       the next 1-2 days, which allows this to be enabled by default.]

   - Add support for the AMD SEV (Secure Encrypted Virtualization) CPU
     feature, on top of SME (Secure Memory Encryption) support that was
     added in v4.14. (Tom Lendacky, Brijesh Singh)

   - Enable new SSE/AVX/AVX512 CPU features: AVX512_VBMI2, GFNI, VAES,
     VPCLMULQDQ, AVX512_VNNI, AVX512_BITALG. (Gayatri Kammela)

  Other changes:

   - A big series of entry code simplifications and enhancements (Andy
     Lutomirski)

   - Make the ORC unwinder default on x86 and various objtool
     enhancements. (Josh Poimboeuf)

   - 5-level paging enhancements (Kirill A. Shutemov)

   - Micro-optimize the entry code a bit (Borislav Petkov)

   - Improve the handling of interdependent CPU features in the early
     FPU init code (Andi Kleen)

   - Build system enhancements (Changbin Du, Masahiro Yamada)

   - ... plus misc enhancements, fixes and cleanups"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (118 commits)
  x86/build: Make the boot image generation less verbose
  selftests/x86: Add tests for the STR and SLDT instructions
  selftests/x86: Add tests for User-Mode Instruction Prevention
  x86/traps: Fix up general protection faults caused by UMIP
  x86/umip: Enable User-Mode Instruction Prevention at runtime
  x86/umip: Force a page fault when unable to copy emulated result to user
  x86/umip: Add emulation code for UMIP instructions
  x86/cpufeature: Add User-Mode Instruction Prevention definitions
  x86/insn-eval: Add support to resolve 16-bit address encodings
  x86/insn-eval: Handle 32-bit address encodings in virtual-8086 mode
  x86/insn-eval: Add wrapper function for 32 and 64-bit addresses
  x86/insn-eval: Add support to resolve 32-bit address encodings
  x86/insn-eval: Compute linear address in several utility functions
  resource: Fix resource_size.cocci warnings
  X86/KVM: Clear encryption attribute when SEV is active
  X86/KVM: Decrypt shared per-cpu variables when SEV is active
  percpu: Introduce DEFINE_PER_CPU_DECRYPTED
  x86: Add support for changing memory encryption attribute in early boot
  x86/io: Unroll string I/O when SEV is active
  x86/boot: Add early boot support when running with SEV active
  ...
2017-11-13 14:13:48 -08:00