This commit broke boot on systems with an uncompressed kernel image,
namely systems using a cuImage. On such systems the compressed boot
image (boot wrapper, uncompressed kernel image, ..) is decompressed
by u-boot already, therefore the boot wrapper code sees an
uncompressed kernel image.
The old decompression code silently assumed an uncompressed kernel
image if it found no valid gzip signature, whilst the new code
bailed out in this case.
Fix this by re-introducing such a fallback if no valid compressed
image is found.
Fixes: 1b7898ee27 ("Use the pre-boot decompression API")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If a cxl adapter faults on an invalid address for a kernel context, we
may enter copro_calculate_slb() with a NULL mm pointer (kernel
context) and an effective address which looks like a user
address. Which will cause a crash when dereferencing mm. It is clearly
an AFU bug, but there's no reason to crash either. So return an error,
so that cxl can ack the interrupt with an address error.
Fixes: 73d16a6e0e ("powerpc/cell: Move data segment faulting code out of cell platform")
Cc: stable@vger.kernel.org # v3.18+
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
extract as much possible uncertainty from a running system at boot time as
possible, hoping to capitalize on any possible variation in CPU operation
(due to runtime data differences, hardware differences, SMP ordering,
thermal timing variation, cache behavior, etc).
At the very least, this plugin is a much more comprehensive example for
how to manipulate kernel code using the gcc plugin internals.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Kees Cook <kees@outflux.net>
iQIcBAABCgAGBQJX/BAFAAoJEIly9N/cbcAmzW8QALFbCs7EFFkML+M/M/9d8zEk
1QbUs/z8covJTTT1PjSdw7JUrAMulI3S00owpcQVd/PcWjRPU80QwfsXBgIB0tvC
Kub2qxn6Oaf+kTB646zwjFgjdCecw/USJP+90nfcu2+LCnE8ReclKd1aUee+Bnhm
iDEUyH2ONIoWq6ta2Z9sA7+E4y2ZgOlmW0iga3Mnf+OcPtLE70fWPoe5E4g9DpYk
B+kiPDrD9ql5zsHaEnKG1ldjiAZ1L6Grk8rGgLEXmbOWtTOFmnUhR+raK5NA/RCw
MXNuyPay5aYPpqDHFm+OuaWQAiPWfPNWM3Ett4k0d9ZWLixTcD1z68AciExwk7aW
SEA8b1Jwbg05ZNYM7NJB6t6suKC4dGPxWzKFOhmBicsh2Ni5f+Az0BQL6q8/V8/4
8UEqDLuFlPJBB50A3z5ngCVeYJKZe8Bg/Swb4zXl6mIzZ9darLzXDEV6ystfPXxJ
e1AdBb41WC+O2SAI4l64yyeswkGo3Iw2oMbXG5jmFl6wY/xGp7dWxw7gfnhC6oOh
afOT54p2OUDfSAbJaO0IHliWoIdmE5ZYdVYVU9Ek+uWyaIwcXhNmqRg+Uqmo32jf
cP5J9x2kF3RdOcbSHXmFp++fU+wkhBtEcjkNpvkjpi4xyA47IWS7lrVBBebrCq9R
pa/A7CNQwibIV6YD8+/p
=1dUK
-----END PGP SIGNATURE-----
Merge tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull gcc plugins update from Kees Cook:
"This adds a new gcc plugin named "latent_entropy". It is designed to
extract as much possible uncertainty from a running system at boot
time as possible, hoping to capitalize on any possible variation in
CPU operation (due to runtime data differences, hardware differences,
SMP ordering, thermal timing variation, cache behavior, etc).
At the very least, this plugin is a much more comprehensive example
for how to manipulate kernel code using the gcc plugin internals"
* tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
latent_entropy: Mark functions with __latent_entropy
gcc-plugins: Add latent_entropy plugin
Pull kbuild updates from Michal Marek:
- EXPORT_SYMBOL for asm source by Al Viro.
This does bring a regression, because genksyms no longer generates
checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is
working on a patch to fix this.
Plus, we are talking about functions like strcpy(), which rarely
change prototypes.
- Fixes for PPC fallout of the above by Stephen Rothwell and Nick
Piggin
- fixdep speedup by Alexey Dobriyan.
- preparatory work by Nick Piggin to allow architectures to build with
-ffunction-sections, -fdata-sections and --gc-sections
- CONFIG_THIN_ARCHIVES support by Stephen Rothwell
- fix for filenames with colons in the initramfs source by me.
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits)
initramfs: Escape colons in depfile
ppc: there is no clear_pages to export
powerpc/64: whitelist unresolved modversions CRCs
kbuild: -ffunction-sections fix for archs with conflicting sections
kbuild: add arch specific post-link Makefile
kbuild: allow archs to select link dead code/data elimination
kbuild: allow architectures to use thin archives instead of ld -r
kbuild: Regenerate genksyms lexer
kbuild: genksyms fix for typeof handling
fixdep: faster CONFIG_ search
ia64: move exports to definitions
sparc32: debride memcpy.S a bit
[sparc] unify 32bit and 64bit string.h
sparc: move exports to definitions
ppc: move exports to definitions
arm: move exports to definitions
s390: move exports to definitions
m68k: move exports to definitions
alpha: move exports to actual definitions
x86: move exports to actual definitions
...
Pull libata updates from Tejun Heo:
- Write same support added
- Minor ahci MSIX irq handling updates
- Non-critical SCSI command translation fixes
- Controller specific changes
* 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: qoriq: Revert "ahci: qoriq: Disable NCQ on ls2080a SoC"
libata: remove <asm-generic/libata-portmap.h>
libata: remove unused definitions from <asm/libata-portmap.h>
pata_at91: Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
ata: Replace BUG() with BUG_ON().
ata: sata_mv: Replacing dma_pool_alloc and memset with a single call dma_pool_zalloc.
libata: Some drives failing on SCT Write Same
ahci: use pci_alloc_irq_vectors
libata: SCT Write Same handle ATA_DFLAG_PIO
libata: SCT Write Same / DSM Trim
libata: Add support for SCT Write Same
libata: Safely overwrite attached page in WRITE SAME xlat
ahci: also use a per-port lock for the multi-MSIX case
ARM: dts: STiH407-family: Add ports-implemented property in sata nodes
ahci: st: Add ports-implemented property in support
ahci: qoriq: enable snoopable sata read and write
ahci: qoriq: adjust sata parameter
libata-scsi: fix MODE SELECT translation for Control mode page
libata-scsi: use u8 array to store mode page copy
Add support for the DMA_ATTR_NO_WARN attribute on powerpc iommu code.
Link: http://lkml.kernel.org/r/1470092390-25451-3-git-send-email-mauricfo@linux.vnet.ibm.com
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit 2b4e3ad8f5 ("powerpc/mm/hash64: Don't test for machine type
to detect HEA special case") we changed the logic in might_have_hea()
to check FW_FEATURE_SPLPAR rather than machine_is(pseries).
However the check was incorrectly negated, leading to crashes on
machines with HEA adapters, such as:
mm: Hashing failure ! EA=0xd000080080004040 access=0x800000000000000e current=NetworkManager
trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2 pte=0xc0003cc033e701ae
Unable to handle kernel paging request for data at address 0xd000080080004040
Call Trace:
.ehea_create_cq+0x148/0x340 [ehea] (unreliable)
.ehea_up+0x258/0x1200 [ehea]
.ehea_open+0x44/0x1a0 [ehea]
...
Fix it by removing the negation.
Fixes: 2b4e3ad8f5 ("powerpc/mm/hash64: Don't test for machine type to detect HEA special case")
Cc: stable@vger.kernel.org # v4.8+
Reported-by: Denis Kirjanov <kda@linux-powerpc.org>
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Debugging a data corruption issue with virtio-net/vhost-net led to
the observation that __copy_tofrom_user was occasionally returning
a value 16 larger than it should. Since the return value from
__copy_tofrom_user is the number of bytes not copied, this means
that __copy_tofrom_user can occasionally return a value larger
than the number of bytes it was asked to copy. In turn this can
cause higher-level copy functions such as copy_page_to_iter_iovec
to corrupt memory by copying data into the wrong memory locations.
It turns out that the failing case involves a fault on the store
at label 79, and at that point the first unmodified byte of the
destination is at R3 + 16. Consequently the exception handler
for that store needs to add 16 to R3 before using it to work out
how many bytes were not copied, but in this one case it was not
adding the offset to R3. To fix it, this moves the label 179 to
the point where we add 16 to R3. I have checked manually all the
exception handlers for the loads and stores in this code and the
rest of them are correct (it would be excellent to have an
automated test of all the exception cases).
This bug has been present since this code was initially
committed in May 2002 to Linux version 2.5.20.
Cc: stable@vger.kernel.org
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
power4_fixup_nap is called from the "common" handlers, not the virt/real
handlers, therefore it should itself be a common handler. Placing it
down in the trampoline space caused it to go out of reach of its
callers, requiring a trampoline inserted at the start of the text
section, which breaks the fixed section address calculations.
Fixes: da2bc4644c ("powerpc/64s: Add new exception vector macros")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This commit fixes a stack corruption in the pseries specific code dealing
with the huge pages.
In __pSeries_lpar_hugepage_invalidate() the buffer used to pass arguments
to the hypervisor is not large enough. This leads to a stack corruption
where a previously saved register could be corrupted leading to unexpected
result in the caller, like the following panic:
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: virtio_balloon ip_tables x_tables autofs4
virtio_blk 8139too virtio_pci virtio_ring 8139cp virtio
CPU: 11 PID: 1916 Comm: mmstress Not tainted 4.8.0 #76
task: c000000005394880 task.stack: c000000005570000
NIP: c00000000027bf6c LR: c00000000027bf64 CTR: 0000000000000000
REGS: c000000005573820 TRAP: 0300 Not tainted (4.8.0)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 84822884 XER: 20000000
CFAR: c00000000010a924 DAR: 420000000014e5e0 DSISR: 40000000 SOFTE: 1
GPR00: c00000000027bf64 c000000005573aa0 c000000000e02800 c000000004447964
GPR04: c00000000404de18 c000000004d38810 00000000042100f5 00000000f5002104
GPR08: e0000000f5002104 0000000000000001 042100f5000000e0 00000000042100f5
GPR12: 0000000000002200 c00000000fe02c00 c00000000404de18 0000000000000000
GPR16: c1ffffffffffe7ff 00003fff62000000 420000000014e5e0 00003fff63000000
GPR20: 0008000000000000 c0000000f7014800 0405e600000000e0 0000000000010000
GPR24: c000000004d38810 c000000004447c10 c00000000404de18 c000000004447964
GPR28: c000000005573b10 c000000004d38810 00003fff62000000 420000000014e5e0
NIP [c00000000027bf6c] zap_huge_pmd+0x4c/0x470
LR [c00000000027bf64] zap_huge_pmd+0x44/0x470
Call Trace:
[c000000005573aa0] [c00000000027bf64] zap_huge_pmd+0x44/0x470 (unreliable)
[c000000005573af0] [c00000000022bbd8] unmap_page_range+0xcf8/0xed0
[c000000005573c30] [c00000000022c2d4] unmap_vmas+0x84/0x120
[c000000005573c80] [c000000000235448] unmap_region+0xd8/0x1b0
[c000000005573d80] [c0000000002378f0] do_munmap+0x2d0/0x4c0
[c000000005573df0] [c000000000237be4] SyS_munmap+0x64/0xb0
[c000000005573e30] [c000000000009560] system_call+0x38/0x108
Instruction dump:
fbe1fff8 fb81ffe0 7c7f1b78 7ca32b78 7cbd2b78 f8010010 7c9a2378 f821ffb1
7cde3378 4bfffea9 7c7b1b79 41820298 <e87f0000> 48000130 7fa5eb78 7fc4f378
Most of the time, the bug is surfacing in a caller up in the stack from
__pSeries_lpar_hugepage_invalidate() which is quite confusing.
This bug is pending since v3.11 but was hidden if a caller of the
caller of __pSeries_lpar_hugepage_invalidate() has pushed the corruped
register (r18 in this case) in the stack and is not using it until
restoring it. GCC 6.2.0 seems to raise it more frequently.
This commit also change the definition of the parameter buffer in
pSeries_lpar_flush_hash_range() to rely on the global define
PLPAR_HCALL9_BUFSIZE (no functional change here).
Fixes: 1a5272866f ("powerpc: Optimize hugepage invalidate")
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Freescale updates from Scott:
"Highlights include qbman support (a prerequisite for datapath drivers
such as ethernet), a PCI DMA fix+improvement, reset handler changes, more
8xx optimizations, and some cleanups and fixes."
Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning
This adds a new gcc plugin named "latent_entropy". It is designed to
extract as much possible uncertainty from a running system at boot time as
possible, hoping to capitalize on any possible variation in CPU operation
(due to runtime data differences, hardware differences, SMP ordering,
thermal timing variation, cache behavior, etc).
At the very least, this plugin is a much more comprehensive example for
how to manipulate kernel code using the gcc plugin internals.
The need for very-early boot entropy tends to be very architecture or
system design specific, so this plugin is more suited for those sorts
of special cases. The existing kernel RNG already attempts to extract
entropy from reliable runtime variation, but this plugin takes the idea to
a logical extreme by permuting a global variable based on any variation
in code execution (e.g. a different value (and permutation function)
is used to permute the global based on loop count, case statement,
if/then/else branching, etc).
To do this, the plugin starts by inserting a local variable in every
marked function. The plugin then adds logic so that the value of this
variable is modified by randomly chosen operations (add, xor and rol) and
random values (gcc generates separate static values for each location at
compile time and also injects the stack pointer at runtime). The resulting
value depends on the control flow path (e.g., loops and branches taken).
Before the function returns, the plugin mixes this local variable into
the latent_entropy global variable. The value of this global variable
is added to the kernel entropy pool in do_one_initcall() and _do_fork(),
though it does not credit any bytes of entropy to the pool; the contents
of the global are just used to mix the pool.
Additionally, the plugin can pre-initialize arrays with build-time
random contents, so that two different kernel builds running on identical
hardware will not have the same starting values.
Signed-off-by: Emese Revfy <re.emese@gmail.com>
[kees: expanded commit message and code comments]
Signed-off-by: Kees Cook <keescook@chromium.org>
Pull crypto updates from Herbert Xu:
"Here is the crypto update for 4.9:
API:
- The crypto engine code now supports hashes.
Algorithms:
- Allow keys >= 2048 bits in FIPS mode for RSA.
Drivers:
- Memory overwrite fix for vmx ghash.
- Add support for building ARM sha1-neon in Thumb2 mode.
- Reenable ARM ghash-ce code by adding import/export.
- Reenable img-hash by adding import/export.
- Add support for multiple cores in omap-aes.
- Add little-endian support for sha1-powerpc.
- Add Cavium HWRNG driver for ThunderX SoC"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (137 commits)
crypto: caam - treat SGT address pointer as u64
crypto: ccp - Make syslog errors human-readable
crypto: ccp - clean up data structure
crypto: vmx - Ensure ghash-generic is enabled
crypto: testmgr - add guard to dst buffer for ahash_export
crypto: caam - Unmap region obtained by of_iomap
crypto: sha1-powerpc - little-endian support
crypto: gcm - Fix IV buffer size in crypto_gcm_setkey
crypto: vmx - Fix memory corruption caused by p8_ghash
crypto: ghash-generic - move common definitions to a new header file
crypto: caam - fix sg dump
hwrng: omap - Only fail if pm_runtime_get_sync returns < 0
crypto: omap-sham - shrink the internal buffer size
crypto: omap-sham - add support for export/import
crypto: omap-sham - convert driver logic to use sgs for data xmit
crypto: omap-sham - change the DMA threshold value to a define
crypto: omap-sham - add support functions for sg based data handling
crypto: omap-sham - rename sgl to sgl_tmp for deprecation
crypto: omap-sham - align algorithms on word offset
crypto: omap-sham - add context export/import stubs
...
Merge updates from Andrew Morton:
- fsnotify updates
- ocfs2 updates
- all of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits)
console: don't prefer first registered if DT specifies stdout-path
cred: simpler, 1D supplementary groups
CREDITS: update Pavel's information, add GPG key, remove snail mail address
mailmap: add Johan Hovold
.gitattributes: set git diff driver for C source code files
uprobes: remove function declarations from arch/{mips,s390}
spelling.txt: "modeled" is spelt correctly
nmi_backtrace: generate one-line reports for idle cpus
arch/tile: adopt the new nmi_backtrace framework
nmi_backtrace: do a local dump_stack() instead of a self-NMI
nmi_backtrace: add more trigger_*_cpu_backtrace() methods
min/max: remove sparse warnings when they're nested
Documentation/filesystems/proc.txt: add more description for maps/smaps
mm, proc: fix region lost in /proc/self/smaps
proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self
proc: add LSM hook checks to /proc/<tid>/timerslack_ns
proc: relax /proc/<tid>/timerslack_ns capability requirements
meminfo: break apart a very long seq_printf with #ifdefs
seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char
proc: faster /proc/*/status
...
Highlights:
- Major rework of Book3S 64-bit exception vectors (Nicholas Piggin)
- Use gas sections for arranging exception vectors et. al.
- Large set of TM cleanups and selftests (Cyril Bur)
- Enable transactional memory (TM) lazily for userspace (Cyril Bur)
- Support for XZ compression in the zImage wrapper (Oliver O'Halloran)
- Add support for bpf constant blinding (Naveen N. Rao)
- Beginnings of upstream support for PA Semi Nemo motherboards (Darren Stevens)
Fixes:
- Ensure .mem(init|exit).text are within _stext/_etext (Michael Ellerman)
- xmon: Don't use ld on 32-bit (Michael Ellerman)
- vdso64: Use double word compare on pointers (Anton Blanchard)
- powerpc/nvram: Fix an incorrect partition merge (Pan Xinhui)
- powerpc: Fix usage of _PAGE_RO in hugepage (Christophe Leroy)
- powerpc/mm: Update FORCE_MAX_ZONEORDER range to allow hugetlb w/4K (Aneesh Kumar K.V)
- Fix memory leak in queue_hotplug_event() error path (Andrew Donnellan)
- Replay hypervisor maintenance interrupt first (Nicholas Piggin)
Cleanups & features:
- Sparse fixes/cleanups (Daniel Axtens)
- Preserve CFAR value on SLB miss caused by access to bogus address (Paul Mackerras)
- Radix MMU fixups for POWER9 (Aneesh Kumar K.V)
- Support for setting used_(vsr|vr|spe) in sigreturn path (for CRIU) (Simon Guo)
- Optimise syscall entry for virtual, relocatable case (Nicholas Piggin)
- Optimise MSR handling in exception handling (Nicholas Piggin)
- Support for kexec with Radix MMU (Benjamin Herrenschmidt)
- powernv EEH fixes (Russell Currey)
- Suprise PCI hotplug support for powernv (Gavin Shan)
- Endian/sparse fixes for powernv PCI (Gavin Shan)
- Defconfig updates (Anton Blanchard)
- Various performance optimisations (Anton Blanchard)
- Align hot loops of memset() and backwards_memcpy()
- During context switch, check before setting mm_cpumask
- Remove static branch prediction in atomic{, 64}_add_unless
- Only disable HAVE_EFFICIENT_UNALIGNED_ACCESS on POWER7 little endian
- Set default CPU type to POWER8 for little endian builds
- KVM: PPC: Book3S HV: Migrate pinned pages out of CMA (Balbir Singh)
- cxl: Flush PSL cache before resetting the adapter (Frederic Barrat)
- cxl: replace loop with for_each_child_of_node(), remove unneeded of_node_put() (Andrew Donnellan)
- Fix HV facility unavailable to use correct handler (Nicholas Piggin)
- Remove unnecessary syscall trampoline (Nicholas Piggin)
- fadump: Fix build break when CONFIG_PROC_VMCORE=n (Michael Ellerman)
- Quieten EEH message when no adapters are found (Anton Blanchard)
- powernv: Add PHB register dump debugfs handle (Russell Currey)
- Use kprobe blacklist for exception handlers & asm functions (Nicholas Piggin)
- Document the syscall ABI (Nicholas Piggin)
- MAINTAINERS: Update cxl maintainers (Michael Neuling)
- powerpc: Remove all usages of NO_IRQ (Michael Ellerman)
Minor cleanups:
- Andrew Donnellan, Christophe Leroy, Colin Ian King, Cyril Bur, Frederic Barrat,
Pan Xinhui, PrasannaKumar Muralidharan, Rui Teng, Simon Guo.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX9x5ZAAoJEFHr6jzI4aWAWQ0P+gOhdtayMsRY0k0dzPmYaFr0
Ha5v968RJaNIyGGM9ARJg8h27PGMaSlBp/9zaYdk1G7xfv/DMR0uq8d8l5pjy/Zw
Jm72WE4PEX/zAcQxry6Y2fDdumO09crTBA/W0hM1UZzqu0bcVUfD+E51ZFYWW7yh
fyhT2YnlucxIcT34pxsLqwTIiZYG4xgN3+YGo0wohY1D1GHE3UZ7SXIglb49yM6v
ZeXrL7SOdERR1w88rC+g99P/cWng5HDS0wPLUbxGT5KIpoOSXOs7EbZwFqQBUy5O
37PB07K5dDyUbrm++l5lUigldF3W1OZQBN5+n8PciulxxwFX84pllTlAxv1p60JR
piEKZ8pl023IF7zMGatUG9qcNOcnbxdMsAhoEhlcFi9ulM/yLzbmRTKVfDYm+O/J
UI+YtcbsgdyOXMdGXCqdpeBNuuypgLG/g7gC8bnk3taS0LUUZLcXtRNuE4tcPJJe
v8FnszaLkjAi83Lmzt3fgZo7DI1RIPwDSw6fY+nBrxCRfEPRVx3f7KhmUXvSeol5
Ln9xpk4AtyQt1RHhckxXwWSUgvXVg2ltmz7ElqK4sQ9mO/D2ZIs6R6fPY4VlJLc4
/2yIV4RLIsbHmdv9IbJ8PBp0VTugSNdicZ904QiAHSZQv/i1mgYuXw3tjR6kuy9f
bKOzNJTwLV1WUsOlUpiq
=Jnn8
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights:
- Major rework of Book3S 64-bit exception vectors (Nicholas Piggin)
- Use gas sections for arranging exception vectors et. al.
- Large set of TM cleanups and selftests (Cyril Bur)
- Enable transactional memory (TM) lazily for userspace (Cyril Bur)
- Support for XZ compression in the zImage wrapper (Oliver
O'Halloran)
- Add support for bpf constant blinding (Naveen N. Rao)
- Beginnings of upstream support for PA Semi Nemo motherboards
(Darren Stevens)
Fixes:
- Ensure .mem(init|exit).text are within _stext/_etext (Michael
Ellerman)
- xmon: Don't use ld on 32-bit (Michael Ellerman)
- vdso64: Use double word compare on pointers (Anton Blanchard)
- powerpc/nvram: Fix an incorrect partition merge (Pan Xinhui)
- powerpc: Fix usage of _PAGE_RO in hugepage (Christophe Leroy)
- powerpc/mm: Update FORCE_MAX_ZONEORDER range to allow hugetlb w/4K
(Aneesh Kumar K.V)
- Fix memory leak in queue_hotplug_event() error path (Andrew
Donnellan)
- Replay hypervisor maintenance interrupt first (Nicholas Piggin)
Various performance optimisations (Anton Blanchard):
- Align hot loops of memset() and backwards_memcpy()
- During context switch, check before setting mm_cpumask
- Remove static branch prediction in atomic{, 64}_add_unless
- Only disable HAVE_EFFICIENT_UNALIGNED_ACCESS on POWER7 little
endian
- Set default CPU type to POWER8 for little endian builds
Cleanups & features:
- Sparse fixes/cleanups (Daniel Axtens)
- Preserve CFAR value on SLB miss caused by access to bogus address
(Paul Mackerras)
- Radix MMU fixups for POWER9 (Aneesh Kumar K.V)
- Support for setting used_(vsr|vr|spe) in sigreturn path (for CRIU)
(Simon Guo)
- Optimise syscall entry for virtual, relocatable case (Nicholas
Piggin)
- Optimise MSR handling in exception handling (Nicholas Piggin)
- Support for kexec with Radix MMU (Benjamin Herrenschmidt)
- powernv EEH fixes (Russell Currey)
- Suprise PCI hotplug support for powernv (Gavin Shan)
- Endian/sparse fixes for powernv PCI (Gavin Shan)
- Defconfig updates (Anton Blanchard)
- KVM: PPC: Book3S HV: Migrate pinned pages out of CMA (Balbir Singh)
- cxl: Flush PSL cache before resetting the adapter (Frederic Barrat)
- cxl: replace loop with for_each_child_of_node(), remove unneeded
of_node_put() (Andrew Donnellan)
- Fix HV facility unavailable to use correct handler (Nicholas
Piggin)
- Remove unnecessary syscall trampoline (Nicholas Piggin)
- fadump: Fix build break when CONFIG_PROC_VMCORE=n (Michael
Ellerman)
- Quieten EEH message when no adapters are found (Anton Blanchard)
- powernv: Add PHB register dump debugfs handle (Russell Currey)
- Use kprobe blacklist for exception handlers & asm functions
(Nicholas Piggin)
- Document the syscall ABI (Nicholas Piggin)
- MAINTAINERS: Update cxl maintainers (Michael Neuling)
- powerpc: Remove all usages of NO_IRQ (Michael Ellerman)
Minor cleanups:
- Andrew Donnellan, Christophe Leroy, Colin Ian King, Cyril Bur,
Frederic Barrat, Pan Xinhui, PrasannaKumar Muralidharan, Rui Teng,
Simon Guo"
* tag 'powerpc-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (156 commits)
powerpc/bpf: Add support for bpf constant blinding
powerpc/bpf: Implement support for tail calls
powerpc/bpf: Introduce accessors for using the tmp local stack space
powerpc/fadump: Fix build break when CONFIG_PROC_VMCORE=n
powerpc: tm: Enable transactional memory (TM) lazily for userspace
powerpc/tm: Add TM Unavailable Exception
powerpc: Remove do_load_up_transact_{fpu,altivec}
powerpc: tm: Rename transct_(*) to ck(\1)_state
powerpc: tm: Always use fp_state and vr_state to store live registers
selftests/powerpc: Add checks for transactional VSXs in signal contexts
selftests/powerpc: Add checks for transactional VMXs in signal contexts
selftests/powerpc: Add checks for transactional FPUs in signal contexts
selftests/powerpc: Add checks for transactional GPRs in signal contexts
selftests/powerpc: Check that signals always get delivered
selftests/powerpc: Add TM tcheck helpers in C
selftests/powerpc: Allow tests to extend their kill timeout
selftests/powerpc: Introduce GPR asm helper header file
selftests/powerpc: Move VMX stack frame macros to header file
selftests/powerpc: Rework FPU stack placement macros and move to header file
selftests/powerpc: Check for VSX preservation across userspace preemption
...
When doing an nmi backtrace of many cores, most of which are idle, the
output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the interrupted
PC to see if it lies within that section.
This commit suitably tags x86 and tile idle routines, and only adds in
the minimal framework for other architectures.
Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
Tested-by: Petr Mladek <pmladek@suse.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This came to light when implementing native 64-bit atomics for ARCv2.
The atomic64 self-test code uses CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
to check whether atomic64_dec_if_positive() is available. It seems it
was needed when not every arch defined it. However as of current code
the Kconfig option seems needless
- for CONFIG_GENERIC_ATOMIC64 it is auto-enabled in lib/Kconfig and a
generic definition of API is present lib/atomic64.c
- arches with native 64-bit atomics select it in arch/*/Kconfig and
define the API in their headers
So I see no point in keeping the Kconfig option
Compile tested for:
- blackfin (CONFIG_GENERIC_ATOMIC64)
- x86 (!CONFIG_GENERIC_ATOMIC64)
- ia64
Link: http://lkml.kernel.org/r/1473703083-8625-3-git-send-email-vgupta@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ming Lin <ming.l@ssi.samsung.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently significant amount of memory is reserved only in kernel booted
to capture kernel dump using the fa_dump method.
Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialize
only certain size memory per node. The certain size takes into account
the dentry and inode cache sizes. Currently the cache sizes are
calculated based on the total system memory including the reserved
memory. However such a kernel when booting the same kernel as fadump
kernel will not be able to allocate the required amount of memory to
suffice for the dentry and inode caches. This results in crashes like
Hence only implement arch_reserved_kernel_pages() for CONFIG_FA_DUMP
configurations. The amount reserved will be reduced while calculating
the large caches and will avoid crashes like the below on large systems
such as 32 TB systems.
Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes)
vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes
swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC)
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3
Call Trace:
dump_stack+0xb0/0xf0 (unreliable)
warn_alloc_failed+0x114/0x160
__vmalloc_node_range+0x304/0x340
__vmalloc+0x6c/0x90
alloc_large_system_hash+0x1b8/0x2c0
inode_init+0x94/0xe4
vfs_caches_init+0x8c/0x13c
start_kernel+0x50c/0x578
start_here_common+0x20/0xa8
Link: http://lkml.kernel.org/r/1472476010-4709-4-git-send-email-srikar@linux.vnet.ibm.com
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Suggested-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All architectures:
Move `make kvmconfig` stubs from x86; use 64 bits for debugfs stats.
ARM:
Important fixes for not using an in-kernel irqchip; handle SError
exceptions and present them to guests if appropriate; proxying of GICV
access at EL2 if guest mappings are unsafe; GICv3 on AArch32 on ARMv8;
preparations for GICv3 save/restore, including ABI docs; cleanups and
a bit of optimizations.
MIPS:
A couple of fixes in preparation for supporting MIPS EVA host kernels;
MIPS SMP host & TLB invalidation fixes.
PPC:
Fix the bug which caused guests to falsely report lockups; other minor
fixes; a small optimization.
s390:
Lazy enablement of runtime instrumentation; up to 255 CPUs for nested
guests; rework of machine check deliver; cleanups and fixes.
x86:
IOMMU part of AMD's AVIC for vmexit-less interrupt delivery; Hyper-V
TSC page; per-vcpu tsc_offset in debugfs; accelerated INS/OUTS in
nVMX; cleanups and fixes.
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJX9iDrAAoJEED/6hsPKofoOPoIAIUlgojkb9l2l1XVDgsXdgQL
sRVhYSVv7/c8sk9vFImrD5ElOPZd+CEAIqFOu45+NM3cNi7gxip9yftUVs7wI5aC
eDZRWm1E4trDZLe54ZM9ThcqZzZZiELVGMfR1+ZndUycybwyWzafpXYsYyaXp3BW
hyHM3qVkoWO3dxBWFwHIoO/AUJrWYkRHEByKyvlC6KPxSdBPSa5c1AQwMCoE0Mo4
K/xUj4gBn9eMelNhg4Oqu/uh49/q+dtdoP2C+sVM8bSdquD+PmIeOhPFIcuGbGFI
B+oRpUhIuntN39gz8wInJ4/GRSeTuR2faNPxMn4E1i1u4LiuJvipcsOjPfe0a18=
=fZRB
-----END PGP SIGNATURE-----
Merge tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Radim Krčmář:
"All architectures:
- move `make kvmconfig` stubs from x86
- use 64 bits for debugfs stats
ARM:
- Important fixes for not using an in-kernel irqchip
- handle SError exceptions and present them to guests if appropriate
- proxying of GICV access at EL2 if guest mappings are unsafe
- GICv3 on AArch32 on ARMv8
- preparations for GICv3 save/restore, including ABI docs
- cleanups and a bit of optimizations
MIPS:
- A couple of fixes in preparation for supporting MIPS EVA host
kernels
- MIPS SMP host & TLB invalidation fixes
PPC:
- Fix the bug which caused guests to falsely report lockups
- other minor fixes
- a small optimization
s390:
- Lazy enablement of runtime instrumentation
- up to 255 CPUs for nested guests
- rework of machine check deliver
- cleanups and fixes
x86:
- IOMMU part of AMD's AVIC for vmexit-less interrupt delivery
- Hyper-V TSC page
- per-vcpu tsc_offset in debugfs
- accelerated INS/OUTS in nVMX
- cleanups and fixes"
* tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (140 commits)
KVM: MIPS: Drop dubious EntryHi optimisation
KVM: MIPS: Invalidate TLB by regenerating ASIDs
KVM: MIPS: Split kernel/user ASID regeneration
KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
KVM: arm/arm64: vgic: Don't flush/sync without a working vgic
KVM: arm64: Require in-kernel irqchip for PMU support
KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register
KVM: PPC: Book3S PR: Support 64kB page size on POWER8E and POWER8NVL
KVM: PPC: Book3S: Remove duplicate setting of the B field in tlbie
KVM: PPC: BookE: Fix a sanity check
KVM: PPC: Book3S HV: Take out virtual core piggybacking code
KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread
ARM: gic-v3: Work around definition of gic_write_bpr1
KVM: nVMX: Fix the NMI IDT-vectoring handling
KVM: VMX: Enable MSR-BASED TPR shadow even if APICv is inactive
KVM: nVMX: Fix reload apic access page warning
kvmconfig: add virtio-gpu to config fragment
config: move x86 kvm_guest.config to a common location
arm64: KVM: Remove duplicating init code for setting VMID
ARM: KVM: Support vgic-v3
...
Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
programs. This can be achieved either by:
(1) retaining the stack setup by the first eBPF program and having all
subsequent eBPF programs re-using it, or,
(2) by unwinding/tearing down the stack and having each eBPF program
deal with its own stack as it sees fit.
To ensure that this does not create loops, there is a limit to how many
tail calls can be done (currently 32). This requires the JIT'ed code to
maintain a count of the number of tail calls done so far.
Approach (1) is simple, but requires every eBPF program to have (almost)
the same prologue/epilogue, regardless of whether they need it. This is
inefficient for small eBPF programs which may not sometimes need a
prologue at all. As such, to minimize impact of tail call
implementation, we use approach (2) here which needs each eBPF program
in the chain to use its own prologue/epilogue. This is not ideal when
many tail calls are involved and when all the eBPF programs in the chain
have similar prologue/epilogue. However, the impact is restricted to
programs that do tail calls. Individual eBPF programs are not affected.
We maintain the tail call count in a fixed location on the stack and
updated tail call count values are passed in through this. The very
first eBPF program in a chain sets this up to 0 (the first 2
instructions). Subsequent tail calls skip the first two eBPF JIT
instructions to maintain the count. For programs that don't do tail
calls themselves, the first two instructions are NOPs.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
While at it, ensure that the location of the local save area is
consistent whether or not we setup our own stackframe. This property is
utilised in the next patch that adds support for tail calls.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The fadump code calls vmcore_cleanup() which only exists if
CONFIG_PROC_VMCORE=y. We don't want to depend on CONFIG_PROC_VMCORE,
because it's user selectable, so just wrap the call in an #ifdef.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently the MSR TM bit is always set if the hardware is TM capable.
This adds extra overhead as it means the TM SPRS (TFHAR, TEXASR and
TFAIR) must be swapped for each process regardless of if they use TM.
For processes that don't use TM the TM MSR bit can be turned off
allowing the kernel to avoid the expensive swap of the TM registers.
A TM unavailable exception will occur if a thread does use TM and the
kernel will enable MSR_TM and leave it so for some time afterwards.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If the kernel disables transactional memory (TM) and userspace still
tries TM related actions (TM instructions or TM SPR accesses) TM aware
hardware will cause the kernel to take a facility unavailable
exception.
Add checks for the exception being caused by illegal TM access in
userspace.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
[mpe: Rewrite comment entirely, bugs in it are mine]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Make the structures being used for checkpointed state named
consistently with the pt_regs/ckpt_regs.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There is currently an inconsistency as to how the entire CPU register
state is saved and restored when a thread uses transactional memory
(TM).
Using transactional memory results in the CPU having duplicated
(almost) all of its register state. This duplication results in a set
of registers which can be considered 'live', those being currently
modified by the instructions being executed and another set that is
frozen at a point in time.
On context switch, both sets of state have to be saved and (later)
restored. These two states are often called a variety of different
things. Common terms for the state which only exists after the CPU has
entered a transaction (performed a TBEGIN instruction) in hardware are
'transactional' or 'speculative'.
Between a TBEGIN and a TEND or TABORT (or an event that causes the
hardware to abort), regardless of the use of TSUSPEND the
transactional state can be referred to as the live state.
The second state is often to referred to as the 'checkpointed' state
and is a duplication of the live state when the TBEGIN instruction is
executed. This state is kept in the hardware and will be rolled back
to on transaction failure.
Currently all the registers stored in pt_regs are ALWAYS the live
registers, that is, when a thread has transactional registers their
values are stored in pt_regs and the checkpointed state is in
ckpt_regs. A strange opposite is true for fp_state/vr_state. When a
thread is non transactional fp_state/vr_state holds the live
registers. When a thread has initiated a transaction fp_state/vr_state
holds the checkpointed state and transact_fp/transact_vr become the
structure which holds the live state (at this point it is a
transactional state).
This method creates confusion as to where the live state is, in some
circumstances it requires extra work to determine where to put the
live state and prevents the use of common functions designed (probably
before TM) to save the live state.
With this patch pt_regs, fp_state and vr_state all represent the
same thing and the other structures [pending rename] are for
checkpointed state.
Acked-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Much of the signal code takes a pt_regs on which it operates. Over
time the signal code has needed to know more about the thread than
what pt_regs can supply, this information is obtained as needed by
using 'current'.
This approach is not strictly incorrect however it does mean that
there is now a hard requirement that the pt_regs being passed around
does belong to current, this is never checked. A safer approach is for
the majority of the signal functions to take a task_struct from which
they can obtain pt_regs and any other information they need. The
caveat that the task_struct they are passed must be current doesn't go
away but can more easily be checked for.
Functions called from outside powerpc signal code are passed a pt_regs
and they can confirm that the pt_regs is that of current and pass
current to other functions, furthurmore, powerpc signal functions can
check that the task_struct they are passed is the same as current
avoiding possible corruption of current (or the task they are passed)
if this assertion ever fails.
CC: paulus@samba.org
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
After a thread is reclaimed from its active or suspended transactional
state the checkpointed state exists on CPU, this state (along with the
live/transactional state) has been saved in its entirety by the
reclaiming process.
There exists a sequence of events that would cause the kernel to call
one of enable_kernel_fp(), enable_kernel_altivec() or
enable_kernel_vsx() after a thread has been reclaimed. These functions
save away any user state on the CPU so that the kernel can use the
registers. Not only is this saving away unnecessary at this point, it
is actually incorrect. It causes a save of the checkpointed state to
the live structures within the thread struct thus destroying the true
live state for that thread.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
msr_check_and_set() always performs a mfmsr() to determine if it needs
to perform an mtmsr(), as mfmsr() can be a costly operation
msr_check_and_set() could return the MSR now on the CPU to avoid
callers of msr_check_and_set having to make their own mfmsr() call.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
giveup_all() causes FPU/VMX/VSX facilities to be disabled in a threads
MSR. If the thread performing the giveup was transactional, the kernel
must record which facilities were in use before the giveup as the
thread must have these facilities re-enabled on return to userspace.
>From process.c:
/*
* This is called if we are on the way out to userspace and the
* TIF_RESTORE_TM flag is set. It checks if we need to reload
* FP and/or vector state and does so if necessary.
* If userspace is inside a transaction (whether active or
* suspended) and FP/VMX/VSX instructions have ever been enabled
* inside that transaction, then we have to keep them enabled
* and keep the FP/VMX/VSX state loaded while ever the transaction
* continues. The reason is that if we didn't, and subsequently
* got a FP/VMX/VSX unavailable interrupt inside a transaction,
* we don't know whether it's the same transaction, and thus we
* don't know which of the checkpointed state and the transactional
* state to use.
*/
Calling check_if_tm_restore_required() will set TIF_RESTORE_TM and
save the MSR if needed.
Fixes: c208505 ("powerpc: create giveup_all()")
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Comment from arch/powerpc/kernel/process.c:967:
If userspace is inside a transaction (whether active or
suspended) and FP/VMX/VSX instructions have ever been enabled
inside that transaction, then we have to keep them enabled
and keep the FP/VMX/VSX state loaded while ever the transaction
continues. The reason is that if we didn't, and subsequently
got a FP/VMX/VSX unavailable interrupt inside a transaction,
we don't know whether it's the same transaction, and thus we
don't know which of the checkpointed state and the ransactional
state to use.
restore_math() restore_fp() and restore_altivec() currently may not
restore the registers. It doesn't appear that this is more serious
than a performance penalty. If the math registers aren't restored the
userspace thread will still be run with the facility disabled.
Userspace will not be able to read invalid values. On the first access
it will take an facility unavailable exception and the kernel will
detected an active transaction, at which point it will abort the
transaction. There is the possibility for a pathological case
preventing any progress by transactions, however, transactions
are never guaranteed to make progress.
Fixes: 70fe3d9 ("powerpc: Restore FPU/VEC/VSX if previously used")
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This fixes warning reported from sparse:
pci-ioda.c:451:49: warning: incorrect type in argument 2 (different base types)
Fixes: 262af557dd ("powerpc/powernv: Enable M64 aperatus for PHB3")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This fixes the warning reported from sparse:
eeh-powernv.c:875:23: warning: constant 0x8000000000000000 is so big it is unsigned long
Fixes: ebe2253127 ("powerpc/powernv: Support PCI slot ID")
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The hub diag-data type is filled with big-endian data by OPAL call
opal_pci_get_hub_diag_data(). We need convert it to CPU-endian value
before using it. The issue is reported by sparse as pointed by Michael
Ellerman:
eeh-powernv.c:1309:21: warning: restricted __be16 degrades to integer
This converts hub diag-data type to CPU-endian before using it in
pnv_eeh_get_and_dump_hub_diag().
Fixes: 2a485ad7c8 ("powerpc/powernv: Drop PHB operation next_error()")
Cc: stable@vger.kernel.org # v4.1+
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The PE number (@frozen_pe_no), filled by opal_pci_next_error() is in
big-endian format. It should be converted to CPU-endian before it is
passed to opal_pci_eeh_freeze_clear() when clearing the frozen state if
the PE is invalid one. As Michael Ellerman pointed out, the issue is
also detected by sparse:
eeh-powernv.c:1541:41: warning: incorrect type in argument 2 (different base types)
This passes CPU-endian PE number to opal_pci_eeh_freeze_clear() and it
should be part of commit <0f36db77643b> ("powerpc/eeh: Fix wrong printed
PE number"), which was merged to 4.3 kernel.
Fixes: 71b540adff ("powerpc/powernv: Don't escalate non-existing frozen PE")
Cc: stable@vger.kernel.org # v4.3+
Suggested-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We supported POWER7 CPUs for bootstrapping little endian, but the
target was always POWER8. Now that POWER7 specific issues are
impacting performance, change the default target to POWER8.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
POWER8 handles unaligned accesses in little endian mode, but commit
0b5e6661ac ("powerpc: Don't set HAVE_EFFICIENT_UNALIGNED_ACCESS on
little endian builds") disabled it for all.
The issue with unaligned little endian accesses is specific to POWER7,
so update the Kconfig check to match. Using the stat() testcase from
commit a75c380c71 ("powerpc: Enable DCACHE_WORD_ACCESS on ppc64le"),
performance improves 15% on POWER8.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
I see quite a lot of static branch mispredictions on a simple
web serving workload. The issue is in __atomic_add_unless(), called
from _atomic_dec_and_lock(). There is no obvious common case, so it
is better to let the hardware predict the branch.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
During context switch, switch_mm() sets our current CPU in mm_cpumask.
We can avoid this atomic sequence in most cases by checking before
setting the bit.
Testing on a POWER8 using our context switch microbenchmark:
tools/testing/selftests/powerpc/benchmarks/context_switch \
--process --no-fp --no-altivec --no-vector
Performance improves 2%.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
No real need for this to be pr_warn(), reduce it to pr_info().
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We are starting to see i40e adapters in recent machines, so enable
it in our configs.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Change a few devices and filesystems that are seldom used any more
from built in to modules. This reduces our vmlinux about 500kB.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When we issue a system reset, every CPU in the box prints an Oops,
including a backtrace. Each of these can be quite large (over 4kB)
and we may end up wrapping the ring buffer and losing important
information.
Bump the base size from 128kB to 256kB and the per CPU size from
4kB to 8kB.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We see big improvements with the VMX crypto functions (often 10x or more),
so enable it as a module.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Align the hot loops in our assembly implementation of memset()
and backwards_memcpy().
backwards_memcpy() is called from tcp_v4_rcv(), so we might
want to optimise this a little more.
Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull CPU hotplug updates from Thomas Gleixner:
"Yet another batch of cpu hotplug core updates and conversions:
- Provide core infrastructure for multi instance drivers so the
drivers do not have to keep custom lists.
- Convert custom lists to the new infrastructure. The block-mq custom
list conversion comes through the block tree and makes the diffstat
tip over to more lines removed than added.
- Handle unbalanced hotplug enable/disable calls more gracefully.
- Remove the obsolete CPU_STARTING/DYING notifier support.
- Convert another batch of notifier users.
The relayfs changes which conflicted with the conversion have been
shipped to me by Andrew.
The remaining lot is targeted for 4.10 so that we finally can remove
the rest of the notifiers"
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
cpufreq: Fix up conversion to hotplug state machine
blk/mq: Reserve hotplug states for block multiqueue
x86/apic/uv: Convert to hotplug state machine
s390/mm/pfault: Convert to hotplug state machine
mips/loongson/smp: Convert to hotplug state machine
mips/octeon/smp: Convert to hotplug state machine
fault-injection/cpu: Convert to hotplug state machine
padata: Convert to hotplug state machine
cpufreq: Convert to hotplug state machine
ACPI/processor: Convert to hotplug state machine
virtio scsi: Convert to hotplug state machine
oprofile/timer: Convert to hotplug state machine
block/softirq: Convert to hotplug state machine
lib/irq_poll: Convert to hotplug state machine
x86/microcode: Convert to hotplug state machine
sh/SH-X3 SMP: Convert to hotplug state machine
ia64/mca: Convert to hotplug state machine
ARM/OMAP/wakeupgen: Convert to hotplug state machine
ARM/shmobile: Convert to hotplug state machine
arm64/FP/SIMD: Convert to hotplug state machine
...
This was not done before the big patches because I only noticed
them afterwards. It has become much easier to see which handlers
are branched to from which exception vectors now, and to see
exactly what vector space is being used for what.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Simple substitution. This is possible now that both parts of the OOL
initial handler get linked into their correct location.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This is not an exception handler as such, it's called from
local_irq_enable(), not exception entry.
Also clean up some now redundant comments at the end of the
consolidation series.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use assembler sections of fixed size and location to arrange the 64-bit
Book3S exception vector code (64-bit Book3E also uses it in head_64.S
for 0x0..0x100).
This allows better flexibility in arranging exception code and hiding
unimportant details behind macros.
Gas sections can be a bit painful to use this way, mainly because the
assembler does not know where they will be finally linked. Taking
absolute addresses requires a bit of trickery for example, but it can
be hidden behind macros for the most part.
Generated code is mostly the same except locations, offsets, alignments.
The "+ 0x2" is only required for the trap number / kvm exit number,
which gets loaded as a constant into a register.
Previously, code also used + 0x2 for label names, but we changed to
using "H" to distinguish HV case for that. Remove the last vestiges
of that.
__after_prom_start is taking absolute address of a label in another
fixed section. Newer toolchains seemed to compile this okay, but older
ones do not. FIXED_SYMBOL_ABS_ADDR is more foolproof, it just takes an
additional line to define.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With a subsequent patch to put text into different sections,
(_end - _stext) can no longer be computed at link time to determine
the end of the copy. Instead, calculate it at runtime with
(copy_to_here - _stext) + (_end - copy_to_here).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move exception handler alignment directives into the head-64.h macros,
beause they will no longer work in-place after the next patch. This
slightly changes functions that have alignments applied and therefore
code generation, which is why it was not done initially (see earlier
patch).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Create arch/powerpc/include/asm/head-64.h with macros that specify
an exception vector (name, type, location), which will be used to
label and lay out exceptions into the object file.
Naming is moved out of exception-64s.h, which is used to specify the
implementation of exception handlers.
objdump of generated code in exception vectors is unchanged except for
names. Alignment directives scattered around are annoying, but done
this way so that disassembly can verify identical instruction
generation before and after patch. These get cleaned up in future
patch.
We change the way KVMTEST works, explicitly passing EXC_HV or EXC_STD
rather than overloading the trap number. This removes the need to have
SOFTEN values for the overloaded trap numbers, eg. 0x502.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The driver does not handle endianness properly when loading the input
data.
Signed-off-by: Marcelo Cerri <marcelo.cerri@canonical.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
__kernel_get_syscall_map() and __kernel_clock_getres() use cmpli to
check if the passed in pointer is non zero. cmpli maps to a 32 bit
compare on binutils, so we ignore the top 32 bits.
A simple test case can be created by passing in a bogus pointer with
the bottom 32 bits clear. Using a clk_id that is handled by the VDSO,
then one that is handled by the kernel shows the problem:
printf("%d\n", clock_getres(CLOCK_REALTIME, (void *)0x100000000));
printf("%d\n", clock_getres(CLOCK_BOOTTIME, (void *)0x100000000));
And we get:
0
-1
The bigger issue is if we pass a valid pointer with the bottom 32 bits
clear, in this case we will return success but won't write any data
to the pointer.
I stumbled across this issue because the LLVM integrated assembler
doesn't accept cmpli with 3 arguments. Fix this by converting them to
cmpldi.
Fixes: a7f290dad3 ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel")
Cc: stable@vger.kernel.org # v2.6.15+
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When PCI Device pass-through is enabled via VFIO, KVM-PPC will
pin pages using get_user_pages_fast(). One of the downsides of
the pinning is that the page could be in CMA region. The CMA
region is used for other allocations like the hash page table.
Ideally we want the pinned pages to be from non CMA region.
This patch (currently only for KVM PPC with VFIO) forcefully
migrates the pages out (huge pages are omitted for the moment).
There are more efficient ways of doing this, but that might
be elaborate and might impact a larger audience beyond just
the kvm ppc implementation.
The magic is in new_iommu_non_cma_page() which allocates the
new page from a non CMA region.
I've tested the patches lightly at my end. The full solution
requires migration of THP pages in the CMA region. That work
will be done incrementally on top of this.
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[mpe: Merged via powerpc tree as that's where the changes are]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This supports PCI surprise hotplug. The design is highlighted as
below:
* The PCI slot's surprise hotplug capability is exposed through
device node property "ibm,slot-surprise-pluggable", meaning
PCI surprise hotplug will be disabled if skiboot doesn't support
it yet.
* The interrupt because of presence or link state change is raised
on surprise hotplug event. One event is allocated and queued to
the PCI slot for workqueue to pick it up and process in serialized
fashion. The code flow for surprise hotplug is same to that for
managed hotplug except: the affected PEs are put into frozen state
to avoid unexpected EEH error reporting in surprise hot remove path.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This unfreezes PE when it's initialized because the PE might be put
into frozen state in the last hot remove path. It's not harmful to
do so if the PE is already in unfrozen state.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This exports eeh_pe_state_mark(). It will be used to mark the surprise
hot removed PE as isolated to avoid unexpected EEH error reporting in
surprise remove path.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This exports @confirm_error_lock so that eeh_serialize_{lock, unlock}()
can be used to freeze the affected PE in PCI surprise hot remove path.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Function eeh_pe_set_option() is used to apply the requested options
(enable, disable, unfreeze) in EEH virtualization path. The semantics
of this function isn't complete until freezing is supported.
This allows to freeze the indicated PE. The new semantics is going to
be used in PCI surprise hot remove path, to freeze removed PCI devices
(PE) to avoid unexpected EEH error reporting.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When issuing PHB reset, OPAL API opal_pci_poll() is called to drive
the state machine in OPAL forward. However, we needn't always call
the function under some circumstances like reset deassert.
This avoids calling opal_pci_poll() when OPAL_SUCCESS is returned
from opal_pci_reset(). Except the overhead introduced by additional
one unnecessary OPAL call, I didn't run into real issue because of
this.
Reported-by: Pridhiviraj Paidipeddi <ppaiddipe@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch adds an option to use XZ compression for the kernel image.
Currently this is only enabled for 64-bit Book3S targets, which is
roughly equivalent to the platforms that use the kernel's zImage
wrapper, and that have been tested.
The bulk of the 32-bit platforms and 64-bit BookE use uboot images,
which relies on uboot implementing XZ. In future we can enable XZ
support for those targets once someone has tested it.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This modifies the wrapper script so that the -Z option takes an argument
to specify the compression type. It can either be 'gz', 'xz' or 'none'.
The legazy --no-gzip and -z options are still supported and will set the
compression to none and gzip respectively, but they are not documented.
Only XZ -6 is used for compression rather than XZ -9. Using compression
levels higher than 6 requires the decompressor to build a large (64MB)
dictionary when decompressing and some environments cannot satisfy such
large allocations (e.g. POWER 6 LPAR partition firmware).
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This code is no longer used and can be removed.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently the powerpc boot wrapper has its own wrapper around zlib to
handle decompressing gzipped kernels. The kernel decompressor library
functions now provide a generic interface that can be used in the
pre-boot environment. This allows boot wrappers to easily support
different compression algorithms. This patch converts the wrapper to use
this new API, but does not add support for using new algorithms.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Most architectures allow the compression algorithm used to produced the
vmlinuz image to be selected as a kernel config option. In preperation
for supporting algorithms other than gzip in the powerpc boot wrapper
the makefile needs to be modified to use these config options.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The powerpc boot wrapper is potentially compiled with a separate
toolchain and/or toolchain flags than the rest of the kernel. The usual
case is a 64-bit big endian kernel builds a 32-bit big endian wrapper.
The main problem with this is that the wrapper does not have access to
the kernel headers (without a lot of gross hacks). To get around this
the required headers are copied into the build directory via several sed
scripts which rewrite problematic includes. This patch moves these
fixups out of the makefile into a separate .sed script file to clean up
makefile slightly.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Reword first paragraph of change log a little]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_time() instead.
CURRENT_TIME is also not y2038 safe.
This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. As part of the effort current_time() will be
extended to do range checks. Hence, it is necessary for all
file system timestamps to use current_time(). Also,
current_time() will be transitioned along with vfs to be
y2038 safe.
Note that whenever a single call to current_time() is used
to change timestamps in different inodes, it is because they
share the same time granularity.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Felipe Balbi <balbi@kernel.org>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The MMCR2 register is available twice, one time with number 785
(privileged access), and one time with number 769 (unprivileged,
but it can be disabled completely). In former times, the Linux
kernel was using the unprivileged register 769 only, but since
commit 8dd75ccb57 ("powerpc: Use privileged SPR number
for MMCR2"), it uses the privileged register 785 instead.
The KVM-PR code then of course also switched to use the SPR 785,
but this is causing older guest kernels to crash, since these
kernels still access 769 instead. So to support older kernels
with KVM-PR again, we have to support register 769 in KVM-PR, too.
Fixes: 8dd75ccb57
Cc: stable@vger.kernel.org # v3.10+
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
On POWER8E and POWER8NVL, KVM-PR does not announce support for
64kB page sizes and 1TB segments yet. Looks like this has just
been forgotton so far, since there is no reason why this should
be different to the normal POWER8 CPUs.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Remove duplicate setting of the the "B" field when doing a tlbie(l).
In compute_tlbie_rb(), the "B" field is set again just before
returning the rb value to be used for tlbie(l).
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
We use logical negate where bitwise negate was intended. It means that
we never return -EINVAL here.
Fixes: ce11e48b7f ('KVM: PPC: E500: Add userspace debug stub support')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This takes out the code that arranges to run two (or more) virtual
cores on a single subcore when possible, that is, when both vcores
are from the same VM, the VM is configured with one CPU thread per
virtual core, and all the per-subcore registers have the same value
in each vcore. Since the VTB (virtual timebase) is a per-subcore
register, and will almost always differ between vcores, this code
is disabled on POWER8 machines, meaning that it is only usable on
POWER7 machines (which don't have VTB). Given the tiny number of
POWER7 machines which have firmware that allows them to run HV KVM,
the benefit of simplifying the code outweighs the loss of this
feature on POWER7 machines.
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
POWER8 has one virtual timebase (VTB) register per subcore, not one
per CPU thread. The HV KVM code currently treats VTB as a per-thread
register, which can lead to spurious soft lockup messages from guests
which use the VTB as the time source for the soft lockup detector.
(CPUs before POWER8 did not have the VTB register.)
For HV KVM, this fixes the problem by making only the primary thread
in each virtual core save and restore the VTB value. With this,
the VTB state becomes part of the kvmppc_vcore structure. This
also means that "piggybacking" of multiple virtual cores onto one
subcore is not possible on POWER8, because then the virtual cores
would share a single VTB register.
PR KVM emulates a VTB register, which is per-vcpu because PR KVM
has no notion of CPU threads or SMT. For PR KVM we move the VTB
state into the kvmppc_vcpu_book3s struct.
Cc: stable@vger.kernel.org # v3.14+
Reported-by: Thomas Huth <thuth@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
- powernv/pci: Fix m64 checks for SR-IOV and window alignment from Russell Currey
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX50CvAAoJEFHr6jzI4aWAqJAP/0/0D8YGwOuIoYD2GmfoasKR
TFbbuhX3xnfdiRG6w/sFBI3oh7icCw7hC+Qj1lNu9D3L/UkxOTBny+W07KvWzX44
Yu74nEHgq3mVrRAU4McztbKIUBK2zagGwwCcGZXZl/uQI1ylvmmpcH3xClQzF+oA
xKk8eB1OW2Ay6+y+FkSuyBHHSfww6QCk7ERPqaStCW9Uy+dDBjIwStLQuOpAhN/o
Z9K+JwpPJ8qgw1Pe9pvrD5MjcM0hR+tUZm6LklZCCk89feqlwcrz9cpOrmTdGuF+
n1iacpDaFf6IOlhI+6ImrT15llTgSk/nu9GNIRFDwOjVCuGy5aDQBtWuRFiVNggp
vkZWFSl594Jn5H9/s6MpMXygSl36NMKgM/ZKvUsEAe6mF0Kb9pZRB7b/aV+ajkCQ
rkQCe0KKSF6+D3wu3SmMe0NTc3/GkgxZN0lTnqUaB5PSRqwvVwurXugnAKr7arhj
JSu9/QSeOxNI5ytDF1Nf9/RN0DT+L1w0vun083DupyJkG1hrjzm9kI0lACQTr/QX
TxAWXGjiTsUOeM4pfNzqaJE4fNUc0TIc41jgWMx9qXzbKjhijgKEPtmyDMz93GVY
hFXyRAMsWUOsQGP5tiLFYG0PkNsmCDIwca+yg47EicBQGTpEsGLYUBRvIILYNBKI
ULl0yMLZWekl1rzthDdB
=35xQ
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull one more powerpc fix from Michael Ellerman:
"powernv/pci: Fix m64 checks for SR-IOV and window alignment from
Russell Currey"
* tag 'powerpc-4.8-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv/pci: Fix m64 checks for SR-IOV and window alignment
Enable the drivers on the powerpc arch.
Signed-off-by: Roy Pledge <roy.pledge@nxp.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>
User space DTLB miss represent approximatly 90% of TLB misses
so make it the shortest path.
Also remove an unneccessary double jump in FixupDAR
Before this patch, we spend 3.3 TB ticks in the handler for each
user address miss and 3.4 TB ticks for each kernel address miss
After this patch, we send 3.0 TB ticks in the handler for each
user address miss and 3.9 TB ticks for each kernel address miss
Taking into account that user misses represent 90% of the total,
this patch provides an improvement of approx. 9%
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
When all options are activated, there is not enough space for the
DTLBMiss handlers that handles IMMR area and linear RAM pages in
the exception area once we have added hugepage handling.
So lets move them after .0x2000
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
During a machine check, the 8xx provides indication of
whether the check is due to data or instruction access, so
let's display it.
Lets also move 8xx specific handling into the new handler.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
When the watchdog is in NMI mode, the system reset interrupt is
generated when the watchdog counter expires.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
This allows PCI devices that can only address (e.g.) 36 or 40 bit DMA to
use direct DMA, at the cost of not being able to DMA to non-RAM addresses
(this doesn't affect MSIs as there is a separate dedicated window for
that) which we wouldn't have been able to do anyway if the RAM size didn't
trigger the creation of the second inbound window.
It also fixes an off-by-one error that set dma_direct_ops on PCI devices
whose dma mask could address all the space below the DMA offset
(previously 40 bits), but not the window that starts at the DMA offset.
Signed-off-by: Scott Wood <oss@buserror.net>
Cc: Tillmann Heidsieck <theidsieck@leenox.de>
Tested-by: Tillmann Heidsieck <theidsieck@leenox.de>
The 8xx has two special registers called EID (External Interrupt
Disable) and EIE (External Interrupt Enable) for clearing/setting
EE in MSR. It avoids the three instructions set mfmsr/ori/mtmsr or
mfmsr/rlwinm/mtmsr and it avoids using a general register.
We just have to write something in the special register to change MSR EE
bit. So we write r0 into the register, regardless of r0 value.
Writing to one of those two special registers also set the MSR RI bit,
but this bit is only unset during beginning of exception prolog and end
of exception epilog. When executing C-functions MSR RI is always set.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
Factor out the common codes of setup arch functions to a separate
function. It does make no sense to print a board specific info
in setup arch functions, so use a more general one.
For ASP8347E board, there is no pci device node. So it is safe to
invoke mpc83xx_setup_pci() in its setup arch function even there is
no such invocation in its original setup arch function.
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <oss@buserror.net>
Commit 0e6e01ff69 ("CPM/QE: use genalloc to manage CPM/QE muram")
has changed the way muram is managed.
genalloc uses kmalloc(), hence requires the SLAB to be up and running.
On powerpc 8xx, cpm_reset() is called early during startup.
cpm_reset() then calls cpm_muram_init() before SLAB is available,
hence the following Oops.
cpm_reset() cannot be called during initcalls because the CPM is
needed for console.
This patch removes the call to cpm_muram_init() from cpm_reset().
cpm_muram_init() will be called from a new function called cpm_init()
which is declared as subsys_initcall, unless cpm_muram_alloc() is
called earlier for the serial console in which case cpm_muram_init()
will be called from there.
The reason for calling it from two places is that some drivers
(e.g. i2c-cpm) need some of the initialisations done by
cpm_muram_init() but don't call cpm_muram_alloc(). The console
driver calls cpm_muram_alloc() but some platforms might not use
the CPM serial ports for console.
[ 0.000000] Unable to handle kernel paging request for data at address 0x00000008
[ 0.000000] Faulting instruction address: 0xc01acce0
[ 0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
[ 0.000000] PREEMPT CMPC885
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.4.14-g0886ed8 #5
[ 0.000000] task: c05183e0 ti: c0536000 task.ti: c0536000
[ 0.000000] NIP: c01acce0 LR: c0011068 CTR: 00000000
[ 0.000000] REGS: c0537e50 TRAP: 0300 Not tainted (4.4.14-s3k-dev-g0886ed8-svn)
[ 0.000000] MSR: 00001032 <ME,IR,DR,RI> CR: 28044428 XER: 00000000
[ 0.000000] DAR: 00000008 DSISR: c0000000
GPR00: c0011068 c0537f00 c05183e0 00000000 00009000 ffffffff 00000bc0 ffffffff
GPR08: ff003000 ff00b000 ff003bbf 00000000 22044422 100d43a8 00000000 07ff94e8
GPR16: 00000000 07bb5d70 00000000 07ff81f4 07ff81f4 07ff81f4 00000000 00000000
GPR24: 07ffb3a0 07fe7628 c0550000 c7ffa190 c0540000 ff003bbf 00000000 00000001
[ 0.000000] NIP [c01acce0] gen_pool_add_virt+0x14/0xdc
[ 0.000000] LR [c0011068] cpm_muram_init+0xd4/0x18c
[ 0.000000] Call Trace:
[ 0.000000] [c0537f00] [00000200] 0x200 (unreliable)
[ 0.000000] [c0537f20] [c0011068] cpm_muram_init+0xd4/0x18c
[ 0.000000] [c0537f70] [c0494684] cpm_reset+0xb4/0xc8
[ 0.000000] [c0537f90] [c0494c64] cmpc885_setup_arch+0x10/0x30
[ 0.000000] [c0537fa0] [c0493cd4] setup_arch+0x130/0x168
[ 0.000000] [c0537fb0] [c04906bc] start_kernel+0x88/0x380
[ 0.000000] [c0537ff0] [c0002224] start_here+0x38/0x98
[ 0.000000] Instruction dump:
[ 0.000000] 91430010 91430014 80010014 83e1000c 7c0803a6 38210010 4e800020 7c0802a6
[ 0.000000] 9421ffe0 bf61000c 90010024 7c7e1b78 <80630008> 7c9c2378 7cc31c30 3863001f
[ 0.000000] ---[ end trace dc8fa200cb88537f ]---
fixes: 0e6e01ff69 ("CPM/QE: use genalloc to manage CPM/QE muram")
Cc: stable@vger.linux.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[scottwood: Removed some string changes unrelated to bugfix]
Signed-off-by: Scott Wood <oss@buserror.net>
Use of_property_read_bool to check for the existence of a property.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression e1,e2;
statement S2,S1;
@@
- if (of_get_property(e1,e2,NULL))
+ if (of_property_read_bool(e1,e2))
S1 else S2
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Scott Wood <oss@buserror.net>
Convert fsl_rstcr_restart into a function to be registered with
register_reset_handler().
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
[scottwood: Converted mvme7100 as well]
Signed-off-by: Scott Wood <oss@buserror.net>
Call out to all restart handlers that were added via
register_restart_handler() API when restarting the machine.
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: Scott Wood <oss@buserror.net>
Factor out a small bit of common code in machine_restart(),
machine_power_off() and machine_halt().
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: Scott Wood <oss@buserror.net>
Halt callback in struct machdep_calls is declared with __noreturn
attribute, so omitting that attribute in gpio_halt_cb()'s signatrue
results in compilation error.
Change the signature to address the problem as well as change the code
of the function to avoid ever returning from the function.
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: Scott Wood <oss@buserror.net>
Select PHYLIB only if NETDEVICES is enabled and MDIO_BITBANG only if
PHYLIB is present to avoid warnings from Kconfig.
To prevent undefined references during linking register MDIO driver only
if CONFIG_MDIO_BITBANG is enabled.
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: Scott Wood <oss@buserror.net>
PHYLIB depends on NETDEVICES, so to avoid unmet dependencies warning
from Kconfig it needs to be selected conditionally.
Also add checks if PHYLIB is built-in to avoid undefined references to
PHYLIB's symbols.
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Signed-off-by: Scott Wood <oss@buserror.net>
The same logic appears twice and should probably be pulled out into a
function.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Rui Teng <rui.teng@linux.vnet.ibm.com>
[mpe: Rename to tm_flush_hash_page() and move comment into the function]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
CLR_TOP32() is defined as blank. Last useful instance of CLR_TOP32()
was removed by commit 40ef8cbc6d ("powerpc: Get 64-bit configs to
compile with ARCH=powerpc") in 2005.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On some CPUs like the 8xx, _PAGE_RW hence _PAGE_WRITE is defined
as 0 and _PAGE_RO has to be set when a page is not writable
_PAGE_RO is defined by default in pte-common.h, however BOOK3S/64
doesn't include that file so _PAGE_RO has to be defined explicitly
in book3s/64/pgtable.h
Fixes: a7b9f671f2 ("powerpc32: adds handling of _PAGE_RO")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In eeh_handle_special_event(), eeh_pe_bus_get() is called before calling
eeh_report_failure() on every device under a PE. If a PE was missing a
bus for some reason, the error would occur before reporting failure, even
though eeh_report_failure() doesn't require a bus.
Fix this by moving the bus retrieval and error check after the
eeh_report_failure() calls.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When the PE used in pnv_eeh_reset() is that of a VF,
pnv_eeh_reset_vf_pe() is used. Unlike the other reset functions called
in pnv_eeh_reset(), the VF reset doesn't require a bus, and if a bus was
missing the function would error out before resetting the VF PE.
To avoid this, reorder the VF reset function to occur before finding and
checking the bus.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
eeh_pe_bus_get() can return NULL if a PCI bus isn't found for a given PE.
Some callers don't check this, and can cause a null pointer dereference
under certain circumstances.
Fix this by checking NULL everywhere eeh_pe_bus_get() is called.
Fixes: 8a6b1bc70d ("powerpc/eeh: EEH core to handle special event")
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When we originally added the ability to split the exception vectors from
the kernel (commit 1f6a93e4c3 ("powerpc: Make it possible to move the
interrupt handlers away from the kernel" 2008-09-15)), the LOAD_HANDLER() macro
used an addi instruction to compute the offset of the common handler
from the kernel base address.
Using addi meant the handler had to be within 32K of the kernel base
address, due to the addi instruction taking a signed immediate value.
That necessitated creating a trampoline for the system call handler,
because system_call_common (in entry64.S) is not linked within 32K of
the kernel base address.
Later in commit 61e2390ede ("powerpc: Make load_hander handle upto 64k
offset" 2012-11-15) we changed LOAD_HANDLER to take a 64K offset, by
changing it to use ori.
Although system_call_common is not in head_64.S or exceptions-64s.S, it
is included in head-y, which causes it to be linked early in the kernel
text, so in practice it ends up below 64K. Additionally if it can't be
placed below 64K the linker will fail to build with a "relocation
truncated to fit" error.
So remove the trampoline.
Newer toolchains are able to work out that the ori in LOAD_HANDLER only
takes a 16 bit offset, and so they generate a 16 bit relocation. Older
toolchains (binutils 2.22 at least) are not so smart, so we have to add
the @l annotation to tell the assembler to generate a 16 bit relocation.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The 0xf80 hv_facility_unavailable trampoline branches to the 0xf60
handler. This works because they both do the same thing, but it should
be fixed.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On EEH events the kernel will print a dump of relevant registers.
If EEH is unavailable (i.e. CONFIG_EEH is disabled, a new platform
doesn't have EEH support, etc) this information isn't readily available.
Add a new debugfs handler to trigger a PHB register dump, so that this
information can be made available on demand.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The only difference is now the TCE table check which doesn't need
to be ifdef'ed out, it will basically do nothing on BookE (it is
only useful for ancient IBM machines).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently we turn the MMU off after copying the image, and we make
sure there is no overlap between the hash table and the target pages
in that case.
That doesn't work for Radix however. In that case, the page tables
are scattered and we can't really enforce that the target of the
image isn't overlapping one of them.
So instead, let's turn the MMU off before copying the image in radix
mode. Thankfully, in radix mode, even under a hypervisor, we know we
don't have the same kind of RMA limitations that hash mode has.
While at it, also turn the MMU off early when using hash in non-LPAR
mode, that way we can get rid of the collision check completely.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Just using the hash ops won't work anymore since radix will have
NULL in there. Instead create an mmu_cleanup_all() function which
will do the right thing based on the MMU mode.
For Radix, for now I clear UPRT and the PTCR, effectively switching
back to Radix with no partition table setup.
Currently set it to NULL on BookE thought it might be a good idea
to wipe the TLB there (Scott ?)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With Radix, it can be NULL even on !BOOKE these days so replace
the ifdef with a NULL check which is cleaner anyway.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes: 9445aa1a30 ("ppc: move exports to definitions")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michal Marek <mmarek@suse.com>
These are a symptom of CRC generation failure in generic
build code, and not powerpc specific.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes: 9445aa1a30 ("ppc: move exports to definitions")
Signed-off-by: Michal Marek <mmarek@suse.com>
Commit 5958d19a14 checks for prefetchable m64 BARs by comparing the
addresses instead of using resource flags. This broke SR-IOV as the m64
check in pnv_pci_ioda_fixup_iov_resources() fails.
The condition in pnv_pci_window_alignment() also changed to checking
only IORESOURCE_MEM_64 instead of both IORESOURCE_MEM_64 and
IORESOURCE_PREFETCH.
Revert these cases to the previous behaviour, adding a new helper function
to do so. This is named pnv_pci_is_m64_flags() to make it clear this
function is only looking at resource flags and should not be relied on for
non-SRIOV resources.
Fixes: 5958d19a14 ("Fix incorrect PE reservation attempt on some 64-bit BARs")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
NO_IRQ has been == 0 on powerpc for just over ten years (since commit
0ebfff1491 ("[POWERPC] Add new interrupt mapping core and change
platforms to use it")). It's also 0 on most other arches.
Although it's fairly harmless, every now and then it causes confusion
when a driver is built on powerpc and another arch which doesn't define
NO_IRQ. There's at least 6 definitions of NO_IRQ in drivers/, at least
some of which are to work around that problem.
So we'd like to remove it. This is fairly trivial in the arch code, we
just convert:
if (irq == NO_IRQ) to if (!irq)
if (irq != NO_IRQ) to if (irq)
irq = NO_IRQ; to irq = 0;
return NO_IRQ; to return 0;
And a few other odd cases as well.
At least for now we keep the #define NO_IRQ, because there is driver
code that uses NO_IRQ and the fixes to remove those will go via other
trees.
Note we also change some occurrences in PPC sound drivers, drivers/ps3,
and drivers/macintosh.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If we fail to allocate work, we don't end up using hp_errlog_copy. Free it
in the error path.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When we merge two contiguous partitions whose signatures are marked
NVRAM_SIG_FREE, We need update prev's length and checksum, then write it
to nvram, not cur's. So lets fix this mistake now.
Also use memset instead of strncpy to set the partition's name. It's
more readable if we want to fill up with duplicate chars .
Fixes: fa2b4e54d4 ("powerpc/nvram: Improve partition removal")
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If kmemdup fails, We need kfree *buff* first then return -ENOMEM.
Otherwise there is a memory leak.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
mtmsrd with L=1 only affects MSR_EE and MSR_RI bits, and we always
know what state those bits are, so the kernel MSR does not need to be
loaded when modifying them.
mtmsrd is often in the critical execution path, so avoiding dependency
on even L1 load is noticable. On a POWER8 this saves about 3 cycles
from the syscall path, and possibly a few from other exception returns
(not measured).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The mflr r10 instruction was left over from when the code used LR to
branch to system_call_entry from the exception handler. That was
changed by commit 6a404806df ("powerpc: Avoid link stack corruption in
MMU on syscall entry path") to use the count register. The value is
never used now, so mflr can be removed, and r10 can be used for storage
rather than spilling to the SPR scratch register.
The scratch register spill causes a long pipeline stall due to the SPR
read after write. This change brings getppid syscall cost from 406 to
376 cycles on POWER8. getppid for non-relocatable case is 371 cycles.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For hugetlb to work with 4K page size, we need MAX_ORDER to be 13 or
more. When switching from a 64K page size to 4K linux page size using
make oldconfig, we end up with a CONFIG_FORCE_MAX_ZONEORDER value of 9.
This results in a 16M hugepage beiing considered as a gigantic huge page
which in turn results in failure to setup hugepages if gigantic hugepage
support is not enabled.
This also results in kernel crash with 4K radix configuration. We
hit the below BUG_ON on radix:
kernel BUG at mm/huge_memory.c:364!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2048 NUMA PowerNV
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc1-00006-gbae9cc6 #1
task: c0000000f1af8000 task.stack: c0000000f1aec000
NIP: c000000000c5fa0c LR: c000000000c5f9d8 CTR: c000000000c5f9a4
REGS: c0000000f1aef920 TRAP: 0700 Not tainted (4.8.0-rc1-00006-gbae9cc6)
MSR: 9000000102029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE,TM[E]> CR: 24000844 XER: 00000000
CFAR: c000000000c5f9e0 SOFTE: 1
....
NIP [c000000000c5fa0c] hugepage_init+0x68/0x238
LR [c000000000c5f9d8] hugepage_init+0x34/0x238
Fixes: a7ee539584 ("powerpc/Kconfig: Update config option based on page size")
Cc: stable@vger.kernel.org # v4.7+
Reported-by: Santhosh <santhog4@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The HMI (Hypervisor Maintenance Interrupt) is defined by the
architecture to be higher priority than other maskable interrupts, so
replay it first, as a best-effort to replay according to hardware
priorities.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In our linker script we open code the list of text sections, because we
need to include the __ftr_alt sections, which are arch-specific.
This means we can't use TEXT_TEXT as defined in vmlinux.lds.h, and so we
don't have the MEM_KEEP() logic for memory hotplug sections.
If we build the kernel with the gold linker, and with CONFIG_MEMORY_HOTPLUG=y,
we see that functions marked __meminit can end up outside of the
_stext/_etext range, and also outside of _sinittext/_einittext, eg:
c000000000000000 T _stext
c0000000009e0000 A _etext
c0000000009e3f18 T hash__vmemmap_create_mapping
c000000000ca0000 T _sinittext
c000000000d00844 T _einittext
This causes them to not be recognised as text by is_kernel_text(), and
prevents them being patched by jump_label (and presumably ftrace/kprobes
etc.).
Fix it by adding MEM_KEEP() directives, mirroring what TEXT_TEXT does.
This isn't a problem when CONFIG_MEMORY_HOTPLUG=n, because we use the
standard INIT_TEXT_SECTION() and EXIT_TEXT macros from vmlinux.lds.h.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently the _GLOBAL() macro unilaterally sets the assembler section to
".text" at the start of the macro. This is rude as the caller may be
using a different section.
So let the caller decide which section to emit the code into. On big
endian we do need to switch to the ".opd" section to emit the OPD, but
do that with pushsection/popsection, thereby leaving the original
section intact.
I verified that the order of all entries in System.map is unchanged
after this patch. The actual addresses shift around slightly so you
can't just diff the System.map.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Rather than forcing the whole function into the ".kprobes.text" section,
just add the symbol's address to the kprobe blacklist.
This also lets us drop the three versions of the_KPROBE macro, in
exchange for just one version of _ASM_NOKPROBE_SYMBOL - which is a good
cleanup.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently we mark the C implementations of some exception handlers as
__kprobes. This has the effect of putting them in the ".kprobes.text"
section, which separates them from the rest of the text.
Instead we can use the blacklist macros to add the symbols to a
blacklist which kprobes will check. This allows the linker to move
exception handler functions close to callers and avoids trampolines in
larger kernels.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Reword change log a bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes for code merged this cycle:
- Fix restore of SPRs upon wake up from hypervisor state loss from Gautham R. Shenoy
- Fix the state of root PE from Gavin Shan
- Detach from PE on releasing PCI device from Gavin Shan
- Fix size of NUM_CPU_FTR_KEYS on 32-bit
- Fix missed TCE invalidations that should fallback to OPAL
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX3QI8AAoJEFHr6jzI4aWAmDEP/3k8Tuk30d+QhfVm++N18cZ7
EFdpO25m+kJH83PVc3Lri6sj2z6/Bpm6Ib2V3gynExB9SxUCcAJXHqwioLkL9/PW
aUiotxsPvlpfBFAjNbk2myB8JSbc/+8yJaojKYWqwX796bjUdRkI7rmXtfrjmX6U
uhQQ9nvKNxThwY5eedMH9PCJ89BzgLefrExHUD171iR43qfaouLkUn/Ba+UIhC5m
pepwePCTXHEPm8e328hYVSNEmqWRgL+UN2EUZKqXjITNtDSHCdwGTF8iifwTku54
g/rrta8CgFD4x5chTROnOhJMkTD9MRoneVR8nE4QD6yMHj9k1huL8J8wlfnG/zbB
Ym6MNKBYbGPMAoYfbxAcvWr/7XL+szNoR+p+VWl+rgf2Z08dQaI4zNiB3aimCs1g
7yWW649Gd4gXyNygfeMCDWGZbVhQdQIHcNrcAKFuIRvkn3iPZ0cPa5GYxZ7o/32B
oKAtZMsufGN0eC21hbLaRkyeYPdqEjyk+T734t05cfBCvScWkHeBapnX9gYOoCqZ
ok7b8wXVqVFXZ+FSZ8Ec7YquUHBhHECpqofMgB6d9DqbWPlubwiA3g4YnjrpFDC7
u4a4bVKVZy8fk3w7+2ibkIdud35zL0LqkB2ZNhOn3IYM/yBD0zgUs+bIqruTKZ2+
AYapeGjmf+SBD3ytGtab
=MG5t
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.8-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fixes for code merged this cycle:
- Fix restore of SPRs upon wake up from hypervisor state loss from
Gautham R Shenoy
- Fix the state of root PE from Gavin Shan
- Detach from PE on releasing PCI device from Gavin Shan
- Fix size of NUM_CPU_FTR_KEYS on 32-bit
- Fix missed TCE invalidations that should fallback to OPAL"
* tag 'powerpc-4.8-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv/pci: Fix missed TCE invalidations that should fallback to OPAL
powerpc/powernv: Detach from PE on releasing PCI device
powerpc/powernv: Fix the state of root PE
powerpc/kernel: Fix size of NUM_CPU_FTR_KEYS on 32-bit
powerpc/powernv: Fix restore of SPRs upon wake up from hypervisor state loss
Two stubs are added:
o kvm_arch_has_vcpu_debugfs(): must return true if the arch
supports creating debugfs entries in the vcpu debugfs dir
(which will be implemented by the next commit)
o kvm_arch_create_vcpu_debugfs(): code that creates debugfs
entries in the vcpu debugfs dir
For x86, this commit introduces a new file to avoid growing
arch/x86/kvm/x86.c even more.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In commit f0228c4130 ("powerpc/powernv/pci: Fallback to OPAL for TCE
invalidations"), we added logic to fallback to OPAL for doing TCE
invalidations if we can't do it in Linux.
Ben sent a v2 of the patch, containing these additional call sites, but
I had already applied v1 and didn't notice. So fix them now.
Fixes: f0228c4130 ("powerpc/powernv/pci: Fallback to OPAL for TCE invalidations")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The PCI hotplug can be part of EEH error recovery. The @pdn and
the device's PE number aren't removed and added afterwords. The
PE number in @pdn should be set to an invalid one. Otherwise, the
PE's device count is decreased on removing devices while failing
to be increased on adding devices. It leads to unbalanced PE's
device count and make normal PCI hotplug path broken.
Fixes: c5f7700bbd ("powerpc/powernv: Dynamically release PE")
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull uaccess fixes from Al Viro:
"Fixes for broken uaccess primitives - mostly lack of proper zeroing
in copy_from_user()/get_user()/__get_user(), but for several
architectures there's more (broken clear_user() on frv and
strncpy_from_user() on hexagon)"
* 'uaccess-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
avr32: fix copy_from_user()
microblaze: fix __get_user()
microblaze: fix copy_from_user()
m32r: fix __get_user()
blackfin: fix copy_from_user()
sparc32: fix copy_from_user()
sh: fix copy_from_user()
sh64: failing __get_user() should zero
score: fix copy_from_user() and friends
score: fix __get_user/get_user
s390: get_user() should zero on failure
ppc32: fix copy_from_user()
parisc: fix copy_from_user()
openrisc: fix copy_from_user()
nios2: fix __get_user()
nios2: copy_from_user() should zero the tail of destination
mn10300: copy_from_user() should zero on access_ok() failure...
mn10300: failing __get_user() and get_user() should zero
mips: copy_from_user() must zero the destination on access_ok() failure
ARC: uaccess: get_user to zero out dest in cause of fault
...
The PE for root bus (root PE) can be removed because of PCI hot
remove in EEH recovery path for fenced PHB error. We need update
@phb->root_pe_populated accordingly so that the root PE can be
populated again in forthcoming PCI hot add path. Also, the PE
shouldn't be destroyed as it's global and reserved resource.
Fixes: c5f7700bbd ("powerpc/powernv: Dynamically release PE")
Reported-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
should clear on access_ok() failures. Also remove the useless
range truncation logics.
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Normally, when MSR[VSX/VR/SPE] bits == 1, the used_vsr/used_vr/used_spe
bit have already been set. However when loading a signal frame from user
space we need to explicitly set used_vsr/used_vr/used_spe to make them
consistent with the MSR bits from the signal frame.
For example, CRIU application, who utilizes sigreturn to restore
checkpointed process, will lead to the case where MSR[VSX] bit is active
in signal frame, but used_vsr bit is not set in the kernel. (the same
applies to VR/SPE).
This patch fixes this by always setting used_* bit when MSR related bits
are active in signal frame and we are doing sigreturn.
Based on a proposal by Benh.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
[mpe: Massage change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The ckpt_regs usage in gpr32_set_common/gpr32_get_common() will lead to
following cppcheck error at ifndef CONFIG_PPC_TRANSACTIONAL_MEM case:
[arch/powerpc/kernel/ptrace.c:2062]:
(error) Uninitialized variable: ckpt_regs
[arch/powerpc/kernel/ptrace.c:2130]:
(error) Uninitialized variable: ckpt_regs
The problem is due to gpr32_set_common() used ckpt_regs variable which
only makes sense at #ifdef CONFIG_PPC_TRANSACTIONAL_MEM.
This patch fix this issue by passing in "regs" parameter instead.
Reported-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The message is missing a \n, add it. Switch to pr_warn(), it's shorter
and less ugly.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Power9 DD1 requires to update the hid0 register when switching from
hash to radix.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
POWER9 DD1 requires pte to be marked invalid (V=0) before updating
it with the new value. This makes this distinction for the different
revisions.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
POWER9 DD1 uses RTS - 28 for the RTS value but other revisions use
RTS - 31. This makes this distinction for the different revisions
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The of_node for the SB600 (io-bridge) has its device_type set to
'io-bridge' Set it to 'isa' so that it can be found by
isa_bridge_find_early() instead of using patches in the kernel.
Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The device tree on the Nemo passes all of the i8259 interrupts with
numbers between 212 and 222, and points their interrupt-parent property
to the pasemi-opic, requiring custom patches to the kernel. Fix the
values so that they can be controlled by the generic ppc i8259 code.
Signed-off-by: Darren Stevens <darren@stevens-zone.net>
[mpe: Rework deeply nested if and boundary checks]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add config option for the Nemo motherboard used in the Amigaone X1000.
This is a custom PASemi board with an AMD SB600 southbridge, and needs
some patches to it device tree. This option will be used to build these
into the kernel
Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Trivial fix to spelling mistake in dev_warn message and remove
extraneous trailing whitespace at end of the message.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 2578bfae84 ("[POWERPC] Create and use CONFIG_WORD_SIZE") added
CONFIG_WORD_SIZE, and suggests that other arches were going to do
likewise.
But that never happened, powerpc is the only architecture which uses it.
So switch to using a simple make variable, BITS, like x86, sh, sparc and
tile. It is also easier to spell and simpler, avoiding any confusion
about whether it's defined due to ordering of make vs kconfig.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Some of the rules in the boot Makefile use @ to hide the command, this
means "make V=1" doesn't show them, which is confusing.
So use the Kbuild standard $(Q) which means KBUILD_VERBOSE=1 or V=1 will
work as expected.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In fact it makes no sense at all to have this defined on little endian
builds. Since we disabled the 32-bit VDSO on little endian, we don't
build any 32-bit code when building a little endian kernel.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The LOAD_HANDLER macro requires that you have previously loaded "reg"
with PACAKBASE. Although that gives callers flexibility to get PACAKBASE
in some interesting way, none of the callers actually do that. So fold
the load of PACAKBASE into the macro, making it simpler for callers to
use correctly.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The comment for LOAD_HANDLER() was wrong. The part about kdump has not
been true since 1f6a93e4c3 ("powerpc: Make it possible to move the
interrupt handlers away from the kernel").
Describe how it currently works, and combine the two separate comments
into one.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently, if userspace or the kernel accesses a completely bogus address,
for example with any of bits 46-59 set, we first take an SLB miss interrupt,
install a corresponding SLB entry with VSID 0, retry the instruction, then
take a DSI/ISI interrupt because there is no HPT entry mapping the address.
However, by the time of the second interrupt, the Come-From Address Register
(CFAR) has been overwritten by the rfid instruction at the end of the SLB
miss interrupt handler. Since bogus accesses can often be caused by a
function return after the stack has been overwritten, the CFAR value would
be very useful as it could indicate which function it was whose return had
led to the bogus address.
This patch adds code to create a full exception frame in the SLB miss handler
in the case of a bogus address, rather than inserting an SLB entry with a
zero VSID field. Then we call a new slb_miss_bad_addr() function in C code,
which delivers a signal for a user access or creates an oops for a kernel
access. In the latter case the oops message will show the CFAR value at the
time of the access.
In the case of the radix MMU, a segment miss interrupt indicates an access
outside the ranges mapped by the page tables. Previously this was handled
by the code for an unrecoverable SLB miss (one with MSR[RI] = 0), which is
not really correct. With this patch, we now handle these interrupts with
slb_miss_bad_addr(), which is much more consistent.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In commit 31cdd0c39c ("powerpc/xmon: Fix SPR read/write commands and
add command to dump SPRs") I added two uses of the "ld" instruction in
spr_access.S. "ld" is a 64-bit instruction, so shouldn't be used on
32-bit CPUs.
Replace it with PPC_LL which is a macro that gives us either "ld" or
"lwz" depending on whether we're 64 or 32-bit.
Fixes: 31cdd0c39c ("powerpc/xmon: Fix SPR read/write commands and add command to dump SPRs")
Cc: stable@vger.kernel.org # v4.7+
Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Another set of things that are only called from assembler and so need
prototypes to keep sparse happy.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Firmware Assisted Dump is a facility to dump kernel core with assistance
from firmware. As part of this process the kernel ELF ABI version is
stored in the core file.
Currently fadump.h defines this to 0 if it is not already defined. This
clashes with a define in elf.h which sets it based on the current task -
not based on the kernel's ELF ABI version.
Use the compiler-provided #define _CALL_ELF which tells us the ELF ABI
version of the kernel to set e_flags, this matches what binutils does.
Remove the definition in fadump.h, which becomes unused.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Squash a bunch of sparse warnings by making things static.
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Adjust jump labels according to the current Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The number of CPU feature keys is meant to map 1:1 to the number of CPU
feature flags defined in cputable.h, and the latter must fit in an
unsigned long.
In commit 4db7327194 ("powerpc: Add option to use jump label for
cpu_has_feature()"), I incorrectly defined NUM_CPU_FTR_KEYS to 64.
There should be no real adverse consequences of this bug, other than us
allocating too many keys.
Fix it by using BITS_PER_LONG.
Fixes: 4db7327194 ("powerpc: Add option to use jump label for cpu_has_feature()")
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
pnv_wakeup_tb_loss() currently expects cr4 to be "eq" if the CPU is
waking up from a complete hypervisor state loss. Hence, it currently
restores the SPR contents only if cr4 is "eq".
However, after commit bcef83a00d ("powerpc/powernv: Add platform
support for stop instruction"), on ISA v3.0 CPUs, the function
pnv_restore_hyp_resource() sets cr4 to contain the result of the
comparison between the state the CPU has woken up from and the first
deep stop state before calling pnv_wakeup_tb_loss().
Thus if the CPU woke up from a state that is deeper than the first
deep stop state, cr4 will have "gt" set and hence, pnv_wakeup_tb_loss()
will fail to restore the SPRs on waking up from such a state.
Fix the code in pnv_wakeup_tb_loss() to restore the SPR states when cr4
is "eq" or "gt".
Fixes: bcef83a00d ("powerpc/powernv: Add platform support for stop instruction")
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Shreyas B. Prabhu <shreyasbp@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".
* Replace the specification of a data structure by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kcalloc".
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
This issue was detected also by using the Coccinelle software.
* Replace the specification of data structures by pointer dereferences
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The local variable "g2h_bitmap" will be set to an appropriate value
a bit later. Thus omit the explicit initialisation at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The kfree() function was called in two cases by the
kvm_vcpu_ioctl_config_tlb() function during error handling
even if the passed data structure element contained a null pointer.
* Split a condition check for memory allocation failures.
* Adjust jump targets according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".
This issue was detected by using the Coccinelle software.
* Replace the specification of a data type by a pointer dereference
to make the corresponding size determination a bit safer according to
the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Add VCPU stat counters to track affinity for passthrough
interrupts.
pthru_all: Counts all passthrough interrupts whose IRQ mappings are
in the kvmppc_passthru_irq_map structure.
pthru_host: Counts all cached passthrough interrupts that were injected
from the host through kvm_set_irq (i.e. not handled in
real mode).
pthru_bad_aff: Counts how many cached passthrough interrupts have
bad affinity (receiving CPU is not running VCPU that is
the target of the virtual interrupt in the guest).
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
When a guest has a PCI pass-through device with an interrupt, it
will direct the interrupt to a particular guest VCPU. In fact the
physical interrupt might arrive on any CPU, and then get
delivered to the target VCPU in the emulated XICS (guest interrupt
controller), and eventually delivered to the target VCPU.
Now that we have code to handle device interrupts in real mode
without exiting to the host kernel, there is an advantage to having
the device interrupt arrive on the same sub(core) as the target
VCPU is running on. In this situation, the interrupt can be
delivered to the target VCPU without any exit to the host kernel
(using a hypervisor doorbell interrupt between threads if
necessary).
This patch aims to get passed-through device interrupts arriving
on the correct core by setting the interrupt server in the real
hardware XICS for the interrupt to the first thread in the (sub)core
where its target VCPU is running. We do this in the real-mode H_EOI
code because the H_EOI handler already needs to look at the
emulated ICS state for the interrupt (whereas the H_XIRR handler
doesn't), and we know we are running in the target VCPU context
at that point.
We set the server CPU in hardware using an OPAL call, regardless of
what the IRQ affinity mask for the interrupt says, and without
updating the affinity mask. This amounts to saying that when an
interrupt is passed through to a guest, as a matter of policy we
allow the guest's affinity for the interrupt to override the host's.
This is inspired by an earlier patch from Suresh Warrier, although
none of this code came from that earlier patch.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
When a passthrough IRQ is handled completely within KVM real
mode code, it has to also update the IRQ stats since this
does not go through the generic IRQ handling code.
However, the per CPU kstat_irqs field is an allocated (not static)
field and so cannot be directly accessed in real mode safely.
The function this_cpu_inc_rm() is introduced to safely increment
per CPU fields (currently coded for unsigned integers only) that
are allocated and could thus be vmalloced also.
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Add a module parameter kvm_irq_bypass for kvm_hv.ko to
disable IRQ bypass for passthrough interrupts. The default
value of this tunable is 1 - that is enable the feature.
Since the tunable is used by built-in kernel code, we use
the module_param_cb macro to achieve this.
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Dump the passthrough irqmap structure associated with a
guest as part of /sys/kernel/debug/powerpc/kvm-xics-*.
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
In existing real mode ICP code, when updating the virtual ICP
state, if there is a required action that cannot be completely
handled in real mode, as for instance, a VCPU needs to be woken
up, flags are set in the ICP to indicate the required action.
This is checked when returning from hypercalls to decide whether
the call needs switch back to the host where the action can be
performed in virtual mode. Note that if h_ipi_redirect is enabled,
real mode code will first try to message a free host CPU to
complete this job instead of returning the host to do it ourselves.
Currently, the real mode PCI passthrough interrupt handling code
checks if any of these flags are set and simply returns to the host.
This is not good enough as the trap value (0x500) is treated as an
external interrupt by the host code. It is only when the trap value
is a hypercall that the host code searches for and acts on unfinished
work by calling kvmppc_xics_rm_complete.
This patch introduces a special trap BOOK3S_INTERRUPT_HV_RM_HARD
which is returned by KVM if there is unfinished business to be
completed in host virtual mode after handling a PCI passthrough
interrupt. The host checks for this special interrupt condition
and calls into the kvmppc_xics_rm_complete, which is made an
exported function for this reason.
[paulus@ozlabs.org - moved logic to set r12 to BOOK3S_INTERRUPT_HV_RM_HARD
in book3s_hv_rmhandlers.S into the end of kvmppc_check_wake_reason.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Currently, KVM switches back to the host to handle any external
interrupt (when the interrupt is received while running in the
guest). This patch updates real-mode KVM to check if an interrupt
is generated by a passthrough adapter that is owned by this guest.
If so, the real mode KVM will directly inject the corresponding
virtual interrupt to the guest VCPU's ICS and also EOI the interrupt
in hardware. In short, the interrupt is handled entirely in real
mode in the guest context without switching back to the host.
In some rare cases, the interrupt cannot be completely handled in
real mode, for instance, a VCPU that is sleeping needs to be woken
up. In this case, KVM simply switches back to the host with trap
reason set to 0x500. This works, but it is clearly not very efficient.
A following patch will distinguish this case and handle it
correctly in the host. Note that we can use the existing
check_too_hard() routine even though we are not in a hypercall to
determine if there is unfinished business that needs to be
completed in host virtual mode.
The patch assumes that the mapping between hardware interrupt IRQ
and virtual IRQ to be injected to the guest already exists for the
PCI passthrough interrupts that need to be handled in real mode.
If the mapping does not exist, KVM falls back to the default
existing behavior.
The KVM real mode code reads mappings from the mapped array in the
passthrough IRQ map without taking any lock. We carefully order the
loads and stores of the fields in the kvmppc_irq_map data structure
using memory barriers to avoid an inconsistent mapping being seen by
the reader. Thus, although it is possible to miss a map entry, it is
not possible to read a stale value.
[paulus@ozlabs.org - get irq_chip from irq_map rather than pimap,
pulled out powernv eoi change into a separate patch, made
kvmppc_read_intr get the vcpu from the paca rather than being
passed in, rewrote the logic at the end of kvmppc_read_intr to
avoid deep indentation, simplified logic in book3s_hv_rmhandlers.S
since we were always restoring SRR0/1 anyway, get rid of the cached
array (just use the mapped array), removed the kick_all_cpus_sync()
call, clear saved_xirr PACA field when we handle the interrupt in
real mode, fix compilation with CONFIG_KVM_XICS=n.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Sparse checking revealed that it is no longer used. The last usage was
removed in commit 2e19458312 ("[POWERPC] Cell interrupt rework") in
2006.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
- Don't alias user region to other regions below PAGE_OFFSET from Paul Mackerras
- Fix again csum_partial_copy_generic() on 32-bit from Christophe Leroy
- Fix corrupted PE allocation bitmap on releasing PE from Gavin Shan
Fixes for code merged this cycle:
- Fix crash on releasing compound PE from Gavin Shan
- Fix processor numbers in OPAL ICP from Benjamin Herrenschmidt
- Fix little endian build with CONFIG_KEXEC=n from Thiago Jung Bauermann
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX0qKlAAoJEFHr6jzI4aWAu30QAK88plLxJ2z5Lyf7axdHf7P0
NfM6xQLyuUQ1xP3ksUB4/4eps3jINJ0bZRd4mMJ3fO/YgNsBACayB9GeE478tMbp
1KmK94qpLk1u4BUxXNtsWJLEuzVqAPr2cGh6jddmkPXGCUx1MFatEVNVJupX+Vt9
sJsmhLatUucZEQI6r4sK5wDOdLYIQgcgTIWW5qHH7jyJDKLGyJbNPtmQhbMWU0a5
zBwD+paecJSGTJEVjd3UwBic+oXt8chwiZkaHLu4Rh6JQ0yVRL4If4EYCodHIpDR
H7b0P9De9W6a+IWLjVDMhYKq9rBjjgZwcjMplkO7gBE2P+v/NGzbfORJtNXeOgKE
/RSWufpTbpiGyUzP1Lr/j0O59ZoijRGBK8zuha5FtsTlhl909ifc6KuHO5aqHY9r
I5o7ws+hSBM1u9cf0Bl011P4uToYzy1auMsZsjDW2SdDEFtJ+WK+0I2vp+M9Jv73
/F48n/EWUuul5oS2Uar+V2AUADpnYPRi50OR1zVJxdJSM8bZFue4brBFfx1bI/2/
jmK87hxNwJtYT45KiuEXr2FWMiB1iNHHxL/OEwWbitf2MfRjq8+LHbdt9FxOSj3/
+8cw3f1zyEjNsvH380HhkUBZknKmD7z8V5Ko5Dx5h8cuRlL+QEW2GnW+1NN7VMoQ
T7QTHRR4ziHSKdzAIlTe
=q0Jo
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fixes marked for stable:
- Don't alias user region to other regions below PAGE_OFFSET from
Paul Mackerras
- Fix again csum_partial_copy_generic() on 32-bit from Christophe
Leroy
- Fix corrupted PE allocation bitmap on releasing PE from Gavin Shan
Fixes for code merged this cycle:
- Fix crash on releasing compound PE from Gavin Shan
- Fix processor numbers in OPAL ICP from Benjamin Herrenschmidt
- Fix little endian build with CONFIG_KEXEC=n from Thiago Jung
Bauermann"
* tag 'powerpc-4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
powerpc/32: Fix again csum_partial_copy_generic()
powerpc/powernv: Fix corrupted PE allocation bitmap on releasing PE
powerpc/powernv: Fix crash on releasing compound PE
powerpc/xics/opal: Fix processor numbers in OPAL ICP
powerpc/pseries: Fix little endian build with CONFIG_KEXEC=n
Add the irq_bypass_add_producer and irq_bypass_del_producer
functions. These functions get called whenever a GSI is being
defined for a guest. They create/remove the mapping between
host real IRQ numbers and the guest GSI.
Add the following helper functions to manage the
passthrough IRQ map.
kvmppc_set_passthru_irq()
Creates a mapping in the passthrough IRQ map that maps a host
IRQ to a guest GSI. It allocates the structure (one per guest VM)
the first time it is called.
kvmppc_clr_passthru_irq()
Removes the passthrough IRQ map entry given a guest GSI.
The passthrough IRQ map structure is not freed even when the
number of mapped entries goes to zero. It is only freed when
the VM is destroyed.
[paulus@ozlabs.org - modified to use is_pnv_opal_msi() rather than
requiring all passed-through interrupts to use the same irq_chip;
changed deletion so it zeroes out the r_hwirq field rather than
copying the last entry down and decrementing the number of entries.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This patch introduces an IRQ mapping structure, the
kvmppc_passthru_irqmap structure that is to be used
to map the real hardware IRQ in the host with the virtual
hardware IRQ (gsi) that is injected into a guest by KVM for
passthrough adapters.
Currently, we assume a separate IRQ mapping structure for
each guest. Each kvmppc_passthru_irqmap has a mapping arrays,
containing all defined real<->virtual IRQs.
[paulus@ozlabs.org - removed irq_chip field from struct
kvmppc_passthru_irqmap; changed parameter for
kvmppc_get_passthru_irqmap from struct kvm_vcpu * to struct
kvm *, removed small cached array.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Select IRQ_BYPASS_MANAGER for PPC when CONFIG_KVM is set.
Add the PPC producer functions for add and del producer.
[paulus@ozlabs.org - Moved new functions from book3s.c to powerpc.c
so booke compiles; added kvm_arch_has_irq_bypass implementation.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Modify kvmppc_read_intr to make it a C function. Because it is called
from kvmppc_check_wake_reason, any of the assembler code that calls
either kvmppc_read_intr or kvmppc_check_wake_reason now has to assume
that the volatile registers might have been modified.
This also adds in the optimization of clearing saved_xirr in the case
where we completely handle and EOI an IPI. Without this, the next
device interrupt will require two trips through the host interrupt
handling code.
[paulus@ozlabs.org - made kvmppc_check_wake_reason create a stack frame
when it is calling kvmppc_read_intr, which means we can set r12 to
the trap number (0x500) after the call to kvmppc_read_intr, instead
of using r31. Also moved the deliver_guest_interrupt label so as to
restore XER and CTR, plus other minor tweaks.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This merges the topic branch 'kvm-ppc-infrastructure' into kvm-ppc-next
so that I can then apply further patches that need the changes in the
kvm-ppc-infrastructure branch.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
hmi.c functions are unused unless sibling_subcore_state is nonzero, and
that in turn happens only if KVM is in use. So move the code to
arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE
rather than CONFIG_PPC_BOOK3S_64. The sibling_subcore_state is also
included in struct paca_struct only if KVM is supported by the kernel.
Cc: Daniel Axtens <dja@axtens.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: kvm-ppc@vger.kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This adds a new function pnv_opal_pci_msi_eoi() which does the part of
end-of-interrupt (EOI) handling of an MSI which involves doing an
OPAL call. This function can be called in real mode. This doesn't
just export pnv_ioda2_msi_eoi() because that does a call to
icp_native_eoi(), which does not work in real mode.
This also adds a function, is_pnv_opal_msi(), which KVM can call to
check whether an interrupt is one for which we should be calling
pnv_opal_pci_msi_eoi() when we need to do an EOI.
[paulus@ozlabs.org - split out the addition of pnv_opal_pci_msi_eoi()
from Suresh's patch "KVM: PPC: Book3S HV: Handle passthrough
interrupts in guest"; added is_pnv_opal_msi(); wrote description.]
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Add simple cache inhibited accessors for memory mapped I/O.
Unlike the accessors built from the DEF_MMIO_* macros, these
don't include any hardware memory barriers, callers need to
manage memory barriers on their own. These can only be called
in hypervisor real mode.
Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
[paulus@ozlabs.org - added line to comment]
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This replaces a 2-D search through an array with a simple 8-bit table
lookup for determining the actual and/or base page size for a HPT entry.
The encoding in the second doubleword of the HPTE is designed to encode
the actual and base page sizes without using any more bits than would be
needed for a 4k page number, by using between 1 and 8 low-order bits of
the RPN (real page number) field to encode the page sizes. A single
"large page" bit in the first doubleword indicates that these low-order
bits are to be interpreted like this.
We can determine the page sizes by using the low-order 8 bits of the RPN
to look up a 256-entry table. For actual page sizes less than 1MB, some
of the upper bits of these 8 bits are going to be real address bits, but
we can cope with that by replicating the entries for those smaller page
sizes.
While we're at it, let's move the hpte_page_size() and hpte_base_page_size()
functions from a KVM-specific header to a header for 64-bit HPT systems,
since this computation doesn't have anything specifically to do with KVM.
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
In commit c60ac5693c ("powerpc: Update kernel VSID range", 2013-03-13)
we lost a check on the region number (the top four bits of the effective
address) for addresses below PAGE_OFFSET. That commit replaced a check
that the top 18 bits were all zero with a check that bits 46 - 59 were
zero (performed for all addresses, not just user addresses).
This means that userspace can access an address like 0x1000_0xxx_xxxx_xxxx
and we will insert a valid SLB entry for it. The VSID used will be the
same as if the top 4 bits were 0, but the page size will be some random
value obtained by indexing beyond the end of the mm_ctx_high_slices_psize
array in the paca. If that page size is the same as would be used for
region 0, then userspace just has an alias of the region 0 space. If the
page size is different, then no HPTE will be found for the access, and
the process will get a SIGSEGV (since hash_page_mm() will refuse to create
a HPTE for the bogus address).
The access beyond the end of the mm_ctx_high_slices_psize can be at most
5.5MB past the array, and so will be in RAM somewhere. Since the access
is a load performed in real mode, it won't fault or crash the kernel.
At most this bug could perhaps leak a little bit of information about
blocks of 32 bytes of memory located at offsets of i * 512kB past the
paca->mm_ctx_high_slices_psize array, for 1 <= i <= 11.
Fixes: c60ac5693c ("powerpc: Update kernel VSID range")
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 7aef413656 ("powerpc32: rewrite csum_partial_copy_generic()
based on copy_tofrom_user()") introduced a bug when destination address
is odd and len is lower than cacheline size.
In that case the resulting csum value doesn't have to be rotated one
byte because the cache-aligned copy part is skipped so no alignment
is performed.
Fixes: 7aef413656 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()")
Cc: stable@vger.kernel.org # v4.6+
Reported-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Tested-by: Alessio Igor Bogani <alessio.bogani@elettra.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In pnv_ioda_free_pe(), the PE object (including the associated PE
number) is cleared before resetting the corresponding bit in the
PE allocation bitmap. It means PE#0 is always released to the bitmap
wrongly.
This fixes above issue by caching the PE number before the PE object
is cleared.
Fixes: 1e9167726c ("powerpc/powernv: Use PE instead of number during setup and release"
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
vcpu stats are used to collect information about a vcpu which can be viewed
in the debugfs. For example halt_attempted_poll and halt_successful_poll
are used to keep track of the number of times the vcpu attempts to and
successfully polls. These stats are currently not used on powerpc.
Implement incrementation of the halt_attempted_poll and
halt_successful_poll vcpu stats for powerpc. Since these stats are summed
over all the vcpus for all running guests it doesn't matter which vcpu
they are attributed to, thus we choose the current runner vcpu of the
vcore.
Also add new vcpu stats: halt_poll_success_ns, halt_poll_fail_ns and
halt_wait_ns to be used to accumulate the total time spend polling
successfully, polling unsuccessfully and waiting respectively, and
halt_successful_wait to accumulate the number of times the vcpu waits.
Given that halt_poll_success_ns, halt_poll_fail_ns and halt_wait_ns are
expressed in nanoseconds it is necessary to represent these as 64-bit
quantities, otherwise they would overflow after only about 4 seconds.
Given that the total time spend either polling or waiting will be known and
the number of times that each was done, it will be possible to determine
the average poll and wait times. This will give the ability to tune the kvm
module parameters based on the calculated average wait and poll times.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
vms and vcpus have statistics associated with them which can be viewed
within the debugfs. Currently it is assumed within the vcpu_stat_get() and
vm_stat_get() functions that all of these statistics are represented as
u32s, however the next patch adds some u64 vcpu statistics.
Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly.
Since vcpu statistics are per vcpu, they will only be updated by a single
vcpu at a time so this shouldn't present a problem on 32-bit machines
which can't atomically increment 64-bit numbers. However vm statistics
could potentially be updated by multiple vcpus from that vm at a time.
To avoid the overhead of atomics make all vm statistics ulong such that
they are 64-bit on 64-bit systems where they can be atomically incremented
and are 32-bit on 32-bit systems which may not be able to atomically
increment 64-bit numbers. Modify vm_stat_get() to expect ulongs.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
This patch introduces new halt polling functionality into the kvm_hv kernel
module. When a vcore is idle it will poll for some period of time before
scheduling itself out.
When all of the runnable vcpus on a vcore have ceded (and thus the vcore is
idle) we schedule ourselves out to allow something else to run. In the
event that we need to wake up very quickly (for example an interrupt
arrives), we are required to wait until we get scheduled again.
Implement halt polling so that when a vcore is idle, and before scheduling
ourselves, we poll for vcpus in the runnable_threads list which have
pending exceptions or which leave the ceded state. If we poll successfully
then we can get back into the guest very quickly without ever scheduling
ourselves, otherwise we schedule ourselves out as before.
There exists generic halt_polling code in virt/kvm_main.c, however on
powerpc the polling conditions are different to the generic case. It would
be nice if we could just implement an arch specific kvm_check_block()
function, but there is still other arch specific things which need to be
done for kvm_hv (for example manipulating vcore states) which means that a
separate implementation is the best option.
Testing of this patch with a TCP round robin test between two guests with
virtio network interfaces has found a decrease in round trip time of ~15us
on average. A performance gain is only seen when going out of and
back into the guest often and quickly, otherwise there is no net benefit
from the polling. The polling interval is adjusted such that when we are
often scheduled out for long periods of time it is reduced, and when we
often poll successfully it is increased. The rate at which the polling
interval increases or decreases, and the maximum polling interval, can
be set through module parameters.
Based on the implementation in the generic kvm module by Wanpeng Li and
Paolo Bonzini, and on direction from Paul Mackerras.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The struct kvmppc_vcore is a structure used to store various information
about a virtual core for a kvm guest. The runnable_threads element of the
struct provides a list of all of the currently runnable vcpus on the core
(those in the KVMPPC_VCPU_RUNNABLE state). The previous implementation of
this list was a linked_list. The next patch requires that the list be able
to be iterated over without holding the vcore lock.
Reimplement the runnable_threads list in the kvmppc_vcore struct as an
array. Implement function to iterate over valid entries in the array and
update access sites accordingly.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The next commit will introduce a member to the kvmppc_vcore struct which
references MAX_SMT_THREADS which is defined in kvm_book3s_asm.h, however
this file isn't included in kvm_host.h directly. Thus compiling for
certain platforms such as pmac32_defconfig and ppc64e_defconfig with KVM
fails due to MAX_SMT_THREADS not being defined.
Move the struct kvmppc_vcore definition to kvm_book3s.h which explicitly
includes kvm_book3s_asm.h.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Instead of having each caller of check_object_size() need to remember to
check for a const size parameter, move the check into check_object_size()
itself. This actually matches the original implementation in PaX, though
this commit cleans up the now-redundant builtin_const() calls in the
various architectures.
Signed-off-by: Kees Cook <keescook@chromium.org>
Install the callbacks via the state machine.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: rt@linutronix.de
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160818125731.27256-17-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Install the callbacks via the state machine.
I assume here that the powermac has two CPUs and so only one can go up
or down at a time. The variable smp_core99_host_open is here to ensure
that we do not try to open or close the i2c host twice if something goes
wrong and we invoke the prepare or online callback twice due to
rollback.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: rt@linutronix.de
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160818125731.27256-16-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The compound PE is created to accommodate the devices attached to
one specific PCI bus that consume multiple M64 segments. The compound
PE is made up of one master PE and possibly multiple slave PEs. The
slave PEs should be destroyed when releasing the master PE. A kernel
crash happens when derferencing @pe->pdev on releasing the slave PE
in pnv_ioda_deconfigure_pe().
# echo 0 > /sys/bus/pci/slots/C7/power
iommu: Removing device 0000:01:00.1 from group 0
iommu: Removing device 0000:01:00.0 from group 0
Unable to handle kernel paging request for data at address 0x00000010
Faulting instruction address: 0xc00000000005d898
cpu 0x1: Vector: 300 (Data Access) at [c000000fe8217620]
pc: c00000000005d898: pnv_ioda_release_pe+0x288/0x610
lr: c00000000005dbdc: pnv_ioda_release_pe+0x5cc/0x610
sp: c000000fe82178a0
msr: 9000000000009033
dar: 10
dsisr: 40000000
current = 0xc000000fe815ab80
paca = 0xc00000000ff00400 softe: 0 irq_happened: 0x01
pid = 2709, comm = sh
Linux version 4.8.0-rc5-gavin-00006-g745efdb (gwshan@gwshan) \
(gcc version 4.9.3 (Buildroot 2016.02-rc2-00093-g5ea3bce) ) #586 SMP \
Tue Sep 6 13:37:29 AEST 2016
enter ? for help
[c000000fe8217940] c00000000005d684 pnv_ioda_release_pe+0x74/0x610
[c000000fe82179e0] c000000000034460 pcibios_release_device+0x50/0x70
[c000000fe8217a10] c0000000004aba80 pci_release_dev+0x50/0xa0
[c000000fe8217a40] c000000000704898 device_release+0x58/0xf0
[c000000fe8217ac0] c000000000470510 kobject_release+0x80/0xf0
[c000000fe8217b00] c000000000704dd4 put_device+0x24/0x40
[c000000fe8217b20] c0000000004af94c pci_remove_bus_device+0x12c/0x150
[c000000fe8217b60] c000000000034244 pci_hp_remove_devices+0x94/0xd0
[c000000fe8217ba0] c0000000004ca444 pnv_php_disable_slot+0x64/0xb0
[c000000fe8217bd0] c0000000004c88c0 power_write_file+0xa0/0x190
[c000000fe8217c50] c0000000004c248c pci_slot_attr_store+0x3c/0x60
[c000000fe8217c70] c0000000002d6494 sysfs_kf_write+0x94/0xc0
[c000000fe8217cb0] c0000000002d50f0 kernfs_fop_write+0x180/0x260
[c000000fe8217d00] c0000000002334a0 __vfs_write+0x40/0x190
[c000000fe8217d90] c000000000234738 vfs_write+0xc8/0x240
[c000000fe8217de0] c000000000236250 SyS_write+0x60/0x110
[c000000fe8217e30] c000000000009524 system_call+0x38/0x108
It fixes the kernel crash by bypassing releasing resources (DMA,
IO and memory segments, PELTM) because there are no resources assigned
to the slave PE.
Fixes: c5f7700bbd ("powerpc/powernv: Dynamically release PE")
Reported-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When using the OPAL ICP backend we incorrectly pass Linux CPU numbers
rather than HW CPU numbers to OPAL.
Fixes: d74361881f ("powerpc/xics: Add ICP OPAL backend")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On ppc64le, builds with CONFIG_KEXEC=n fail with:
arch/powerpc/platforms/pseries/setup.c: In function ‘pseries_big_endian_exceptions’:
arch/powerpc/platforms/pseries/setup.c:403:13: error: implicit declaration of function ‘kdump_in_progress’
if (rc && !kdump_in_progress())
This is because pseries/setup.c includes <linux/kexec.h>, but
kdump_in_progress() is defined in <asm/kexec.h>. This is a problem
because the former only includes the latter if CONFIG_KEXEC_CORE=y.
Fix it by including <asm/kexec.h> directly, as is done in powernv/setup.c.
Fixes: d3cbff1b5a ("powerpc: Put exception configuration in a common place")
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Userspace can begin and suspend a transaction within the signal
handler which means they might enter sys_rt_sigreturn() with the
processor in suspended state.
sys_rt_sigreturn() wants to restore process context (which may have
been in a transaction before signal delivery). To do this it must
restore TM SPRS. To achieve this, any transaction initiated within the
signal frame must be discarded in order to be able to restore TM SPRs
as TM SPRs can only be manipulated non-transactionally..
>From the PowerPC ISA:
TM Bad Thing Exception [Category: Transactional Memory]
An attempt is made to execute a mtspr targeting a TM register in
other than Non-transactional state.
Not doing so results in a TM Bad Thing:
[12045.221359] Kernel BUG at c000000000050a40 [verbose debug info unavailable]
[12045.221470] Unexpected TM Bad Thing exception at c000000000050a40 (msr 0x201033)
[12045.221540] Oops: Unrecoverable exception, sig: 6 [#1]
[12045.221586] SMP NR_CPUS=2048 NUMA PowerNV
[12045.221634] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_hv kvm
uio_pdrv_genirq ipmi_powernv uio powernv_rng ipmi_msghandler autofs4 ses enclosure
scsi_transport_sas bnx2x ipr mdio libcrc32c
[12045.222167] CPU: 68 PID: 6178 Comm: sigreturnpanic Not tainted 4.7.0 #34
[12045.222224] task: c0000000fce38600 ti: c0000000fceb4000 task.ti: c0000000fceb4000
[12045.222293] NIP: c000000000050a40 LR: c0000000000163bc CTR: 0000000000000000
[12045.222361] REGS: c0000000fceb7ac0 TRAP: 0700 Not tainted (4.7.0)
[12045.222418] MSR: 9000000300201033 <SF,HV,ME,IR,DR,RI,LE,TM[SE]> CR: 28444280 XER: 20000000
[12045.222625] CFAR: c0000000000163b8 SOFTE: 0 PACATMSCRATCH: 900000014280f033
GPR00: 01100000b8000001 c0000000fceb7d40 c00000000139c100 c0000000fce390d0
GPR04: 900000034280f033 0000000000000000 0000000000000000 0000000000000000
GPR08: 0000000000000000 b000000000001033 0000000000000001 0000000000000000
GPR12: 0000000000000000 c000000002926400 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 00003ffff98cadd0 00003ffff98cb470 0000000000000000
GPR28: 900000034280f033 c0000000fceb7ea0 0000000000000001 c0000000fce390d0
[12045.223535] NIP [c000000000050a40] tm_restore_sprs+0xc/0x1c
[12045.223584] LR [c0000000000163bc] tm_recheckpoint+0x5c/0xa0
[12045.223630] Call Trace:
[12045.223655] [c0000000fceb7d80] [c000000000026e74] sys_rt_sigreturn+0x494/0x6c0
[12045.223738] [c0000000fceb7e30] [c0000000000092e0] system_call+0x38/0x108
[12045.223806] Instruction dump:
[12045.223841] 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8
[12045.223955] 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020
[12045.224074] ---[ end trace cb8002ee240bae76 ]---
It isn't clear exactly if there is really a use case for userspace
returning with a suspended transaction, however, doing so doesn't (on
its own) constitute a bad frame. As such, this patch simply discards
the transactional state of the context calling the sigreturn and
continues.
Reported-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Acked-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In a situation, where Linux kernel gets notified about duplicate error log
from OPAL, it is been observed that kernel fails to remove sysfs entries
(/sys/firmware/opal/elog/0xXXXXXXXX) of such error logs. This is because,
we currently search the error log/dump kobject in the kset list via
'kset_find_obj()' routine. Which eventually increment the reference count
by one, once it founds the kobject.
So, unless we decrement the reference count by one after it found the kobject,
we would not be able to release the kobject properly later.
This patch adds the 'kobject_put()' which was missing earlier.
Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
tabort_syscall runs with RI=1, so a nested recoverable machine
check will load the paca into r13 and overwrite what we loaded
it with, because exceptions returning to privileged mode do not
restore r13.
Fixes: b4b56f9eca (powerpc/tm: Abort syscalls in active transactions)
Cc: stable@vger.kernel.org
Signed-off-by: Nick Piggin <npiggin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
As discussed recently on the kvm mailing list, David Gibson's
intention in commit 178a787502 ("vfio: Enable VFIO device for
powerpc", 2016-02-01) was to have the KVM VFIO device built in
on all powerpc platforms. This patch adds the "select KVM_VFIO"
statement that makes this happen.
Currently, arch/powerpc/kvm/Makefile doesn't include vfio.o for
the 64-bit kvm module, because the list of objects doesn't use
the $(common-objs-y) list. The reason it doesn't is because we
don't necessarily want coalesced_mmio.o or emulate.o (for example
if HV KVM is the only target), and common-objs-y includes both.
Since this is confusing, this patch adjusts the definitions so that
we now use $(common-objs-y) in the list for the 64-bit kvm.ko
module, emulate.o is removed from common-objs-y and added in the
places that need it, and the inclusion of coalesced_mmio.o now
depends on CONFIG_KVM_MMIO.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Storing this value will help prevent unwinders from getting out of sync
with the function graph tracer ret_stack. Now instead of needing a
stateful iterator, they can compare the return address pointer to find
the right ret_stack entry.
Note that an array of 50 ftrace_ret_stack structs is allocated for every
task. So when an arch implements this, it will add either 200 or 400
bytes of memory usage per task (depending on whether it's a 32-bit or
64-bit platform).
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a95cfcc39e8f26b89a430c56926af0bb217bc0a1.1471607358.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
hmi.c functions are unused unless sibling_subcore_state is nonzero, and
that in turn happens only if KVM is in use. So move the code to
arch/powerpc/kvm/, putting it under CONFIG_KVM_BOOK3S_HV_POSSIBLE
rather than CONFIG_PPC_BOOK3S_64. The sibling_subcore_state is also
included in struct paca_struct only if KVM is supported by the kernel.
Cc: Daniel Axtens <dja@axtens.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: kvm-ppc@vger.kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
MCE must not use PACA_EXGEN. When a general exception enables MSR_RI,
that means SPRN_SRR[01] and SPRN_SPRG are no longer used. However the
PACA save area is still in use.
Acked-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When booting from an OpenFirmware which supports it, we use the
"ibm,client-architecture-support" firmware call to communicate
our capabilities to firmware.
The format of the structure we pass to firmware is specified in
PAPR (Power Architecture Platform Requirements), or the public version
LoPAPR (Linux on Power Architecture Platform Reference).
Referring to table 244 in LoPAPR v1.1, option vector 5 contains a 4 byte
field at bytes 17-20 for the "Platform Facilities Enable". This is
followed by a 1 byte field at byte 21 for "Sub-Processor Represenation
Level".
Comparing to the code, there we have the Platform Facilities
options (OV5_PFO_*) at byte 17, but we fail to pad that field out to its
full width of 4 bytes. This means the OV5_SUB_PROCESSORS option is
incorrectly placed at byte 18.
Fix it by adding zero bytes for bytes 18, 19, 20, and comment the bytes
to hopefully make it clearer in future.
As far as I'm aware nothing actually consumes this value at this time,
so the effect of this bug is nil in practice.
It does mean we've been incorrectly setting bit 15 of the "Platform
Facilities Enable" option for the past ~3 1/2 years, so we should avoid
allocating that bit to anything else in future.
Fixes: df77c79920 ("powerpc/pseries: Update ibm,architecture.vec for PAPR 2.7/POWER8")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We observed a kernel oops when running a PPC guest with config NR_CPUS=4
and qemu option "-smp cores=1,threads=8":
[ 30.634781] Unable to handle kernel paging request for data at
address 0xc00000014192eb17
[ 30.636173] Faulting instruction address: 0xc00000000003e5cc
[ 30.637069] Oops: Kernel access of bad area, sig: 11 [#1]
[ 30.637877] SMP NR_CPUS=4 NUMA pSeries
[ 30.638471] Modules linked in:
[ 30.638949] CPU: 3 PID: 27 Comm: migration/3 Not tainted
4.7.0-07963-g9714b26 #1
[ 30.640059] task: c00000001e29c600 task.stack: c00000001e2a8000
[ 30.640956] NIP: c00000000003e5cc LR: c00000000003e550 CTR:
0000000000000000
[ 30.642001] REGS: c00000001e2ab8e0 TRAP: 0300 Not tainted
(4.7.0-07963-g9714b26)
[ 30.643139] MSR: 8000000102803033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22004084 XER: 00000000
[ 30.644583] CFAR: c000000000009e98 DAR: c00000014192eb17 DSISR: 40000000 SOFTE: 0
GPR00: c00000000140a6b8 c00000001e2abb60 c0000000016dd300 0000000000000003
GPR04: 0000000000000000 0000000000000004 c0000000016e5920 0000000000000008
GPR08: 0000000000000004 c00000014192eb17 0000000000000000 0000000000000020
GPR12: c00000000140a6c0 c00000000ffffc00 c0000000000d3ea8 c00000001e005680
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 c00000001e6b3a00 0000000000000000 0000000000000001
GPR24: c00000001ff85138 c00000001ff85130 000000001eb6f000 0000000000000001
GPR28: 0000000000000000 c0000000017014e0 0000000000000000 0000000000000018
[ 30.653882] NIP [c00000000003e5cc] __cpu_disable+0xcc/0x190
[ 30.654713] LR [c00000000003e550] __cpu_disable+0x50/0x190
[ 30.655528] Call Trace:
[ 30.655893] [c00000001e2abb60] [c00000000003e550] __cpu_disable+0x50/0x190 (unreliable)
[ 30.657280] [c00000001e2abbb0] [c0000000000aca0c] take_cpu_down+0x5c/0x100
[ 30.658365] [c00000001e2abc10] [c000000000163918] multi_cpu_stop+0x1a8/0x1e0
[ 30.659617] [c00000001e2abc60] [c000000000163cc0] cpu_stopper_thread+0xf0/0x1d0
[ 30.660737] [c00000001e2abd20] [c0000000000d8d70] smpboot_thread_fn+0x290/0x2a0
[ 30.661879] [c00000001e2abd80] [c0000000000d3fa8] kthread+0x108/0x130
[ 30.662876] [c00000001e2abe30] [c000000000009968] ret_from_kernel_thread+0x5c/0x74
[ 30.664017] Instruction dump:
[ 30.664477] 7bde1f24 38a00000 787f1f24 3b600001 39890008 7d204b78 7d05e214 7d0b07b4
[ 30.665642] 796b1f24 7d26582a 7d204a14 7d29f214 <7d4048a8> 7d4a3878 7d4049ad 40c2fff4
[ 30.666854] ---[ end trace 32643b7195717741 ]---
The reason of this is that in __cpu_disable(), when we try to set the
cpu_sibling_mask or cpu_core_mask of the sibling CPUs of the disabled
one, we don't check whether the current configuration employs those
sibling CPUs(hw threads). And if a CPU is not employed by a
configuration, the percpu structures cpu_{sibling,core}_mask are not
allocated, therefore accessing those cpumasks will result in problems as
above.
This patch fixes this problem by adding an addition check on whether the
id is no less than nr_cpu_ids in the sibling CPU iteration code.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
These files were only including module.h for exception table
related functions. We've now separated that content out into its
own file "extable.h" so now move over to that and avoid all the
extra header content in module.h that we don't really need to compile
these files.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Unsigned type is always non-negative, so the loop could not end in case
condition is never true.
The problem has been detected using semantic patch
scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch leverages 'struct pci_host_bridge' from the PCI subsystem
in order to free the pci_controller only after the last reference to
its devices is dropped (avoiding an oops in pcibios_release_device()
if the last reference is dropped after pcibios_free_controller()).
The patch relies on pci_host_bridge.release_fn() (and .release_data),
which is called automatically by the PCI subsystem when the root bus
is released (i.e., the last reference is dropped). Those fields are
set via pci_set_host_bridge_release() (e.g. in the platform-specific
implementation of pcibios_root_bridge_prepare()).
It introduces the 'pcibios_free_controller_deferred()' .release_fn()
and it expects .release_data to hold a pointer to the pci_controller.
The function implictly calls 'pcibios_free_controller()', so an user
must *NOT* explicitly call it if using the new _deferred() callback.
The functionality is enabled for pseries (although it isn't platform
specific, and may be used by cxl).
Details on not-so-elegant design choices:
- Use 'pci_host_bridge.release_data' field as pointer to associated
'struct pci_controller' so *not* to 'pci_bus_to_host(bridge->bus)'
in pcibios_free_controller_deferred().
That's because pci_remove_root_bus() sets 'host_bridge->bus = NULL'
(so, if the last reference is released after pci_remove_root_bus()
runs, which eventually reaches pcibios_free_controller_deferred(),
that would hit a null pointer dereference).
The cxl/vphb.c code calls pci_remove_root_bus(), and the cxl folks
are interested in this fix.
Test-case #1 (hold references)
# ls -ld /sys/block/sd* | grep -m1 0021:01:00.0
<...> /sys/block/sdaa -> ../devices/pci0021:01/0021:01:00.0/<...>
# ls -ld /sys/block/sd* | grep -m1 0021:01:00.1
<...> /sys/block/sdab -> ../devices/pci0021:01/0021:01:00.1/<...>
# cat >/dev/sdaa & pid1=$!
# cat >/dev/sdab & pid2=$!
# drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
Validating PHB DLPAR capability...yes.
[ 594.306719] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01
[ 594.306738] pci_hp_remove_devices: Removing 0021:01:00.0...
...
[ 598.236381] pci_hp_remove_devices: Removing 0021:01:00.1...
...
[ 611.972077] pci_bus 0021:01: busn_res: [bus 01-ff] is released
[ 611.972140] rpadlpar_io: slot PHB 33 removed
# kill -9 $pid1
# kill -9 $pid2
[ 632.918088] pcibios_free_controller_deferred: domain 33, dynamic 1
Test-case #2 (don't hold references)
# drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
Validating PHB DLPAR capability...yes.
[ 916.357363] pci_hp_remove_devices: PCI: Removing devices on bus 0021:01
[ 916.357386] pci_hp_remove_devices: Removing 0021:01:00.0...
...
[ 920.566527] pci_hp_remove_devices: Removing 0021:01:00.1...
...
[ 933.955873] pci_bus 0021:01: busn_res: [bus 01-ff] is released
[ 933.955977] pcibios_free_controller_deferred: domain 33, dynamic 1
[ 933.955999] rpadlpar_io: slot PHB 33 removed
Suggested-By: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> # cxl
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The field "owner" is set by the core.
Thus delete an unneeded initialisation.
Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The field "owner" is set by the core.
Thus delete an unneeded initialisation.
Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Powerpc builds may fail with the following build error.
Error log:
In file included from ./arch/powerpc/include/asm/mmu_context.h:11:0,
from ./include/linux/mmu_context.h:4,
from mm/mmu_context.c:8:
./arch/powerpc/include/asm/cputhreads.h: In function 'get_tensr':
./arch/powerpc/include/asm/cputhreads.h:101:2: error:
implicit declaration of function 'cpu_has_feature'
The problem can be triggered by configuring ppc64e_defconfig and selecting
CONFIG_TICK_CPU_ACCOUNTING instead of CONFIG_VIRT_CPU_ACCOUNTING_NATIVE.
Fixes: b92a226e52 ("powerpc: Move cpu_has_feature() to a separate file")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It doesn't make sense to create irqfds for a VM that doesn't have
in-kernel interrupt controller emulation. There is an existing
interface for architecture code to tell the irqfd code whether or
not any interrupt controller has been initialized, called
kvm_arch_intc_initialized(), so let's implement that for powerpc.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
It turns out that if userspace creates a pseries-type VM without
in-kernel XICS (interrupt controller) emulation, and then connects
an eventfd to the VM as an irqfd, and the eventfd gets signalled,
that the code will try to deliver an interrupt via the non-existent
XICS object and crash the host kernel with a NULL pointer dereference.
To fix this, we check for the presence of the XICS object before
trying to deliver the interrupt, and return with an error if not.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
PPC splits debugfs initialization from creation of the xics device to
unlock the newly taken kvm lock earlier.
s390 prevents userspace from triggering two WARN_ON_ONCE.
MIPS fixes several issues in the management of TLB faults (Cc: stable).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCAAGBQJXrx2ZAAoJEED/6hsPKofoo/4H/jra5NNxvpo09LWlXTwGXxBH
cwcfDZSiOFxgvWztKJOIjPI4ETL3mnZvb9SFWBZZh1U0kfZ/TGiWouwaDNlBkPYj
I3YHuPI7if+yUOmJlI3N2hWa0Wo0qiMqIjKT0pQVSLLdK/CVE+xGyS+qtXTNXHQn
pFdKlYr//7OwQEY0ow1yj5VnsFrXB1JWFyB/+N5zaCfbCaQVyZAL7rj8SUbC/32W
CiNhrvatzierKIfPerWw8DvvBKhCgWaRuLl0W+uMncrC9Qepcx9moM2beD1txK2I
iHor1TDxUPifGQONfWMAlw87FluzHF4vQ5nN2jyTi8TT+CEfZpZ43Q+DY7okD4w=
=NQP9
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"KVM:
- lock kvm_device list to prevent corruption on device creation.
PPC:
- split debugfs initialization from creation of the xics device to
unlock the newly taken kvm lock earlier.
s390:
- prevent userspace from triggering two WARN_ON_ONCE.
MIPS:
- fix several issues in the management of TLB faults (Cc: stable)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
MIPS: KVM: Propagate kseg0/mapped tlb fault errors
MIPS: KVM: Fix gfn range check in kseg0 tlb faults
MIPS: KVM: Add missing gfn range check
MIPS: KVM: Fix mapped fault broken commpage handling
KVM: Protect device ops->create and list_add with kvm->lock
KVM: PPC: Move xics_debugfs_init out of create
KVM: s390: reset KVM_REQ_MMU_RELOAD if mapping the prefix failed
KVM: s390: set the prefix initially properly
- powerpc/vdso: Fix build rules to rebuild vdsos correctly from Nicholas Piggin
- powerpc/ptrace: Fix coredump since ptrace TM changes from Cyril Bur
- powerpc/32: Fix csum_partial_copy_generic() from Christophe Leroy
- cxl: Set psl_fir_cntl to production environment value from Frederic Barrat
- powerpc/eeh: Switch to conventional PCI address output in EEH log from Guilherme G. Piccoli
- cxl: Use fixed width predefined types in data structure. from Philippe Bergheaud
- powerpc/vdso: Add missing include file from Guenter Roeck
- powerpc: Fix unused function warning 'lmb_to_memblock' from Alastair D'Silva
- powerpc/powernv/ioda: Fix TCE invalidate to work in real mode again from Alexey Kardashevskiy
- powerpc/cell: Add missing error code in spufs_mkgang() from Dan Carpenter
- crypto: crc32c-vpmsum - Convert to CPU feature based module autoloading from Anton Blanchard
- powerpc/pasemi: Fix coherent_dma_mask for dma engine from Darren Stevens
Benjamin Herrenschmidt:
- powerpc/32: Fix crash during static key init
- powerpc: Update obsolete comment in setup_32.c about early_init()
- powerpc: Print the kernel load address at the end of prom_init()
- powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
- powerpc/xics: Properly set Edge/Level type and enable resend
Mahesh Salgaonkar:
- powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
- powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
- powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
- powerpc/powernv: Load correct TOC pointer while waking up from winkle.
Andrew Donnellan:
- cxl: Fix sparse warnings
- cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests
Michael Ellerman:
- selftests/powerpc: Specify we expect to build with std=gnu99
- powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
- powerpc/pci: Fix endian bug in fixed PHB numbering
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXrZFAAAoJEFHr6jzI4aWAxacQALPfu/kbKJFhwX8dnbzaCwHe
1bTZHE4bkkxfS5JrghbiLZHUeoCZucDhGGlZSPOEb5VA9lkEX3OJJRQDng754Pit
u3pwt0SLmAxBn9BgTZy/5g5U6KMGptzJcSsKVEtZs17PKpqhPNELMm5EmGhJmNHH
Ksycw4FhVrsjDm5n7s4IqUhsh0Z9QPOOxxb5rVgdBBxmLHz5a1FJSSCFan5WW3PT
QNiMfg58NdBBOFbDQJWLiWXrfPPUMhXfPxHGGArXPEsa+7l5yXaygCSv5KyUBJMt
sDxn6XZMuYzzvg4j8uc9mkDWNWiyxcxBJ6+/Hm5xf9vvpxzHAM1M8j9xqpaCHjeg
b0fsWqVeLD+DuAVqh6rUgUERbsfUtuKXRSB+NR0hHWd7GLx707FIr3i1AAvjDODC
qwcZg9mkcAbKAIOAmsk9aAB60jl7aENiz+bTvLYMHDhIbb+st94jajdaG7MSVn5z
M9FFbRKmRHTW0Qoop1VuseyO9C+Lmb+ksIhBHeYaNDaJ5lzk0NwJltCNd4ybnL6h
i+AFxuhN0uyT6OJOPqTR07+9p+k04LOSYPZR34rclKQ3Z+sQiYQAmwLMHasN6uBk
dZxJUxmeio5J/0BXLGKLYFnaNpHnq3EQm9vdt6spn1kidmm+bOeICB8UW8AairqC
8HasF1QrjZihmoBoXgul
=gw2z
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some powerpc fixes for 4.8:
Misc:
- powerpc/vdso: Fix build rules to rebuild vdsos correctly from Nicholas Piggin
- powerpc/ptrace: Fix coredump since ptrace TM changes from Cyril Bur
- powerpc/32: Fix csum_partial_copy_generic() from Christophe Leroy
- cxl: Set psl_fir_cntl to production environment value from Frederic Barrat
- powerpc/eeh: Switch to conventional PCI address output in EEH log from Guilherme G. Piccoli
- cxl: Use fixed width predefined types in data structure. from Philippe Bergheaud
- powerpc/vdso: Add missing include file from Guenter Roeck
- powerpc: Fix unused function warning 'lmb_to_memblock' from Alastair D'Silva
- powerpc/powernv/ioda: Fix TCE invalidate to work in real mode again from Alexey Kardashevskiy
- powerpc/cell: Add missing error code in spufs_mkgang() from Dan Carpenter
- crypto: crc32c-vpmsum - Convert to CPU feature based module autoloading from Anton Blanchard
- powerpc/pasemi: Fix coherent_dma_mask for dma engine from Darren Stevens
Benjamin Herrenschmidt:
- powerpc/32: Fix crash during static key init
- powerpc: Update obsolete comment in setup_32.c about early_init()
- powerpc: Print the kernel load address at the end of prom_init()
- powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
- powerpc/xics: Properly set Edge/Level type and enable resend
Mahesh Salgaonkar:
- powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
- powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
- powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
- powerpc/powernv: Load correct TOC pointer while waking up from winkle.
Andrew Donnellan:
- cxl: Fix sparse warnings
- cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests
Michael Ellerman:
- selftests/powerpc: Specify we expect to build with std=gnu99
- powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
- powerpc/pci: Fix endian bug in fixed PHB numbering"
* tag 'powerpc-4.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (26 commits)
selftests/powerpc: Specify we expect to build with std=gnu99
powerpc/vdso: Fix build rules to rebuild vdsos correctly
powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
powerpc/32: Fix crash during static key init
powerpc: Update obsolete comment in setup_32.c about early_init()
powerpc: Print the kernel load address at the end of prom_init()
powerpc/ptrace: Fix coredump since ptrace TM changes
powerpc/32: Fix csum_partial_copy_generic()
cxl: Set psl_fir_cntl to production environment value
powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
powerpc/pci: Fix endian bug in fixed PHB numbering
powerpc/eeh: Switch to conventional PCI address output in EEH log
cxl: Fix sparse warnings
cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests
cxl: Use fixed width predefined types in data structure.
powerpc/vdso: Add missing include file
powerpc: Fix unused function warning 'lmb_to_memblock'
powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
...
KVM devices were manipulating list data structures without any form of
synchronization, and some implementations of the create operations also
suffered from a lack of synchronization.
Now when we've split the xics create operation into create and init, we
can hold the kvm->lock mutex while calling the create operation and when
manipulating the devices list.
The error path in the generic code gets slightly ugly because we have to
take the mutex again and delete the device from the list, but holding
the mutex during anon_inode_getfd or releasing/locking the mutex in the
common non-error path seemed wrong.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
As we are about to hold the kvm->lock during the create operation on KVM
devices, we should move the call to xics_debugfs_init into its own
function, since holding a mutex over extended amounts of time might not
be a good idea.
Introduce an init operation on the kvm_device_ops struct which cannot
fail and call this, if configured, after the device has been created.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
When using if_changed, we need to add FORCE as a dependency (see
Documentation/kbuild/makefiles.txt) otherwise we don't get command line
change checking amongst other things. This has resulted in vdsos not
being rebuilt when switching between big and little endian.
The vdso64/32ld commands have to be changed around to avoid pulling
FORCE into the linker command line (code copied from x86).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When we introduced the little endian support, we added the endian flags
to CC directly using override. I don't know the history of why we did
that, I suspect no one does.
Although this mostly works, it has one bug, which is that CROSS32CC
doesn't get -mbig-endian. That means when the compiler is little endian
by default and the user is building big endian, vdso32 is incorrectly
compiled as little endian and the kernel fails to build.
Instead we can add the endian flags to cflags-y/aflags-y, and then
append those to KBUILD_CFLAGS/KBUILD_AFLAGS.
This has the advantage of being 1) less ugly, 2) the documented way of
adding flags in the arch Makefile and 3) it fixes building vdso32 with a
LE toolchain.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We cannot do those initializations from apply_feature_fixups() as
this function runs in a very restricted environment on 32-bit where
the kernel isn't running at its linked address and the PTRRELOC()
macro must be used for any global accesss.
Instead, split them into a separtate steup_feature_keys() function
which is called in a more suitable spot on ppc32.
Fixes: 309b315b6e ("powerpc: Call jump_label_init() in apply_feature_fixups()")
Reported-and-tested-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We don't identify the machine type anymore...
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This makes it easier to debug crashes that happen very early before
the kernel takes over Open Firmware by allowing us to relate the OF
reported crashing addresses to offsets within the kernel.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 8d460f6156 ("powerpc/process: Add the function
flush_tmregs_to_thread") added flush_tmregs_to_thread() and included
the assumption that it would only be called for a task which is not
current.
Although this is correct for ptrace, when generating a core dump, some
of the routines which call flush_tmregs_to_thread() are called. This
leads to a WARNing such as:
Not expecting ptrace on self: TM regs may be incorrect
------------[ cut here ]------------
WARNING: CPU: 123 PID: 7727 at arch/powerpc/kernel/process.c:1088 flush_tmregs_to_thread+0x78/0x80
CPU: 123 PID: 7727 Comm: libvirtd Not tainted 4.8.0-rc1-gcc6x-g61e8a0d #1
task: c000000fe631b600 task.stack: c000000fe63b0000
NIP: c00000000001a1a8 LR: c00000000001a1a4 CTR: c000000000717780
REGS: c000000fe63b3420 TRAP: 0700 Not tainted (4.8.0-rc1-gcc6x-g61e8a0d)
MSR: 900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28004222 XER: 20000000
...
NIP [c00000000001a1a8] flush_tmregs_to_thread+0x78/0x80
LR [c00000000001a1a4] flush_tmregs_to_thread+0x74/0x80
Call Trace:
flush_tmregs_to_thread+0x74/0x80 (unreliable)
vsr_get+0x64/0x1a0
elf_core_dump+0x604/0x1430
do_coredump+0x5fc/0x1200
get_signal+0x398/0x740
do_signal+0x54/0x2b0
do_notify_resume+0x98/0xb0
ret_from_except_lite+0x70/0x74
So fix flush_tmregs_to_thread() to detect the case where it is called on
current, and a transaction is active, and in that case flush the TM regs
to the thread_struct.
This patch also moves flush_tmregs_to_thread() into ptrace.c as it is
only called from that file.
Fixes: 8d460f6156 ("powerpc/process: Add the function flush_tmregs_to_thread")
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
[mpe: Flesh out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 7aef413656 ("powerpc32: rewrite csum_partial_copy_generic()
based on copy_tofrom_user()") introduced a bug when destination
address is odd and initial csum is not null
In that (rare) case the initial csum value has to be rotated one byte
as well as the resulting value is
This patch also fixes related comments
Fixes: 7aef413656 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The generic allocation code may sometimes decide to assign a prefetchable
64-bit BAR to the M32 window. In fact it may also decide to allocate
a 64-bit non-prefetchable BAR to the M64 one ! So using the resource
flags as a test to decide which window was used for PE allocation is
just wrong and leads to insane PE numbers.
Instead, compare the addresses to figure it out.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rename the function as agreed by Ben & Gavin]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When machine check occurs with MSR(RI=0), it means MC interrupt is
unrecoverable and kernel goes down to panic path. But the console
message still shows it as recovered. This patch fixes the MCE console
messages.
Fixes: 36df96f8ac ("powerpc/book3s: Decode and save machine check event.")
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The recent commit 63a72284b1 ("powerpc/pci: Assign fixed PHB number
based on device-tree properties"), added code to read a 64-bit property
from the device tree, and if not found read a 32-bit property (reg).
There was a bug in the 32-bit case, on big endian machines, due to the
use of the 64-bit value to read the 32-bit property. The cast of &prop
means we end up writing to the high 32-bit of prop, leaving the low
32-bits containing whatever junk was on the stack.
If that junk value was non-zero, and < MAX_PHBS, we would end up using
it as the PHB id. This results in users seeing what appear to be random
PHB ids.
Fix it by reading into a u32 property and then assigning that to the
u64 value, letting the CPU do the correct conversions for us.
Fixes: 63a72284b1 ("powerpc/pci: Assign fixed PHB number based on device-tree properties")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This is a very minor/trivial fix for the output of PCI address on EEH
logs. The PCI address on "OF node" field currently is using ":" as a
separator for the function, but the usual separator is ".". This patch
changes the separator to dot, so the PCI address is printed as usual.
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Some powerpc builds fail with the following buld error.
In file included from ./arch/powerpc/include/asm/mmu_context.h:11:0,
from arch/powerpc/kernel/vdso.c:28:
arch/powerpc/include/asm/cputhreads.h: In function 'get_tensr':
arch/powerpc/include/asm/cputhreads.h:101:2: error:
implicit declaration of function 'cpu_has_feature'
Fixes: b92a226e52 ("powerpc: Move cpu_has_feature() to a separate file")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch fixes the following warning:
arch/powerpc/platforms/pseries/hotplug-memory.c:323:29: error: 'lmb_to_memblock' defined but not used [-Werror=unused-function]
static struct memory_block *lmb_to_memblock(struct of_drconf_cell *lmb)
^~~~~~~~~~~~~~~
The only consumer of this function is 'dlpar_remove_lmb', which is
enabled with CONFIG_MEMORY_HOTREMOVE, so move it into the same
ifdef block.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The current implementation of MCE early handling modifies CR0/1 registers
without saving its old values. Fix this by moving early check for
powersaving mode to machine_check_handle_early().
The power architecture 2.06 or later allows the possibility of getting
machine check while in nap/sleep/winkle. The last bit of HSPRG0 is set
to 1, if thread is woken up from winkle. Hence, clear the last bit of
HSPRG0 (r13) before MCE handler starts using it as paca pointer.
Also, the current code always puts the thread into nap state irrespective
of whatever idle state it woke up from. Fix that by looking at
paca->thread_idle_state and put the thread back into same state where it
came from.
Fixes: 1c51089f77 ("powerpc/book3s: Return from interrupt if coming from evil context.")
Reported-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h so that MCE handler changes
in subsequent patch can use it.
No functionality change.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The function pnv_restore_hyp_resource() loads the TOC into r2 from
the invalid PACA pointer before fixing r13 value. This do not affect
POWER ISA 3.0 but it does have an impact on POWER ISA 2.07 or less
leading CPU to get stuck forever.
login: [ 471.830433] Processor 120 is stuck.
This can be easily reproducible using following steps:
- Turn off SMT
$ ppc64_cpu --smt=off
- offline/online any online cpu (Thread 0 of any core which is online)
$ echo 0 > /sys/devices/system/cpu/cpu<num>/online
$ echo 1 > /sys/devices/system/cpu/cpu<num>/online
For POWER ISA 2.07 or less, the last bit of HSPRG0 is set indicating
that thread is waking up from winkle. Hence, the last bit of HSPRG0(r13)
needs to be clear before accessing it as PACA to avoid loading invalid
values from invalid PACA pointer.
Fix this by loading TOC after r13 register is corrected.
Fixes: bcef83a00d ("powerpc/powernv: Add platform support for stop instruction")
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit fd141d1a99 ("powerpc/powernv/pci: Rework accessing the TCE
invalidate register") broke TCE invalidation on IODA2/PHB3 for real
mode.
This makes invalidate work again.
Fixes: fd141d1a99 ("powerpc/powernv/pci: Rework accessing the TCE invalidate register")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We should return -ENOMEM if alloc_spu_gang() fails.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This sets the type of the interrupt appropriately. We set it as follow:
- If not mapped from the device-tree, we use edge. This is the case
of the virtual interrupts and PCI MSIs for example.
- If mapped from the device-tree and #interrupt-cells is 2 (PAPR
compliant), we use the second cell to set the appropriate type
- If mapped from the device-tree and #interrupt-cells is 1 (current
OPAL on P8 does that), we assume level sensitive since those are
typically going to be the PSI LSIs which are level sensitive.
Additionally, we mark the interrupts requested via the opal_interrupts
property all level. This is a bit fishy but the best we can do until we
fix OPAL to properly expose them with a complete descriptor. It is also
correct for the current HW anyway as OPAL interrupts are currently PCI
error and PSI interrupts which are level.
Finally now that edge interrupts are properly identified, we can enable
CONFIG_HARDIRQS_SW_RESEND which will make the core re-send them if
they occur while masked, which some drivers rely upon.
This fixes issues with lost interrupts on some Mellanox adapters.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch utilises the GENERIC_CPU_AUTOPROBE infrastructure
to automatically load the crc32c-vpmsum module if the CPU supports
it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 817820b022 ("powerpc/iommu: Support "hybrid" iommu/direct DMA
ops for coherent_mask < dma_mask) adds a check of coherent_dma_mask for
dma allocations.
Unfortunately current PASemi code does not set this value for the DMA
engine, which ends up with the default value of 0xffffffff, the result
is on a PASemi system with >2Gb ram and iommu enabled the the onboard
ethernet stops working due to an inability to allocate memory. Add an
initialisation to pci_dma_dev_setup_pasemi().
Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cleanups:
- huge cleanup of rtc-generic and char/genrtc this allowed to cleanup rtc-cmos,
rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc
- move mn10300 to rtc-cmos
Subsystem:
- fix wakealarms after hibernate
- multiples fixes for rctest
- simplify implementations of .read_alarm
New drivers:
- Maxim MAX6916
Drivers:
- ds1307: fix weekday
- m41t80: add wakeup support
- pcf85063: add support for PCF85063A variant
- rv8803: extend i2c fix and other fixes
- s35390a: fix alarm reading, this fixes instant reboot after shutdown for QNAP
TS-41x
- s3c: clock fixes
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJXokhIAAoJENiigzvaE+LCZqQP+wWzintN/N1u3dKiVB7iSdwq
+S/jAXD9wW8OK9PI60/YUGRYeUXmZW9t4XYg1VKCxU9KpVC17LgOtDyXD8BufP1V
uREJEzZw9O7zCCjeHp/ICFjBkc62Net6ZDOO+ZyXPNfddpS1Xq1uUgXLZc/202UR
ID/kewu0pJRDnoxyqznWn9+8D33w/ygXs2slY2Ive0ONtjdgxGcsj2rNbb2RYn2z
OP7br3lLg7qkFh4TtXb61eh/9GYIk6wzP/CrX5l/jH4SjQnrIk5g/X/Cd1qQ/qso
JZzFoonOKvIp5Gw/+fZ9NP3YFcnkoRMv4NjZV8PAmsYLds+ibRiBcoB8u6FmiJV7
WW5uopgPkfCGN5BV3+QHwJDVe+WlgnlzaT5zPUCcP5KWusDts4fWIgzP7vrtAzf4
3OJLrgSGdBeOqWnJD21nxKUD27JOseX7D+BFtwxR4lMsXHqlHJfETpZ8gts1ZGH3
2U353j/jkZvGWmc6dMcuxOXT2K4VqpYeIIqs0IcLu6hM9crtR89zPR2Iu1AilfDW
h2NroF+Q//SgMMzWoTEG6Tn7RAc7MthgA/tRCFZF9CBMzNs988w0CTHnKsIHmjpU
UKkMeJGAC9YrPYIcqrg0oYsmLUWXc8JuZbGJBnei3BzbaMTlcwIN9qj36zfq6xWc
TMLpbWEoIsgFIZMP/hAP
=rpGB
-----END PGP SIGNATURE-----
Merge tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"RTC for 4.8
Cleanups:
- huge cleanup of rtc-generic and char/genrtc this allowed to cleanup
rtc-cmos, rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc
- move mn10300 to rtc-cmos
Subsystem:
- fix wakealarms after hibernate
- multiples fixes for rctest
- simplify implementations of .read_alarm
New drivers:
- Maxim MAX6916
Drivers:
- ds1307: fix weekday
- m41t80: add wakeup support
- pcf85063: add support for PCF85063A variant
- rv8803: extend i2c fix and other fixes
- s35390a: fix alarm reading, this fixes instant reboot after
shutdown for QNAP TS-41x
- s3c: clock fixes"
* tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (65 commits)
rtc: rv8803: Clear V1F when setting the time
rtc: rv8803: Stop the clock while setting the time
rtc: rv8803: Always apply the I²C workaround
rtc: rv8803: Fix read day of week
rtc: rv8803: Remove the check for valid time
rtc: rv8803: Kconfig: Indicate rx8900 support
rtc: asm9260: remove .owner field for driver
rtc: at91sam9: Fix missing spin_lock_init()
rtc: m41t80: add suspend handlers for alarm IRQ
rtc: m41t80: make it a real error message
rtc: pcf85063: Add support for the PCF85063A device
rtc: pcf85063: fix year range
rtc: hym8563: in .read_alarm set .tm_sec to 0 to signal minute accuracy
rtc: explicitly set tm_sec = 0 for drivers with minute accurancy
rtc: s3c: Add s3c_rtc_{enable/disable}_clk in s3c_rtc_setfreq()
rtc: s3c: Remove unnecessary call to disable already disabled clock
rtc: abx80x: use devm_add_action_or_reset()
rtc: m41t80: use devm_add_action_or_reset()
rtc: fix a typo and reduce three empty lines to one
rtc: s35390a: improve two comments in .set_alarm
...
Fixes:
- Fix early access to cpu_spec relocation from Benjamin Herrenschmidt
- Fix incorrect event codes in power9-event-list from Madhavan Srinivasan
- Move register_process_table() out of ppc_md from Michael Ellerman
Use jump_label for [cpu|mmu]_has_feature() from Aneesh Kumar K.V, Kevin Hao and Michael Ellerman:
- Add mmu_early_init_devtree() from Michael Ellerman
- Move disable_radix handling into mmu_early_init_devtree() from Michael Ellerman
- Do hash device tree scanning earlier from Michael Ellerman
- Do radix device tree scanning earlier from Michael Ellerman
- Do feature patching before MMU init from Michael Ellerman
- Check features don't change after patching from Michael Ellerman
- Make MMU_FTR_RADIX a MMU family feature from Aneesh Kumar K.V
- Convert mmu_has_feature() to returning bool from Michael Ellerman
- Convert cpu_has_feature() to returning bool from Michael Ellerman
- Define radix_enabled() in one place & use static inline from Michael Ellerman
- Add early_[cpu|mmu]_has_feature() from Michael Ellerman
- Convert early cpu/mmu feature check to use the new helpers from Aneesh Kumar K.V
- jump_label: Make it possible for arches to invoke jump_label_init() earlier from Kevin Hao
- Call jump_label_init() in apply_feature_fixups() from Aneesh Kumar K.V
- Remove mfvtb() from Kevin Hao
- Move cpu_has_feature() to a separate file from Kevin Hao
- Add kconfig option to use jump labels for cpu/mmu_has_feature() from Michael Ellerman
- Add option to use jump label for cpu_has_feature() from Kevin Hao
- Add option to use jump label for mmu_has_feature() from Kevin Hao
- Catch usage of cpu/mmu_has_feature() before jump label init from Aneesh Kumar K.V
- Annotate jump label assembly from Michael Ellerman
TLB flush enhancements from Aneesh Kumar K.V:
- radix: Implement tlb mmu gather flush efficiently
- Add helper for finding SLBE LLP encoding
- Use hugetlb flush functions
- Drop multiple definition of mm_is_core_local
- radix: Add tlb flush of THP ptes
- radix: Rename function and drop unused arg
- radix/hugetlb: Add helper for finding page size
- hugetlb: Add flush_hugetlb_tlb_range
- remove flush_tlb_page_nohash
Add new ptrace regsets from Anshuman Khandual and Simon Guo:
- elf: Add powerpc specific core note sections
- Add the function flush_tmregs_to_thread
- Enable in transaction NT_PRFPREG ptrace requests
- Enable in transaction NT_PPC_VMX ptrace requests
- Enable in transaction NT_PPC_VSX ptrace requests
- Adapt gpr32_get, gpr32_set functions for transaction
- Enable support for NT_PPC_CGPR
- Enable support for NT_PPC_CFPR
- Enable support for NT_PPC_CVMX
- Enable support for NT_PPC_CVSX
- Enable support for TM SPR state
- Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR
- Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR
- Enable support for EBB registers
- Enable support for Performance Monitor registers
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXpGaLAAoJEFHr6jzI4aWA9aYP/1AqmRPJ9D0XVUJWT+FVABUK
LESESoVFF4Hug1j1F8Synhg5o4SzD2t45iGKbclYaFthOIyovMg7Wr1KSu4hQ0go
rPuQfpXDNQ8jKdDX8hbPXKUxrNRBNfqJGFo5E7mO6wN9AJ9d1LVwQ+jKAva29Tqs
LaAlMbQNbeObPNzOl73B73iew3aozr+mXjBqv82lqvgYknBD2CLf24xGG3eNIbq5
ZZk4LPC8pdkaxnajnzRFzqwiyPWzao0yfpVRKh52TKHBQF/prR/KACb6zUuja/61
krOfegUKob14OYrehjs6X8XNRLnILRI0u1H5bmj7eVEiY/usyNzE93SMHZM3Wdau
sQF/Au4OLNXj0ZQdNBtzRsZRyp1d560Gsj+lQGBoPd4hfIWkFYHvxzxsUSdqv4uA
MWDMwN0Vvfk0cpprsabsWNevkaotYYBU00px5hF/e5ZUc9/x/xYUVMgPEDr0QZLr
cHJo9/Pjk4u/0g4lj+2y1LLl/0tNEZZg69O6bvffPAPVSS4/P4y/bKKYd4I0zL99
Ykp91mSmkl70F3edgOSFqyda2gN2l2Ekb/i081YGXheFy1rbD29Vxv82BOVog4KY
ibvOqp38WDzCVk5OXuCRvBl0VudLKGJYdppU1nXg4KgrTZXHeCAC0E+NzUsgOF4k
OMvQ+5drVxrno+Hw8FVJ
=0Q8E
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull more powerpc updates from Michael Ellerman:
"These were delayed for various reasons, so I let them sit in next a
bit longer, rather than including them in my first pull request.
Fixes:
- Fix early access to cpu_spec relocation from Benjamin Herrenschmidt
- Fix incorrect event codes in power9-event-list from Madhavan Srinivasan
- Move register_process_table() out of ppc_md from Michael Ellerman
Use jump_label use for [cpu|mmu]_has_feature():
- Add mmu_early_init_devtree() from Michael Ellerman
- Move disable_radix handling into mmu_early_init_devtree() from Michael Ellerman
- Do hash device tree scanning earlier from Michael Ellerman
- Do radix device tree scanning earlier from Michael Ellerman
- Do feature patching before MMU init from Michael Ellerman
- Check features don't change after patching from Michael Ellerman
- Make MMU_FTR_RADIX a MMU family feature from Aneesh Kumar K.V
- Convert mmu_has_feature() to returning bool from Michael Ellerman
- Convert cpu_has_feature() to returning bool from Michael Ellerman
- Define radix_enabled() in one place & use static inline from Michael Ellerman
- Add early_[cpu|mmu]_has_feature() from Michael Ellerman
- Convert early cpu/mmu feature check to use the new helpers from Aneesh Kumar K.V
- jump_label: Make it possible for arches to invoke jump_label_init() earlier from Kevin Hao
- Call jump_label_init() in apply_feature_fixups() from Aneesh Kumar K.V
- Remove mfvtb() from Kevin Hao
- Move cpu_has_feature() to a separate file from Kevin Hao
- Add kconfig option to use jump labels for cpu/mmu_has_feature() from Michael Ellerman
- Add option to use jump label for cpu_has_feature() from Kevin Hao
- Add option to use jump label for mmu_has_feature() from Kevin Hao
- Catch usage of cpu/mmu_has_feature() before jump label init from Aneesh Kumar K.V
- Annotate jump label assembly from Michael Ellerman
TLB flush enhancements from Aneesh Kumar K.V:
- radix: Implement tlb mmu gather flush efficiently
- Add helper for finding SLBE LLP encoding
- Use hugetlb flush functions
- Drop multiple definition of mm_is_core_local
- radix: Add tlb flush of THP ptes
- radix: Rename function and drop unused arg
- radix/hugetlb: Add helper for finding page size
- hugetlb: Add flush_hugetlb_tlb_range
- remove flush_tlb_page_nohash
Add new ptrace regsets from Anshuman Khandual and Simon Guo:
- elf: Add powerpc specific core note sections
- Add the function flush_tmregs_to_thread
- Enable in transaction NT_PRFPREG ptrace requests
- Enable in transaction NT_PPC_VMX ptrace requests
- Enable in transaction NT_PPC_VSX ptrace requests
- Adapt gpr32_get, gpr32_set functions for transaction
- Enable support for NT_PPC_CGPR
- Enable support for NT_PPC_CFPR
- Enable support for NT_PPC_CVMX
- Enable support for NT_PPC_CVSX
- Enable support for TM SPR state
- Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR
- Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR
- Enable support for EBB registers
- Enable support for Performance Monitor registers"
* tag 'powerpc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (48 commits)
powerpc/mm: Move register_process_table() out of ppc_md
powerpc/perf: Fix incorrect event codes in power9-event-list
powerpc/32: Fix early access to cpu_spec relocation
powerpc/ptrace: Enable support for Performance Monitor registers
powerpc/ptrace: Enable support for EBB registers
powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR
powerpc/ptrace: Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR
powerpc/ptrace: Enable support for TM SPR state
powerpc/ptrace: Enable support for NT_PPC_CVSX
powerpc/ptrace: Enable support for NT_PPC_CVMX
powerpc/ptrace: Enable support for NT_PPC_CFPR
powerpc/ptrace: Enable support for NT_PPC_CGPR
powerpc/ptrace: Adapt gpr32_get, gpr32_set functions for transaction
powerpc/ptrace: Enable in transaction NT_PPC_VSX ptrace requests
powerpc/ptrace: Enable in transaction NT_PPC_VMX ptrace requests
powerpc/ptrace: Enable in transaction NT_PRFPREG ptrace requests
powerpc/process: Add the function flush_tmregs_to_thread
elf: Add powerpc specific core note sections
powerpc/mm: remove flush_tlb_page_nohash
powerpc/mm/hugetlb: Add flush_hugetlb_tlb_range
...
We should set the error code here rather than incorrectly returning 0.
Otherwise static checkers complain.
Link: http://lkml.kernel.org/r/20160804053525.GM775@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The stringify_in_c() macro may not be included. Make the dependency
explicit.
Link: http://lkml.kernel.org/r/564720c5328edd53c9d56db325be7215440eec3e.1467837322.git.jbaron@akamai.com
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Joe Perches <joe@perches.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer. Thus the pointer can point to const data.
However the attributes do not have to be a bitfield. Instead unsigned
long will do fine:
1. This is just simpler. Both in terms of reading the code and setting
attributes. Instead of initializing local attributes on the stack
and passing pointer to it to dma_set_attr(), just set the bits.
2. It brings safeness and checking for const correctness because the
attributes are passed by value.
Semantic patches for this change (at least most of them):
virtual patch
virtual context
@r@
identifier f, attrs;
@@
f(...,
- struct dma_attrs *attrs
+ unsigned long attrs
, ...)
{
...
}
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
and
// Options: --all-includes
virtual patch
virtual context
@r@
identifier f, attrs;
type t;
@@
t f(..., struct dma_attrs *attrs);
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We want to initialise register_process_table() before ppc_md is setup,
so that it can be called as part of MMU init (at least on Radix ATM).
That no longer works because probe_machine() requires that ppc_md be
empty before it's called, and we now do probe_machine() much later.
So make register_process_table a global for now. It will probably move
into a mmu_radix_ops struct at some point in the future.
This was broken by me when applying commit 7025776ed1 "powerpc/mm:
Move hash table ops to a separate structure" due to conflicts with other
patches.
Fixes: 7025776ed1 ("powerpc/mm: Move hash table ops to a separate structure")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
These have been changed in the hardware, update Linux's version.
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 9402c68461 ("powerpc: Factor do_feature_fixup calls")
introduced a subtle bug on 32-bit. When reading the cpu spec from the
global, we not only need to do a pointer relocation on the global
address but also on the pointer we read from it.
This fixes crashes reported on MPC5200 based machines.
Fixes: 9402c68461 ("powerpc: Factor do_feature_fixup calls")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Merge yet more updates from Andrew Morton:
- the rest of ocfs2
- various hotfixes, mainly MM
- quite a bit of misc stuff - drivers, fork, exec, signals, etc.
- printk updates
- firmware
- checkpatch
- nilfs2
- more kexec stuff than usual
- rapidio updates
- w1 things
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (111 commits)
ipc: delete "nr_ipc_ns"
kcov: allow more fine-grained coverage instrumentation
init/Kconfig: add clarification for out-of-tree modules
config: add android config fragments
init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig
relay: add global mode support for buffer-only channels
init: allow blacklisting of module_init functions
w1:omap_hdq: fix regression
w1: add helper macro module_w1_family
w1: remove need for ida and use PLATFORM_DEVID_AUTO
rapidio/switches: add driver for IDT gen3 switches
powerpc/fsl_rio: apply changes for RIO spec rev 3
rapidio: modify for rev.3 specification changes
rapidio: change inbound window size type to u64
rapidio/idt_gen2: fix locking warning
rapidio: fix error handling in mbox request/release functions
rapidio/tsi721_dma: advance queue processing from transfer submit call
rapidio/tsi721: add messaging mbox selector parameter
rapidio/tsi721: add PCIe MRRS override parameter
rapidio/tsi721_dma: add channel mask and queue size parameters
...
Current definition of map_inb() mport operations callback uses u32 type
to specify required inbound window (IBW) size. This is limiting factor
because existing hardware - tsi721 and fsl_rio, both support IBW size up
to 16GB.
Changing type of size parameter to u64 to allow IBW size configurations
larger than 4GB.
[alexandre.bounine@idt.com: remove compiler warning about size of constant]
Link: http://lkml.kernel.org/r/20160802184856.2566-1-alexandre.bounine@idt.com
Link: http://lkml.kernel.org/r/1469125134-16523-11-git-send-email-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In general, there's no need for the "restore sigmask" flag to live in
ti->flags. alpha, ia64, microblaze, powerpc, sh, sparc (64-bit only),
tile, and x86 use essentially identical alternative implementations,
placing the flag in ti->status.
Replace those optimized implementations with an equally good common
implementation that stores it in a bitfield in struct task_struct and
drop the custom implementations.
Additional architectures can opt in by removing their
TIF_RESTORE_SIGMASK defines.
Link: http://lkml.kernel.org/r/8a14321d64a28e40adfddc90e18a96c086a6d6f9.1468522723.git.luto@kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For pure bool function's return value, bool is a little better more or
less than int.
Link: http://lkml.kernel.org/r/1469331815-2026-1-git-send-email-chengang@emindsoft.com.cn
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There was only one use of __initdata_refok and __exit_refok
__init_refok was used 46 times against 82 for __ref.
Those definitions are obsolete since commit 312b1485fb ("Introduce new
section reference annotations tags: __ref, __refdata, __refconst")
This patch removes the following compatibility definitions and replaces
them treewide.
/* compatibility defines */
#define __init_refok __ref
#define __initdata_refok __refdata
#define __exit_refok __ref
I can also provide separate patches if necessary.
(One patch per tree and check in 1 month or 2 to remove old definitions)
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>