Function eeh_pe_state_mark() could possibly have combination of
multiple EEH PE state as its argument. The patch fixes the condition
used to check if EEH_PE_ISOLATED is included.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Michael points out that __get_SP() is a pretty horrible
function name. Let's give it a better name.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Li Zhong points out an issue with our current __get_SP()
implementation. If ftrace function tracing is enabled (ie -pg
profiling using _mcount) we spill a stack frame on 64bit all the
time.
If a function calls __get_SP() and later calls a function that is
tail call optimised, we will pop the stack frame and the value
returned by __get_SP() is no longer valid. An example from Li can
be found in save_stack_trace -> save_context_stack:
c0000000000432c0 <.save_stack_trace>:
c0000000000432c0: mflr r0
c0000000000432c4: std r0,16(r1)
c0000000000432c8: stdu r1,-128(r1) <-- stack frame for _mcount
c0000000000432cc: std r3,112(r1)
c0000000000432d0: bl <._mcount>
c0000000000432d4: nop
c0000000000432d8: mr r4,r1 <-- __get_SP()
c0000000000432dc: ld r5,632(r13)
c0000000000432e0: ld r3,112(r1)
c0000000000432e4: li r6,1
c0000000000432e8: addi r1,r1,128 <-- pop stack frame
c0000000000432ec: ld r0,16(r1)
c0000000000432f0: mtlr r0
c0000000000432f4: b <.save_context_stack> <-- tail call optimized
save_context_stack ends up with a stack pointer below the current
one, and it is likely to be scribbled over.
Fix this by making __get_SP() a function which returns the
callers stack frame. Also replace inline assembly which grabs
the stack pointer in save_stack_trace and show_stack with
__get_SP().
This also fixes an issue with perf_arch_fetch_caller_regs().
It currently unwinds the stack once, which will skip a
valid stack frame on a leaf function. With the __get_SP() fixes
in this patch, we never need to unwind the stack frame to get
to the first interesting frame.
We have to export __get_SP() because perf_arch_fetch_caller_regs()
(which is used in modules) calls it from a header file.
Reported-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In HMI interrupt handler we don't touch SRR0/SRR1, instead we touch
HSRR0/HSRR1. Hence we don't need to clear MSR_RI bit.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Freescale updates from Scott (27 commits):
"Highlights include DMA32 zone support (SATA, USB, etc now works on 64-bit
FSL kernels), MSI changes, 8xx optimizations and cleanup, t104x board
support, and PrPMC PCI enumeration."
pci_bus_find_capability() is decleared in pci.h, so it is not necessary to do
it again.
This patch removes it.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add printk levels to some places in the powerpc port.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There is no need for yet another copy of the command line, just
use boot_command_line like everyone else.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use pr_fmt to give some context to the error messages in the
module code, and convert open coded debug printk to pr_debug.
Use pr_err for error messages.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
As Michael suggested, the hex prefix for the output of EEH PE
state sysfs entry (/sys/bus/pci/devices/xxx/eeh_pe_state) is
always informative to users.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The dma_get_required_mask() function is used by some drivers to
query the platform about what DMA mask is needed to cover all of
memory. This is a bit of a strange semantic when we have to choose
between IOMMU translation or bypass, but essentially what it means
is "what DMA mask will give best performances".
Currently, our IOMMU backend always returns a 32-bit mask here, we
don't do anything special to it when we have bypass available. This
causes some drivers to choose a 32-bit mask, thus losing the ability
to use the bypass window, thinking this is more efficient. The problem
was reported from the driver of following device:
0004:03:00.0 0107: 1000:0087 (rev 05)
0004:03:00.0 Serial Attached SCSI controller: LSI Logic / Symbios \
Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)
This patch adds an override of that function in order to, instead,
return a 64-bit mask whenever a bypass window is available in order
for drivers to prefer this configuration.
Reported-by: Murali N. Iyer <mniyer@us.ibm.com>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The PEs can be organized as nested. Current implementation doesn't
dump PCI config space for subordinate devices of child PEs. However,
the frozen PE could be caused by those subordinate devices of its
child PEs.
The patch dumps PCI config space for all subordinate devices of the
problematic PE.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When enabling EEH functionality on passed through devices (PE)
with VFIO, the devices in the PE would be removed permanently
from guest side. In that case, the PE remains frozen state.
When returning PE to host, or restarting the guest again, we
had mechanism unfreezing the PE by clearing PESTA/B frozen
bits. However, that's not enough for some adapters, which are
indicated as following "lspci" shows. Those adapters require
hot reset on the parent bus to bring their firmware back to
workable state. Otherwise, those adaptrs won't be operative
and the host (for returning case) or the guest will fail to
load the drivers for those adapters without exception.
0000:01:00.0 Ethernet controller: Emulex Corporation OneConnect \
10Gb NIC (be3) (rev 02)
0000:01:00.0 0200: 19a2:0710 (rev 02)
0001:03:00.0 Ethernet controller: Emulex Corporation OneConnect \
NIC (Lancer) (rev 10)
0001:03:00.0 0200: 10df:e220 (rev 10)
The patch adds mechanism to emulate EEH recovery (for hot reset
on parent PCI bus) on 3 gates to fix the issue: open/release one
adapter of the PE, enable EEH functionality on one adapter of the
PE.
Reported-by: Murilo Fossa Vicentini <muvic@br.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
PE would be owned by userland, which probably request PE reset
done in host side. During the reset, we should drop the PCI
config accesses to the PE with help of flag EEH_PE_RESET.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Function pcibios_set_pcie_reset_state() can be used to do PCI
reset. PCI config access during the reset usually causes EEH
errors unexpectedly. In order to avoid the EEH error, the patch
blocks PCI config access during reset with the help of flag
EEH_PE_RESET, which is similar to what we did in EEH PE reset
path.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The patch uses eeh_unfreeze_pe() to replace the logic clearing
frozen IO and DMA, in order to simplify the code.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When passing through PE to guest, that's possibly in frozen
state. The driver for the pass-through devices on guest side
can't be loaded successfully as reported. We already had one
gate in eeh_dev_open() to clear PE frozen state accordingly,
but that's not enough because the function is only called at
QEMU startup for once.
The patch adds another gate in eeh_pe_set_option() so that the
PE frozen state can be cleared at QEMU restart time.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The function eeh_pci_enable() is called to apply various requests
to one particular PE: Enabling EEH, Disabling EEH, Enabling IO,
Enabling DMA, Freezing PE. When enabling IO or DMA on one specific
PE, we need check that IO or DMA isn't enabled previously. But
the condition used to do the check isn't completely correct because
one PE would be in DMA frozen state with workable IO path, or vice
versa.
The patch fixes the improper condition.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The problem was reported by Carol: In the scenario of passing mlx4
adapter to guest, EEH error could be recovered successfully. When
returning the device back to host, the driver (mlx4_core.ko)
couldn't be loaded successfully because of error number -5 (-EIO)
returned from mlx4_get_ownership(), which hits offlined PCI device.
The root cause is that we missed to put the affected devices into
normal state on clearing PE isolated state right after PE reset.
The patch fixes above issue by putting the affected devices to
normal state when clearing PE isolated state in eeh_pe_state_clear().
Cc: stable@vger.kernel.org
Reported-by: Carol L. Soto <clsoto@us.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When passing through device, its PE might have been put into frozen
state. One obvious example would be: the passed PE is forced to be
offline because of hitting maximal allowed EEH errors in userland.
In that case, the frozen state won't be cleared and then the PE is
returned back to host, which might not have chance detecting and
recovering from it.
The patch adds more check when passing through device and clear the
PE frozen state if necessary.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The PCI devices that have been passed through are enabled before
reset, we need restore to the enabled state after reset. Otherwise,
MMIO access might be issued to disabled devices after reset and
causes exceptional recursive EEH error.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The patch adds one more option (EEH_OPT_FREEZE_PE) to set_option()
method to proactively freeze PE, which will be issued before resetting
pass-throughed PE to drop MMIO access during reset because it's
always contributing to recursive EEH error.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The patch adds sysfs entry "eeh_pe_state". Reading on it returns
the PE's state while writing to it clears the frozen state. It's
used to check or clear the PE frozen state from userland for
debugging purpose.
The patch also replaces printk(KERN_WARNING ...) with pr_warn() in
eeh_sysfs.c
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
eeh_check_failure() is used to check frozen state of the PE which
owns the indicated I/O address. The argument "val" of the function
isn't used. The patch drops it and return the frozen state of the
PE as expected.
Cc: Vishal Mansur <vmansur@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
At boot we display a bunch of low level settings which can be useful to
know, and can help to spot bugs when things are fundamentally
misconfigured.
At the moment they are very widely spaced, so that we can accommodate
the line:
ppc64_caches.dcache_line_size = 0xYY
But we only print that line when the cache line size is not 128, ie.
almost never, so it just makes the display look odd usually.
The ppc64_caches prefix is redundant so remove it, which means we can
align things a bit closer for the common case. While we're there
replace the last use of camelCase (physicalMemorySize), and use
phys_mem_size.
Before:
Starting Linux PPC64 #104 SMP Wed Aug 6 18:41:34 EST 2014
-----------------------------------------------------
ppc64_pft_size = 0x1a
physicalMemorySize = 0x200000000
ppc64_caches.dcache_line_size = 0xf0
ppc64_caches.icache_line_size = 0xf0
htab_address = 0xdeadbeef
htab_hash_mask = 0x7ffff
physical_start = 0xf000bar
-----------------------------------------------------
After:
Starting Linux PPC64 #103 SMP Wed Aug 6 18:38:04 EST 2014
-----------------------------------------------------
ppc64_pft_size = 0x1a
phys_mem_size = 0x200000000
dcache_line_size = 0xf0
icache_line_size = 0xf0
htab_address = 0xdeadbeef
htab_hash_mask = 0x7ffff
physical_start = 0xf000bar
-----------------------------------------------------
This patch is final, no bike shedding ;)
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
As Nish suggested, it makes more sense to init the numa node informatiion
for present cpus at boottime, which could also avoid WARN_ON(1) in
numa_setup_cpu().
With this change, we also need to change the smp_prepare_cpus() to set up
numa information only on present cpus.
For those possible, but not present cpus, their numa information
will be set up after they are started, as the original code did before commit
2fabf084b6.
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Tested-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In commit e6a6928c3e "of/fdt: Convert FDT functions to use libfdt",
the kernel stopped supporting old flat device tree formats. The minimum
supported version is now 0x10.
There was a checking function added, early_init_dt_verify(), but it's
not called on powerpc.
The result is, if you boot with an old flat device tree, the kernel will
fail to parse it correctly, think you have no memory etc. and hilarity
ensues.
We can't really fix it, but we can at least catch the fact that the
device tree is in an unsupported format and panic(). We can't call
BUG(), it's too early.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On PowerNV platforms, when a CPU is offline, we put it into nap mode.
It's possible that the CPU wakes up from nap mode while it is still
offline due to a stray IPI. A misdirected device interrupt could also
potentially cause it to wake up. In that circumstance, we need to clear
the interrupt so that the CPU can go back to nap mode.
In the past the clearing of the interrupt was accomplished by briefly
enabling interrupts and allowing the normal interrupt handling code
(do_IRQ() etc.) to handle the interrupt. This has the problem that
this code calls irq_enter() and irq_exit(), which call functions such
as account_system_vtime() which use RCU internally. Use of RCU is not
permitted on offline CPUs and will trigger errors if RCU checking is
enabled.
To avoid calling into any generic code which might use RCU, we adopt
a different method of clearing interrupts on offline CPUs. Since we
are on the PowerNV platform, we know that the system interrupt
controller is a XICS being driven directly (i.e. not via hcalls) by
the kernel. Hence this adds a new icp_native_flush_interrupt()
function to the native-mode XICS driver and arranges to call that
when an offline CPU is woken from nap. This new function reads the
interrupt from the XICS. If it is an IPI, it clears the IPI; if it
is a device interrupt, it prints a warning and disables the source.
Then it does the end-of-interrupt processing for the interrupt.
The other thing that briefly enabling interrupts did was to check and
clear the irq_happened flag in this CPU's PACA. Therefore, after
flushing the interrupt from the XICS, we also clear all bits except
the PACA_IRQ_HARD_DIS (interrupts are hard disabled) bit from the
irq_happened flag. The PACA_IRQ_HARD_DIS flag is set by power7_nap()
and is left set to indicate that interrupts are hard disabled. This
means we then have to ignore that flag in power7_nap(), which is
reasonable since it doesn't indicate that any interrupt event needs
servicing.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
of_device_ids (i.e. compatible strings and the respective data) are not
supposed to change at runtime. All functions working with of_device_ids
provided by <linux/of.h> work with const of_device_ids. This allows to
mark all struct of_device_id const, too.
While touching these line also put the __init annotation at the right
position where necessary.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix a number of places where global functions were not including
their prototype. This ensures the prototype and the function match.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Simplify things considerably by moving all the ppc32 specific
symbol exports into its own file.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Move the lib symbol exports closer to their function definitions
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Branching takes two cycles on MPC8xx. Lets duplicate the two instructions
and avoid the branching.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
By XORing the upper part of the instruction code, we get a value that can
directly be verified with the second test and we can remove the first test.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
r10 and r3 are only used inside FixupDAR function. So lets save them inside
that function only.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Since commit 2321f33790, dirty handling is not
handled here anymore. So we fix the comment.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Since commit 2321f33790, r10 is not used anymore
after FixupDAR. There is therefore no need to set it up with the value of DAR.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
SCRATCH0 and SCRATCH1 are only used in Exceptions prologs where no other
exception can happen. There is therefore no need to preserve them accross
TLB handlers, we can use them there as in other exceptions. One of the
advantages is that they do not suffer CPU6 errata unlike M_TW register.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Since commit 469d62be92, SPRG2 is used as a
scratch register just like SPRG0 and SPRG1. So Declare it as such and fix
the comment which is not valid anymore since that commit.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <scottwood@freescale.com>
A DMA zone is still needed with swiotlb, for coherent allocations.
This doesn't affect platforms that don't use swiotlb or that don't call
swiotlb_detect_4g().
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
FSL PCI cannot directly address the whole lower 4 GiB due to
conflicts with PCICSRBAR and outbound windows, and thus
max_direct_dma_addr is less than 4GiB. Honor that limit in
dma_direct_alloc_coherent().
Note that setting the DMA mask to 31 bits is not an option, since many
PCI drivers would fail if we reject 32-bit DMA in dma_supported(), and
we have no control over the setting of coherent_dma_mask if
dma_supported() returns true.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>