do_device_not_available() is the handler for #NM and it declares that
it takes a unsigned long and calls math_emu(), which takes a long
argument and surprisingly expects the stack frame starting at the zero
argument would match struct math_emu_info, which isn't true regardless
of configuration in the current code.
This patch makes do_device_not_available() take struct pt_regs like
other exception handlers and initialize struct math_emu_info with
pointer to it and pass pointer to the math_emu_info to math_emulate()
like normal C functions do. This way, unless gcc makes a copy of
struct pt_regs in do_device_not_available(), the register frame is
correctly accessed regardless of kernel configuration or compiler
used.
This doesn't fix all math_emu problems but it at least gets it
somewhat working.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
* Come on, struct info? s/struct info/struct math_emu_info/
* Use struct pt_regs and kernel_vm86_regs instead of defining its own
register frame structure.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: dump the correct %gs into a.out core dump
aout_dump_thread() read %gs but didn't include it in core dump. Fix
it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: find right nr_irqs_gsi on some systems.
One test-system has gap between gsi's:
[ 0.000000] ACPI: IOAPIC (id[0x04] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 4, version 0, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: IOAPIC (id[0x05] address[0xfeafd000] gsi_base[48])
[ 0.000000] IOAPIC[1]: apic_id 5, version 0, address 0xfeafd000, GSI 48-54
[ 0.000000] ACPI: IOAPIC (id[0x06] address[0xfeafc000] gsi_base[56])
[ 0.000000] IOAPIC[2]: apic_id 6, version 0, address 0xfeafc000, GSI 56-62
...
[ 0.000000] nr_irqs_gsi: 38
So nr_irqs_gsi is not right. some irq for MSI will overwrite with io_apic.
need to get that with acpi_probe_gsi when acpi io_apic is used
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For Intel 7400 series CPUs, the recommendation is to use a clflush on the
monitored address just before monitor and mwait pair [1].
This clflush makes sure that there are no false wakeups from mwait when the
monitored address was recently written to.
[1] "MONITOR/MWAIT Recommendations for Intel Xeon Processor 7400 series"
section in specification update document of 7400 series
http://download.intel.com/design/xeon/specupdt/32033601.pdf
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup and bug fix
Use the linker to create symbols for certain per-cpu variables
that are offset by __per_cpu_load. This allows the removal of
the runtime fixup of the GDT pointer, which fixes a bug with
resume reported by Jiri Slaby.
Reported-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On an x86 system which doesn't support global mappings,
__supported_pte_mask has _PAGE_GLOBAL clear, to make sure it never
appears in the PTE. pfn_pte() and so on will enforce it with:
static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
{
return __pte((((phys_addr_t)page_nr << PAGE_SHIFT) |
pgprot_val(pgprot)) & __supported_pte_mask);
}
However, we overload _PAGE_GLOBAL with _PAGE_PROTNONE on non-present
ptes to distinguish them from swap entries. However, applying
__supported_pte_mask indiscriminately will clear the bit and corrupt the
pte.
I guess the best fix is to only apply __supported_pte_mask to present
ptes. This seems like the right solution to me, as it means we can
completely ignore the issue of overlaps between the present pte bits and
the non-present pte-as-swap entry use of the bits.
__supported_pte_mask contains the set of flags we support on the
current hardware. We also use bits in the pte for things like
logically present ptes with no permissions, and swap entries for
swapped out pages. We should only apply __supported_pte_mask to
present ptes, because otherwise we may destroy other information being
stored in the ptes.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: cleanup
In __put_user_size() macro errret is used for error value.
But if size is 8, errret isn't passed to__put_user_asm_u64().
This behavior is inconsistent.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: Bug fix
A hunk went missing in the original patch, and callee-save callsites were
not marked as returning the upper 32-bit of result, causing Badness.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: bugfix
In the 32-bit calling convention, %eax:%edx is used to return 64-bit
values. Don't save and restore %edx around wrapped functions, or they
can't return a full 64-bit result.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: split out a function, no functional change
Xen needs to be able to access percpu data from very early on. For
various reasons, it cannot also load the gdt at that time. It does,
however, have a pefectly functional gdt at that point, so there's no
pressing need to reload the gdt.
Split the function to load the segment registers off, so Xen can call
it directly.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup, prepare for xen boot fix.
Xen needs to call this function very early to setup the GDT and
per-cpu segments. Remove the call to smp_processor_id() and just
pass in the cpu number.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: Fix build when CONFIG_PARAVIRT_DEBUG is enabled
Fix missed convertion to using callee-saved calls for pud_val, which
causes a compile error when CONFIG_PARAVIRT_DEBUG is enabled.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Impact: Optimization
In the native case, pte_val, make_pte, etc are all just identity
functions, so there's no need to clobber a lot of registers over them.
(This changes the 32-bit callee-save calling convention to return both
EAX and EDX so functions can return 64-bit values.)
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: Optimization
Functions with the callee save calling convention clobber many fewer
registers than the normal C calling convention. Implement variants of
PVOP_V?CALL* accordingly. This only bothers with functions up to 3
args, since functions with more args may as well use the normal
calling convention.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: Optimization
One of the problems with inserting a pile of C calls where previously
there were none is that the register pressure is greatly increased.
The C calling convention says that the caller must expect a certain
set of registers may be trashed by the callee, and that the callee can
use those registers without restriction. This includes the function
argument registers, and several others.
This patch seeks to alleviate this pressure by introducing wrapper
thunks that will do the register saving/restoring, so that the
callsite doesn't need to worry about it, but the callee function can
be conventional compiler-generated code. In many cases (particularly
performance-sensitive cases) the callee will be in assembler anyway,
and need not use the compiler's calling convention.
Standard calling convention is:
arguments return scratch
x86-32 eax edx ecx eax ?
x86-64 rdi rsi rdx rcx rax r8 r9 r10 r11
The thunk preserves all argument and scratch registers. The return
register is not preserved, and is available as a scratch register for
unwrapped callee code (and of course the return value).
Wrapped function pointers are themselves wrapped in a struct
paravirt_callee_save structure, in order to get some warning from the
compiler when functions with mismatched calling conventions are used.
The most common paravirt ops, both statically and dynamically, are
interrupt enable/disable/save/restore, so handle them first. This is
particularly easy since their calls are handled specially anyway.
XXX Deal with VMI. What's their calling convention?
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: Optimization
Each asm paravirt-ops call says what registers are available for
clobbering. This patch makes use of this to selectively save/restore
registers around each pvops call. In many cases this significantly
shrinks code size.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Impact: Optimization
Several paravirt ops implementations simply return their arguments,
the most obvious being the make_pte/pte_val class of operations on
native.
On 32-bit, the identity function is literally a no-op, as the calling
convention uses the same registers for the first argument and return.
On 64-bit, it can be implemented with a single "mov".
This patch adds special identity functions for 32 and 64 bit argument,
and machinery to recognize them and replace them with either nops or a
mov as appropriate.
At the moment, the only users for the identity functions are the
pagetable entry conversion functions.
The result is a measureable improvement on pagetable-heavy benchmarks
(2-3%, reducing the pvops overhead from 5 to 2%).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
fix the following 'make headers_check' warnings:
usr/include/asm/swab.h:4: include of <linux/types.h> is preferred over <asm/types.h>
usr/include/asm/swab.h:7: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
fix the following 'make headers_check' warning:
usr/include/asm/sigcontext32.h:20: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
fix the following 'make headers_check' warnings:
usr/include/asm/sigcontext.h:5: include of <linux/types.h> is preferred over <asm/types.h>
usr/include/asm/sigcontext.h:24: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
fix the following 'make headers_check' warnings:
usr/include/asm/ptrace-abi.h:86: include of <linux/types.h> is preferred over <asm/types.h>
usr/include/asm/ptrace-abi.h:93: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
fix the following 'make headers_check' warning:
usr/include/asm/mtrr.h:61: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
fix the following 'make headers_check' warnings:
usr/include/asm/mce.h:7: include of <linux/types.h> is preferred over <asm/types.h>
usr/include/asm/mce.h:29: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
fix the following 'make headers_check' warnings:
usr/include/asm/kvm.h:9: include of <linux/types.h> is preferred over <asm/types.h>
usr/include/asm/kvm.h:16: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
fix the following 'make headers_check' warning:
usr/include/asm/e820.h:44: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Move DMA-mapping.txt to Documentation/PCI/.
DMA-mapping.txt was supposed to be moved from Documentation/ to
Documentation/PCI/. The 00-INDEX files in those two directories
were updated, along with a few other text files, but the file
itself somehow escaped being moved, so move it and update more
text files and source files with its new location.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In case of !CONFIG_X86_WP_WORKS_OK, __put_user_size_ex() is not defined.
Add macros for !CONFIG_X86_WP_WORKS_OK case.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: sync 32 and 64-bit code
Merge load_gs_base() into switch_to_new_gdt(). Load the GDT and
per-cpu state for the boot cpu when its new area is set up.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: optimization
mb() generates an mfence instruction, which is not needed here. Only
a compiler barrier is needed, and that is handled by the memory clobber
in the wrmsrl function.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
Rename init_gdt() to setup_percpu_segment(), and move it to
setup_percpu.c.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: Code movement, no functional change.
Move setup_cpu_local_masks() to kernel/cpu/common.c, where the
masks are defined.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: Code movement, no functional change.
Move the 64-bit NUMA code from setup_percpu.c to numa_64.c
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (29 commits)
xen: unitialised return value in xenbus_write_transaction
x86: fix section mismatch warning
x86: unmask CPUID levels on Intel CPUs, fix
x86: work around PAGE_KERNEL_WC not getting WC in iomap_atomic_prot_pfn.
x86: use standard PIT frequency
xen: handle highmem pages correctly when shrinking a domain
x86, mm: fix pte_free()
xen: actually release memory when shrinking domain
x86: unmask CPUID levels on Intel CPUs
x86: add MSR_IA32_MISC_ENABLE bits to <asm/msr-index.h>
x86: fix PTE corruption issue while mapping RAM using /dev/mem
x86: mtrr fix debug boot parameter
x86: fix page attribute corruption with cpa()
Revert "x86: signal: change type of paramter for sys_rt_sigreturn()"
x86: use early clobbers in usercopy*.c
x86: remove kernel_physical_mapping_init() from init section
fix: crash: IP: __bitmap_intersects+0x48/0x73
cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
work_on_cpu: Use our own workqueue.
work_on_cpu: don't try to get_online_cpus() in work_on_cpu.
...
the RDC and ELAN platforms use slighly different PIT clocks, resulting in
a timex.h hack that changes PIT_TICK_RATE during build time. But if a
tester enables any of these platform support .config options, the PIT
will be miscalibrated on standard PC platforms.
So use one frequency - in a subsequent patch we'll add a quirk to allow
x86 platforms to define different PIT frequencies.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: introduce new uaccess exception handling framework
Introduce {get|put}_user_try and {get|put}_user_catch as new uaccess exception
handling framework.
{get|put}_user_try begins exception block and {get|put}_user_catch(err) ends
the block and gets err if an exception occured in {get|put}_user_ex() in the
block. The exception is stored thread_info->uaccess_err.
The example usage of this framework is below;
int func()
{
int err = 0;
get_user_try {
get_user_ex(...);
get_user_ex(...);
:
} get_user_catch(err);
return err;
}
Note: get_user_ex() is not clear the value when an exception occurs, it's
different from the behavior of __get_user(), but I think it doesn't matter.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
On -rt we were seeing spurious bad page states like:
Bad page state in process 'firefox'
page:c1bc2380 flags:0x40000000 mapping:c1bc2390 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
Pid: 503, comm: firefox Not tainted 2.6.26.8-rt13 #3
[<c043d0f3>] ? printk+0x14/0x19
[<c0272d4e>] bad_page+0x4e/0x79
[<c0273831>] free_hot_cold_page+0x5b/0x1d3
[<c02739f6>] free_hot_page+0xf/0x11
[<c0273a18>] __free_pages+0x20/0x2b
[<c027d170>] __pte_alloc+0x87/0x91
[<c027d25e>] handle_mm_fault+0xe4/0x733
[<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
[<c043f680>] ? rt_mutex_down_read_trylock+0x57/0x63
[<c0218875>] do_page_fault+0x36f/0x88a
This is the case where a concurrent fault already installed the PTE and
we get to free the newly allocated one.
This is due to pgtable_page_ctor() doing the spin_lock_init(&page->ptl)
which is overlaid with the {private, mapping} struct.
union {
struct {
unsigned long private;
struct address_space *mapping;
};
spinlock_t ptl;
struct kmem_cache *slab;
struct page *first_page;
};
Normally the spinlock is small enough to not stomp on page->mapping, but
PREEMPT_RT=y has huge 'spin'locks.
But lockdep kernels should also be able to trigger this splat, as the
lock tracking code grows the spinlock to cover page->mapping.
The obvious fix is calling pgtable_page_dtor() like the regular pte free
path __pte_free_tlb() does.
It seems all architectures except x86 and nm10300 already do this, and
nm10300 doesn't seem to use pgtable_page_ctor(), which suggests it
doesn't do SMP or simply doesnt do MMU at all or something.
Signed-off-by: Peter Zijlstra <a.p.zijlsta@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Impact: better code generation and removal of unused field for 32bit
In general, use the 64-bit version.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: cleanup
APIC definitions aren't needed here. Remove the include and fix
up the fallout.
tj: added include to mce_intel_64.c.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Impact: bogus irq_cpustat field removed
idle_timestamp is left over from the removed irqbalance code.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>