linux_old1

Commit Graph

Author	SHA1	Message	Date
Kirill A. Shutemov	e180377f1a	thp: change split_huge_page_pmd() interface Pass vma instead of mm and add address parameter. In most cases we already have vma on the stack. We provides split_huge_page_pmd_mm() for few cases when we have mm, but not vma. This change is preparation to huge zero pmd splitting implementation. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-12 17:38:31 -08:00
Linus Torvalds	9977d9b379	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull big execve/kernel_thread/fork unification series from Al Viro: "All architectures are converted to new model. Quite a bit of that stuff is actually shared with architecture trees; in such cases it's literally shared branch pulled by both, not a cherry-pick. A lot of ugliness and black magic is gone (-3KLoC total in this one): - kernel_thread()/kernel_execve()/sys_execve() redesign. We don't do syscalls from kernel anymore for either kernel_thread() or kernel_execve(): kernel_thread() is essentially clone(2) with callback run before we return to userland, the callbacks either never return or do successful do_execve() before returning. kernel_execve() is a wrapper for do_execve() - it doesn't need to do transition to user mode anymore. As a result kernel_thread() and kernel_execve() are arch-independent now - they live in kernel/fork.c and fs/exec.c resp. sys_execve() is also in fs/exec.c and it's completely architecture-independent. - daemonize() is gone, along with its parts in fs/.c - struct pt_regs is no longer passed to do_fork/copy_process/ copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump. - sys_fork()/sys_vfork()/sys_clone() unified; some architectures still need wrappers (ones with callee-saved registers not saved in pt_regs on syscall entry), but the main part of those suckers is in kernel/fork.c now." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits) do_coredump(): get rid of pt_regs argument print_fatal_signal(): get rid of pt_regs argument ptrace_signal(): get rid of unused arguments get rid of ptrace_signal_deliver() arguments new helper: signal_pt_regs() unify default ptrace_signal_deliver flagday: kill pt_regs argument of do_fork() death to idle_regs() don't pass regs to copy_process() flagday: don't pass regs to copy_thread() bfin: switch to generic vfork, get rid of pointless wrappers xtensa: switch to generic clone() openrisc: switch to use of generic fork and clone unicore32: switch to generic clone(2) score: switch to generic fork/vfork/clone c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone() take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h mn10300: switch to generic fork/vfork/clone h8300: switch to generic fork/vfork/clone tile: switch to generic clone() ... Conflicts: arch/microblaze/include/asm/Kbuild	2012-12-12 12:22:13 -08:00
Linus Torvalds	1ebaf4f4e6	Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 timer update from Ingo Molnar: "This tree includes HPET fixes and also implements a calibration-free, TSC match driven APIC timer interrupt mode: 'TSC deadline mode' supported in SandyBridge and later CPUs." * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: hpet: Fix inverted return value check in arch_setup_hpet_msi() x86: hpet: Fix masking of MSI interrupts x86: apic: Use tsc deadline for oneshot when available	2012-12-11 20:01:33 -08:00
Linus Torvalds	743aa456c1	Merge branch 'x86-nuke386-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull "Nuke 386-DX/SX support" from Ingo Molnar: "This tree removes ancient-386-CPUs support and thus zaps quite a bit of complexity: 24 files changed, 56 insertions(+), 425 deletions(-) ... which complexity has plagued us with extra work whenever we wanted to change SMP primitives, for years. Unfortunately there's a nostalgic cost: your old original 386 DX33 system from early 1991 won't be able to boot modern Linux kernels anymore. Sniff." I'm not sentimental. Good riddance. * 'x86-nuke386-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, 386 removal: Document Nx586 as a 386 and thus unsupported x86, cleanups: Simplify sync_core() in the case of no CPUID x86, 386 removal: Remove CONFIG_X86_POPAD_OK x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK x86, 386 removal: Remove CONFIG_INVLPG x86, 386 removal: Remove CONFIG_BSWAP x86, 386 removal: Remove CONFIG_XADD x86, 386 removal: Remove CONFIG_CMPXCHG x86, 386 removal: Remove CONFIG_M386 from Kconfig	2012-12-11 19:59:32 -08:00
Linus Torvalds	a05a4e24dc	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 topology discovery improvements from Ingo Molnar: "These changes improve topology discovery on AMD CPUs. Right now this feeds information displayed in /sys/devices/system/cpu/cpuX/cache/indexY/* - but in the future we could use this to set up a better scheduling topology." * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cacheinfo: Base cache sharing info on CPUID 0x8000001d on AMD x86, cacheinfo: Make use of CPUID 0x8000001d for cache information on AMD x86, cacheinfo: Determine number of cache leafs using CPUID 0x8000001d on AMD x86: Add cpu_has_topoext	2012-12-11 19:58:29 -08:00
Linus Torvalds	e9a5a91971	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Small cleanups." * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Fix the error of using "const" in gen-insn-attr-x86.awk x86, apic: Cleanup cfg->domain setup for legacy interrupts x86: Remove dead hlt_use_halt code	2012-12-11 19:57:56 -08:00
Linus Torvalds	74b8423345	Merge branch 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 BSP hotplug changes from Ingo Molnar: "This tree enables CPU#0 (the boot processor) to be onlined/offlined on x86, just like any other CPU. Enabled on Intel CPUs for now. Allowing this required the identification and fixing of latent CPU#0 assumptions (such as CPU#0 initializations, etc.) in the x86 architecture code, plus the identification of barriers to BSP-offlining, such as active PIC interrupts which can only be serviced on the BSP. It's behind a default-off option, and there's a debug option that allows the automatic testing of this feature. The motivation of this feature is to allow and prepare for true CPU-hotplug hardware support: recent changes to MCE support enable us to detect a deteriorating but not yet hard-failing L1/L2 cache on a CPU that could be soft-unplugged - or a failing L3 cache on a multi-socket system. Note that true hardware hot-plug is not yet fully enabled by this, because that requires a special platform wakeup sequence to be sent to the freshly powered up CPU#0. Future patches for this are planned, once such a platform exists. Chicken and egg" * 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, topology: Debug CPU0 hotplug x86/i387.c: Initialize thread xstate only on CPU0 only once x86, hotplug: Handle retrigger irq by the first available CPU x86, hotplug: The first online processor saves the MTRR state x86, hotplug: During CPU0 online, enable x2apic, set_numa_node. x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI x86-32, hotplug: Add start_cpu0() entry point to head_32.S x86-64, hotplug: Add start_cpu0() entry point to head_64.S kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback x86, hotplug, suspend: Online CPU0 for suspend or hibernate x86, hotplug: Support functions for CPU0 online/offline x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it x86, Kconfig: Add config switch for CPU0 hotplug doc: Add x86 CPU0 online/offline feature	2012-12-11 19:56:33 -08:00
Linus Torvalds	5074474737	Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 boot changes from Ingo Molnar: "Two small changes: a cleanup and allow CONFIG_X86_MPPARSE to be turned off on SFI as well." * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arch/x86/Kconfig: Allow turning off CONFIG_X86_MPPARSE when either ACPI or SFI is present x86/boot/doc: Fix grammar and typo in boot.txt	2012-12-11 19:55:58 -08:00
Linus Torvalds	0019fab355	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm changes from Ingo Molnar: "Two fixlets and a cleanup." * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86_32: Return actual stack when requesting sp from regs x86: Don't clobber top of pt_regs in nested NMI x86/asm: Clean up copy_page_*() comments and code	2012-12-11 19:55:20 -08:00
Linus Torvalds	090f8ccba3	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Lots of activity: 211 files changed, 8328 insertions(+), 4116 deletions(-) most of it on the tooling side. Main changes: * ftrace enhancements and fixes from Steve Rostedt. * uprobes fixes, cleanups and preparation for the ARM port from Oleg Nesterov. * UAPI fixes, from David Howels - prepares the arch/x86 UAPI transition * Separate perf tests into multiple objects, one per test, from Jiri Olsa. * Make hardware event translations available in sysfs, from Jiri Olsa. * Fixes to /proc/pid/maps parsing, preparatory to supporting data maps, from Namhyung Kim * Implement ui_progress for GTK, from Namhyung Kim * Add framework for automated perf_event_attr tests, where tools with different command line options will be run from a 'perf test', via python glue, and the perf syscall will be intercepted to verify that the perf_event_attr fields set by the tool are those expected, from Jiri Olsa * Add a 'link' method for hists, so that we can have the leader with buckets for all the entries in all the hists. This new method is now used in the default 'diff' output, making the sum of the 'baseline' column be 100%, eliminating blind spots. * libtraceevent fixes for compiler warnings trying to make perf it build on some distros, like fedora 14, 32-bit, some of the warnings really pointed to real bugs. * Add a browser for 'perf script' and make it available from the report and annotate browsers. It does filtering to find the scripts that handle events found in the perf.data file used. From Feng Tang * perf inject changes to allow showing where a task sleeps, from Andrew Vagin. * Makefile improvements from Namhyung Kim. * Add --pre and --post command hooks in 'stat', from Peter Zijlstra. * Don't stop synthesizing threads when one vanishes, this is for the existing threads when we start a tool like trace. * Use sched:sched_stat_runtime to provide a thread summary, this produces the same output as the 'trace summary' subcommand of tglx's original "trace" tool. * Support interrupted syscalls in 'trace' * Add an event duration column and filter in 'trace'. * There are references to the man pages in some tools, so try to build Documentation when installing, warning the user if that is not possible, from Borislav Petkov. * Give user better message if precise is not supported, from David Ahern. * Try to find cross-built objdump path by using the session environment information in the perf.data file header, from Irina Tirdea, original patch and idea by Namhyung Kim. * Diplays more output on features check for make V=1, so that one can figure out what is happening by looking at gcc output, etc. From Jiri Olsa. * Add on_exit implementation for systems without one, e.g. Android, from Bernhard Rosenkraenzer. * Only process events for vcpus of interest, helps handling large number of events, from David Ahern. * Cross compilation fixes for Android, from Irina Tirdea. * Add documentation on compiling for Android, from Irina Tirdea. * perf diff improvements from Jiri Olsa. * Target (task/user/cpu/syswide) handling improvements, from Namhyung Kim. * Add support in 'trace' for tracing workload given by command line, from Namhyung Kim. * ... and much more." * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (194 commits) uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race perf evsel: Introduce is_group_member method perf powerpc: Use uapi/unistd.h to fix build error tools: Pass the target in descend tools: Honour the O= flag when tool build called from a higher Makefile tools: Define a Makefile function to do subdir processing perf ui: Always compile browser setup code perf ui: Add ui_progress__finish() perf ui gtk: Implement ui_progress functions perf ui: Introduce generic ui_progress helper perf ui tui: Move progress.c under ui/tui directory perf tools: Add basic event modifier sanity check perf tools: Omit group members from perf_evlist__disable/enable perf tools: Ensure single disable call per event in record comand perf tools: Fix 'disabled' attribute config for record command perf tools: Fix attributes for '{}' defined event groups perf tools: Use sscanf for parsing /proc/pid/maps perf tools: Add gtk.<command> config option for launching GTK browser perf tools: Fix compile error on NO_NEWT=1 build perf hists: Initialize all of he->stat with zeroes ...	2012-12-11 18:14:31 -08:00
Linus Torvalds	37ea95a959	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU update from Ingo Molnar: "The major features of this tree are: 1. A first version of no-callbacks CPUs. This version prohibits offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y. Relaxing this constraint is in progress, but not yet ready for prime time. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/724. 2. Changes to SRCU that allows statically initialized srcu_struct structures. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/296. 3. Restructuring of RCU's debugfs output. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/341. 4. Additional CPU-hotplug/RCU improvements, posted to LKML at https://lkml.org/lkml/2012/10/30/327. Note that the commit eliminating __stop_machine() was judged to be too-high of risk, so is deferred to 3.9. 5. Changes to RCU's idle interface, most notably a new module parameter that redirects normal grace-period operations to their expedited equivalents. These were posted to LKML at https://lkml.org/lkml/2012/10/30/739. 6. Additional diagnostics for RCU's CPU stall warning facility, posted to LKML at https://lkml.org/lkml/2012/10/30/315. The most notable change reduces the default RCU CPU stall-warning time from 60 seconds to 21 seconds, so that it once again happens sooner than the softlockup timeout. 7. Documentation updates, which were posted to LKML at https://lkml.org/lkml/2012/10/30/280. A couple of late-breaking changes were posted at https://lkml.org/lkml/2012/11/16/634 and https://lkml.org/lkml/2012/11/16/547. 8. Miscellaneous fixes, which were posted to LKML at https://lkml.org/lkml/2012/10/30/309. 9. Finally, a fix for an lockdep-RCU splat was posted to LKML at https://lkml.org/lkml/2012/11/7/486." * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits) context_tracking: New context tracking susbsystem sched: Mark RCU reader in sched_show_task() rcu: Separate accounting of callbacks from callback-free CPUs rcu: Add callback-free CPUs rcu: Add documentation for the new rcuexp debugfs trace file rcu: Update documentation for TREE_RCU debugfs tracing rcu: Reduce default RCU CPU stall warning timeout rcu: Fix TINY_RCU rcu_is_cpu_rrupt_from_idle check rcu: Clarify memory-ordering properties of grace-period primitives rcu: Add new rcutorture module parameters to start/end test messages rcu: Remove list_for_each_continue_rcu() rcu: Fix batch-limit size problem rcu: Add tracing for synchronize_sched_expedited() rcu: Remove old debugfs interfaces and also RCU flavor name rcu: split 'rcuhier' to each flavor rcu: split 'rcugp' to each flavor rcu: split 'rcuboost' to each flavor rcu: split 'rcubarrier' to each flavor rcu: Fix tracing formatting rcu: Remove the interface "rcudata.csv" ...	2012-12-11 18:10:49 -08:00
Linus Torvalds	608ff1a210	Merge branch 'akpm' (Andrew's patchbomb) Merge misc updates from Andrew Morton: "About half of most of MM. Going very early this time due to uncertainty over the coreautounifiednumasched things. I'll send the other half of most of MM tomorrow. The rest of MM awaits a slab merge from Pekka." * emailed patches from Andrew Morton: (71 commits) memory_hotplug: ensure every online node has NORMAL memory memory_hotplug: handle empty zone when online_movable/online_kernel mm, memory-hotplug: dynamic configure movable memory and portion memory drivers/base/node.c: cleanup node_state_attr[] bootmem: fix wrong call parameter for free_bootmem() avr32, kconfig: remove HAVE_ARCH_BOOTMEM mm: cma: remove watermark hacks mm: cma: skip watermarks check for already isolated blocks in split_free_page() mm, oom: fix race when specifying a thread as the oom origin mm, oom: change type of oom_score_adj to short mm: cleanup register_node() mm, mempolicy: remove duplicate code mm/vmscan.c: try_to_freeze() returns boolean mm: introduce putback_movable_pages() virtio_balloon: introduce migration primitives to balloon pages mm: introduce compaction and migration for ballooned pages mm: introduce a common interface for balloon pages mobility mm: redefine address_space.assoc_mapping mm: adjust address_space_operations.migratepage() return code arch/sparc/kernel/sys_sparc_64.c: s/COLOUR/COLOR/ ...	2012-12-11 18:05:37 -08:00
Michel Lespinasse	cdc1734495	mm: use vm_unmapped_area() in hugetlbfs on i386 architecture Update the i386 hugetlb_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. [akpm@linux-foundation.org: fix build] Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-11 17:22:25 -08:00
Michel Lespinasse	7d02505965	mm: fix cache coloring on x86_64 architecture Fix the x86-64 cache alignment code to take pgoff into account. Use the x86 and MIPS cache alignment code as the basis for a generic cache alignment function. The old x86 code will always align the mmap to aliasing boundaries, even if the program mmaps the file with a non-zero pgoff. If program A mmaps the file with pgoff 0, and program B mmaps the file with pgoff 1. The old code would align the mmaps, resulting in misaligned pages: A: 0123 B: 123 After this patch, they are aligned so the pages line up: A: 0123 B: 123 Proposed by Rik van Riel. Signed-off-by: Michel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-11 17:22:25 -08:00
Michel Lespinasse	f99024729e	mm: use vm_unmapped_area() on x86_64 architecture Update the x86_64 arch_get_unmapped_area[_topdown] functions to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-11 17:22:25 -08:00
Andi Kleen	42d7395feb	mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB There was some desire in large applications using MAP_HUGETLB or SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on others. This is useful together with NUMA policy: use 2MB interleaving on some mappings, but 1GB on local mappings. This patch extends the IPC/SHM syscall interfaces slightly to allow specifying the page size. It borrows some upper bits in the existing flag arguments and allows encoding the log of the desired page size in addition to the *_HUGETLB flag. When 0 is specified the default size is used, this makes the change fully compatible. Extending the internal hugetlb code to handle this is straight forward. Instead of a single mount it just keeps an array of them and selects the right mount based on the specified page size. When no page size is specified it uses the mount of the default page size. The change is not visible in /proc/mounts because internal mounts don't appear there. It also has very little overhead: the additional mounts just consume a super block, but not more memory when not used. I also exported the new flags to the user headers (they were previously under __KERNEL__). Right now only symbols for x86 and some other architecture for 1GB and 2MB are defined. The interface should already work for all other architectures though. Only architectures that define multiple hugetlb sizes actually need it (that is currently x86, tile, powerpc). However tile and powerpc have user configurable hugetlb sizes, so it's not easy to add defines. A program on those architectures would need to query sysfs and use the appropiate log2. [akpm@linux-foundation.org: cleanups] [rientjes@google.com: fix build] [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-11 17:22:25 -08:00
Gleb Natapov	58b7825bc3	KVM: emulator: fix real mode segment checks in address linearization In real mode CS register is writable, so do not #GP on write. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-11 21:00:28 -02:00
Gleb Natapov	0b26b588d9	VMX: remove unneeded enable_unrestricted_guest check If enable_unrestricted_guest is true vmx->rmode.vm86_active will always be false. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-11 21:00:28 -02:00
Gleb Natapov	a4d3326c2d	KVM: VMX: fix DPL during entry to protected mode On CPUs without support for unrestricted guests DPL cannot be smaller than RPL for data segments during guest entry, but this state can occurs if a data segment selector changes while vcpu is in real mode to a value with lowest two bits != 00. Fix that by forcing DPL == RPL on transition to protected mode. This is a regression introduced by `c865c43de6`. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-11 21:00:27 -02:00
Linus Torvalds	c6bd5bcc49	TTY/Serial merge for 3.8-rc1 Here's the big tty/serial tree set of changes for 3.8-rc1. Contained in here is a bunch more reworks of the tty port layer from Jiri and bugfixes from Alan, along with a number of other tty and serial driver updates by the various driver authors. Also, Jiri has been coerced^Wconvinced to be the co-maintainer of the TTY layer, which is much appreciated by me. All of these have been in the linux-next tree for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlDHhgwACgkQMUfUDdst+ynI6wCcC+YeBwncnoWHvwLAJOwAZpUL bysAn28o780/lOsTzp3P1Qcjvo69nldo =hN/g -----END PGP SIGNATURE----- Merge tag 'tty-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull TTY/Serial merge from Greg Kroah-Hartman: "Here's the big tty/serial tree set of changes for 3.8-rc1. Contained in here is a bunch more reworks of the tty port layer from Jiri and bugfixes from Alan, along with a number of other tty and serial driver updates by the various driver authors. Also, Jiri has been coerced^Wconvinced to be the co-maintainer of the TTY layer, which is much appreciated by me. All of these have been in the linux-next tree for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" Fixed up some trivial conflicts in the staging tree, due to the fwserial driver having come in both ways (but fixed up a bit in the serial tree), and the ioctl handling in the dgrp driver having been done slightly differently (staging tree got that one right, and removed both TIOCGSOFTCAR and TIOCSSOFTCAR). * tag 'tty-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (146 commits) staging: sb105x: fix potential NULL pointer dereference in mp_chars_in_buffer() staging/fwserial: Remove superfluous free staging/fwserial: Use WARN_ONCE when port table is corrupted staging/fwserial: Destruct embedded tty_port on teardown staging/fwserial: Fix build breakage when !CONFIG_BUG staging: fwserial: Add TTY-over-Firewire serial driver drivers/tty/serial/serial_core.c: clean up HIGH_BITS_OFFSET usage staging: dgrp: dgrp_tty.c: Audit the return values of get/put_user() staging: dgrp: dgrp_tty.c: Remove the TIOCSSOFTCAR ioctl handler from dgrp driver serial: ifx6x60: Add modem power off function in the platform reboot process serial: mxs-auart: unmap the scatter list before we copy the data serial: mxs-auart: disable the Receive Timeout Interrupt when DMA is enabled serial: max310x: Setup missing "can_sleep" field for GPIO tty/serial: fix ifx6x60.c declaration warning serial: samsung: add devicetree properties for non-Exynos SoCs serial: samsung: fix potential soft lockup during uart write tty: vt: Remove redundant null check before kfree. tty/8250 Add check for pci_ioremap_bar failure tty/8250 Add support for Commtech's Fastcom Async-335 and Fastcom Async-PCIe cards tty/8250 Add XR17D15x devices to the exar_handle_irq override ...	2012-12-11 14:08:47 -08:00
Zhang Yanfei	0ca0d818cb	x86/kexec: crash_vmclear_local_vmcss needs __rcu This removes the sparse warning: arch/x86/kernel/crash.c:49:32: sparse: incompatible types in comparison expression (different address spaces) Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-12-11 19:55:23 -02:00
Linus Torvalds	bad73c5aa0	ACPI and power management updates for 3.8-rc1 * Introduction of device PM QoS flags. * ACPI device power management update allowing subsystems other than PCI to use it more easily. * ACPI device enumeration rework allowing additional kinds of devices to be enumerated via ACPI. From Mika Westerberg, Adrian Hunter, Mathias Nyman, Andy Shevchenko, and Rafael J. Wysocki. * ACPICA update to version 20121018 from Bob Moore and Lv Zheng. * ACPI memory hotplug update from Wen Congyang and Yasuaki Ishimatsu. * Introduction of acpi_handle_<level>() messaging macros and ACPI-based CPU hot-remove support from Toshi Kani. * ACPI EC updates from Feng Tang. * cpufreq updates from Viresh Kumar, Fabio Baltieri and others. * cpuidle changes to quickly notice governor prediction failure from Youquan Song. * Support for using multiple cpuidle drivers at the same time and cpuidle cleanups from Daniel Lezcano. * devfreq updates from Nishanth Menon and others. * cpupower update from Thomas Renninger. * Fixes and small cleanups all over the place. -- -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJQxxevAAoJEKhOf7ml8uNsHacQAK2xoQozDddPBAaTCf1OWW/G J4E2qMn+gy4AtFmQ0/xeZVvvylOCn9eu/kv3QJ/bFlVoUsqTgfPwQBjT6HjF1Acn 7yVFdr9H/tn2wi9nW2Gv6Tl2Hr/kFLpo0f5Kd40Z+P8SV5sKX5ktJcVpJ/I/P4Vh Qw2nWtj7hQktZDERzgG4ZpMbQToNhbLhbKWB9ad3/XQSSA9JkfgvBFgrTEGHcZD5 bwsggjNdOAWNGeDdzRsQSDn0Alld5ILVdSJ5xKimO1O70WvKc7fqA1IdYRIeKL90 yz4bcoYKzl9iktlw8+x5o1U9mrc8TFV5p4+zV+t5Z6pzS/J3kWvnsW4zu9sCrxFv Wic3SKyiem7s2dxrYyj4ZXAci3GK4ouRTrCLqk7/00tEGdwAQD1ZNfsUJp6jKayz FvtZUgItcOyrlQ6B4nh951OY6dI3AUYJ2NuWWNr5NZkgVAvQGV8zTGOImbeVeL2+ pMiw14zScO3ylYilVcjTKDDMj2sDZ68mw5PIcbmksvWsCLo26jDBVDtLVmtYWyd4 ek3WnOrQZr0R3agvOLLssMKXompvpP+N4Klf4rV+GtqGsWtHryYKys2Laju9FwFj yYLchxYlxhGTzqq8LjF90HDL0TWpPe6cPi+B5ow9g/SXLexbMKNQGhv3Jovm2yR3 j54tKBWy7e9AAYEDPirX =6OEP -----END PGP SIGNATURE----- Merge tag 'pm+acpi-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: - Introduction of device PM QoS flags. - ACPI device power management update allowing subsystems other than PCI to use it more easily. - ACPI device enumeration rework allowing additional kinds of devices to be enumerated via ACPI. From Mika Westerberg, Adrian Hunter, Mathias Nyman, Andy Shevchenko, and Rafael J. Wysocki. - ACPICA update to version 20121018 from Bob Moore and Lv Zheng. - ACPI memory hotplug update from Wen Congyang and Yasuaki Ishimatsu. - Introduction of acpi_handle_<level>() messaging macros and ACPI-based CPU hot-remove support from Toshi Kani. - ACPI EC updates from Feng Tang. - cpufreq updates from Viresh Kumar, Fabio Baltieri and others. - cpuidle changes to quickly notice governor prediction failure from Youquan Song. - Support for using multiple cpuidle drivers at the same time and cpuidle cleanups from Daniel Lezcano. - devfreq updates from Nishanth Menon and others. - cpupower update from Thomas Renninger. - Fixes and small cleanups all over the place. * tag 'pm+acpi-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (196 commits) mmc: sdhci-acpi: enable runtime-pm for device HID INT33C6 ACPI: add Haswell LPSS devices to acpi_platform_device_ids list ACPI: add documentation about ACPI 5 enumeration pnpacpi: fix incorrect TEST_ALPHA() test ACPI / PM: Fix header of acpi_dev_pm_detach() in acpi.h ACPI / video: ignore BIOS initial backlight value for HP Folio 13-2000 ACPI : do not use Lid and Sleep button for S5 wakeup ACPI / PNP: Do not crash due to stale pointer use during system resume ACPI / video: Add "Asus UL30VT" to ACPI video detect blacklist ACPI: do acpisleep dmi check when CONFIG_ACPI_SLEEP is set spi / ACPI: add ACPI enumeration support gpio / ACPI: add ACPI support PM / devfreq: remove compiler error with module governors (2) cpupower: IvyBridge (0x3a and 0x3e models) support cpupower: Provide -c param for cpupower monitor to schedule process on all cores cpupower tools: Fix warning and a bug with the cpu package count cpupower tools: Fix malloc of cpu_info structure cpupower tools: Fix issues with sysfs_topology_read_file cpupower tools: Fix minor warnings cpupower tools: Update .gitignore for files created in the debug directories ...	2012-12-11 12:45:35 -08:00
Peter Zijlstra	cbee9f88ec	mm: numa: Add fault driven placement and migration NOTE: This patch is based on "sched, numa, mm: Add fault driven placement and migration policy" but as it throws away all the policy to just leave a basic foundation I had to drop the signed-offs-by. This patch creates a bare-bones method for setting PTEs pte_numa in the context of the scheduler that when faulted later will be faulted onto the node the CPU is running on. In itself this does nothing useful but any placement policy will fundamentally depend on receiving hints on placement from fault context and doing something intelligent about it. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com>	2012-12-11 14:42:45 +00:00
Andrea Arcangeli	be3a728427	mm: numa: pte_numa() and pmd_numa() Implement pte_numa and pmd_numa. We must atomically set the numa bit and clear the present bit to define a pte_numa or pmd_numa. Once a pte or pmd has been set as pte_numa or pmd_numa, the next time a thread touches a virtual address in the corresponding virtual range, a NUMA hinting page fault will trigger. The NUMA hinting page fault will clear the NUMA bit and set the present bit again to resolve the page fault. The expectation is that a NUMA hinting page fault is used as part of a placement policy that decides if a page should remain on the current node or migrated to a different node. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:36 +00:00
Andrea Arcangeli	dbe4d2035a	mm: numa: define _PAGE_NUMA The objective of _PAGE_NUMA is to be able to trigger NUMA hinting page faults to identify the per NUMA node working set of the thread at runtime. Arming the NUMA hinting page fault mechanism works similarly to setting up a mprotect(PROT_NONE) virtual range: the present bit is cleared at the same time that _PAGE_NUMA is set, so when the fault triggers we can identify it as a NUMA hinting page fault. _PAGE_NUMA on x86 shares the same bit number of _PAGE_PROTNONE (but it could also use a different bitflag, it's up to the architecture to decide). It would be confusing to call the "NUMA hinting page faults" as "do_prot_none faults". They're different events and _PAGE_NUMA doesn't alter the semantics of mprotect(PROT_NONE) in any way. Sharing the same bitflag with _PAGE_PROTNONE in fact complicates things: it requires us to ensure the code paths executed by _PAGE_PROTNONE remains mutually exclusive to the code paths executed by _PAGE_NUMA at all times, to avoid _PAGE_NUMA and _PAGE_PROTNONE to step into each other toes. Because we want to be able to set this bitflag in any established pte or pmd (while clearing the present bit at the same time) without losing information, this bitflag must never be set when the pte and pmd are present, so the bitflag picked for _PAGE_NUMA usage, must not be used by the swap entry format. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com>	2012-12-11 14:28:35 +00:00
Rik van Riel	2c3cf556b2	x86/mm: Introduce pte_accessible() We need pte_present to return true for _PAGE_PROTNONE pages, to indicate that the pte is associated with a page. However, for TLB flushing purposes, we would like to know whether the pte points to an actually accessible page. This allows us to skip remote TLB flushes for pages that are not actually accessible. Fill in this method for x86 and provide a safe (but slower) method on other architectures. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixed-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-66p11te4uj23gevgh4j987ip@git.kernel.org [ Added Linus's review fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-12-11 14:28:34 +00:00
Rik van Riel	e4a1cc56e4	x86: mm: drop TLB flush from ptep_set_access_flags Intel has an architectural guarantee that the TLB entry causing a page fault gets invalidated automatically. This means we should be able to drop the local TLB invalidation. Because of the way other areas of the page fault code work, chances are good that all x86 CPUs do this. However, if someone somewhere has an x86 CPU that does not invalidate the TLB entry causing a page fault, this one-liner should be easy to revert. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com>	2012-12-11 14:28:33 +00:00
Rik van Riel	0f9a921cf9	x86: mm: only do a local tlb flush in ptep_set_access_flags() The function ptep_set_access_flags() is only ever invoked to set access flags or add write permission on a PTE. The write bit is only ever set together with the dirty bit. Because we only ever upgrade a PTE, it is safe to skip flushing entries on remote TLBs. The worst that can happen is a spurious page fault on other CPUs, which would flush that TLB entry. Lazily letting another CPU incur a spurious page fault occasionally is (much!) cheaper than aggressively flushing everybody else's TLB. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michel Lespinasse <walken@google.com> Cc: Ingo Molnar <mingo@kernel.org>	2012-12-11 14:28:33 +00:00
Bjorn Helgaas	1cb73f8c47	Merge branch 'pci/mjg-pci-roms-from-efi' into next * pci/mjg-pci-roms-from-efi: PCI: Use phys_addr_t for physical ROM address	2012-12-10 16:20:12 -07:00
Cong Ding	28a7938922	x86: Fix the error of using "const" in gen-insn-attr-x86.awk The original version code causes following sparse warnings: arch/x86/lib/inat-tables.c:1080:25: warning: duplicate const arch/x86/lib/inat-tables.c:1095:25: warning: duplicate const arch/x86/lib/inat-tables.c:1118:25: warning: duplicate const for the variables inat_escape_tables, inat_group_tables, and inat_avx_tables in the code generated by gen-insn-attr-x86.awk. The author Masami Hiramutsu says here is to make both the value pointed by the pointers and the pointers itself read-only, so we move the "const" to be after the "*". Signed-off-by: Cong Ding <dinggnu@gmail.com> Link: http://lkml.kernel.org/r/20121209082103.GA9181@gmail.com Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-12-10 10:31:24 -08:00
Bjorn Helgaas	dbd3fc3345	PCI: Use phys_addr_t for physical ROM address Use phys_addr_t rather than "void *" for physical memory address. This removes casts and fixes a "cast from pointer to integer of different size" warning on ppc44x_defconfig. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2012-12-10 11:24:42 -07:00
Ingo Molnar	cc1b39dbf9	Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core Pull ftrace updates from Steve Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-12-08 15:54:35 +01:00
Ingo Molnar	7e0dd574cd	Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core Pull uprobes fixes, cleanups and preparation for the ARM port from Oleg Nesterov. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-12-08 15:51:10 +01:00
Ingo Molnar	f0b9abfb04	Merge branch 'linus' into perf/core Conflicts: tools/perf/Makefile tools/perf/builtin-test.c tools/perf/perf.h tools/perf/tests/parse-events.c tools/perf/util/evsel.h Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-12-08 15:25:06 +01:00
Bjorn Helgaas	3ced69f8bb	Merge branch 'pci/daniel-numachip' into next * pci/daniel-numachip: x86/PCI: Add NumaChip remote PCI support	2012-12-07 14:27:33 -07:00
Daniel J Blueman	f9726bfd4b	x86/PCI: Add NumaChip remote PCI support Add NumaChip-specific PCI access mechanism via MMCONFIG cycles, but preventing access to AMD Northbridges which shouldn't respond. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2012-12-07 14:24:32 -07:00
Olaf Hering	4319eb4fc7	x86 mtrr: fix comment typo in mtrr_bp_init Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-12-07 11:19:19 +01:00
Bjorn Helgaas	72e1e868ca	Merge branch 'pci/mjg-pci-roms-from-efi' into next * pci/mjg-pci-roms-from-efi: x86: Use PCI setup data PCI: Add support for non-BAR ROMs PCI: Add pcibios_add_device EFI: Stash ROMs if they're not in the PCI BAR	2012-12-06 14:37:32 -07:00
Zhang Yanfei	8f536b7697	KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump The vmclear function will be assigned to the callback function pointer when loading kvm-intel module. And the bitmap indicates whether we should do VMCLEAR operation in kdump. The bits in the bitmap are set/unset according to different conditions. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-06 18:26:57 +02:00
Zhang Yanfei	f23d1f4a11	x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary This patch provides a way to VMCLEAR VMCSs related to guests on all cpus before executing the VMXOFF when doing kdump. This is used to ensure the VMCSs in the vmcore updated and non-corrupted. Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-06 18:25:36 +02:00
Nadia Yvette Chambers	6d49e352ae	propagate name change to comments in kernel source I've legally changed my name with New York State, the US Social Security Administration, et al. This patch propagates the name change and change in initials and login to comments in the kernel source as well. Signed-off-by: Nadia Yvette Chambers <nyc@holomorphy.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-12-06 10:39:54 +01:00
Jussi Kivilinna	044ab52578	crypto: cast5/cast6 - move lookup tables to shared module CAST5 and CAST6 both use same lookup tables, which can be moved shared module 'cast_common'. Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2012-12-06 17:16:26 +08:00
Xiao Guangrong	c219346325	KVM: MMU: optimize for set_spte There are two cases we need to adjust page size in set_spte: 1): the one is other vcpu creates new sp in the window between mapping_level() and acquiring mmu-lock. 2): the another case is the new sp is created by itself (page-fault path) when guest uses the target gfn as its page table. In current code, set_spte drop the spte and emulate the access for these case, it works not good: - for the case 1, it may destroy the mapping established by other vcpu, and do expensive instruction emulation. - for the case 2, it may emulate the access even if the guest is accessing the page which not used as page table. There is a example, 0~2M is used as huge page in guest, in this huge page, only page 3 used as page table, then guest read/writes on other pages can cause instruction emulation. Both of these cases can be fixed by allowing guest to retry the access, it will refault, then we can establish the mapping by using small page Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-06 09:11:25 +02:00
Matthew Garrett	f9a37be0f0	x86: Use PCI setup data EFI can provide PCI ROMs out of band via boot services, which may not be available after boot. Add support for using the data handed off to us by the boot stub or bootloader. [bhelgaas: added Seth's boot_params section mismatch fix] [bhelgaas: drop "boot_params.hdr.version < 0x0209" test] Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Seth Forshee <seth.forshee@canonical.com>	2012-12-05 14:38:26 -07:00
Matthew Garrett	dd5fc854de	EFI: Stash ROMs if they're not in the PCI BAR EFI provides support for providing PCI ROMs via means other than the ROM BAR. This support vanishes after we've exited boot services, so add support for stashing copies of the ROMs in setup_data if they're not otherwise available. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Seth Forshee <seth.forshee@canonical.com>	2012-12-05 14:33:26 -07:00
Julian Stecklina	66f7b72e11	KVM: x86: Make register state after reset conform to specification VMX behaves now as SVM wrt to FPU initialization. Code has been moved to generic code path. General-purpose registers are now cleared on reset and INIT. SVM code properly initializes EDX. Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-05 18:00:07 +02:00
Zhang Xiantao	2b3c5cbc0d	kvm: don't use bit24 for detecting address-specific invalidation capability Bit24 in VMX_EPT_VPID_CAP_MASI is not used for address-specific invalidation capability reporting, so remove it from KVM to avoid conflicts in future. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-05 16:35:48 +02:00
Zhang Xiantao	0307b7b8c2	kvm: remove unnecessary bit checking for ept violation Bit 6 in EPT vmexit's exit qualification is not defined in SDM, so remove it. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-05 16:35:21 +02:00
Ingo Molnar	630e1e0bcd	Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Conflicts: arch/x86/kernel/ptrace.c Pull the latest RCU tree from Paul E. McKenney: " The major features of this series are: 1. A first version of no-callbacks CPUs. This version prohibits offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y. Relaxing this constraint is in progress, but not yet ready for prime time. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/724, and are at branch rcu/nocb. 2. Changes to SRCU that allows statically initialized srcu_struct structures. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/296, and are at branch rcu/srcu. 3. Restructuring of RCU's debugfs output. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/341, and are at branch rcu/tracing. 4. Additional CPU-hotplug/RCU improvements, posted to LKML at https://lkml.org/lkml/2012/10/30/327, and are at branch rcu/hotplug. Note that the commit eliminating __stop_machine() was judged to be too-high of risk, so is deferred to 3.9. 5. Changes to RCU's idle interface, most notably a new module parameter that redirects normal grace-period operations to their expedited equivalents. These were posted to LKML at https://lkml.org/lkml/2012/10/30/739, and are at branch rcu/idle. 6. Additional diagnostics for RCU's CPU stall warning facility, posted to LKML at https://lkml.org/lkml/2012/10/30/315, and are at branch rcu/stall. The most notable change reduces the default RCU CPU stall-warning time from 60 seconds to 21 seconds, so that it once again happens sooner than the softlockup timeout. 7. Documentation updates, which were posted to LKML at https://lkml.org/lkml/2012/10/30/280, and are at branch rcu/doc. A couple of late-breaking changes were posted at https://lkml.org/lkml/2012/11/16/634 and https://lkml.org/lkml/2012/11/16/547. 8. Miscellaneous fixes, which were posted to LKML at https://lkml.org/lkml/2012/10/30/309, along with a late-breaking change posted at Fri, 16 Nov 2012 11:26:25 -0800 with message-ID <20121116192625.GA447@linux.vnet.ibm.com>, but which lkml.org seems to have missed. These are at branch rcu/fixes. 9. Finally, a fix for an lockdep-RCU splat was posted to LKML at https://lkml.org/lkml/2012/11/7/486. This is at rcu/next. " Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-12-03 06:27:05 +01:00
Jan Kiszka	45e3cc7d9f	KVM: x86: Fix uninitialized return code This is a regression caused by `18595411a7`. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>	2012-12-02 17:37:04 +02:00
Linus Torvalds	b3c3a9cf2a	Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU fix from Ingo Molnar: "Fix leaking RCU extended quiescent state, which might trigger warnings and mess up the extended quiescent state tracking logic into thinking that we are in "RCU user mode" while we aren't." * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Fix unrecovered RCU user mode in syscall_trace_leave()	2012-12-01 13:08:36 -08:00
Linus Torvalds	455e987c0c	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This is mostly about unbreaking architectures that took the UAPI changes in the v3.7 cycle, plus misc fixes." * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf kvm: Fix building perf kvm on non x86 arches perf kvm: Rename perf_kvm to perf_kvm_stat perf: Make perf build for x86 with UAPI disintegration applied perf powerpc: Use uapi/unistd.h to fix build error tools: Pass the target in descend tools: Honour the O= flag when tool build called from a higher Makefile tools: Define a Makefile function to do subdir processing x86: Export asm/{svm.h,vmx.h,perf_regs.h} perf tools: Fix strbuf_addf() when the buffer needs to grow perf header: Fix numa topology printing perf, powerpc: Fix hw breakpoints returning -ENOSPC	2012-12-01 13:07:48 -08:00
Konrad Rzeszutek Wilk	6a7ed40511	Merge branch 'arm-privcmd-for-3.8' of git://xenbits.xen.org/people/ianc/linux into stable/for-linus-3.8 * 'arm-privcmd-for-3.8' of git://xenbits.xen.org/people/ianc/linux: xen: arm: implement remap interfaces needed for privcmd mappings. xen: correctly use xen_pfn_t in remap_domain_mfn_range. xen: arm: enable balloon driver xen: balloon: allow PVMMU interfaces to be compiled out xen: privcmd: support autotranslated physmap guests. xen: add pages parameter to xen_remap_domain_mfn_range Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-11-30 17:07:59 -05:00
Olaf Hering	a7be94ac8d	xen/PVonHVM: fix compile warning in init_hvm_pv_info After merging the xen-two tree, today's linux-next build (x86_64 allmodconfig) produced this warning: arch/x86/xen/enlighten.c: In function 'init_hvm_pv_info': arch/x86/xen/enlighten.c:1617:16: warning: unused variable 'ebx' [-Wunused-variable] arch/x86/xen/enlighten.c:1617:11: warning: unused variable 'eax' [-Wunused-variable] Introduced by commit `9d02b43dee` ("xen PVonHVM: use E820_Reserved area for shared_info"). Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-11-30 17:02:55 -05:00
Vincent Palatin	644c154186	x86, fpu: Avoid FPU lazy restore after suspend When a cpu enters S3 state, the FPU state is lost. After resuming for S3, if we try to lazy restore the FPU for a process running on the same CPU, this will result in a corrupted FPU context. Ensure that "fpu_owner_task" is properly invalided when (re-)initializing a CPU, so nobody will try to lazy restore a state which doesn't exist in the hardware. Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off, by doing thousands of suspend/resume cycles with 4 processes doing FPU operations running. Without the patch, a process is killed after a few hundreds cycles by a SIGFPE. Cc: Duncan Laurie <dlaurie@chromium.org> Cc: Olof Johansson <olofj@chromium.org> Cc: <stable@kernel.org> v3.4+ # for 3.4 need to replace this_cpu_write by percpu_write Signed-off-by: Vincent Palatin <vpalatin@chromium.org> Link: http://lkml.kernel.org/r/1354306532-1014-1-git-send-email-vpalatin@chromium.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-30 13:48:05 -08:00
Will Auld	ba904635d4	KVM: x86: Emulate IA32_TSC_ADJUST MSR CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported Basic design is to emulate the MSR by allowing reads and writes to a guest vcpu specific location to store the value of the emulated MSR while adding the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This is of course as long as the "use TSC counter offsetting" VM-execution control is enabled as well as the IA32_TSC_ADJUST control. However, because hardware will only return the TSC + IA32_TSC_ADJUST + vmsc tsc_offset for a guest process when it does and rdtsc (with the correct settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one of these three locations. The argument against storing it in the actual MSR is performance. This is likely to be seldom used while the save/restore is required on every transition. IA32_TSC_ADJUST was created as a way to solve some issues with writing TSC itself so that is not an option either. The remaining option, defined above as our solution has the problem of returning incorrect vmcs tsc_offset values (unless we intercept and fix, not done here) as mentioned above. However, more problematic is that storing the data in vmcs tsc_offset will have a different semantic effect on the system than does using the actual MSR. This is illustrated in the following example: The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest process performs a rdtsc. In this case the guest process will get TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics as seen by the guest do not and hence this will not cause a problem. Signed-off-by: Will Auld <will.auld@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-30 18:29:30 -02:00
Will Auld	8fe8ab46be	KVM: x86: Add code to track call origin for msr assignment In order to track who initiated the call (host or guest) to modify an msr value I have changed function call parameters along the call path. The specific change is to add a struct pointer parameter that points to (index, data, caller) information rather than having this information passed as individual parameters. The initial use for this capability is for updating the IA32_TSC_ADJUST msr while setting the tsc value. It is anticipated that this capability is useful for other tasks. Signed-off-by: Will Auld <will.auld@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-30 18:26:12 -02:00
Frederic Weisbecker	91d1aa43d3	context_tracking: New context tracking susbsystem Create a new subsystem that probes on kernel boundaries to keep track of the transitions between level contexts with two basic initial contexts: user or kernel. This is an abstraction of some RCU code that use such tracking to implement its userspace extended quiescent state. We need to pull this up from RCU into this new level of indirection because this tracking is also going to be used to implement an "on demand" generic virtual cputime accounting. A necessary step to shutdown the tick while still accounting the cputime. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Gilad Ben-Yossef <gilad@benyossef.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> [ paulmck: fix whitespace error and email address. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-11-30 11:40:07 -08:00
Xiao Guangrong	5a560f8b5e	KVM: VMX: fix memory order between loading vmcs and clearing vmcs vmcs->cpu indicates whether it exists on the target cpu, -1 means the vmcs does not exist on any vcpu If vcpu load vmcs with vmcs.cpu = -1, it can be directly added to cpu's percpu list. The list can be corrupted if the cpu prefetch the vmcs's list before reading vmcs->cpu. Meanwhile, we should remove vmcs from the list before making vmcs->vcpu == -1 be visible Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-29 21:14:46 -02:00
H. Peter Anvin	11af32b69e	x86, 386 removal: Document Nx586 as a 386 and thus unsupported Per Alan Cox, Nx586 did not support WP in supervisor mode, making it a 386 by Linux kernel standards. As such, it is too unsupported now. Reported-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Link: http://lkml.kernel.org/r/20121128205203.05868eab@pyramind.ukuu.org.uk Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-29 13:28:39 -08:00
H. Peter Anvin	45c39fb0cc	x86, cleanups: Simplify sync_core() in the case of no CPUID Simplify the implementation of sync_core() for the case where we may not have the CPUID instruction available. [ v2: stylistic cleanup of the #else clause per suggestion by Borislav Petkov. ] Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-9-git-send-email-hpa@linux.intel.com Cc: Borislav Petkov <bp@alien8.de>	2012-11-29 13:25:39 -08:00
H. Peter Anvin	e3228cf454	x86, 386 removal: Remove CONFIG_X86_POPAD_OK The check_popad() routine tested for a 386-specific bug, and never actually did anything useful with it anyway other than print a message. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-8-git-send-email-hpa@linux.intel.com	2012-11-29 13:23:03 -08:00
H. Peter Anvin	a5c2a893db	x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK All 486+ CPUs support WP in supervisor mode, so remove the fallback 386 support code. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-7-git-send-email-hpa@linux.intel.com	2012-11-29 13:23:03 -08:00
H. Peter Anvin	094ab1db7c	x86, 386 removal: Remove CONFIG_INVLPG All 486+ CPUs support INVLPG, so remove the fallback 386 support code. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-6-git-send-email-hpa@linux.intel.com	2012-11-29 13:23:02 -08:00
H. Peter Anvin	e5bb8ad862	x86, 386 removal: Remove CONFIG_BSWAP All 486+ CPUs support BSWAP, so remove the fallback 386 support code. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-5-git-send-email-hpa@linux.intel.com	2012-11-29 13:23:02 -08:00
H. Peter Anvin	7ac468b130	x86, 386 removal: Remove CONFIG_XADD All 486+ CPUs support XADD, so remove the fallback 386 support code. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-4-git-send-email-hpa@linux.intel.com	2012-11-29 13:23:02 -08:00
H. Peter Anvin	d55c5a93db	x86, 386 removal: Remove CONFIG_CMPXCHG All 486+ CPUs support CMPXCHG, so remove the fallback 386 support code. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-3-git-send-email-hpa@linux.intel.com	2012-11-29 13:23:01 -08:00
H. Peter Anvin	eb068e7810	x86, 386 removal: Remove CONFIG_M386 from Kconfig Remove the CONFIG_M386 symbol from Kconfig so that it cannot be selected. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-2-git-send-email-hpa@linux.intel.com	2012-11-29 13:23:01 -08:00
Rafael J. Wysocki	d4c091f13d	Merge branch 'acpi-general' * acpi-general: (38 commits) ACPI / thermal: _TMP and _CRT/_HOT/_PSV/_ACx dependency fix ACPI: drop unnecessary local variable from acpi_system_write_wakeup_device() ACPI: Fix logging when no pci_irq is allocated ACPI: Update Dock hotplug error messages ACPI: Update Container hotplug error messages ACPI: Update Memory hotplug error messages ACPI: Update CPU hotplug error messages ACPI: Add acpi_handle_<level>() interfaces ACPI: remove use of __devexit ACPI / PM: Add Sony Vaio VPCEB1S1E to nonvs blacklist. ACPI / battery: Correct battery capacity values on Thinkpads Revert "ACPI / x86: Add quirk for "CheckPoint P-20-00" to not use bridge _CRS_ info" ACPI: create _SUN sysfs file ACPI / memhotplug: bind the memory device when the driver is being loaded ACPI / memhotplug: don't allow to eject the memory device if it is being used ACPI / memhotplug: free memory device if acpi_memory_enable_device() failed ACPI / memhotplug: fix memory leak when memory device is unbound from acpi_memhotplug ACPI / memhotplug: deal with eject request in hotplug queue ACPI / memory-hotplug: add memory offline code to acpi_memory_device_remove() ACPI / memory-hotplug: call acpi_bus_trim() to remove memory device ... Conflicts: include/linux/acpi.h (two additions at the end of the same file)	2012-11-29 21:43:06 +01:00
Rafael J. Wysocki	bcacbdbdc8	Merge branch 'acpi-enumeration' * acpi-enumeration: ACPI: remove unnecessary INIT_LIST_HEAD ACPI / platform: include missed header into acpi_platform.c platform / ACPI: Attach/detach ACPI PM during probe/remove/shutdown mmc: sdhci-acpi: add SDHCI ACPI driver ACPI: add SDHCI to ACPI platform devices ACPI / PNP: skip ACPI device nodes associated with physical nodes already i2c / ACPI: add ACPI enumeration support ACPI / platform: Initialize ACPI handles of platform devices in advance ACPI / driver core: Introduce struct acpi_dev_node and related macros ACPI: Allow ACPI handles of devices to be initialized in advance ACPI / resources: Use AE_CTRL_TERMINATE to terminate resources walks ACPI: Centralized processing of ACPI device resources ACPI / platform: Use common ACPI device resource parsing routines ACPI: Move device resources interpretation code from PNP to ACPI core ACPI / platform: use ACPI device name instead of _HID._UID ACPI: Add support for platform bus type ACPI / ia64: Export acpi_[un]register_gsi() ACPI / x86: Export acpi_[un]register_gsi() ACPI: Provide generic functions for matching ACPI device nodes driver core / ACPI: Move ACPI support to core device and driver types	2012-11-29 21:41:25 +01:00
Ian Campbell	f832da068b	xen: arm: implement remap interfaces needed for privcmd mappings. We use XENMEM_add_to_physmap_range which is the preferred interface for foreign mappings. Acked-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-11-29 14:00:19 +00:00
Ian Campbell	7892f6928d	xen: correctly use xen_pfn_t in remap_domain_mfn_range. For Xen on ARM a PFN is 64 bits so we need to use the appropriate type here. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v2: include the necessary header, Reported-by: Fengguang Wu <fengguang.wu@intel.com> ]	2012-11-29 12:59:19 +00:00
Ian Campbell	c2374bf57e	xen: balloon: allow PVMMU interfaces to be compiled out The ARM platform has no concept of PVMMU and therefor no HYPERVISOR_update_va_mapping et al. Allow this code to be compiled out when not required. In some similar situations (e.g. P2M) we have defined dummy functions to avoid this, however I think we can/should draw the line at dummying out actual hypercalls. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-11-29 12:59:13 +00:00
Ian Campbell	9a032e393a	xen: add pages parameter to xen_remap_domain_mfn_range Also introduce xen_unmap_domain_mfn_range. These are the parts of Mukesh's "xen/pvh: Implement MMU changes for PVH" which are also needed as a baseline for ARM privcmd support. The original patch was: Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> This derivative is also: Signed-off-by: Ian Campbell <ian.campbell@citrix.com>	2012-11-29 12:57:36 +00:00
Al Viro	4f4202fe5a	unify default ptrace_signal_deliver Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-11-29 00:01:23 -05:00
Al Viro	18c26c27ae	death to idle_regs() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-11-28 23:43:42 -05:00
Al Viro	afa86fc426	flagday: don't pass regs to copy_thread() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-11-28 23:43:42 -05:00
Al Viro	24465a40ba	take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h now it can be done... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-11-28 23:43:27 -05:00
Al Viro	1d4b4b2994	x86, um: switch to generic fork/vfork/clone Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-11-28 22:13:44 -05:00
Al Viro	71613c3b87	get rid of pt_regs argument of ->load_binary() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-11-28 21:53:38 -05:00
Al Viro	6b94631f9e	consolidate sys_execve() prototype Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-11-28 21:53:35 -05:00
Xiao Guangrong	e6c7d32172	KVM: VMX: fix invalid cpu passed to smp_call_function_single In loaded_vmcs_clear, loaded_vmcs->cpu is the fist parameter passed to smp_call_function_single, if the target cpu is downing (doing cpu hot remove), loaded_vmcs->cpu can become -1 then -1 is passed to smp_call_function_single It can be triggered when vcpu is being destroyed, loaded_vmcs_clear is called in the preemptionable context Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-28 22:04:58 -02:00
Gleb Natapov	859f8450d8	KVM: use is_idle_task() instead of idle_cpu() to decide when to halt in async_pf As Frederic pointed idle_cpu() may return false even if async fault happened in the idle task if wake up is pending. In this case the code will try to put idle task to sleep. Fix this by using is_idle_task() to check for idle task. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-28 21:30:13 -02:00
Konrad Rzeszutek Wilk	394b40f62d	xen/acpi: Move the xen_running_on_version_or_later function. As on ia64 builds we get: include/xen/interface/version.h: In function 'xen_running_on_version_or_later': include/xen/interface/version.h:76: error: implicit declaration of function 'HYPERVISOR_xen_version' We can later on make this function exportable if there are modules using part of it. For right now the only two users are built-in. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-11-28 14:39:31 -05:00
Marcelo Tosatti	d98d07ca7e	KVM: x86: update pvclock area conditionally, on cpu migration As requested by Glauber, do not update kvmclock area on vcpu->pcpu migration, in case the host has stable TSC. This is to reduce cacheline bouncing. Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:15 -02:00
Marcelo Tosatti	b48aa97e38	KVM: x86: require matched TSC offsets for master clock With master clock, a pvclock clock read calculates: ret = system_timestamp + [ (rdtsc + tsc_offset) - tsc_timestamp ] Where 'rdtsc' is the host TSC. system_timestamp and tsc_timestamp are unique, one tuple per VM: the "master clock". Given a host with synchronized TSCs, its obvious that guest TSC must be matched for the above to guarantee monotonicity. Allow master clock usage only if guest TSCs are synchronized. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:15 -02:00
Marcelo Tosatti	42897d866b	KVM: x86: add kvm_arch_vcpu_postcreate callback, move TSC initialization TSC initialization will soon make use of online_vcpus. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:14 -02:00
Marcelo Tosatti	d828199e84	KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag KVM added a global variable to guarantee monotonicity in the guest. One of the reasons for that is that the time between 1. ktime_get_ts(&timespec); 2. rdtscll(tsc); Is variable. That is, given a host with stable TSC, suppose that two VCPUs read the same time via ktime_get_ts() above. The time required to execute 2. is not the same on those two instances executing in different VCPUS (cache misses, interrupts...). If the TSC value that is used by the host to interpolate when calculating the monotonic time is the same value used to calculate the tsc_timestamp value stored in the pvclock data structure, and a single <system_timestamp, tsc_timestamp> tuple is visible to all vcpus simultaneously, this problem disappears. See comment on top of pvclock_update_vm_gtod_copy for details. Monotonicity is then guaranteed by synchronicity of the host TSCs and guest TSCs. Set TSC stable pvclock flag in that case, allowing the guest to read clock from userspace. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:13 -02:00
Marcelo Tosatti	16e8d74d2d	KVM: x86: notifier for clocksource changes Register a notifier for clocksource change event. In case the host switches to clock other than TSC, disable master clock usage. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:12 -02:00
Marcelo Tosatti	886b470cb1	KVM: x86: pass host_tsc to read_l1_tsc Allow the caller to pass host tsc value to kvm_x86_ops->read_l1_tsc(). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:11 -02:00
Marcelo Tosatti	51c19b4f59	x86: vdso: pvclock gettime support Improve performance of time system calls when using Linux pvclock, by reading time info from fixmap visible copy of pvclock data. Originally from Jeremy Fitzhardinge. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:11 -02:00
Marcelo Tosatti	3dc4f7cfb7	x86: kvm guest: pvclock vsyscall support Hook into generic pvclock vsyscall code, with the aim to allow userspace to have visibility into pvclock data. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:10 -02:00
Marcelo Tosatti	71056ae22d	x86: pvclock: generic pvclock vsyscall initialization Originally from Jeremy Fitzhardinge. Introduce generic, non hypervisor specific, pvclock initialization routines. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:09 -02:00
Marcelo Tosatti	189e11731a	x86: pvclock: add note about rdtsc barriers As noted by Gleb, not advertising SSE2 support implies no RDTSC barriers. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:08 -02:00
Marcelo Tosatti	2697902be8	x86: pvclock: introduce helper to read flags Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:08 -02:00
Marcelo Tosatti	dce2db0a35	x86: pvclock: create helper for pvclock data retrieval Originally from Jeremy Fitzhardinge. So code can be reused. Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:07 -02:00
Marcelo Tosatti	42b5637d69	x86: pvclock: remove pvclock_shadow_time Originally from Jeremy Fitzhardinge. We can copy the information directly from "struct pvclock_vcpu_time_info", remove pvclock_shadow_time. Reviewed-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:06 -02:00
Marcelo Tosatti	b01578de45	x86: pvclock: make sure rdtsc doesnt speculate out of region Originally from Jeremy Fitzhardinge. pvclock_get_time_values, which contains the memory barriers will be removed by next patch. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:06 -02:00
Marcelo Tosatti	7069ed6763	x86: kvmclock: allocate pvclock shared memory area We want to expose the pvclock shared memory areas, which the hypervisor periodically updates, to userspace. For a linear mapping from userspace, it is necessary that entire page sized regions are used for array of pvclock structures. There is no such guarantee with per cpu areas, therefore move to memblock_alloc based allocation. Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:05 -02:00
Marcelo Tosatti	78c0337a38	KVM: x86: retain pvclock guest stopped bit in guest memory Otherwise its possible for an unrelated KVM_REQ_UPDATE_CLOCK (such as due to CPU migration) to clear the bit. Noticed by Paolo Bonzini. Reviewed-by: Gleb Natapov <gleb@redhat.com> Reviewed-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-27 23:29:05 -02:00
H. Peter Anvin	6662c34fa9	x86-32: Unbreak booting on some 486 clones There appear to have been some 486 clones, including the "enhanced" version of Am486, which have CPUID but not CR4. These 486 clones had only the FPU flag, if any, unlike the Intel 486s with CPUID, which also had VME and therefore needed CR4. Therefore, look at the basic CPUID flags and require at least one bit other than bit 0 before we modify CR4. Thanks to Christian Ludloff of sandpile.org for confirming this as a problem. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-27 09:26:33 -08:00
H. Peter Anvin	cb7cb2864e	x86, kvm: Remove incorrect redundant assembly constraint In __emulate_1op_rax_rdx, we use "+a" and "+d" which are input/output constraints, and then use "a" and "d" as input constraints. This is incorrect, but happens to work on some versions of gcc. However, it breaks gcc with -O0 and icc, and may break on future versions of gcc. Reported-and-tested-by: Melanie Blower <melanie.blower@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/B3584E72CFEBED439A3ECA9BCE67A4EF1B17AF90@FMSMSX107.amr.corp.intel.com Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com>	2012-11-26 15:52:48 -08:00
Suresh Siddha	29c574c0ab	x86, apic: Cleanup cfg->domain setup for legacy interrupts Issues that need to be handled: * Handle PIC interrupts on any CPU irrespective of the apic mode * In the apic lowest priority logical flat delivery mode, be prepared to handle the interrupt on any CPU irrespective of what the IO-APIC RTE says. * Because of above, when the IO-APIC starts handling the legacy PIC interrupt, use the same vector that is being used by the PIC while programming the corresponding IO-APIC RTE. Start with all the cpu's in the legacy PIC interrupts cfg->domain. By the time IO-APIC starts taking over the PIC interrupts, apic driver model is finalized. So depend on the assign_irq_vector() to update the cfg->domain and retain the same vector that was used by PIC before. For the logical apic flat mode, cfg->domain is updated (during the first call to assign_irq_vector()) to contain all the possible online cpu's (0xff). Vector used for the legacy PIC interrupt doesn't change when the IO-APIC starts handling the interrupt. Any interrupt migration after that doesn't change the cfg->domain or the vector used. For other apic modes like physical mode, cfg->domain is updated (during the first call to assign_irq_vector()) to the boot cpu (cpu-0), with the same vector that is being used by the PIC. When that interrupt is migrated to a different cpu, cfg->domin and the vector assigned will change accordingly. Tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1353970176.21070.51.camel@sbsiddha-desk.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-26 15:43:25 -08:00
Liu, Jinsong	e3aa4e61b5	xen/acpi: revert pad config check in xen_check_mwait With Xen acpi pad logic added into kernel, we can now revert xen mwait related patch `df88b2d96e` ("xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded. "). The reason is, when running under newer Xen platform, Xen pad driver would be early loaded, so native pad driver would fail to be loaded, and hence no mwait/monitor #UD risk again. Another point is, only Xen4.2 or later support Xen acpi pad, so we won't expose mwait cpuid capability when running under older Xen platform. Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-11-26 15:08:29 -05:00
David S. Miller	24bc518a68	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/wireless/iwlwifi/pcie/tx.c Minor iwlwifi conflict in TX queue disabling between 'net', which removed a bogus warning, and 'net-next' which added some status register poking code. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-25 12:49:17 -05:00
Linus Torvalds	2654ad44b5	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 arch fixes from Peter Anvin: "Here is a collection of fixes for 3.7-rc7. This is a superset of tglx' earlier pull request." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64: Fix ordering of CFI directives and recent ASM_CLAC additions x86, microcode, AMD: Add support for family 16h processors x86-32: Export kernel_stack_pointer() for modules x86-32: Fix invalid stack address while in softirq x86, efi: Fix processor-specific memcpy() build error x86: remove dummy long from EFI stub x86, mm: Correct vmflag test for checking VM_HUGETLB x86, amd: Disable way access filter on Piledriver CPUs x86/mce: Do not change worker's running cpu in cmci_rediscover(). x86/ce4100: Fix PCI configuration register access for devices without interrupts x86/ce4100: Fix reboot by forcing the reboot method to be KBD x86/ce4100: Fix pm_poweroff MAINTAINERS: Update email address for Robert Richter x86, microcode_amd: Change email addresses, MAINTAINERS entry MAINTAINERS: Change Boris' email address EDAC: Change Boris' email address x86, AMD: Change Boris' email address	2012-11-23 20:03:14 -10:00
Len Brown	3fc808aaa0	x86 power: define RAPL MSRs The Run Time Average Power Limiting interface is currently model specific, present on Sandy Bridge and Ivy Bridge processors. These #defines correspond to documentation in the latest "Intel® 64 and IA-32 Architectures Software Developer Manual", plus some typos in that document corrected. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org	2012-11-23 21:40:11 -05:00
Len Brown	9c63a650bb	tools/power/x86/turbostat: share kernel MSR #defines Now that turbostat is built in the kernel tree, it can share MSR #defines with the kernel. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org	2012-11-23 21:40:04 -05:00
Jan Beulich	ee4eb87be2	x86-64: Fix ordering of CFI directives and recent ASM_CLAC additions While these got added in the right place everywhere else, entry_64.S is the odd one where they ended up before the initial CFI directive(s). In order to cover the full code ranges, the CFI directive must be first, though. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/5093BA1F02000078000A600E@nat28.tlf.novell.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-20 22:23:57 -08:00
Boris Ostrovsky	36c46ca4f3	x86, microcode, AMD: Add support for family 16h processors Add valid patch size for family 16h processors. [ hpa: promoting to urgent/stable since it is hw enabling and trivial ] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> Acked-by: Andreas Herrmann <herrmann.der.user@googlemail.com> Link: http://lkml.kernel.org/r/1353004910-2204-1-git-send-email-boris.ostrovsky@amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org>	2012-11-20 22:23:28 -08:00
H. Peter Anvin	cb57a2b4cf	x86-32: Export kernel_stack_pointer() for modules Modules, in particular oprofile (and possibly other similar tools) need kernel_stack_pointer(), so export it using EXPORT_SYMBOL_GPL(). Cc: Yang Wei <wei.yang@windriver.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Jun Zhang <jun.zhang@intel.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20120912135059.GZ8285@erda.amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-20 22:23:23 -08:00
Robert Richter	1022623842	x86-32: Fix invalid stack address while in softirq In 32 bit the stack address provided by kernel_stack_pointer() may point to an invalid range causing NULL pointer access or page faults while in NMI (see trace below). This happens if called in softirq context and if the stack is empty. The address at &regs->sp is then out of range. Fixing this by checking if regs and &regs->sp are in the same stack context. Otherwise return the previous stack pointer stored in struct thread_info. If that address is invalid too, return address of regs. BUG: unable to handle kernel NULL pointer dereference at 0000000a IP: [<c1004237>] print_context_stack+0x6e/0x8d pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: Pid: 4434, comm: perl Not tainted 3.6.0-rc3-oprofile-i386-standard-g4411a05 #4 Hewlett-Packard HP xw9400 Workstation/0A1Ch EIP: 0060:[<c1004237>] EFLAGS: 00010093 CPU: 0 EIP is at print_context_stack+0x6e/0x8d EAX: ffffe000 EBX: 0000000a ECX: f4435f94 EDX: 0000000a ESI: f4435f94 EDI: f4435f94 EBP: f5409ec0 ESP: f5409ea0 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 CR0: 8005003b CR2: 0000000a CR3: 34ac9000 CR4: 000007d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process perl (pid: 4434, ti=f5408000 task=f5637850 task.ti=f4434000) Stack: 000003e8 ffffe000 00001ffc f4e39b00 00000000 0000000a f4435f94 c155198c f5409ef0 c1003723 c155198c f5409f04 00000000 f5409edc 00000000 00000000 f5409ee8 f4435f94 f5409fc4 00000001 f5409f1c c12dce1c 00000000 c155198c Call Trace: [<c1003723>] dump_trace+0x7b/0xa1 [<c12dce1c>] x86_backtrace+0x40/0x88 [<c12db712>] ? oprofile_add_sample+0x56/0x84 [<c12db731>] oprofile_add_sample+0x75/0x84 [<c12ddb5b>] op_amd_check_ctrs+0x46/0x260 [<c12dd40d>] profile_exceptions_notify+0x23/0x4c [<c1395034>] nmi_handle+0x31/0x4a [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45 [<c13950ed>] do_nmi+0xa0/0x2ff [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45 [<c13949e5>] nmi_stack_correct+0x28/0x2d [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45 [<c1003603>] ? do_softirq+0x4b/0x7f <IRQ> [<c102a06f>] irq_exit+0x35/0x5b [<c1018f56>] smp_apic_timer_interrupt+0x6c/0x7a [<c1394746>] apic_timer_interrupt+0x2a/0x30 Code: 89 fe eb 08 31 c9 8b 45 0c ff 55 ec 83 c3 04 83 7d 10 00 74 0c 3b 5d 10 73 26 3b 5d e4 73 0c eb 1f 3b 5d f0 76 1a 3b 5d e8 73 15 <8b> 13 89 d0 89 55 e0 e8 ad 42 03 00 85 c0 8b 55 e0 75 a6 eb cc EIP: [<c1004237>] print_context_stack+0x6e/0x8d SS:ESP 0068:f5409ea0 CR2: 000000000000000a ---[ end trace 62afee3481b00012 ]--- Kernel panic - not syncing: Fatal exception in interrupt V2: add comments to kernel_stack_pointer() * always return a valid stack address by falling back to the address of regs Reported-by: Yang Wei <wei.yang@windriver.com> Cc: <stable@vger.kernel.org> Signed-off-by: Robert Richter <robert.richter@amd.com> Link: http://lkml.kernel.org/r/20120912135059.GZ8285@erda.amd.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Jun Zhang <jun.zhang@intel.com>	2012-11-20 22:23:20 -08:00
H. Peter Anvin	c1ddb48204	Merge commit 'efi-for-3.7-v2' into x86/urgent	2012-11-20 16:49:15 -08:00
Matt Fleming	0f905a43ce	x86, efi: Fix processor-specific memcpy() build error Building for Athlon/Duron/K7 results in the following build error, arch/x86/boot/compressed/eboot.o: In function `__constant_memcpy3d': eboot.c:(.text+0x385): undefined reference to `_mmx_memcpy' arch/x86/boot/compressed/eboot.o: In function `efi_main': eboot.c:(.text+0x1a22): undefined reference to `_mmx_memcpy' because the boot stub code doesn't link with the kernel proper, and therefore doesn't have access to the 3DNow version of memcpy. So, follow the example of misc.c and #undef memcpy so that we use the version provided by misc.c. See https://bugzilla.kernel.org/show_bug.cgi?id=50391 Reported-by: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Ryan Underwood <nemesis@icequake.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: stable@vger.kernel.org Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2012-11-20 20:52:07 +00:00
Cesar Eduardo Barros	caaa8c6339	x86: remove dummy long from EFI stub Commit `2e064b1` (x86, efi: Fix issue of overlapping .reloc section for EFI_STUB) removed a dummy reloc added by commit `291f363` (x86, efi: EFI boot stub support), but forgot to remove the dummy long used by that reloc. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Tested-by: Lee G Rosenbaum <lee.g.rosenbaum@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net> Signed-off-by: Matt Fleming <matt.fleming@intel.com>	2012-11-20 20:17:48 +00:00
David Howells	60606d4248	Merge branch 'x86-pre-uapi' into perf-uapi David Howells (1): x86: Export asm/{svm.h,vmx.h,perf_regs.h}	2012-11-19 21:50:58 +00:00
Paul Bolle	23e5047afc	x86: remove offsets.h from .gitignore and dontdiff Commit `77d1a49995` ("x86, boot: make symbols from the main vmlinux available") removed all traces of offsets.h from the tree. Remove its entries in dontdiff and x86/boot's .gitignore file too. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-11-19 14:10:53 +01:00
Steven Rostedt	6c8d8b3c69	x86_32: Return actual stack when requesting sp from regs As x86_32 traps do not save sp when taken in kernel mode, we need to accommodate the sp when requesting to get the register. This affects kprobes. Before: # echo 'p:ftrace sys_read+4 s=%sp' > /debug/tracing/kprobe_events # echo 1 > /debug/tracing/events/kprobes/enable # cat trace sshd-1345 [000] d... 489.117168: ftrace: (sys_read+0x4/0x70) s=b7e96768 sshd-1345 [000] d... 489.117191: ftrace: (sys_read+0x4/0x70) s=b7e96768 cat-1447 [000] d... 489.117392: ftrace: (sys_read+0x4/0x70) s=5a7 cat-1447 [001] d... 489.118023: ftrace: (sys_read+0x4/0x70) s=b77ad05f less-1448 [000] d... 489.118079: ftrace: (sys_read+0x4/0x70) s=b7762e06 less-1448 [000] d... 489.118117: ftrace: (sys_read+0x4/0x70) s=b7764970 After: sshd-1352 [000] d... 362.348016: ftrace: (sys_read+0x4/0x70) s=f3febfa8 sshd-1352 [000] d... 362.348048: ftrace: (sys_read+0x4/0x70) s=f3febfa8 bash-1355 [001] d... 362.348081: ftrace: (sys_read+0x4/0x70) s=f5075fa8 sshd-1352 [000] d... 362.348082: ftrace: (sys_read+0x4/0x70) s=f3febfa8 sshd-1352 [000] d... 362.690950: ftrace: (sys_read+0x4/0x70) s=f3febfa8 bash-1355 [001] d... 362.691033: ftrace: (sys_read+0x4/0x70) s=f5075fa8 Link: http://lkml.kernel.org/r/1342208654.30075.22.camel@gandalf.stny.rr.com Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-11-19 05:31:37 -05:00
David S. Miller	67f4efdce7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Minor line offset auto-merges. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-11-17 22:00:43 -05:00
Yinghai Lu	9710f581bb	x86, mm: Let "memmap=" take more entries one time Current "memmap=" only can take one entry every time. when we have more entries, we have to use memmap= for each of them. For pxe booting, we have command line length limitation, those extra "memmap=" would waste too much space. This patch make memmap= could take several entries one time, and those entries will be split with ',' Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-47-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:51 -08:00
Yinghai Lu	c074eaac2a	x86, mm: kill numa_64.h Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-44-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:47 -08:00
Yinghai Lu	94b43c3d86	x86, mm: kill numa_free_all_bootmem() Now NO_BOOTMEM version free_all_bootmem_node() does not really do free_bootmem at all, and it only call register_page_bootmem_info_node instead. That is confusing, try to kill that free_all_bootmem_node(). Before that, this patch will remove numa_free_all_bootmem(). That function could be replaced with register_page_bootmem_info() and free_all_bootmem(); Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-43-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:47 -08:00
Yinghai Lu	b8fd39c036	x86, mm: Use clamp_t() in init_range_memory_mapping save some lines, and make code more readable. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-42-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:46 -08:00
Yinghai Lu	60a8f42832	x86, mm: Move after_bootmem to mm_internel.h it is only used in arch/x86/mm/init*.c Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-41-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:45 -08:00
Yinghai Lu	4e37a89047	x86, mm: Unifying after_bootmem for 32bit and 64bit after_bootmem has different meaning in 32bit and 64bit. 32bit: after bootmem is ready 64bit: after bootmem is distroyed Let's merget them make 32bit the same as 64bit. for 32bit, it is mixing alloc_bootmem_pages, and alloc_low_page under after_bootmem is set or not set. alloc_bootmem is just wrapper for memblock for x86. Now we have alloc_low_page() with memblock too. We can drop bootmem path now, and only alloc_low_page only. At the same time, we make alloc_low_page could handle real after_bootmem for 32bit, because alloc_bootmem_pages could fallback to use slab too. At last move after_bootmem set position for 32bit the same as 64bit. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-40-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:44 -08:00
Yinghai Lu	2e8059edb6	x86, mm: use limit_pfn for end pfn instead of shifting end to get that. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-39-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:43 -08:00
Yinghai Lu	1829ae9ad7	x86, mm: use pfn instead of pos in split_mem_range could save some bit shifting operations. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-38-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:41 -08:00
Yinghai Lu	84d770019b	x86, mm: use PFN_DOWN in split_mem_range() to replace own inline version for shifting. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-37-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:40 -08:00
Yinghai Lu	5a0d3aeeef	x86, mm: use round_up/down in split_mem_range() to replace own inline version for those roundup and rounddown. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-36-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:40 -08:00
Yinghai Lu	11ed9e927d	x86, mm: Add check before clear pte above max_low_pfn on 32bit During test patch that adjust page_size_mask to map small range ram with big page size, found page table is setup wrongly for 32bit. And native_pagetable_init wrong clear pte for pmd with large page support. 1. add more comments about why we are expecting pte. 2. add BUG checking, so next time we could find problem earlier when we mess up page table setup again. 3. max_low_pfn is not included boundary for low memory mapping. We should check from max_low_pfn instead of +1. 4. add print out when some pte really get cleared, or we should use WARN() to find out why above max_low_pfn get mapped? so we could fix it. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-35-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:39 -08:00
Yinghai Lu	c8dcdb9ce4	x86, mm: Move function declaration into mm_internal.h They are only for mm/init*.c. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-34-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:38 -08:00
Yinghai Lu	f836e35a98	x86, mm: change low/hignmem_pfn_init to static on 32bit Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-33-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:37 -08:00
Yinghai Lu	148b20989e	x86, mm: Move init_gbpages() out of setup.c Put it in mm/init.c, and call it from probe_page_mask(). init_mem_mapping is calling probe_page_mask at first. So calling sequence is not changed. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-32-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:37 -08:00
Yinghai Lu	cf47065961	x86, mm: Move back pgt_buf_* to mm/init.c Also change them to static. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-31-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:36 -08:00
Yinghai Lu	719272c45b	x86, mm: only call early_ioremap_page_table_range_init() once On 32bit, before patcheset that only set page table for ram, we only call that one time. Now, we are calling that during every init_memory_mapping if we have holes under max_low_pfn. We should only call it one time after all ranges under max_low_page get mapped just like we did before. Also that could avoid the risk to run out of pgt_buf in BRK. Need to update page_table_range_init() to count the pages for kmap page table at first, and use new added alloc_low_pages() to get pages in sequence. That will conform to the requirement that pages need to be in low to high order. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-30-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:29 -08:00
Stefano Stabellini	ddd3509df8	x86, mm: Add pointer about Xen mmu requirement for alloc_low_pages Add link for more information `279b706` x86,xen: introduce x86_init.mapping.pagetable_reserve -v2: updated to commets from hpa to include commit name. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-29-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:28 -08:00
Yinghai Lu	22c8ca2ac2	x86, mm: Add alloc_low_pages(num) 32bit kmap mapping needs pages to be used for low to high. At this point those pages are still from pgt_buf_* from BRK, so it is ok now. But we want to move early_ioremap_page_table_range_init() out of init_memory_mapping() and only call it one time later, that will make page_table_range_init/page_table_kmap_check/alloc_low_page to use memblock to get page. memblock allocation for pages are from high to low. So will get panic from page_table_kmap_check() that has BUG_ON to do ordering checking. This patch add alloc_low_pages to make it possible to allocate serveral pages at first, and hand out pages one by one from low to high. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-28-git-send-email-yinghai@kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:27 -08:00
Yinghai Lu	6f80b68e9e	x86, mm, Xen: Remove mapping_pagetable_reserve() Page table area are pre-mapped now after x86, mm: setup page table in top-down x86, mm: Remove early_memremap workaround for page table accessing on 64bit mapping_pagetable_reserve is not used anymore, so remove it. Also remove operation in mask_rw_pte(), as modified allow_low_page always return pages that are already mapped, moreover xen_alloc_pte_init, xen_alloc_pmd_init, etc, will mark the page RO before hooking it into the pagetable automatically. -v2: add changelog about mask_rw_pte() from Stefano. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-27-git-send-email-yinghai@kernel.org Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:26 -08:00
Yinghai Lu	9985b4c6fa	x86, mm: Move min_pfn_mapped back to mm/init.c Also change it to static. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-26-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:24 -08:00
Yinghai Lu	5c51bdbe4c	x86, mm: Merge alloc_low_page between 64bit and 32bit They are almost same except 64 bit need to handle after_bootmem case. Add mm_internal.h to make that alloc_low_page() only to be accessible from arch/x86/mm/init*.c Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-25-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:23 -08:00
Yinghai Lu	868bf4d6b9	x86, mm: Remove parameter in alloc_low_page for 64bit Now all page table buf are pre-mapped, and could use virtual address directly. So don't need to remember physical address anymore. Remove that phys pointer in alloc_low_page(), and that will allow us to merge alloc_low_page between 64bit and 32bit. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-24-git-send-email-yinghai@kernel.org Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:22 -08:00
Yinghai Lu	973dc4f3fa	x86, mm: Remove early_memremap workaround for page table accessing on 64bit We try to put page table high to make room for kdump, and at that time those ranges are not mapped yet, and have to use ioremap to access it. Now after patch that pre-map page table top down. x86, mm: setup page table in top-down We do not need that workaround anymore. Just use __va to return directly mapping address. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-23-git-send-email-yinghai@kernel.org Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:20 -08:00
Yinghai Lu	8d57470d8f	x86, mm: setup page table in top-down Get pgt_buf early from BRK, and use it to map PMD_SIZE from top at first. Then use mapped pages to map more ranges below, and keep looping until all pages get mapped. alloc_low_page will use page from BRK at first, after that buffer is used up, will use memblock to find and reserve pages for page table usage. Introduce min_pfn_mapped to make sure find new pages from mapped ranges, that will be updated when lower pages get mapped. Also add step_size to make sure that don't try to map too big range with limited mapped pages initially, and increase the step_size when we have more mapped pages on hand. We don't need to call pagetable_reserve anymore, reserve work is done in alloc_low_page() directly. At last we can get rid of calculation and find early pgt related code. -v2: update to after fix_xen change, also use MACRO for initial pgt_buf size and add comments with it. -v3: skip big reserved range in memblock.reserved near end. -v4: don't need fix_xen change now. -v5: add changelog about moving about reserving pagetable to alloc_low_page. Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-22-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:19 -08:00
Yinghai Lu	f763ad1d38	x86, mm: Break down init_all_memory_mapping Will replace that with top-down page table initialization. New API need to take range: init_range_memory_mapping() Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-21-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:17 -08:00
Yinghai Lu	eceb3632ac	x86, mm: Don't clear page table if range is ram After we add code use buffer in BRK to pre-map buf for page table in following patch: x86, mm: setup page table in top-down it should be safe to remove early_memmap for page table accessing. Instead we get panic with that. It turns out that we clear the initial page table wrongly for next range that is separated by holes. And it only happens when we are trying to map ram range one by one. We need to check if the range is ram before clearing page table. We change the loop structure to remove the extra little loop and use one loop only, and in that loop will caculate next at first, and check if [addr,next) is covered by E820_RAM. -v2: E820_RESERVED_KERN is treated as E820_RAM. EFI one change some E820_RAM to that, so next kernel by kexec will know that range is used already. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-20-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:17 -08:00
Yinghai Lu	aeebe84cc9	x86, mm: Use big page size for small memory range We could map small range in the middle of big range at first, so should use big page size at first to avoid using small page size to break down page table. Only can set big page bit when that range has ram area around it. -v2: fix 32bit boundary checking. We can not count ram above max_low_pfn for 32 bit. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-19-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:16 -08:00
Yinghai Lu	960ddb4fe7	x86, mm: Align start address to correct big page size We are going to use buffer in BRK to map small range just under memory top, and use those new mapped ram to map ram range under it. The ram range that will be mapped at first could be only page aligned, but ranges around it are ram too, we could use bigger page to map it to avoid small page size. We will adjust page_size_mask in following patch: x86, mm: Use big page size for small memory range to use big page size for small ram range. Before that patch, this patch will make sure start address to be aligned down according to bigger page size, otherwise entry in page page will not have correct value. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-18-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:15 -08:00
Yinghai Lu	74f27655dd	x86, mm: relocate initrd under all mem for 64bit instead of under 4g. For 64bit, we can use any mapped mem instead of low mem. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-17-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:15 -08:00
Jacob Shin	66520ebc2d	x86, mm: Only direct map addresses that are marked as E820_RAM Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT ) and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not backed by actual DRAM. This is fine for holes under 4GB which are covered by fixed and variable range MTRRs to be UC. However, we run into trouble on higher memory addresses which cannot be covered by MTRRs. Our system with 1TB of RAM has an e820 that looks like this: BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable and so direct mappings are created for huge memory hole between 0x000000e038000000 to 0x0000010000000000. Even though the kernel never generates memory accesses in that region, since the page tables mark them incorrectly as being WB, our (AMD) processor ends up causing a MCE while doing some memory bookkeeping/optimizations around that area. This patch iterates through e820 and only direct maps ranges that are marked as E820_RAM, and keeps track of those pfn ranges. Depending on the alignment of E820 ranges, this may possibly result in using smaller size (i.e. 4K instead of 2M or 1G) page tables. -v2: move changes from setup.c to mm/init.c, also use for_each_mem_pfn_range instead. - Yinghai Lu -v3: add calculate_all_table_space_size() to get correct needed page table size. - Yinghai Lu -v4: fix add_pfn_range_mapped() to get correct max_low_pfn_mapped when mem map does have hole under 4g that is found by Konard on xen domU with 8g ram. - Yinghai Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1353123563-3103-16-git-send-email-yinghai@kernel.org Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:14 -08:00
Yinghai Lu	e8c57d4051	x86, mm: use pfn_range_is_mapped() with reserve_initrd We are going to map ram only, so under max_low_pfn_mapped, between 4g and max_pfn_mapped does not mean mapped at all. Use pfn_range_is_mapped() to find out if range is mapped for initrd. That could happen bootloader put initrd in range but user could use memmap to carve some of range out. Also during copying need to use early_memmap to map original initrd for accessing. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-15-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:12 -08:00

1 2 3 4 5 ...

16651 Commits