Commit Graph

16651 Commits

Author SHA1 Message Date
Kirill A. Shutemov e180377f1a thp: change split_huge_page_pmd() interface
Pass vma instead of mm and add address parameter.

In most cases we already have vma on the stack. We provides
split_huge_page_pmd_mm() for few cases when we have mm, but not vma.

This change is preparation to huge zero pmd splitting implementation.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-12 17:38:31 -08:00
Linus Torvalds 9977d9b379 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull big execve/kernel_thread/fork unification series from Al Viro:
 "All architectures are converted to new model.  Quite a bit of that
  stuff is actually shared with architecture trees; in such cases it's
  literally shared branch pulled by both, not a cherry-pick.

  A lot of ugliness and black magic is gone (-3KLoC total in this one):

   - kernel_thread()/kernel_execve()/sys_execve() redesign.

     We don't do syscalls from kernel anymore for either kernel_thread()
     or kernel_execve():

     kernel_thread() is essentially clone(2) with callback run before we
     return to userland, the callbacks either never return or do
     successful do_execve() before returning.

     kernel_execve() is a wrapper for do_execve() - it doesn't need to
     do transition to user mode anymore.

     As a result kernel_thread() and kernel_execve() are
     arch-independent now - they live in kernel/fork.c and fs/exec.c
     resp.  sys_execve() is also in fs/exec.c and it's completely
     architecture-independent.

   - daemonize() is gone, along with its parts in fs/*.c

   - struct pt_regs * is no longer passed to do_fork/copy_process/
     copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump.

   - sys_fork()/sys_vfork()/sys_clone() unified; some architectures
     still need wrappers (ones with callee-saved registers not saved in
     pt_regs on syscall entry), but the main part of those suckers is in
     kernel/fork.c now."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits)
  do_coredump(): get rid of pt_regs argument
  print_fatal_signal(): get rid of pt_regs argument
  ptrace_signal(): get rid of unused arguments
  get rid of ptrace_signal_deliver() arguments
  new helper: signal_pt_regs()
  unify default ptrace_signal_deliver
  flagday: kill pt_regs argument of do_fork()
  death to idle_regs()
  don't pass regs to copy_process()
  flagday: don't pass regs to copy_thread()
  bfin: switch to generic vfork, get rid of pointless wrappers
  xtensa: switch to generic clone()
  openrisc: switch to use of generic fork and clone
  unicore32: switch to generic clone(2)
  score: switch to generic fork/vfork/clone
  c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone()
  take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h
  mn10300: switch to generic fork/vfork/clone
  h8300: switch to generic fork/vfork/clone
  tile: switch to generic clone()
  ...

Conflicts:
	arch/microblaze/include/asm/Kbuild
2012-12-12 12:22:13 -08:00
Linus Torvalds 1ebaf4f4e6 Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer update from Ingo Molnar:
 "This tree includes HPET fixes and also implements a calibration-free,
  TSC match driven APIC timer interrupt mode: 'TSC deadline mode'
  supported in SandyBridge and later CPUs."

* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: hpet: Fix inverted return value check in arch_setup_hpet_msi()
  x86: hpet: Fix masking of MSI interrupts
  x86: apic: Use tsc deadline for oneshot when available
2012-12-11 20:01:33 -08:00
Linus Torvalds 743aa456c1 Merge branch 'x86-nuke386-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull "Nuke 386-DX/SX support" from Ingo Molnar:
 "This tree removes ancient-386-CPUs support and thus zaps quite a bit
  of complexity:

    24 files changed, 56 insertions(+), 425 deletions(-)

  ... which complexity has plagued us with extra work whenever we wanted
  to change SMP primitives, for years.

  Unfortunately there's a nostalgic cost: your old original 386 DX33
  system from early 1991 won't be able to boot modern Linux kernels
  anymore.  Sniff."

I'm not sentimental.  Good riddance.

* 'x86-nuke386-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, 386 removal: Document Nx586 as a 386 and thus unsupported
  x86, cleanups: Simplify sync_core() in the case of no CPUID
  x86, 386 removal: Remove CONFIG_X86_POPAD_OK
  x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK
  x86, 386 removal: Remove CONFIG_INVLPG
  x86, 386 removal: Remove CONFIG_BSWAP
  x86, 386 removal: Remove CONFIG_XADD
  x86, 386 removal: Remove CONFIG_CMPXCHG
  x86, 386 removal: Remove CONFIG_M386 from Kconfig
2012-12-11 19:59:32 -08:00
Linus Torvalds a05a4e24dc Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 topology discovery improvements from Ingo Molnar:
 "These changes improve topology discovery on AMD CPUs.

  Right now this feeds information displayed in
  /sys/devices/system/cpu/cpuX/cache/indexY/* - but in the future we
  could use this to set up a better scheduling topology."

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, cacheinfo: Base cache sharing info on CPUID 0x8000001d on AMD
  x86, cacheinfo: Make use of CPUID 0x8000001d for cache information on AMD
  x86, cacheinfo: Determine number of cache leafs using CPUID 0x8000001d on AMD
  x86: Add cpu_has_topoext
2012-12-11 19:58:29 -08:00
Linus Torvalds e9a5a91971 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Small cleanups."

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix the error of using "const" in gen-insn-attr-x86.awk
  x86, apic: Cleanup cfg->domain setup for legacy interrupts
  x86: Remove dead hlt_use_halt code
2012-12-11 19:57:56 -08:00
Linus Torvalds 74b8423345 Merge branch 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 BSP hotplug changes from Ingo Molnar:
 "This tree enables CPU#0 (the boot processor) to be onlined/offlined on
  x86, just like any other CPU.  Enabled on Intel CPUs for now.

  Allowing this required the identification and fixing of latent CPU#0
  assumptions (such as CPU#0 initializations, etc.) in the x86
  architecture code, plus the identification of barriers to
  BSP-offlining, such as active PIC interrupts which can only be
  serviced on the BSP.

  It's behind a default-off option, and there's a debug option that
  allows the automatic testing of this feature.

  The motivation of this feature is to allow and prepare for true
  CPU-hotplug hardware support: recent changes to MCE support enable us
  to detect a deteriorating but not yet hard-failing L1/L2 cache on a
  CPU that could be soft-unplugged - or a failing L3 cache on a
  multi-socket system.

  Note that true hardware hot-plug is not yet fully enabled by this,
  because that requires a special platform wakeup sequence to be sent to
  the freshly powered up CPU#0.  Future patches for this are planned,
  once such a platform exists.  Chicken and egg"

* 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, topology: Debug CPU0 hotplug
  x86/i387.c: Initialize thread xstate only on CPU0 only once
  x86, hotplug: Handle retrigger irq by the first available CPU
  x86, hotplug: The first online processor saves the MTRR state
  x86, hotplug: During CPU0 online, enable x2apic, set_numa_node.
  x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
  x86-32, hotplug: Add start_cpu0() entry point to head_32.S
  x86-64, hotplug: Add start_cpu0() entry point to head_64.S
  kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback
  x86, hotplug, suspend: Online CPU0 for suspend or hibernate
  x86, hotplug: Support functions for CPU0 online/offline
  x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it
  x86, Kconfig: Add config switch for CPU0 hotplug
  doc: Add x86 CPU0 online/offline feature
2012-12-11 19:56:33 -08:00
Linus Torvalds 5074474737 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot changes from Ingo Molnar:
 "Two small changes: a cleanup and allow CONFIG_X86_MPPARSE to be turned
  off on SFI as well."

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arch/x86/Kconfig: Allow turning off CONFIG_X86_MPPARSE when either ACPI or SFI is present
  x86/boot/doc: Fix grammar and typo in boot.txt
2012-12-11 19:55:58 -08:00
Linus Torvalds 0019fab355 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm changes from Ingo Molnar:
 "Two fixlets and a cleanup."

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86_32: Return actual stack when requesting sp from regs
  x86: Don't clobber top of pt_regs in nested NMI
  x86/asm: Clean up copy_page_*() comments and code
2012-12-11 19:55:20 -08:00
Linus Torvalds 090f8ccba3 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Lots of activity:

   211 files changed, 8328 insertions(+), 4116 deletions(-)

  most of it on the tooling side.

  Main changes:

   * ftrace enhancements and fixes from Steve Rostedt.

   * uprobes fixes, cleanups and preparation for the ARM port from Oleg
     Nesterov.

   * UAPI fixes, from David Howels - prepares the arch/x86 UAPI
     transition

   * Separate perf tests into multiple objects, one per test, from Jiri
     Olsa.

   * Make hardware event translations available in sysfs, from Jiri
     Olsa.

   * Fixes to /proc/pid/maps parsing, preparatory to supporting data
     maps, from Namhyung Kim

   * Implement ui_progress for GTK, from Namhyung Kim

   * Add framework for automated perf_event_attr tests, where tools with
     different command line options will be run from a 'perf test', via
     python glue, and the perf syscall will be intercepted to verify
     that the perf_event_attr fields set by the tool are those expected,
     from Jiri Olsa

   * Add a 'link' method for hists, so that we can have the leader with
     buckets for all the entries in all the hists.  This new method is
     now used in the default 'diff' output, making the sum of the
     'baseline' column be 100%, eliminating blind spots.

   * libtraceevent fixes for compiler warnings trying to make perf it
     build on some distros, like fedora 14, 32-bit, some of the warnings
     really pointed to real bugs.

   * Add a browser for 'perf script' and make it available from the
     report and annotate browsers.  It does filtering to find the
     scripts that handle events found in the perf.data file used.  From
     Feng Tang

   * perf inject changes to allow showing where a task sleeps, from
     Andrew Vagin.

   * Makefile improvements from Namhyung Kim.

   * Add --pre and --post command hooks in 'stat', from Peter Zijlstra.

   * Don't stop synthesizing threads when one vanishes, this is for the
     existing threads when we start a tool like trace.

   * Use sched:sched_stat_runtime to provide a thread summary, this
     produces the same output as the 'trace summary' subcommand of
     tglx's original "trace" tool.

   * Support interrupted syscalls in 'trace'

   * Add an event duration column and filter in 'trace'.

   * There are references to the man pages in some tools, so try to
     build Documentation when installing, warning the user if that is
     not possible, from Borislav Petkov.

   * Give user better message if precise is not supported, from David
     Ahern.

   * Try to find cross-built objdump path by using the session
     environment information in the perf.data file header, from Irina
     Tirdea, original patch and idea by Namhyung Kim.

   * Diplays more output on features check for make V=1, so that one can
     figure out what is happening by looking at gcc output, etc.  From
     Jiri Olsa.

   * Add on_exit implementation for systems without one, e.g.  Android,
     from Bernhard Rosenkraenzer.

   * Only process events for vcpus of interest, helps handling large
     number of events, from David Ahern.

   * Cross compilation fixes for Android, from Irina Tirdea.

   * Add documentation on compiling for Android, from Irina Tirdea.

   * perf diff improvements from Jiri Olsa.

   * Target (task/user/cpu/syswide) handling improvements, from Namhyung
     Kim.

   * Add support in 'trace' for tracing workload given by command line,
     from Namhyung Kim.

   * ... and much more."

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (194 commits)
  uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race
  perf evsel: Introduce is_group_member method
  perf powerpc: Use uapi/unistd.h to fix build error
  tools: Pass the target in descend
  tools: Honour the O= flag when tool build called from a higher Makefile
  tools: Define a Makefile function to do subdir processing
  perf ui: Always compile browser setup code
  perf ui: Add ui_progress__finish()
  perf ui gtk: Implement ui_progress functions
  perf ui: Introduce generic ui_progress helper
  perf ui tui: Move progress.c under ui/tui directory
  perf tools: Add basic event modifier sanity check
  perf tools: Omit group members from perf_evlist__disable/enable
  perf tools: Ensure single disable call per event in record comand
  perf tools: Fix 'disabled' attribute config for record command
  perf tools: Fix attributes for '{}' defined event groups
  perf tools: Use sscanf for parsing /proc/pid/maps
  perf tools: Add gtk.<command> config option for launching GTK browser
  perf tools: Fix compile error on NO_NEWT=1 build
  perf hists: Initialize all of he->stat with zeroes
  ...
2012-12-11 18:14:31 -08:00
Linus Torvalds 37ea95a959 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU update from Ingo Molnar:
 "The major features of this tree are:

     1. A first version of no-callbacks CPUs.  This version prohibits
        offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y.
        Relaxing this constraint is in progress, but not yet ready
        for prime time.  These commits were posted to LKML at
        https://lkml.org/lkml/2012/10/30/724.

     2. Changes to SRCU that allows statically initialized srcu_struct
        structures.  These commits were posted to LKML at
        https://lkml.org/lkml/2012/10/30/296.

     3. Restructuring of RCU's debugfs output.  These commits were posted
        to LKML at https://lkml.org/lkml/2012/10/30/341.

     4. Additional CPU-hotplug/RCU improvements, posted to LKML at
        https://lkml.org/lkml/2012/10/30/327.
        Note that the commit eliminating __stop_machine() was judged to
        be too-high of risk, so is deferred to 3.9.

     5. Changes to RCU's idle interface, most notably a new module
        parameter that redirects normal grace-period operations to
        their expedited equivalents.  These were posted to LKML at
        https://lkml.org/lkml/2012/10/30/739.

     6. Additional diagnostics for RCU's CPU stall warning facility,
        posted to LKML at https://lkml.org/lkml/2012/10/30/315.
        The most notable change reduces the
        default RCU CPU stall-warning time from 60 seconds to 21 seconds,
        so that it once again happens sooner than the softlockup timeout.

     7. Documentation updates, which were posted to LKML at
        https://lkml.org/lkml/2012/10/30/280.
        A couple of late-breaking changes were posted at
        https://lkml.org/lkml/2012/11/16/634 and
        https://lkml.org/lkml/2012/11/16/547.

     8. Miscellaneous fixes, which were posted to LKML at
        https://lkml.org/lkml/2012/10/30/309.

     9. Finally, a fix for an lockdep-RCU splat was posted to LKML
        at https://lkml.org/lkml/2012/11/7/486."

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
  context_tracking: New context tracking susbsystem
  sched: Mark RCU reader in sched_show_task()
  rcu: Separate accounting of callbacks from callback-free CPUs
  rcu: Add callback-free CPUs
  rcu: Add documentation for the new rcuexp debugfs trace file
  rcu: Update documentation for TREE_RCU debugfs tracing
  rcu: Reduce default RCU CPU stall warning timeout
  rcu: Fix TINY_RCU rcu_is_cpu_rrupt_from_idle check
  rcu: Clarify memory-ordering properties of grace-period primitives
  rcu: Add new rcutorture module parameters to start/end test messages
  rcu: Remove list_for_each_continue_rcu()
  rcu: Fix batch-limit size problem
  rcu: Add tracing for synchronize_sched_expedited()
  rcu: Remove old debugfs interfaces and also RCU flavor name
  rcu: split 'rcuhier' to each flavor
  rcu: split 'rcugp' to each flavor
  rcu: split 'rcuboost' to each flavor
  rcu: split 'rcubarrier' to each flavor
  rcu: Fix tracing formatting
  rcu: Remove the interface "rcudata.csv"
  ...
2012-12-11 18:10:49 -08:00
Linus Torvalds 608ff1a210 Merge branch 'akpm' (Andrew's patchbomb)
Merge misc updates from Andrew Morton:
 "About half of most of MM.  Going very early this time due to
  uncertainty over the coreautounifiednumasched things.  I'll send the
  other half of most of MM tomorrow.  The rest of MM awaits a slab merge
  from Pekka."

* emailed patches from Andrew Morton: (71 commits)
  memory_hotplug: ensure every online node has NORMAL memory
  memory_hotplug: handle empty zone when online_movable/online_kernel
  mm, memory-hotplug: dynamic configure movable memory and portion memory
  drivers/base/node.c: cleanup node_state_attr[]
  bootmem: fix wrong call parameter for free_bootmem()
  avr32, kconfig: remove HAVE_ARCH_BOOTMEM
  mm: cma: remove watermark hacks
  mm: cma: skip watermarks check for already isolated blocks in split_free_page()
  mm, oom: fix race when specifying a thread as the oom origin
  mm, oom: change type of oom_score_adj to short
  mm: cleanup register_node()
  mm, mempolicy: remove duplicate code
  mm/vmscan.c: try_to_freeze() returns boolean
  mm: introduce putback_movable_pages()
  virtio_balloon: introduce migration primitives to balloon pages
  mm: introduce compaction and migration for ballooned pages
  mm: introduce a common interface for balloon pages mobility
  mm: redefine address_space.assoc_mapping
  mm: adjust address_space_operations.migratepage() return code
  arch/sparc/kernel/sys_sparc_64.c: s/COLOUR/COLOR/
  ...
2012-12-11 18:05:37 -08:00
Michel Lespinasse cdc1734495 mm: use vm_unmapped_area() in hugetlbfs on i386 architecture
Update the i386 hugetlb_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:25 -08:00
Michel Lespinasse 7d02505965 mm: fix cache coloring on x86_64 architecture
Fix the x86-64 cache alignment code to take pgoff into account.  Use the
x86 and MIPS cache alignment code as the basis for a generic cache
alignment function.

The old x86 code will always align the mmap to aliasing boundaries,
even if the program mmaps the file with a non-zero pgoff.

If program A mmaps the file with pgoff 0, and program B mmaps the file
with pgoff 1.  The old code would align the mmaps, resulting in misaligned
pages:

  A:  0123
  B:  123

After this patch, they are aligned so the pages line up:

  A: 0123
  B:  123

Proposed by Rik van Riel.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:25 -08:00
Michel Lespinasse f99024729e mm: use vm_unmapped_area() on x86_64 architecture
Update the x86_64 arch_get_unmapped_area[_topdown] functions to make use
of vm_unmapped_area() instead of implementing a brute force search.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:25 -08:00
Andi Kleen 42d7395feb mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB
There was some desire in large applications using MAP_HUGETLB or
SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on
others.  This is useful together with NUMA policy: use 2MB interleaving
on some mappings, but 1GB on local mappings.

This patch extends the IPC/SHM syscall interfaces slightly to allow
specifying the page size.

It borrows some upper bits in the existing flag arguments and allows
encoding the log of the desired page size in addition to the *_HUGETLB
flag.  When 0 is specified the default size is used, this makes the
change fully compatible.

Extending the internal hugetlb code to handle this is straight forward.
Instead of a single mount it just keeps an array of them and selects the
right mount based on the specified page size.  When no page size is
specified it uses the mount of the default page size.

The change is not visible in /proc/mounts because internal mounts don't
appear there.  It also has very little overhead: the additional mounts
just consume a super block, but not more memory when not used.

I also exported the new flags to the user headers (they were previously
under __KERNEL__).  Right now only symbols for x86 and some other
architecture for 1GB and 2MB are defined.  The interface should already
work for all other architectures though.  Only architectures that define
multiple hugetlb sizes actually need it (that is currently x86, tile,
powerpc).  However tile and powerpc have user configurable hugetlb
sizes, so it's not easy to add defines.  A program on those
architectures would need to query sysfs and use the appropiate log2.

[akpm@linux-foundation.org: cleanups]
[rientjes@google.com: fix build]
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:25 -08:00
Gleb Natapov 58b7825bc3 KVM: emulator: fix real mode segment checks in address linearization
In real mode CS register is writable, so do not #GP on write.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-11 21:00:28 -02:00
Gleb Natapov 0b26b588d9 VMX: remove unneeded enable_unrestricted_guest check
If enable_unrestricted_guest is true vmx->rmode.vm86_active will
always be false.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-11 21:00:28 -02:00
Gleb Natapov a4d3326c2d KVM: VMX: fix DPL during entry to protected mode
On CPUs without support for unrestricted guests DPL cannot be smaller
than RPL for data segments during guest entry, but this state can occurs
if a data segment selector changes while vcpu is in real mode to a value
with lowest two bits != 00. Fix that by forcing DPL == RPL on transition
to protected mode.

This is a regression introduced by c865c43de6.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-11 21:00:27 -02:00
Linus Torvalds c6bd5bcc49 TTY/Serial merge for 3.8-rc1
Here's the big tty/serial tree set of changes for 3.8-rc1.
 
 Contained in here is a bunch more reworks of the tty port layer from Jiri and
 bugfixes from Alan, along with a number of other tty and serial driver updates
 by the various driver authors.
 
 Also, Jiri has been coerced^Wconvinced to be the co-maintainer of the TTY
 layer, which is much appreciated by me.
 
 All of these have been in the linux-next tree for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlDHhgwACgkQMUfUDdst+ynI6wCcC+YeBwncnoWHvwLAJOwAZpUL
 bysAn28o780/lOsTzp3P1Qcjvo69nldo
 =hN/g
 -----END PGP SIGNATURE-----

Merge tag 'tty-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull TTY/Serial merge from Greg Kroah-Hartman:
 "Here's the big tty/serial tree set of changes for 3.8-rc1.

  Contained in here is a bunch more reworks of the tty port layer from
  Jiri and bugfixes from Alan, along with a number of other tty and
  serial driver updates by the various driver authors.

  Also, Jiri has been coerced^Wconvinced to be the co-maintainer of the
  TTY layer, which is much appreciated by me.

  All of these have been in the linux-next tree for a while.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

Fixed up some trivial conflicts in the staging tree, due to the fwserial
driver having come in both ways (but fixed up a bit in the serial tree),
and the ioctl handling in the dgrp driver having been done slightly
differently (staging tree got that one right, and removed both
TIOCGSOFTCAR and TIOCSSOFTCAR).

* tag 'tty-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (146 commits)
  staging: sb105x: fix potential NULL pointer dereference in mp_chars_in_buffer()
  staging/fwserial: Remove superfluous free
  staging/fwserial: Use WARN_ONCE when port table is corrupted
  staging/fwserial: Destruct embedded tty_port on teardown
  staging/fwserial: Fix build breakage when !CONFIG_BUG
  staging: fwserial: Add TTY-over-Firewire serial driver
  drivers/tty/serial/serial_core.c: clean up HIGH_BITS_OFFSET usage
  staging: dgrp: dgrp_tty.c: Audit the return values of get/put_user()
  staging: dgrp: dgrp_tty.c: Remove the TIOCSSOFTCAR ioctl handler from dgrp driver
  serial: ifx6x60: Add modem power off function in the platform reboot process
  serial: mxs-auart: unmap the scatter list before we copy the data
  serial: mxs-auart: disable the Receive Timeout Interrupt when DMA is enabled
  serial: max310x: Setup missing "can_sleep" field for GPIO
  tty/serial: fix ifx6x60.c declaration warning
  serial: samsung: add devicetree properties for non-Exynos SoCs
  serial: samsung: fix potential soft lockup during uart write
  tty: vt: Remove redundant null check before kfree.
  tty/8250 Add check for pci_ioremap_bar failure
  tty/8250 Add support for Commtech's Fastcom Async-335 and Fastcom Async-PCIe cards
  tty/8250 Add XR17D15x devices to the exar_handle_irq override
  ...
2012-12-11 14:08:47 -08:00
Zhang Yanfei 0ca0d818cb x86/kexec: crash_vmclear_local_vmcss needs __rcu
This removes the sparse warning:
arch/x86/kernel/crash.c:49:32: sparse: incompatible types in comparison expression (different address spaces)

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-12-11 19:55:23 -02:00
Linus Torvalds bad73c5aa0 ACPI and power management updates for 3.8-rc1
* Introduction of device PM QoS flags.
 
 * ACPI device power management update allowing subsystems other than
   PCI to use it more easily.
 
 * ACPI device enumeration rework allowing additional kinds of devices
   to be enumerated via ACPI.  From Mika Westerberg, Adrian Hunter,
   Mathias Nyman, Andy Shevchenko, and Rafael J. Wysocki.
 
 * ACPICA update to version 20121018 from Bob Moore and Lv Zheng.
 
 * ACPI memory hotplug update from Wen Congyang and Yasuaki Ishimatsu.
 
 * Introduction of acpi_handle_<level>() messaging macros and ACPI-based CPU
   hot-remove support from Toshi Kani.
 
 * ACPI EC updates from Feng Tang.
 
 * cpufreq updates from Viresh Kumar, Fabio Baltieri and others.
 
 * cpuidle changes to quickly notice governor prediction failure from
   Youquan Song.
 
 * Support for using multiple cpuidle drivers at the same time and cpuidle
   cleanups from Daniel Lezcano.
 
 * devfreq updates from Nishanth Menon and others.
 
 * cpupower update from Thomas Renninger.
 
 * Fixes and small cleanups all over the place.
 
 --
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJQxxevAAoJEKhOf7ml8uNsHacQAK2xoQozDddPBAaTCf1OWW/G
 J4E2qMn+gy4AtFmQ0/xeZVvvylOCn9eu/kv3QJ/bFlVoUsqTgfPwQBjT6HjF1Acn
 7yVFdr9H/tn2wi9nW2Gv6Tl2Hr/kFLpo0f5Kd40Z+P8SV5sKX5ktJcVpJ/I/P4Vh
 Qw2nWtj7hQktZDERzgG4ZpMbQToNhbLhbKWB9ad3/XQSSA9JkfgvBFgrTEGHcZD5
 bwsggjNdOAWNGeDdzRsQSDn0Alld5ILVdSJ5xKimO1O70WvKc7fqA1IdYRIeKL90
 yz4bcoYKzl9iktlw8+x5o1U9mrc8TFV5p4+zV+t5Z6pzS/J3kWvnsW4zu9sCrxFv
 Wic3SKyiem7s2dxrYyj4ZXAci3GK4ouRTrCLqk7/00tEGdwAQD1ZNfsUJp6jKayz
 FvtZUgItcOyrlQ6B4nh951OY6dI3AUYJ2NuWWNr5NZkgVAvQGV8zTGOImbeVeL2+
 pMiw14zScO3ylYilVcjTKDDMj2sDZ68mw5PIcbmksvWsCLo26jDBVDtLVmtYWyd4
 ek3WnOrQZr0R3agvOLLssMKXompvpP+N4Klf4rV+GtqGsWtHryYKys2Laju9FwFj
 yYLchxYlxhGTzqq8LjF90HDL0TWpPe6cPi+B5ow9g/SXLexbMKNQGhv3Jovm2yR3
 j54tKBWy7e9AAYEDPirX
 =6OEP
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management updates from Rafael Wysocki:

 - Introduction of device PM QoS flags.

 - ACPI device power management update allowing subsystems other than
   PCI to use it more easily.

 - ACPI device enumeration rework allowing additional kinds of devices
   to be enumerated via ACPI.  From Mika Westerberg, Adrian Hunter,
   Mathias Nyman, Andy Shevchenko, and Rafael J. Wysocki.

 - ACPICA update to version 20121018 from Bob Moore and Lv Zheng.

 - ACPI memory hotplug update from Wen Congyang and Yasuaki Ishimatsu.

 - Introduction of acpi_handle_<level>() messaging macros and ACPI-based
   CPU hot-remove support from Toshi Kani.

 - ACPI EC updates from Feng Tang.

 - cpufreq updates from Viresh Kumar, Fabio Baltieri and others.

 - cpuidle changes to quickly notice governor prediction failure from
   Youquan Song.

 - Support for using multiple cpuidle drivers at the same time and
   cpuidle cleanups from Daniel Lezcano.

 - devfreq updates from Nishanth Menon and others.

 - cpupower update from Thomas Renninger.

 - Fixes and small cleanups all over the place.

* tag 'pm+acpi-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (196 commits)
  mmc: sdhci-acpi: enable runtime-pm for device HID INT33C6
  ACPI: add Haswell LPSS devices to acpi_platform_device_ids list
  ACPI: add documentation about ACPI 5 enumeration
  pnpacpi: fix incorrect TEST_ALPHA() test
  ACPI / PM: Fix header of acpi_dev_pm_detach() in acpi.h
  ACPI / video: ignore BIOS initial backlight value for HP Folio 13-2000
  ACPI : do not use Lid and Sleep button for S5 wakeup
  ACPI / PNP: Do not crash due to stale pointer use during system resume
  ACPI / video: Add "Asus UL30VT" to ACPI video detect blacklist
  ACPI: do acpisleep dmi check when CONFIG_ACPI_SLEEP is set
  spi / ACPI: add ACPI enumeration support
  gpio / ACPI: add ACPI support
  PM / devfreq: remove compiler error with module governors (2)
  cpupower: IvyBridge (0x3a and 0x3e models) support
  cpupower: Provide -c param for cpupower monitor to schedule process on all cores
  cpupower tools: Fix warning and a bug with the cpu package count
  cpupower tools: Fix malloc of cpu_info structure
  cpupower tools: Fix issues with sysfs_topology_read_file
  cpupower tools: Fix minor warnings
  cpupower tools: Update .gitignore for files created in the debug directories
  ...
2012-12-11 12:45:35 -08:00
Peter Zijlstra cbee9f88ec mm: numa: Add fault driven placement and migration
NOTE: This patch is based on "sched, numa, mm: Add fault driven
	placement and migration policy" but as it throws away all the policy
	to just leave a basic foundation I had to drop the signed-offs-by.

This patch creates a bare-bones method for setting PTEs pte_numa in the
context of the scheduler that when faulted later will be faulted onto the
node the CPU is running on.  In itself this does nothing useful but any
placement policy will fundamentally depend on receiving hints on placement
from fault context and doing something intelligent about it.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
2012-12-11 14:42:45 +00:00
Andrea Arcangeli be3a728427 mm: numa: pte_numa() and pmd_numa()
Implement pte_numa and pmd_numa.

We must atomically set the numa bit and clear the present bit to
define a pte_numa or pmd_numa.

Once a pte or pmd has been set as pte_numa or pmd_numa, the next time
a thread touches a virtual address in the corresponding virtual range,
a NUMA hinting page fault will trigger. The NUMA hinting page fault
will clear the NUMA bit and set the present bit again to resolve the
page fault.

The expectation is that a NUMA hinting page fault is used as part
of a placement policy that decides if a page should remain on the
current node or migrated to a different node.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
2012-12-11 14:42:36 +00:00
Andrea Arcangeli dbe4d2035a mm: numa: define _PAGE_NUMA
The objective of _PAGE_NUMA is to be able to trigger NUMA hinting page
faults to identify the per NUMA node working set of the thread at
runtime.

Arming the NUMA hinting page fault mechanism works similarly to
setting up a mprotect(PROT_NONE) virtual range: the present bit is
cleared at the same time that _PAGE_NUMA is set, so when the fault
triggers we can identify it as a NUMA hinting page fault.

_PAGE_NUMA on x86 shares the same bit number of _PAGE_PROTNONE (but it
could also use a different bitflag, it's up to the architecture to
decide).

It would be confusing to call the "NUMA hinting page faults" as
"do_prot_none faults". They're different events and _PAGE_NUMA doesn't
alter the semantics of mprotect(PROT_NONE) in any way.

Sharing the same bitflag with _PAGE_PROTNONE in fact complicates
things: it requires us to ensure the code paths executed by
_PAGE_PROTNONE remains mutually exclusive to the code paths executed
by _PAGE_NUMA at all times, to avoid _PAGE_NUMA and _PAGE_PROTNONE to
step into each other toes.

Because we want to be able to set this bitflag in any established pte
or pmd (while clearing the present bit at the same time) without
losing information, this bitflag must never be set when the pte and
pmd are present, so the bitflag picked for _PAGE_NUMA usage, must not
be used by the swap entry format.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
2012-12-11 14:28:35 +00:00
Rik van Riel 2c3cf556b2 x86/mm: Introduce pte_accessible()
We need pte_present to return true for _PAGE_PROTNONE pages, to indicate that
the pte is associated with a page.

However, for TLB flushing purposes, we would like to know whether the pte
points to an actually accessible page.  This allows us to skip remote TLB
flushes for pages that are not actually accessible.

Fill in this method for x86 and provide a safe (but slower) method
on other architectures.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Fixed-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-66p11te4uj23gevgh4j987ip@git.kernel.org
[ Added Linus's review fixes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-11 14:28:34 +00:00
Rik van Riel e4a1cc56e4 x86: mm: drop TLB flush from ptep_set_access_flags
Intel has an architectural guarantee that the TLB entry causing
a page fault gets invalidated automatically. This means
we should be able to drop the local TLB invalidation.

Because of the way other areas of the page fault code work,
chances are good that all x86 CPUs do this.  However, if
someone somewhere has an x86 CPU that does not invalidate
the TLB entry causing a page fault, this one-liner should
be easy to revert.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
2012-12-11 14:28:33 +00:00
Rik van Riel 0f9a921cf9 x86: mm: only do a local tlb flush in ptep_set_access_flags()
The function ptep_set_access_flags() is only ever invoked to set access
flags or add write permission on a PTE.  The write bit is only ever set
together with the dirty bit.

Because we only ever upgrade a PTE, it is safe to skip flushing entries on
remote TLBs. The worst that can happen is a spurious page fault on other
CPUs, which would flush that TLB entry.

Lazily letting another CPU incur a spurious page fault occasionally is
(much!) cheaper than aggressively flushing everybody else's TLB.

Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
2012-12-11 14:28:33 +00:00
Bjorn Helgaas 1cb73f8c47 Merge branch 'pci/mjg-pci-roms-from-efi' into next
* pci/mjg-pci-roms-from-efi:
  PCI: Use phys_addr_t for physical ROM address
2012-12-10 16:20:12 -07:00
Cong Ding 28a7938922 x86: Fix the error of using "const" in gen-insn-attr-x86.awk
The original version code causes following sparse warnings:
arch/x86/lib/inat-tables.c:1080:25: warning: duplicate const
arch/x86/lib/inat-tables.c:1095:25: warning: duplicate const
arch/x86/lib/inat-tables.c:1118:25: warning: duplicate const

for the variables inat_escape_tables, inat_group_tables, and inat_avx_tables
in the code generated by gen-insn-attr-x86.awk.

The author Masami Hiramutsu says here is to make both the value pointed by the
pointers and the pointers itself read-only, so we move the "const" to be after
the "*".

Signed-off-by: Cong Ding <dinggnu@gmail.com>
Link: http://lkml.kernel.org/r/20121209082103.GA9181@gmail.com
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-12-10 10:31:24 -08:00
Bjorn Helgaas dbd3fc3345 PCI: Use phys_addr_t for physical ROM address
Use phys_addr_t rather than "void *" for physical memory address.
This removes casts and fixes a "cast from pointer to integer of different
size" warning on ppc44x_defconfig.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-12-10 11:24:42 -07:00
Ingo Molnar cc1b39dbf9 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core
Pull ftrace updates from Steve Rostedt.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-08 15:54:35 +01:00
Ingo Molnar 7e0dd574cd Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
Pull uprobes fixes, cleanups and preparation for the ARM port from Oleg Nesterov.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-08 15:51:10 +01:00
Ingo Molnar f0b9abfb04 Merge branch 'linus' into perf/core
Conflicts:
	tools/perf/Makefile
	tools/perf/builtin-test.c
	tools/perf/perf.h
	tools/perf/tests/parse-events.c
	tools/perf/util/evsel.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-08 15:25:06 +01:00
Bjorn Helgaas 3ced69f8bb Merge branch 'pci/daniel-numachip' into next
* pci/daniel-numachip:
  x86/PCI: Add NumaChip remote PCI support
2012-12-07 14:27:33 -07:00
Daniel J Blueman f9726bfd4b x86/PCI: Add NumaChip remote PCI support
Add NumaChip-specific PCI access mechanism via MMCONFIG cycles, but
preventing access to AMD Northbridges which shouldn't respond.

Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-12-07 14:24:32 -07:00
Olaf Hering 4319eb4fc7 x86 mtrr: fix comment typo in mtrr_bp_init
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-12-07 11:19:19 +01:00
Bjorn Helgaas 72e1e868ca Merge branch 'pci/mjg-pci-roms-from-efi' into next
* pci/mjg-pci-roms-from-efi:
  x86: Use PCI setup data
  PCI: Add support for non-BAR ROMs
  PCI: Add pcibios_add_device
  EFI: Stash ROMs if they're not in the PCI BAR
2012-12-06 14:37:32 -07:00
Zhang Yanfei 8f536b7697 KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump
The vmclear function will be assigned to the callback function pointer
when loading kvm-intel module. And the bitmap indicates whether we
should do VMCLEAR operation in kdump. The bits in the bitmap are
set/unset according to different conditions.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-06 18:26:57 +02:00
Zhang Yanfei f23d1f4a11 x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary
This patch provides a way to VMCLEAR VMCSs related to guests
on all cpus before executing the VMXOFF when doing kdump. This
is used to ensure the VMCSs in the vmcore updated and
non-corrupted.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-06 18:25:36 +02:00
Nadia Yvette Chambers 6d49e352ae propagate name change to comments in kernel source
I've legally changed my name with New York State, the US Social Security
Administration, et al. This patch propagates the name change and change
in initials and login to comments in the kernel source as well.

Signed-off-by: Nadia Yvette Chambers <nyc@holomorphy.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-12-06 10:39:54 +01:00
Jussi Kivilinna 044ab52578 crypto: cast5/cast6 - move lookup tables to shared module
CAST5 and CAST6 both use same lookup tables, which can be moved shared module
'cast_common'.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2012-12-06 17:16:26 +08:00
Xiao Guangrong c219346325 KVM: MMU: optimize for set_spte
There are two cases we need to adjust page size in set_spte:
1): the one is other vcpu creates new sp in the window between mapping_level()
    and acquiring mmu-lock.
2): the another case is the new sp is created by itself (page-fault path) when
    guest uses the target gfn as its page table.

In current code, set_spte drop the spte and emulate the access for these case,
it works not good:
- for the case 1, it may destroy the mapping established by other vcpu, and
  do expensive instruction emulation.
- for the case 2, it may emulate the access even if the guest is accessing
  the page which not used as page table. There is a example, 0~2M is used as
  huge page in guest, in this huge page, only page 3 used as page table, then
  guest read/writes on other pages can cause instruction emulation.

Both of these cases can be fixed by allowing guest to retry the access, it
will refault, then we can establish the mapping by using small page

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-06 09:11:25 +02:00
Matthew Garrett f9a37be0f0 x86: Use PCI setup data
EFI can provide PCI ROMs out of band via boot services, which may not be
available after boot. Add support for using the data handed off to us by
the boot stub or bootloader.

[bhelgaas: added Seth's boot_params section mismatch fix]
[bhelgaas: drop "boot_params.hdr.version < 0x0209" test]
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
2012-12-05 14:38:26 -07:00
Matthew Garrett dd5fc854de EFI: Stash ROMs if they're not in the PCI BAR
EFI provides support for providing PCI ROMs via means other than the ROM
BAR. This support vanishes after we've exited boot services, so add support
for stashing copies of the ROMs in setup_data if they're not otherwise
available.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
2012-12-05 14:33:26 -07:00
Julian Stecklina 66f7b72e11 KVM: x86: Make register state after reset conform to specification
VMX behaves now as SVM wrt to FPU initialization. Code has been moved to
generic code path. General-purpose registers are now cleared on reset and
INIT.  SVM code properly initializes EDX.

Signed-off-by: Julian Stecklina <jsteckli@os.inf.tu-dresden.de>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-05 18:00:07 +02:00
Zhang Xiantao 2b3c5cbc0d kvm: don't use bit24 for detecting address-specific invalidation capability
Bit24 in VMX_EPT_VPID_CAP_MASI is  not used for address-specific invalidation capability
reporting, so remove it from KVM to avoid conflicts in future.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-05 16:35:48 +02:00
Zhang Xiantao 0307b7b8c2 kvm: remove unnecessary bit checking for ept violation
Bit 6 in EPT vmexit's exit qualification is not defined in SDM, so remove it.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-05 16:35:21 +02:00
Ingo Molnar 630e1e0bcd Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Conflicts:
	arch/x86/kernel/ptrace.c

Pull the latest RCU tree from Paul E. McKenney:

"       The major features of this series are:

  1.	A first version of no-callbacks CPUs.  This version prohibits
  	offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y.
  	Relaxing this constraint is in progress, but not yet ready
  	for prime time.  These commits were posted to LKML at
  	https://lkml.org/lkml/2012/10/30/724, and are at branch rcu/nocb.

  2.	Changes to SRCU that allows statically initialized srcu_struct
  	structures.  These commits were posted to LKML at
  	https://lkml.org/lkml/2012/10/30/296, and are at branch rcu/srcu.

  3.	Restructuring of RCU's debugfs output.  These commits were posted
  	to LKML at https://lkml.org/lkml/2012/10/30/341, and are at
  	branch rcu/tracing.

  4.	Additional CPU-hotplug/RCU improvements, posted to LKML at
  	https://lkml.org/lkml/2012/10/30/327, and are at branch rcu/hotplug.
  	Note that the commit eliminating __stop_machine() was judged to
  	be too-high of risk, so is deferred to 3.9.

  5.	Changes to RCU's idle interface, most notably a new module
  	parameter that redirects normal grace-period operations to
  	their expedited equivalents.  These were posted to LKML at
  	https://lkml.org/lkml/2012/10/30/739, and are at branch rcu/idle.

  6.	Additional diagnostics for RCU's CPU stall warning facility,
  	posted to LKML at https://lkml.org/lkml/2012/10/30/315, and
  	are at branch rcu/stall.  The most notable change reduces the
  	default RCU CPU stall-warning time from 60 seconds to 21 seconds,
  	so that it once again happens sooner than the softlockup timeout.

  7.	Documentation updates, which were posted to LKML at
  	https://lkml.org/lkml/2012/10/30/280, and are at branch rcu/doc.
  	A couple of late-breaking changes were posted at
  	https://lkml.org/lkml/2012/11/16/634 and
  	https://lkml.org/lkml/2012/11/16/547.

  8.	Miscellaneous fixes, which were posted to LKML at
  	https://lkml.org/lkml/2012/10/30/309, along with a late-breaking
  	change posted at Fri, 16 Nov 2012 11:26:25 -0800 with message-ID
  	<20121116192625.GA447@linux.vnet.ibm.com>, but which lkml.org
  	seems to have missed.  These are at branch rcu/fixes.

  9.	Finally, a fix for an lockdep-RCU splat was posted to LKML
  	at https://lkml.org/lkml/2012/11/7/486.  This is at rcu/next. "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-12-03 06:27:05 +01:00
Jan Kiszka 45e3cc7d9f KVM: x86: Fix uninitialized return code
This is a regression caused by 18595411a7.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2012-12-02 17:37:04 +02:00
Linus Torvalds b3c3a9cf2a Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU fix from Ingo Molnar:
 "Fix leaking RCU extended quiescent state, which might trigger warnings
  and mess up the extended quiescent state tracking logic into thinking
  that we are in "RCU user mode" while we aren't."

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rcu: Fix unrecovered RCU user mode in syscall_trace_leave()
2012-12-01 13:08:36 -08:00
Linus Torvalds 455e987c0c Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "This is mostly about unbreaking architectures that took the UAPI
  changes in the v3.7 cycle, plus misc fixes."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf kvm: Fix building perf kvm on non x86 arches
  perf kvm: Rename perf_kvm to perf_kvm_stat
  perf: Make perf build for x86 with UAPI disintegration applied
  perf powerpc: Use uapi/unistd.h to fix build error
  tools: Pass the target in descend
  tools: Honour the O= flag when tool build called from a higher Makefile
  tools: Define a Makefile function to do subdir processing
  x86: Export asm/{svm.h,vmx.h,perf_regs.h}
  perf tools: Fix strbuf_addf() when the buffer needs to grow
  perf header: Fix numa topology printing
  perf, powerpc: Fix hw breakpoints returning -ENOSPC
2012-12-01 13:07:48 -08:00
Konrad Rzeszutek Wilk 6a7ed40511 Merge branch 'arm-privcmd-for-3.8' of git://xenbits.xen.org/people/ianc/linux into stable/for-linus-3.8
* 'arm-privcmd-for-3.8' of git://xenbits.xen.org/people/ianc/linux:
  xen: arm: implement remap interfaces needed for privcmd mappings.
  xen: correctly use xen_pfn_t in remap_domain_mfn_range.
  xen: arm: enable balloon driver
  xen: balloon: allow PVMMU interfaces to be compiled out
  xen: privcmd: support autotranslated physmap guests.
  xen: add pages parameter to xen_remap_domain_mfn_range

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-11-30 17:07:59 -05:00
Olaf Hering a7be94ac8d xen/PVonHVM: fix compile warning in init_hvm_pv_info
After merging the xen-two tree, today's linux-next build (x86_64
allmodconfig) produced this warning:

arch/x86/xen/enlighten.c: In function 'init_hvm_pv_info':
arch/x86/xen/enlighten.c:1617:16: warning: unused variable 'ebx' [-Wunused-variable]
arch/x86/xen/enlighten.c:1617:11: warning: unused variable 'eax' [-Wunused-variable]

Introduced by commit 9d02b43dee ("xen PVonHVM: use E820_Reserved area
for shared_info").

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-11-30 17:02:55 -05:00
Vincent Palatin 644c154186 x86, fpu: Avoid FPU lazy restore after suspend
When a cpu enters S3 state, the FPU state is lost.
After resuming for S3, if we try to lazy restore the FPU for a process running
on the same CPU, this will result in a corrupted FPU context.

Ensure that "fpu_owner_task" is properly invalided when (re-)initializing a CPU,
so nobody will try to lazy restore a state which doesn't exist in the hardware.

Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off,
by doing thousands of suspend/resume cycles with 4 processes doing FPU
operations running. Without the patch, a process is killed after a
few hundreds cycles by a SIGFPE.

Cc: Duncan Laurie <dlaurie@chromium.org>
Cc: Olof Johansson <olofj@chromium.org>
Cc: <stable@kernel.org> v3.4+ # for 3.4 need to replace this_cpu_write by percpu_write
Signed-off-by: Vincent Palatin <vpalatin@chromium.org>
Link: http://lkml.kernel.org/r/1354306532-1014-1-git-send-email-vpalatin@chromium.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-30 13:48:05 -08:00
Will Auld ba904635d4 KVM: x86: Emulate IA32_TSC_ADJUST MSR
CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported

Basic design is to emulate the MSR by allowing reads and writes to a guest
vcpu specific location to store the value of the emulated MSR while adding
the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will
be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This
is of course as long as the "use TSC counter offsetting" VM-execution control
is enabled as well as the IA32_TSC_ADJUST control.

However, because hardware will only return the TSC + IA32_TSC_ADJUST +
vmsc tsc_offset for a guest process when it does and rdtsc (with the correct
settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one
of these three locations. The argument against storing it in the actual MSR
is performance. This is likely to be seldom used while the save/restore is
required on every transition. IA32_TSC_ADJUST was created as a way to solve
some issues with writing TSC itself so that is not an option either.

The remaining option, defined above as our solution has the problem of
returning incorrect vmcs tsc_offset values (unless we intercept and fix, not
done here) as mentioned above. However, more problematic is that storing the
data in vmcs tsc_offset will have a different semantic effect on the system
than does using the actual MSR. This is illustrated in the following example:

The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest
process performs a rdtsc. In this case the guest process will get
TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics
as seen by the guest do not and hence this will not cause a problem.

Signed-off-by: Will Auld <will.auld@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-30 18:29:30 -02:00
Will Auld 8fe8ab46be KVM: x86: Add code to track call origin for msr assignment
In order to track who initiated the call (host or guest) to modify an msr
value I have changed function call parameters along the call path. The
specific change is to add a struct pointer parameter that points to (index,
data, caller) information rather than having this information passed as
individual parameters.

The initial use for this capability is for updating the IA32_TSC_ADJUST msr
while setting the tsc value. It is anticipated that this capability is
useful for other tasks.

Signed-off-by: Will Auld <will.auld@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-30 18:26:12 -02:00
Frederic Weisbecker 91d1aa43d3 context_tracking: New context tracking susbsystem
Create a new subsystem that probes on kernel boundaries
to keep track of the transitions between level contexts
with two basic initial contexts: user or kernel.

This is an abstraction of some RCU code that use such tracking
to implement its userspace extended quiescent state.

We need to pull this up from RCU into this new level of indirection
because this tracking is also going to be used to implement an "on
demand" generic virtual cputime accounting. A necessary step to
shutdown the tick while still accounting the cputime.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
[ paulmck: fix whitespace error and email address. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2012-11-30 11:40:07 -08:00
Xiao Guangrong 5a560f8b5e KVM: VMX: fix memory order between loading vmcs and clearing vmcs
vmcs->cpu indicates whether it exists on the target cpu, -1 means the vmcs
does not exist on any vcpu

If vcpu load vmcs with vmcs.cpu = -1, it can be directly added to cpu's percpu
list. The list can be corrupted if the cpu prefetch the vmcs's list before
reading vmcs->cpu. Meanwhile, we should remove vmcs from the list before
making vmcs->vcpu == -1 be visible

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-29 21:14:46 -02:00
H. Peter Anvin 11af32b69e x86, 386 removal: Document Nx586 as a 386 and thus unsupported
Per Alan Cox, Nx586 did not support WP in supervisor mode, making it a
386 by Linux kernel standards.  As such, it is too unsupported now.

Reported-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Link: http://lkml.kernel.org/r/20121128205203.05868eab@pyramind.ukuu.org.uk
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-29 13:28:39 -08:00
H. Peter Anvin 45c39fb0cc x86, cleanups: Simplify sync_core() in the case of no CPUID
Simplify the implementation of sync_core() for the case where we may
not have the CPUID instruction available.

[ v2: stylistic cleanup of the #else clause per suggestion by Borislav
  Petkov. ]

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1354132230-21854-9-git-send-email-hpa@linux.intel.com
Cc: Borislav Petkov <bp@alien8.de>
2012-11-29 13:25:39 -08:00
H. Peter Anvin e3228cf454 x86, 386 removal: Remove CONFIG_X86_POPAD_OK
The check_popad() routine tested for a 386-specific bug, and never
actually did anything useful with it anyway other than print a
message.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1354132230-21854-8-git-send-email-hpa@linux.intel.com
2012-11-29 13:23:03 -08:00
H. Peter Anvin a5c2a893db x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK
All 486+ CPUs support WP in supervisor mode, so remove the fallback
386 support code.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1354132230-21854-7-git-send-email-hpa@linux.intel.com
2012-11-29 13:23:03 -08:00
H. Peter Anvin 094ab1db7c x86, 386 removal: Remove CONFIG_INVLPG
All 486+ CPUs support INVLPG, so remove the fallback 386 support
code.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1354132230-21854-6-git-send-email-hpa@linux.intel.com
2012-11-29 13:23:02 -08:00
H. Peter Anvin e5bb8ad862 x86, 386 removal: Remove CONFIG_BSWAP
All 486+ CPUs support BSWAP, so remove the fallback 386 support
code.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1354132230-21854-5-git-send-email-hpa@linux.intel.com
2012-11-29 13:23:02 -08:00
H. Peter Anvin 7ac468b130 x86, 386 removal: Remove CONFIG_XADD
All 486+ CPUs support XADD, so remove the fallback 386 support
code.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1354132230-21854-4-git-send-email-hpa@linux.intel.com
2012-11-29 13:23:02 -08:00
H. Peter Anvin d55c5a93db x86, 386 removal: Remove CONFIG_CMPXCHG
All 486+ CPUs support CMPXCHG, so remove the fallback 386 support
code.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1354132230-21854-3-git-send-email-hpa@linux.intel.com
2012-11-29 13:23:01 -08:00
H. Peter Anvin eb068e7810 x86, 386 removal: Remove CONFIG_M386 from Kconfig
Remove the CONFIG_M386 symbol from Kconfig so that it cannot be
selected.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1354132230-21854-2-git-send-email-hpa@linux.intel.com
2012-11-29 13:23:01 -08:00
Rafael J. Wysocki d4c091f13d Merge branch 'acpi-general'
* acpi-general: (38 commits)
  ACPI / thermal: _TMP and _CRT/_HOT/_PSV/_ACx dependency fix
  ACPI: drop unnecessary local variable from acpi_system_write_wakeup_device()
  ACPI: Fix logging when no pci_irq is allocated
  ACPI: Update Dock hotplug error messages
  ACPI: Update Container hotplug error messages
  ACPI: Update Memory hotplug error messages
  ACPI: Update CPU hotplug error messages
  ACPI: Add acpi_handle_<level>() interfaces
  ACPI: remove use of __devexit
  ACPI / PM: Add Sony Vaio VPCEB1S1E to nonvs blacklist.
  ACPI / battery: Correct battery capacity values on Thinkpads
  Revert "ACPI / x86: Add quirk for "CheckPoint P-20-00" to not use bridge _CRS_ info"
  ACPI: create _SUN sysfs file
  ACPI / memhotplug: bind the memory device when the driver is being loaded
  ACPI / memhotplug: don't allow to eject the memory device if it is being used
  ACPI / memhotplug: free memory device if acpi_memory_enable_device() failed
  ACPI / memhotplug: fix memory leak when memory device is unbound from acpi_memhotplug
  ACPI / memhotplug: deal with eject request in hotplug queue
  ACPI / memory-hotplug: add memory offline code to acpi_memory_device_remove()
  ACPI / memory-hotplug: call acpi_bus_trim() to remove memory device
  ...

Conflicts:
	include/linux/acpi.h (two additions at the end of the same file)
2012-11-29 21:43:06 +01:00
Rafael J. Wysocki bcacbdbdc8 Merge branch 'acpi-enumeration'
* acpi-enumeration:
  ACPI: remove unnecessary INIT_LIST_HEAD
  ACPI / platform: include missed header into acpi_platform.c
  platform / ACPI: Attach/detach ACPI PM during probe/remove/shutdown
  mmc: sdhci-acpi: add SDHCI ACPI driver
  ACPI: add SDHCI to ACPI platform devices
  ACPI / PNP: skip ACPI device nodes associated with physical nodes already
  i2c / ACPI: add ACPI enumeration support
  ACPI / platform: Initialize ACPI handles of platform devices in advance
  ACPI / driver core: Introduce struct acpi_dev_node and related macros
  ACPI: Allow ACPI handles of devices to be initialized in advance
  ACPI / resources: Use AE_CTRL_TERMINATE to terminate resources walks
  ACPI: Centralized processing of ACPI device resources
  ACPI / platform: Use common ACPI device resource parsing routines
  ACPI: Move device resources interpretation code from PNP to ACPI core
  ACPI / platform: use ACPI device name instead of _HID._UID
  ACPI: Add support for platform bus type
  ACPI / ia64: Export acpi_[un]register_gsi()
  ACPI / x86: Export acpi_[un]register_gsi()
  ACPI: Provide generic functions for matching ACPI device nodes
  driver core / ACPI: Move ACPI support to core device and driver types
2012-11-29 21:41:25 +01:00
Ian Campbell f832da068b xen: arm: implement remap interfaces needed for privcmd mappings.
We use XENMEM_add_to_physmap_range which is the preferred interface
for foreign mappings.

Acked-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-11-29 14:00:19 +00:00
Ian Campbell 7892f6928d xen: correctly use xen_pfn_t in remap_domain_mfn_range.
For Xen on ARM a PFN is 64 bits so we need to use the appropriate
type here.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: include the necessary header,
     Reported-by: Fengguang Wu <fengguang.wu@intel.com> ]
2012-11-29 12:59:19 +00:00
Ian Campbell c2374bf57e xen: balloon: allow PVMMU interfaces to be compiled out
The ARM platform has no concept of PVMMU and therefor no
HYPERVISOR_update_va_mapping et al. Allow this code to be compiled out
when not required.

In some similar situations (e.g. P2M) we have defined dummy functions
to avoid this, however I think we can/should draw the line at dummying
out actual hypercalls.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-11-29 12:59:13 +00:00
Ian Campbell 9a032e393a xen: add pages parameter to xen_remap_domain_mfn_range
Also introduce xen_unmap_domain_mfn_range. These are the parts of
Mukesh's "xen/pvh: Implement MMU changes for PVH" which are also
needed as a baseline for ARM privcmd support.

The original patch was:

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

This derivative is also:

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
2012-11-29 12:57:36 +00:00
Al Viro 4f4202fe5a unify default ptrace_signal_deliver
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29 00:01:23 -05:00
Al Viro 18c26c27ae death to idle_regs()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 23:43:42 -05:00
Al Viro afa86fc426 flagday: don't pass regs to copy_thread()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 23:43:42 -05:00
Al Viro 24465a40ba take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h
now it can be done...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 23:43:27 -05:00
Al Viro 1d4b4b2994 x86, um: switch to generic fork/vfork/clone
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 22:13:44 -05:00
Al Viro 71613c3b87 get rid of pt_regs argument of ->load_binary()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 21:53:38 -05:00
Al Viro 6b94631f9e consolidate sys_execve() prototype
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 21:53:35 -05:00
Xiao Guangrong e6c7d32172 KVM: VMX: fix invalid cpu passed to smp_call_function_single
In loaded_vmcs_clear, loaded_vmcs->cpu is the fist parameter passed to
smp_call_function_single, if the target cpu is downing (doing cpu hot remove),
loaded_vmcs->cpu can become -1 then -1 is passed to smp_call_function_single

It can be triggered when vcpu is being destroyed, loaded_vmcs_clear is called
in the preemptionable context

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-28 22:04:58 -02:00
Gleb Natapov 859f8450d8 KVM: use is_idle_task() instead of idle_cpu() to decide when to halt in async_pf
As Frederic pointed idle_cpu() may return false even if async fault
happened in the idle task if wake up is pending. In this case the code
will try to put idle task to sleep. Fix this by using is_idle_task() to
check for idle task.

Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-28 21:30:13 -02:00
Konrad Rzeszutek Wilk 394b40f62d xen/acpi: Move the xen_running_on_version_or_later function.
As on ia64 builds we get:
include/xen/interface/version.h: In function 'xen_running_on_version_or_later':
include/xen/interface/version.h:76: error: implicit declaration of function 'HYPERVISOR_xen_version'

We can later on make this function exportable if there are
modules using part of it. For right now the only two users are
built-in.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-11-28 14:39:31 -05:00
Marcelo Tosatti d98d07ca7e KVM: x86: update pvclock area conditionally, on cpu migration
As requested by Glauber, do not update kvmclock area on vcpu->pcpu
migration, in case the host has stable TSC.

This is to reduce cacheline bouncing.

Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:15 -02:00
Marcelo Tosatti b48aa97e38 KVM: x86: require matched TSC offsets for master clock
With master clock, a pvclock clock read calculates:

ret = system_timestamp + [ (rdtsc + tsc_offset) - tsc_timestamp ]

Where 'rdtsc' is the host TSC.

system_timestamp and tsc_timestamp are unique, one tuple
per VM: the "master clock".

Given a host with synchronized TSCs, its obvious that
guest TSC must be matched for the above to guarantee monotonicity.

Allow master clock usage only if guest TSCs are synchronized.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:15 -02:00
Marcelo Tosatti 42897d866b KVM: x86: add kvm_arch_vcpu_postcreate callback, move TSC initialization
TSC initialization will soon make use of online_vcpus.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:14 -02:00
Marcelo Tosatti d828199e84 KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag
KVM added a global variable to guarantee monotonicity in the guest.
One of the reasons for that is that the time between

	1. ktime_get_ts(&timespec);
	2. rdtscll(tsc);

Is variable. That is, given a host with stable TSC, suppose that
two VCPUs read the same time via ktime_get_ts() above.

The time required to execute 2. is not the same on those two instances
executing in different VCPUS (cache misses, interrupts...).

If the TSC value that is used by the host to interpolate when
calculating the monotonic time is the same value used to calculate
the tsc_timestamp value stored in the pvclock data structure, and
a single <system_timestamp, tsc_timestamp> tuple is visible to all
vcpus simultaneously, this problem disappears. See comment on top
of pvclock_update_vm_gtod_copy for details.

Monotonicity is then guaranteed by synchronicity of the host TSCs
and guest TSCs.

Set TSC stable pvclock flag in that case, allowing the guest to read
clock from userspace.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:13 -02:00
Marcelo Tosatti 16e8d74d2d KVM: x86: notifier for clocksource changes
Register a notifier for clocksource change event. In case
the host switches to clock other than TSC, disable master
clock usage.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:12 -02:00
Marcelo Tosatti 886b470cb1 KVM: x86: pass host_tsc to read_l1_tsc
Allow the caller to pass host tsc value to kvm_x86_ops->read_l1_tsc().

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:11 -02:00
Marcelo Tosatti 51c19b4f59 x86: vdso: pvclock gettime support
Improve performance of time system calls when using Linux pvclock,
by reading time info from fixmap visible copy of pvclock data.

Originally from Jeremy Fitzhardinge.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:11 -02:00
Marcelo Tosatti 3dc4f7cfb7 x86: kvm guest: pvclock vsyscall support
Hook into generic pvclock vsyscall code, with the aim to
allow userspace to have visibility into pvclock data.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:10 -02:00
Marcelo Tosatti 71056ae22d x86: pvclock: generic pvclock vsyscall initialization
Originally from Jeremy Fitzhardinge.

Introduce generic, non hypervisor specific, pvclock initialization
routines.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:09 -02:00
Marcelo Tosatti 189e11731a x86: pvclock: add note about rdtsc barriers
As noted by Gleb, not advertising SSE2 support implies
no RDTSC barriers.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:08 -02:00
Marcelo Tosatti 2697902be8 x86: pvclock: introduce helper to read flags
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:08 -02:00
Marcelo Tosatti dce2db0a35 x86: pvclock: create helper for pvclock data retrieval
Originally from Jeremy Fitzhardinge.

So code can be reused.

Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:07 -02:00
Marcelo Tosatti 42b5637d69 x86: pvclock: remove pvclock_shadow_time
Originally from Jeremy Fitzhardinge.

We can copy the information directly from "struct pvclock_vcpu_time_info",
remove pvclock_shadow_time.

Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:06 -02:00
Marcelo Tosatti b01578de45 x86: pvclock: make sure rdtsc doesnt speculate out of region
Originally from Jeremy Fitzhardinge.

pvclock_get_time_values, which contains the memory barriers
will be removed by next patch.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:06 -02:00
Marcelo Tosatti 7069ed6763 x86: kvmclock: allocate pvclock shared memory area
We want to expose the pvclock shared memory areas, which
the hypervisor periodically updates, to userspace.

For a linear mapping from userspace, it is necessary that
entire page sized regions are used for array of pvclock
structures.

There is no such guarantee with per cpu areas, therefore move
to memblock_alloc based allocation.

Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:05 -02:00
Marcelo Tosatti 78c0337a38 KVM: x86: retain pvclock guest stopped bit in guest memory
Otherwise its possible for an unrelated KVM_REQ_UPDATE_CLOCK (such as due to CPU
migration) to clear the bit.

Noticed by Paolo Bonzini.

Reviewed-by: Gleb Natapov <gleb@redhat.com>
Reviewed-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-27 23:29:05 -02:00
H. Peter Anvin 6662c34fa9 x86-32: Unbreak booting on some 486 clones
There appear to have been some 486 clones, including the "enhanced"
version of Am486, which have CPUID but not CR4.  These 486 clones had
only the FPU flag, if any, unlike the Intel 486s with CPUID, which
also had VME and therefore needed CR4.

Therefore, look at the basic CPUID flags and require at least one bit
other than bit 0 before we modify CR4.

Thanks to Christian Ludloff of sandpile.org for confirming this as a
problem.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-27 09:26:33 -08:00
H. Peter Anvin cb7cb2864e x86, kvm: Remove incorrect redundant assembly constraint
In __emulate_1op_rax_rdx, we use "+a" and "+d" which are input/output
constraints, and *then* use "a" and "d" as input constraints.  This is
incorrect, but happens to work on some versions of gcc.

However, it breaks gcc with -O0 and icc, and may break on future
versions of gcc.

Reported-and-tested-by: Melanie Blower <melanie.blower@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/B3584E72CFEBED439A3ECA9BCE67A4EF1B17AF90@FMSMSX107.amr.corp.intel.com
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-11-26 15:52:48 -08:00
Suresh Siddha 29c574c0ab x86, apic: Cleanup cfg->domain setup for legacy interrupts
Issues that need to be handled:
* Handle PIC interrupts on any CPU irrespective of the apic mode
* In the apic lowest priority logical flat delivery mode, be prepared to
  handle the interrupt on any CPU irrespective of what the IO-APIC RTE says.
* Because of above, when the IO-APIC starts handling the legacy PIC interrupt,
  use the same vector that is being used by the PIC while programming the
  corresponding IO-APIC RTE.

Start with all the cpu's in the legacy PIC interrupts cfg->domain.

By the time IO-APIC starts taking over the PIC interrupts, apic driver
model is finalized. So depend on the assign_irq_vector() to update the
cfg->domain and retain the same vector that was used by PIC before.

For the logical apic flat mode, cfg->domain is updated (during the first
call to assign_irq_vector()) to contain all the possible online cpu's (0xff).
Vector used for the legacy PIC interrupt doesn't change when the IO-APIC
starts handling the interrupt. Any interrupt migration after that
doesn't change the cfg->domain or the vector used.

For other apic modes like physical mode, cfg->domain is updated
(during the first call to assign_irq_vector()) to the boot cpu (cpu-0),
with the same vector that is being used by the PIC. When that interrupt is
migrated to a different cpu, cfg->domin and the vector assigned will change
accordingly.

Tested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1353970176.21070.51.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-26 15:43:25 -08:00
Liu, Jinsong e3aa4e61b5 xen/acpi: revert pad config check in xen_check_mwait
With Xen acpi pad logic added into kernel, we can now revert xen mwait related
patch df88b2d96e ("xen/enlighten: Disable
MWAIT_LEAF so that acpi-pad won't be loaded. "). The reason is, when running under
newer Xen platform, Xen pad driver would be early loaded, so native pad driver
would fail to be loaded, and hence no mwait/monitor #UD risk again.

Another point is, only Xen4.2 or later support Xen acpi pad, so we won't expose
mwait cpuid capability when running under older Xen platform.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-11-26 15:08:29 -05:00
David S. Miller 24bc518a68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/wireless/iwlwifi/pcie/tx.c

Minor iwlwifi conflict in TX queue disabling between 'net', which
removed a bogus warning, and 'net-next' which added some status
register poking code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-25 12:49:17 -05:00
Linus Torvalds 2654ad44b5 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 arch fixes from Peter Anvin:
 "Here is a collection of fixes for 3.7-rc7.  This is a superset of
  tglx' earlier pull request."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86-64: Fix ordering of CFI directives and recent ASM_CLAC additions
  x86, microcode, AMD: Add support for family 16h processors
  x86-32: Export kernel_stack_pointer() for modules
  x86-32: Fix invalid stack address while in softirq
  x86, efi: Fix processor-specific memcpy() build error
  x86: remove dummy long from EFI stub
  x86, mm: Correct vmflag test for checking VM_HUGETLB
  x86, amd: Disable way access filter on Piledriver CPUs
  x86/mce: Do not change worker's running cpu in cmci_rediscover().
  x86/ce4100: Fix PCI configuration register access for devices without interrupts
  x86/ce4100: Fix reboot by forcing the reboot method to be KBD
  x86/ce4100: Fix pm_poweroff
  MAINTAINERS: Update email address for Robert Richter
  x86, microcode_amd: Change email addresses, MAINTAINERS entry
  MAINTAINERS: Change Boris' email address
  EDAC: Change Boris' email address
  x86, AMD: Change Boris' email address
2012-11-23 20:03:14 -10:00
Len Brown 3fc808aaa0 x86 power: define RAPL MSRs
The Run Time Average Power Limiting interface
is currently model specific, present on Sandy Bridge
and Ivy Bridge processors.

These #defines correspond to documentation in the latest
"Intel® 64 and IA-32 Architectures Software Developer Manual",
plus some typos in that document corrected.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
2012-11-23 21:40:11 -05:00
Len Brown 9c63a650bb tools/power/x86/turbostat: share kernel MSR #defines
Now that turbostat is built in the kernel tree,
it can share MSR #defines with the kernel.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
2012-11-23 21:40:04 -05:00
Jan Beulich ee4eb87be2 x86-64: Fix ordering of CFI directives and recent ASM_CLAC additions
While these got added in the right place everywhere else, entry_64.S
is the odd one where they ended up before the initial CFI directive(s).
In order to cover the full code ranges, the CFI directive must be
first, though.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/5093BA1F02000078000A600E@nat28.tlf.novell.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-20 22:23:57 -08:00
Boris Ostrovsky 36c46ca4f3 x86, microcode, AMD: Add support for family 16h processors
Add valid patch size for family 16h processors.

[ hpa: promoting to urgent/stable since it is hw enabling and trivial ]

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Acked-by: Andreas Herrmann <herrmann.der.user@googlemail.com>
Link: http://lkml.kernel.org/r/1353004910-2204-1-git-send-email-boris.ostrovsky@amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
2012-11-20 22:23:28 -08:00
H. Peter Anvin cb57a2b4cf x86-32: Export kernel_stack_pointer() for modules
Modules, in particular oprofile (and possibly other similar tools)
need kernel_stack_pointer(), so export it using EXPORT_SYMBOL_GPL().

Cc: Yang Wei <wei.yang@windriver.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Jun Zhang <jun.zhang@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20120912135059.GZ8285@erda.amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-20 22:23:23 -08:00
Robert Richter 1022623842 x86-32: Fix invalid stack address while in softirq
In 32 bit the stack address provided by kernel_stack_pointer() may
point to an invalid range causing NULL pointer access or page faults
while in NMI (see trace below). This happens if called in softirq
context and if the stack is empty. The address at &regs->sp is then
out of range.

Fixing this by checking if regs and &regs->sp are in the same stack
context. Otherwise return the previous stack pointer stored in struct
thread_info. If that address is invalid too, return address of regs.

 BUG: unable to handle kernel NULL pointer dereference at 0000000a
 IP: [<c1004237>] print_context_stack+0x6e/0x8d
 *pde = 00000000
 Oops: 0000 [#1] SMP
 Modules linked in:
 Pid: 4434, comm: perl Not tainted 3.6.0-rc3-oprofile-i386-standard-g4411a05 #4 Hewlett-Packard HP xw9400 Workstation/0A1Ch
 EIP: 0060:[<c1004237>] EFLAGS: 00010093 CPU: 0
 EIP is at print_context_stack+0x6e/0x8d
 EAX: ffffe000 EBX: 0000000a ECX: f4435f94 EDX: 0000000a
 ESI: f4435f94 EDI: f4435f94 EBP: f5409ec0 ESP: f5409ea0
  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
 CR0: 8005003b CR2: 0000000a CR3: 34ac9000 CR4: 000007d0
 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
 DR6: ffff0ff0 DR7: 00000400
 Process perl (pid: 4434, ti=f5408000 task=f5637850 task.ti=f4434000)
 Stack:
  000003e8 ffffe000 00001ffc f4e39b00 00000000 0000000a f4435f94 c155198c
  f5409ef0 c1003723 c155198c f5409f04 00000000 f5409edc 00000000 00000000
  f5409ee8 f4435f94 f5409fc4 00000001 f5409f1c c12dce1c 00000000 c155198c
 Call Trace:
  [<c1003723>] dump_trace+0x7b/0xa1
  [<c12dce1c>] x86_backtrace+0x40/0x88
  [<c12db712>] ? oprofile_add_sample+0x56/0x84
  [<c12db731>] oprofile_add_sample+0x75/0x84
  [<c12ddb5b>] op_amd_check_ctrs+0x46/0x260
  [<c12dd40d>] profile_exceptions_notify+0x23/0x4c
  [<c1395034>] nmi_handle+0x31/0x4a
  [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
  [<c13950ed>] do_nmi+0xa0/0x2ff
  [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
  [<c13949e5>] nmi_stack_correct+0x28/0x2d
  [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
  [<c1003603>] ? do_softirq+0x4b/0x7f
  <IRQ>
  [<c102a06f>] irq_exit+0x35/0x5b
  [<c1018f56>] smp_apic_timer_interrupt+0x6c/0x7a
  [<c1394746>] apic_timer_interrupt+0x2a/0x30
 Code: 89 fe eb 08 31 c9 8b 45 0c ff 55 ec 83 c3 04 83 7d 10 00 74 0c 3b 5d 10 73 26 3b 5d e4 73 0c eb 1f 3b 5d f0 76 1a 3b 5d e8 73 15 <8b> 13 89 d0 89 55 e0 e8 ad 42 03 00 85 c0 8b 55 e0 75 a6 eb cc
 EIP: [<c1004237>] print_context_stack+0x6e/0x8d SS:ESP 0068:f5409ea0
 CR2: 000000000000000a
 ---[ end trace 62afee3481b00012 ]---
 Kernel panic - not syncing: Fatal exception in interrupt

V2:
* add comments to kernel_stack_pointer()
* always return a valid stack address by falling back to the address
  of regs

Reported-by: Yang Wei <wei.yang@windriver.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Link: http://lkml.kernel.org/r/20120912135059.GZ8285@erda.amd.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Jun Zhang <jun.zhang@intel.com>
2012-11-20 22:23:20 -08:00
H. Peter Anvin c1ddb48204 Merge commit 'efi-for-3.7-v2' into x86/urgent 2012-11-20 16:49:15 -08:00
Matt Fleming 0f905a43ce x86, efi: Fix processor-specific memcpy() build error
Building for Athlon/Duron/K7 results in the following build error,

arch/x86/boot/compressed/eboot.o: In function `__constant_memcpy3d':
eboot.c:(.text+0x385): undefined reference to `_mmx_memcpy'
arch/x86/boot/compressed/eboot.o: In function `efi_main':
eboot.c:(.text+0x1a22): undefined reference to `_mmx_memcpy'

because the boot stub code doesn't link with the kernel proper, and
therefore doesn't have access to the 3DNow version of memcpy. So,
follow the example of misc.c and #undef memcpy so that we use the
version provided by misc.c.

See https://bugzilla.kernel.org/show_bug.cgi?id=50391

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Ryan Underwood <nemesis@icequake.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2012-11-20 20:52:07 +00:00
Cesar Eduardo Barros caaa8c6339 x86: remove dummy long from EFI stub
Commit 2e064b1 (x86, efi: Fix issue of overlapping .reloc section for
EFI_STUB) removed a dummy reloc added by commit 291f363 (x86, efi: EFI
boot stub support), but forgot to remove the dummy long used by that
reloc.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Lee G Rosenbaum <lee.g.rosenbaum@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2012-11-20 20:17:48 +00:00
David Howells 60606d4248 Merge branch 'x86-pre-uapi' into perf-uapi
David Howells (1):
      x86: Export asm/{svm.h,vmx.h,perf_regs.h}
2012-11-19 21:50:58 +00:00
Paul Bolle 23e5047afc x86: remove offsets.h from .gitignore and dontdiff
Commit 77d1a49995 ("x86, boot: make
symbols from the main vmlinux available") removed all traces of
offsets.h from the tree. Remove its entries in dontdiff and x86/boot's
.gitignore file too.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-19 14:10:53 +01:00
Steven Rostedt 6c8d8b3c69 x86_32: Return actual stack when requesting sp from regs
As x86_32 traps do not save sp when taken in kernel mode, we need to
accommodate the sp when requesting to get the register.

This affects kprobes.

Before:

 # echo 'p:ftrace sys_read+4 s=%sp' > /debug/tracing/kprobe_events
 # echo 1 > /debug/tracing/events/kprobes/enable
 # cat trace
            sshd-1345  [000] d...   489.117168: ftrace: (sys_read+0x4/0x70) s=b7e96768
            sshd-1345  [000] d...   489.117191: ftrace: (sys_read+0x4/0x70) s=b7e96768
             cat-1447  [000] d...   489.117392: ftrace: (sys_read+0x4/0x70) s=5a7
             cat-1447  [001] d...   489.118023: ftrace: (sys_read+0x4/0x70) s=b77ad05f
            less-1448  [000] d...   489.118079: ftrace: (sys_read+0x4/0x70) s=b7762e06
            less-1448  [000] d...   489.118117: ftrace: (sys_read+0x4/0x70) s=b7764970

After:
            sshd-1352  [000] d...   362.348016: ftrace: (sys_read+0x4/0x70) s=f3febfa8
            sshd-1352  [000] d...   362.348048: ftrace: (sys_read+0x4/0x70) s=f3febfa8
            bash-1355  [001] d...   362.348081: ftrace: (sys_read+0x4/0x70) s=f5075fa8
            sshd-1352  [000] d...   362.348082: ftrace: (sys_read+0x4/0x70) s=f3febfa8
            sshd-1352  [000] d...   362.690950: ftrace: (sys_read+0x4/0x70) s=f3febfa8
            bash-1355  [001] d...   362.691033: ftrace: (sys_read+0x4/0x70) s=f5075fa8

Link: http://lkml.kernel.org/r/1342208654.30075.22.camel@gandalf.stny.rr.com

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-11-19 05:31:37 -05:00
David S. Miller 67f4efdce7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor line offset auto-merges.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-17 22:00:43 -05:00
Yinghai Lu 9710f581bb x86, mm: Let "memmap=" take more entries one time
Current "memmap=" only can take one entry every time.
when we have more entries, we have to use memmap= for each of them.

For pxe booting, we have command line length limitation, those extra
"memmap=" would waste too much space.

This patch make memmap= could take several entries one time,
and those entries will be split with ','

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-47-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:51 -08:00
Yinghai Lu c074eaac2a x86, mm: kill numa_64.h
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-44-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:47 -08:00
Yinghai Lu 94b43c3d86 x86, mm: kill numa_free_all_bootmem()
Now NO_BOOTMEM version free_all_bootmem_node() does not really
do free_bootmem at all, and it only call register_page_bootmem_info_node
instead.

That is confusing, try to kill that free_all_bootmem_node().

Before that, this patch will remove numa_free_all_bootmem().

That function could be replaced with register_page_bootmem_info() and
free_all_bootmem();

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-43-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:47 -08:00
Yinghai Lu b8fd39c036 x86, mm: Use clamp_t() in init_range_memory_mapping
save some lines, and make code more readable.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-42-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:46 -08:00
Yinghai Lu 60a8f42832 x86, mm: Move after_bootmem to mm_internel.h
it is only used in arch/x86/mm/init*.c

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-41-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:45 -08:00
Yinghai Lu 4e37a89047 x86, mm: Unifying after_bootmem for 32bit and 64bit
after_bootmem has different meaning in 32bit and 64bit.
        32bit: after bootmem is ready
        64bit: after bootmem is distroyed
Let's merget them make 32bit the same as 64bit.

for 32bit, it is mixing alloc_bootmem_pages, and alloc_low_page under
after_bootmem is set or not set.

alloc_bootmem is just wrapper for memblock for x86.

Now we have alloc_low_page() with memblock too. We can drop bootmem path
now, and only alloc_low_page only.

At the same time, we make alloc_low_page could handle real after_bootmem
for 32bit, because alloc_bootmem_pages could fallback to use slab too.

At last move after_bootmem set position for 32bit the same as 64bit.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-40-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:44 -08:00
Yinghai Lu 2e8059edb6 x86, mm: use limit_pfn for end pfn
instead of shifting end to get that.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-39-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:43 -08:00
Yinghai Lu 1829ae9ad7 x86, mm: use pfn instead of pos in split_mem_range
could save some bit shifting operations.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-38-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:41 -08:00
Yinghai Lu 84d770019b x86, mm: use PFN_DOWN in split_mem_range()
to replace own inline version for shifting.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-37-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:40 -08:00
Yinghai Lu 5a0d3aeeef x86, mm: use round_up/down in split_mem_range()
to replace own inline version for those roundup and rounddown.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-36-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:40 -08:00
Yinghai Lu 11ed9e927d x86, mm: Add check before clear pte above max_low_pfn on 32bit
During test patch that adjust page_size_mask to map small range ram with
big page size, found page table is setup wrongly for 32bit. And
native_pagetable_init wrong clear pte for pmd with large page support.

1. add more comments about why we are expecting pte.

2. add BUG checking, so next time we could find problem earlier
   when we mess up page table setup again.

3. max_low_pfn is not included boundary for low memory mapping.
   We should check from max_low_pfn instead of +1.

4. add print out when some pte really get cleared, or we should use
   WARN() to find out why above max_low_pfn get mapped? so we could
   fix it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-35-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:39 -08:00
Yinghai Lu c8dcdb9ce4 x86, mm: Move function declaration into mm_internal.h
They are only for mm/init*.c.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-34-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:38 -08:00
Yinghai Lu f836e35a98 x86, mm: change low/hignmem_pfn_init to static on 32bit
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-33-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:37 -08:00
Yinghai Lu 148b20989e x86, mm: Move init_gbpages() out of setup.c
Put it in mm/init.c, and call it from probe_page_mask().
init_mem_mapping is calling probe_page_mask at first.
So calling sequence is not changed.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-32-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:37 -08:00
Yinghai Lu cf47065961 x86, mm: Move back pgt_buf_* to mm/init.c
Also change them to static.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-31-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:36 -08:00
Yinghai Lu 719272c45b x86, mm: only call early_ioremap_page_table_range_init() once
On 32bit, before patcheset that only set page table for ram, we only
call that one time.

Now, we are calling that during every init_memory_mapping if we have holes
under max_low_pfn.

We should only call it one time after all ranges under max_low_page get
mapped just like we did before.

Also that could avoid the risk to run out of pgt_buf in BRK.

Need to update page_table_range_init() to count the pages for kmap page table
at first, and use new added alloc_low_pages() to get pages in sequence.
That will conform to the requirement that pages need to be in low to high order.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-30-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:29 -08:00
Stefano Stabellini ddd3509df8 x86, mm: Add pointer about Xen mmu requirement for alloc_low_pages
Add link for more information
	279b706 x86,xen: introduce x86_init.mapping.pagetable_reserve

-v2: updated to commets from hpa to include commit name.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-29-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:28 -08:00
Yinghai Lu 22c8ca2ac2 x86, mm: Add alloc_low_pages(num)
32bit kmap mapping needs pages to be used for low to high.
At this point those pages are still from pgt_buf_* from BRK, so it is
ok now.
But we want to move early_ioremap_page_table_range_init() out of
init_memory_mapping() and only call it one time later, that will
make page_table_range_init/page_table_kmap_check/alloc_low_page to
use memblock to get page.

memblock allocation for pages are from high to low.
So will get panic from page_table_kmap_check() that has BUG_ON to do
ordering checking.

This patch add alloc_low_pages to make it possible to allocate serveral
pages at first, and hand out pages one by one from low to high.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-28-git-send-email-yinghai@kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:27 -08:00
Yinghai Lu 6f80b68e9e x86, mm, Xen: Remove mapping_pagetable_reserve()
Page table area are pre-mapped now after
	x86, mm: setup page table in top-down
	x86, mm: Remove early_memremap workaround for page table accessing on 64bit

mapping_pagetable_reserve is not used anymore, so remove it.

Also remove operation in mask_rw_pte(), as modified allow_low_page
always return pages that are already mapped, moreover
xen_alloc_pte_init, xen_alloc_pmd_init, etc, will mark the page RO
before hooking it into the pagetable automatically.

-v2: add changelog about mask_rw_pte() from Stefano.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-27-git-send-email-yinghai@kernel.org
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:26 -08:00
Yinghai Lu 9985b4c6fa x86, mm: Move min_pfn_mapped back to mm/init.c
Also change it to static.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-26-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:24 -08:00
Yinghai Lu 5c51bdbe4c x86, mm: Merge alloc_low_page between 64bit and 32bit
They are almost same except 64 bit need to handle after_bootmem case.

Add mm_internal.h to make that alloc_low_page() only to be accessible
from arch/x86/mm/init*.c

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-25-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:23 -08:00
Yinghai Lu 868bf4d6b9 x86, mm: Remove parameter in alloc_low_page for 64bit
Now all page table buf are pre-mapped, and could use virtual address directly.
So don't need to remember physical address anymore.

Remove that phys pointer in alloc_low_page(), and that will allow us to merge
alloc_low_page between 64bit and 32bit.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-24-git-send-email-yinghai@kernel.org
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:22 -08:00
Yinghai Lu 973dc4f3fa x86, mm: Remove early_memremap workaround for page table accessing on 64bit
We try to put page table high to make room for kdump, and at that time
those ranges are not mapped yet, and have to use ioremap to access it.

Now after patch that pre-map page table top down.
	x86, mm: setup page table in top-down
We do not need that workaround anymore.

Just use __va to return directly mapping address.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-23-git-send-email-yinghai@kernel.org
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:20 -08:00
Yinghai Lu 8d57470d8f x86, mm: setup page table in top-down
Get pgt_buf early from BRK, and use it to map PMD_SIZE from top at first.
Then use mapped pages to map more ranges below, and keep looping until
all pages get mapped.

alloc_low_page will use page from BRK at first, after that buffer is used
up, will use memblock to find and reserve pages for page table usage.

Introduce min_pfn_mapped to make sure find new pages from mapped ranges,
that will be updated when lower pages get mapped.

Also add step_size to make sure that don't try to map too big range with
limited mapped pages initially, and increase the step_size when we have
more mapped pages on hand.

We don't need to call pagetable_reserve anymore, reserve work is done
in alloc_low_page() directly.

At last we can get rid of calculation and find early pgt related code.

-v2: update to after fix_xen change,
     also use MACRO for initial pgt_buf size and add comments with it.
-v3: skip big reserved range in memblock.reserved near end.
-v4: don't need fix_xen change now.
-v5: add changelog about moving about reserving pagetable to alloc_low_page.

Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-22-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:19 -08:00
Yinghai Lu f763ad1d38 x86, mm: Break down init_all_memory_mapping
Will replace that with top-down page table initialization.
New API need to take range: init_range_memory_mapping()

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-21-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:17 -08:00
Yinghai Lu eceb3632ac x86, mm: Don't clear page table if range is ram
After we add code use buffer in BRK to pre-map buf for page table in
following patch:
	x86, mm: setup page table in top-down
it should be safe to remove early_memmap for page table accessing.
Instead we get panic with that.

It turns out that we clear the initial page table wrongly for next range
that is separated by holes.
And it only happens when we are trying to map ram range one by one.

We need to check if the range is ram before clearing page table.

We change the loop structure to remove the extra little loop and use
one loop only, and in that loop will caculate next at first, and check if
[addr,next) is covered by E820_RAM.

-v2: E820_RESERVED_KERN is treated as E820_RAM. EFI one change some E820_RAM
     to that, so next kernel by kexec will know that range is used already.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-20-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:17 -08:00
Yinghai Lu aeebe84cc9 x86, mm: Use big page size for small memory range
We could map small range in the middle of big range at first, so should use
big page size at first to avoid using small page size to break down page table.

Only can set big page bit when that range has ram area around it.

-v2: fix 32bit boundary checking. We can not count ram above max_low_pfn
	for 32 bit.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-19-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:16 -08:00
Yinghai Lu 960ddb4fe7 x86, mm: Align start address to correct big page size
We are going to use buffer in BRK to map small range just under memory top,
and use those new mapped ram to map ram range under it.

The ram range that will be mapped at first could be only page aligned,
but ranges around it are ram too, we could use bigger page to map it to
avoid small page size.

We will adjust page_size_mask in following patch:
	x86, mm: Use big page size for small memory range
to use big page size for small ram range.

Before that patch, this patch will make sure start address to be
aligned down according to bigger page size, otherwise entry in page
page will not have correct value.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-18-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:15 -08:00
Yinghai Lu 74f27655dd x86, mm: relocate initrd under all mem for 64bit
instead of under 4g.

For 64bit, we can use any mapped mem instead of low mem.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-17-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:15 -08:00
Jacob Shin 66520ebc2d x86, mm: Only direct map addresses that are marked as E820_RAM
Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
backed by actual DRAM. This is fine for holes under 4GB which are covered
by fixed and variable range MTRRs to be UC. However, we run into trouble
on higher memory addresses which cannot be covered by MTRRs.

Our system with 1TB of RAM has an e820 that looks like this:

 BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
 BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
 BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
 BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
 BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
 BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
 BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
 BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
 BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
 BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
 BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
 BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
 BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable

and so direct mappings are created for huge memory hole between
0x000000e038000000 to 0x0000010000000000. Even though the kernel never
generates memory accesses in that region, since the page tables mark
them incorrectly as being WB, our (AMD) processor ends up causing a MCE
while doing some memory bookkeeping/optimizations around that area.

This patch iterates through e820 and only direct maps ranges that are
marked as E820_RAM, and keeps track of those pfn ranges. Depending on
the alignment of E820 ranges, this may possibly result in using smaller
size (i.e. 4K instead of 2M or 1G) page tables.

-v2: move changes from setup.c to mm/init.c, also use for_each_mem_pfn_range
	instead.  - Yinghai Lu
-v3: add calculate_all_table_space_size() to get correct needed page table
	size. - Yinghai Lu
-v4: fix add_pfn_range_mapped() to get correct max_low_pfn_mapped when
     mem map does have hole under 4g that is found by Konard on xen
     domU with 8g ram. - Yinghai

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1353123563-3103-16-git-send-email-yinghai@kernel.org
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:14 -08:00
Yinghai Lu e8c57d4051 x86, mm: use pfn_range_is_mapped() with reserve_initrd
We are going to map ram only, so under max_low_pfn_mapped,
between 4g and max_pfn_mapped does not mean mapped at all.

Use pfn_range_is_mapped() to find out if range is mapped for initrd.

That could happen bootloader put initrd in range but user could
use memmap to carve some of range out.

Also during copying need to use early_memmap to map original initrd
for accessing.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-15-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:12 -08:00