Commit Graph

8706 Commits

Author SHA1 Message Date
Uros Bizjak e4ea3f6b1b ftrace/x86_32: Simplify parameter setup for ftrace_regs_caller
The final position of the stack after saving regs and setting up
the parameters for ftrace_regs_call, is the position of the pt_regs
needed for the 4th parameter. Instead of saving it into a temporary
reg and pushing the reg, simply push the stack pointer.

Link: http://lkml.kernel.org/r/1342702344.12353.16.camel@gandalf.stny.rr.com

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-07-31 10:29:34 -04:00
Len Brown 3b6961ba8c ACPI/x86: revert 'x86, acpi: Call acpi_enter_sleep_state via an asmlinkage C function from assembler'
cd74257b97
patched up GTS/BFS -- a feature we want to remove.
So revert it (by hand, due to conflict in sleep.h)
to prepare for GTS/BFS removal.

Signed-off-by: Len Brown <len.brown@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-30 21:10:16 -04:00
Yasuaki Ishimatsu 4ed940d4c3 firmware_map: make firmware_map_add_early() argument consistent with firmware_map_add_hotplug()
There are two ways to create /sys/firmware/memmap/X sysfs:

  - firmware_map_add_early
    When the system starts, it is calledd from e820_reserve_resources()
  - firmware_map_add_hotplug
    When the memory is hot plugged, it is called from add_memory()

But these functions are called without unifying value of end argument as
below:

  - end argument of firmware_map_add_early()   : start + size - 1
  - end argument of firmware_map_add_hogplug() : start + size

The patch unifies them to "start + size".  Even if applying the patch,
/sys/firmware/memmap/X/end file content does not change.

[akpm@linux-foundation.org: clarify comments]
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Reviewed-by: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30 17:25:17 -07:00
Linus Torvalds 4cb38750d4 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/mm changes from Peter Anvin:
 "The big change here is the patchset by Alex Shi to use INVLPG to flush
  only the affected pages when we only need to flush a small page range.

  It also removes the special INVALIDATE_TLB_VECTOR interrupts (32
  vectors!) and replace it with an ordinary IPI function call."

Fix up trivial conflicts in arch/x86/include/asm/apic.h (added code next
to changed line)

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tlb: Fix build warning and crash when building for !SMP
  x86/tlb: do flush_tlb_kernel_range by 'invlpg'
  x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR
  x86/tlb: enable tlb flush range support for x86
  mm/mmu_gather: enable tlb flush range in generic mmu_gather
  x86/tlb: add tlb_flushall_shift knob into debugfs
  x86/tlb: add tlb_flushall_shift for specific CPU
  x86/tlb: fall back to flush all when meet a THP large page
  x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range
  x86/tlb_info: get last level TLB entry number of CPU
  x86: Add read_mostly declaration/definition to variables from smp.h
  x86: Define early read-mostly per-cpu macros
2012-07-26 13:17:17 -07:00
Linus Torvalds 79071638ce Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "The biggest change is a performance improvement on SMP systems:

  | 4 socket 40 core + SMT Westmere box, single 30 sec tbench
  | runs, higher is better:
  |
  | clients     1       2       4        8       16       32       64      128
  |..........................................................................
  | pre        30      41     118      645     3769     6214    12233    14312
  | post      299     603    1211     2418     4697     6847    11606    14557
  |
  | A nice increase in performance.

  which speedup is particularly noticeable on heavily interacting
  few-tasks workloads, so the changes should help desktop-style Xorg
  workloads and interactivity as well, on multi-core CPUs.

  There are also cpuset suspend behavior fixes/restructuring and various
  smaller tweaks."

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix race in task_group()
  sched: Improve balance_cpu() to consider other cpus in its group as target of (pinned) task
  sched: Reset loop counters if all tasks are pinned and we need to redo load balance
  sched: Reorder 'struct lb_env' members to reduce its size
  sched: Improve scalability via 'CPU buddies', which withstand random perturbations
  cpusets: Remove/update outdated comments
  cpusets, hotplug: Restructure functions that are invoked during hotplug
  cpusets, hotplug: Implement cpuset tree traversal in a helper function
  CPU hotplug, cpusets, suspend: Don't modify cpusets during suspend/resume
  sched/x86: Remove broken power estimation
2012-07-26 13:08:01 -07:00
Julia Lawall 41fb433b63 arch/x86/kernel/kdebugfs.c: Ensure a consistent return value in error case
Typically, the return value desired for the failure of a
function with an integer return value is a negative integer.  In
these cases, the return value is sometimes a negative integer
and sometimes 0, due to a subsequent initialization of the
return variable within the loop.

A simplified version of the semantic match that finds this
problem is: (http://coccinelle.lip6.fr/)

//<smpl>
@r exists@
identifier ret;
position p;
constant C;
expression e1,e3,e4;
statement S;
@@

ret = -C
... when != ret = e3
    when any
if@p (...) S
... when any
if (\(ret != 0\|ret < 0\|ret > 0\) || ...) { ... return ...; }
... when != ret = e3
    when any
*if@p (...)
{
  ... when != ret = e4
  return ret;
}
//</smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Link: http://lkml.kernel.org/r/1342284188-19176-7-git-send-email-Julia.Lawall@lip6.fr
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-26 15:07:20 +02:00
Tony Luck 61b0fccd7f x86/mce: Add quirk for instruction recovery on Sandy Bridge processors
Sandy Bridge processors follow the SDM (Vol 3B, Table 15-20) and
set both the RIPV and EIPV bits in the MCG_STATUS register to
zero for machine checks during instruction fetch. This is more
than a little counter-intuitive and means that Linux cannot
recover from these errors. Rather than insert special case code
at several places in mce.c and mce-severity.c, we pretend the
EIPV bit was set for just this case early in processing the
machine check.

Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Chen Gong <gong.chen@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/180a06f3f357cf9f78259ae443a082b14a29535b.1343078495.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-26 15:05:47 +02:00
Tony Luck 736edce5f3 x86/mce: Move MCACOD defines from mce-severity.c to <asm/mce.h>
We will need some of these values in mce.c. Move them to the
appropriate header file so they are available.

Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Chen Gong <gong.chen@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/0ccfb1af5fe35e537b7cd8e4d448bf7d851dbfb9.1343078495.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-26 15:05:47 +02:00
Tomoki Sekiyama 1d44b30f35 x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs
In the current kernel, percpu variable `vector_irq' is not always
cleared when a CPU is offlined. If the CPU that has the disabled
irqs in vector_irq is hotplugged again, __setup_vector_irq()
hits invalid irq vector and may crash.

This bug can be reproduced as following;

 # echo 0 > /sys/devices/system/cpu/cpu7/online
 # modprobe -r some_driver_using_interrupts     # vector_irq@cpu7 uncleared
 # echo 1 > /sys/devices/system/cpu/cpu7/online # kernel may crash

To fix this problem, this patch clears vector_irq in
__fixup_irqs() when the CPU is offlined.

This also reverts commit f6175f5bfb, which partially fixes
this bug by clearing vector in __clear_irq_vector(). But in
environments with IOMMU IRQ remapper, it could fail because
cfg->domain doesn't contain offlined CPUs. With this patch, the
fix in __clear_irq_vector() can be reverted because every
vector_irq is already cleared in __fixup_irqs() on offlined CPUs.

Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20120726104732.2889.19144.stgit@kvmdev
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-26 15:01:17 +02:00
Yan, Zheng c1ece48cf7 perf/x86: Fix format definition of SNB-EP uncore QPI box
The event control register of SNB-EP uncore QPI box has a one bit
extension at bit position 21.

Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1343097850-4348-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-26 12:23:14 +02:00
Peter Zijlstra 597ed953d7 perf/x86: Make bitfield unsigned
Fix:

 arch/x86/kernel/cpu/perf_event.h:377:43: sparse: dubious one-bit signed bitfield

Cc: Borislav Petkov <bp@amd64.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-2jxkmktkppkclj1qe6qxd7ah@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-26 12:23:13 +02:00
Yan, Zheng 74e6543fdc perf/x86: Fix LLC-* and node-* events on Intel SandyBridge
LLC-* and node-* events require using the OFFCORE_RESPONSE events
on SandyBridge, but the hw_cache_extra_regs is left uninitialized.
This patch adds the missing extra register configure table for
SandyBridge.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1342517275-2875-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-26 12:23:12 +02:00
Yan, Zheng 254298c726 perf/x86: Add Intel Nehalem-EX uncore support
The uncore subsystem in Nehalem-EX consists of 7 components
(U-Box, C-Box, B-Box, S-Box, R-Box, M-Box and W-Box). This
patch is large because the way to program these boxes is
diverse.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FF534F1.3030307@intel.com
[ Improved the code. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-26 12:23:11 +02:00
Yan, Zheng 4f3f713fc7 perf/x86: Fix typo in format definition of uncore PCU filter
The format definition of uncore PCU filter should be filter_band*
instead of filter_brand*.

Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1343024611-4692-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-26 12:23:11 +02:00
Ingo Molnar d431adfbc9 Merge branch 'linus' into x86/urgent
Merge in Linus's tree to avoid a conflict.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-25 21:40:40 +02:00
Alan Cox d6250a3f12 x86, nops: Missing break resulting in incorrect selection on Intel
The Intel case falls through into the generic case which then changes
the values.  For cases like the P6 it doesn't do the right thing so
this seems to be a screwup.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-lww2uirad4skzjlmrm0vru8o@git.kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
2012-07-25 08:35:38 -07:00
Linus Torvalds 97027da6ad IOMMU Updates for Linux v3.6-rc1
The most important part of these updates is the IOMMU groups code
 enhancement written by Alex Williamson. It abstracts the problem that a
 given hardware IOMMU can't isolate any given device from any other
 device (e.g. 32 bit PCI devices can't usually be isolated). Devices that
 can't be isolated are grouped together. This code is required for the
 upcoming VFIO framework.
 
 Another IOMMU-API change written by be is the introduction of domain
 attributes. This makes it easier to handle GART-like IOMMUs with the
 IOMMU-API because now the start-address and the size of the domain
 address space can be queried.
 
 Besides that there are a few cleanups and fixes for the NVidia Tegra
 IOMMU drivers and the reworked init-code for the AMD IOMMU. The later is
 from my patch-set to support interrupt remapping. The rest of this
 patch-set requires x86 changes which are not mergabe yet. So full
 support for interrupt remapping with AMD IOMMUs will come in a future
 merge window.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQDV/MAAoJECvwRC2XARrjSDcP+gJbtSHDMyZ71zyfQfAZcxJt
 rTqLbdZRtIjrjgtKSEDp8u5Bo5TK9dAYoZVuJMOZewFzwI/fSfbRsWp1PU0I88Fr
 ZzM+/o1N9MLvf1e3kRVOzNzUfku+jTQgUBD4txsbtQzc/IeGHe9qP1Bqzs/xg4Pk
 SjWu7pLNYxaER10z76nRodNn6zGjsc7GFdOW8cJu2HOAHhisIAR291jSQgd6Rz9r
 zWqSTsXIEzYt2CtU3G2/tFJ554Mp8v5F80gHo+0Ldw8aNxlD6nGtbqGNt+KI8qTv
 MUL8KJ0TNms9CZdti1CSlSNp51VgJi2GaWKCaDAkYuuER2IbC/8Yp/p2DIIA0GNp
 HpziIs+dauZPWfZHc6oU7lJAClGAG4MUx7CysVIOzl7ML/Bf4mjYv0faGf5YQfyE
 weOR+OPPIWDUwgjzHKMAboA4ijkE/v+EKjOaN/S9rEqFEMKC99fwGkf9wUcpZTne
 8lzdI2JrgYNDWMVNYlomeLD4lBAbxb/QsnRUa33igjr0MclvMDkp5HaO631Z1+Zx
 be2z8Rl1CtMwS4qeaOXoeaoNWHU26+oJRZNtCGi/Fw4aKqYXP1dnE/m0GtqEP9Yi
 +CU2rKbZn3j0+ZcQjCQop8FREPrZ2/Uaji70b6G7WZ2ApcqBxzBffpbMKOmd6T1D
 HIzGh0fpdYNDuwn6Txit
 =MbAC
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:
 "The most important part of these updates is the IOMMU groups code
  enhancement written by Alex Williamson.  It abstracts the problem that
  a given hardware IOMMU can't isolate any given device from any other
  device (e.g.  32 bit PCI devices can't usually be isolated).  Devices
  that can't be isolated are grouped together.  This code is required
  for the upcoming VFIO framework.

  Another IOMMU-API change written by me is the introduction of domain
  attributes.  This makes it easier to handle GART-like IOMMUs with the
  IOMMU-API because now the start-address and the size of the domain
  address space can be queried.

  Besides that there are a few cleanups and fixes for the NVidia Tegra
  IOMMU drivers and the reworked init-code for the AMD IOMMU.  The
  latter is from my patch-set to support interrupt remapping.  The rest
  of this patch-set requires x86 changes which are not mergabe yet.  So
  full support for interrupt remapping with AMD IOMMUs will come in a
  future merge window."

* tag 'iommu-updates-v3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (33 commits)
  iommu/amd: Fix hotplug with iommu=pt
  iommu/amd: Add missing spin_lock initialization
  iommu/amd: Convert iommu initialization to state machine
  iommu/amd: Introduce amd_iommu_init_dma routine
  iommu/amd: Move unmap_flush message to amd_iommu_init_dma_ops()
  iommu/amd: Split enable_iommus() routine
  iommu/amd: Introduce early_amd_iommu_init routine
  iommu/amd: Move informational prinks out of iommu_enable
  iommu/amd: Split out PCI related parts of IOMMU initialization
  iommu/amd: Use acpi_get_table instead of acpi_table_parse
  iommu/amd: Fix sparse warnings
  iommu/tegra: Don't call alloc_pdir with as->lock
  iommu/tegra: smmu: Fix unsleepable memory allocation at alloc_pdir()
  iommu/tegra: smmu: Remove unnecessary sanity check at alloc_pdir()
  iommu/exynos: Implement DOMAIN_ATTR_GEOMETRY attribute
  iommu/tegra: Implement DOMAIN_ATTR_GEOMETRY attribute
  iommu/msm: Implement DOMAIN_ATTR_GEOMETRY attribute
  iommu/omap: Implement DOMAIN_ATTR_GEOMETRY attribute
  iommu/vt-d: Implement DOMAIN_ATTR_GEOMETRY attribute
  iommu/amd: Implement DOMAIN_ATTR_GEOMETRY attribute
  ...
2012-07-24 16:24:11 -07:00
Linus Torvalds 6dd53aa456 PCI changes for the 3.6 merge window:
Host bridge hotplug
     - Add MMCONFIG support for hot-added host bridges (Jiang Liu)
   Device hotplug
     - Move fixups from __init to __devinit (Sebastian Andrzej Siewior)
     - Call FINAL fixups for hot-added devices, too (Myron Stowe)
     - Factor out generic code for P2P bridge hot-add (Yinghai Lu)
     - Remove all functions in a slot, not just those with _EJx (Amos Kong)
   Dynamic resource management
     - Track bus number allocation (struct resource tree per domain) (Yinghai Lu)
     - Make P2P bridge 1K I/O windows work with resource reassignment (Bjorn Helgaas, Yinghai Lu)
     - Disable decoding while updating 64-bit BARs (Bjorn Helgaas)
   Power management
     - Add PCIe runtime D3cold support (Huang Ying)
   Virtualization
     - Add VFIO infrastructure (ACS, DMA source ID quirks) (Alex Williamson)
     - Add quirks for devices with broken INTx masking (Jan Kiszka)
   Miscellaneous
     - Fix some PCI Express capability version issues (Myron Stowe)
     - Factor out some arch code with a weak, generic, pcibios_setup() (Myron Stowe)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQBy+9AAoJEPGMOI97Hn6zOpQP+wVFvA7pcteFj6HPs5nTq2Hc
 55oeRqCO0wBHoFMCKB0AjeTATjqxi9OhcjaiVrZejxNyWKC9MnrXuunpQ0l/hCbR
 M/TK+BCelfX2FU4eXNf+TBCCcOhOVWqQft9Gm6nYKwX8Y0msRVCceI4WwhZgSwtI
 vdtmnqlwolscdnq+8ThsnvUMtwkN0gExmn2FJRl6EoEgG0DTqhMkZ83uA+NPBhvv
 I+g0XbA6haaZph2nnSYR0hIW4Q7JkT/LgA6uVAQxamctwxLol7xxsjCRnfqrulkf
 kaRr2fAgBXfmaOIltro4UkXrCM52ZSyggCDfExHp6mWGPKMjE5ZcyK1YbGfmmumk
 DS3t1S0eBdDJXrnf9l/Yb8e95dQxRCYKelKzr1rTD9QAXsInE8rC40hvhfFaTa4s
 nZYRTz0SKv6coQihqaOR7shx1DNomLFk7jndaWEElfl9/cT/nQnZ8XLfVMzkJNNB
 Y4SM6zkiIaCL0aiSEE16MqVjmODYRjbURLYzQIrqr2KJQg8X6XjIRojQLjL6xEgA
 22ry2ZRPhqO68g7aLqvixiSDaTp0Z0Vw+JmgjtBqvkokwZcGQtm4umkpAdOi+Es8
 3bJaMY7ZUpDX53FE8iyP6AnmR/1k19rC1gNnNq/syWyjtYOYJ9i3QCTafFgvE1VC
 5coQ1L5tByHvpzK5PHwf
 =oo/A
 -----END PGP SIGNATURE-----

Merge tag 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI changes from Bjorn Helgaas:
 "Host bridge hotplug:
    - Add MMCONFIG support for hot-added host bridges (Jiang Liu)
  Device hotplug:
    - Move fixups from __init to __devinit (Sebastian Andrzej Siewior)
    - Call FINAL fixups for hot-added devices, too (Myron Stowe)
    - Factor out generic code for P2P bridge hot-add (Yinghai Lu)
    - Remove all functions in a slot, not just those with _EJx (Amos
      Kong)
  Dynamic resource management:
    - Track bus number allocation (struct resource tree per domain)
      (Yinghai Lu)
    - Make P2P bridge 1K I/O windows work with resource reassignment
      (Bjorn Helgaas, Yinghai Lu)
    - Disable decoding while updating 64-bit BARs (Bjorn Helgaas)
  Power management:
    - Add PCIe runtime D3cold support (Huang Ying)
  Virtualization:
    - Add VFIO infrastructure (ACS, DMA source ID quirks) (Alex
      Williamson)
    - Add quirks for devices with broken INTx masking (Jan Kiszka)
  Miscellaneous:
    - Fix some PCI Express capability version issues (Myron Stowe)
    - Factor out some arch code with a weak, generic, pcibios_setup()
      (Myron Stowe)"

* tag 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (122 commits)
  PCI: hotplug: ensure a consistent return value in error case
  PCI: fix undefined reference to 'pci_fixup_final_inited'
  PCI: build resource code for M68K architecture
  PCI: pciehp: remove unused pciehp_get_max_lnk_width(), pciehp_get_cur_lnk_width()
  PCI: reorder __pci_assign_resource() (no change)
  PCI: fix truncation of resource size to 32 bits
  PCI: acpiphp: merge acpiphp_debug and debug
  PCI: acpiphp: remove unused res_lock
  sparc/PCI: replace pci_cfg_fake_ranges() with pci_read_bridge_bases()
  PCI: call final fixups hot-added devices
  PCI: move final fixups from __init to __devinit
  x86/PCI: move final fixups from __init to __devinit
  MIPS/PCI: move final fixups from __init to __devinit
  PCI: support sizing P2P bridge I/O windows with 1K granularity
  PCI: reimplement P2P bridge 1K I/O windows (Intel P64H2)
  PCI: disable MEM decoding while updating 64-bit MEM BARs
  PCI: leave MEM and IO decoding disabled during 64-bit BAR sizing, too
  PCI: never discard enable/suspend/resume_early/resume fixups
  PCI: release temporary reference in __nv_msi_ht_cap_quirk()
  PCI: restructure 'pci_do_fixups()'
  ...
2012-07-24 16:17:07 -07:00
Linus Torvalds d14b7a419a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "Trivial updates all over the place as usual."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (29 commits)
  Fix typo in include/linux/clk.h .
  pci: hotplug: Fix typo in pci
  iommu: Fix typo in iommu
  video: Fix typo in drivers/video
  Documentation: Add newline at end-of-file to files lacking one
  arm,unicore32: Remove obsolete "select MISC_DEVICES"
  module.c: spelling s/postition/position/g
  cpufreq: Fix typo in cpufreq driver
  trivial: typo in comment in mksysmap
  mach-omap2: Fix typo in debug message and comment
  scsi: aha152x: Fix sparse warning and make printing pointer address more portable.
  Change email address for Steve Glendinning
  Btrfs: fix typo in convert_extent_bit
  via: Remove bogus if check
  netprio_cgroup.c: fix comment typo
  backlight: fix memory leak on obscure error path
  Documentation: asus-laptop.txt references an obsolete Kconfig item
  Documentation: ManagementStyle: fixed typo
  mm/vmscan: cleanup comment error in balance_pgdat
  mm: cleanup on the comments of zone_reclaim_stat
  ...
2012-07-24 13:34:56 -07:00
Linus Torvalds 62c4d9afa4 Features:
* Performance improvement to lower the amount of traps the hypervisor
    has to do 32-bit guests. Mainly for setting PTE entries and updating
    TLS descriptors.
  * MCE polling driver to collect hypervisor MCE buffer and present them to
    /dev/mcelog.
  * Physical CPU online/offline support. When an privileged guest is booted
    it is present with virtual CPUs, which might have an 1:1 to physical
    CPUs but usually don't. This provides mechanism to offline/online physical
    CPUs.
 Bug-fixes for:
  * Coverity found fixes in the console and ACPI processor driver.
  * PVonHVM kexec fixes along with some cleanups.
  * Pages that fall within E820 gaps and non-RAM regions (and had been
    released to hypervisor) would be populated back, but potentially in
    non-RAM regions.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJQDWcvAAoJEFjIrFwIi8fJ6GAH/iFIkOC5wseD8qZ9nV4VI46t
 0GYvBFC4F91NvC7CNfoAySr84v+ZORIZzMcdyDF8H/tLO9MaOY/Mwn0S5ZSqmYMi
 rhskvK3InBaVkYtceOHugNGM7mB0c3STIm7OsjW6gbVzohmTN25rbQR+X5iWAtVA
 cTUtDyH3AU15mwuVT3U+VC4IulHpnNJz4pHoq3Sn61/UK1LYmhLXYd5fveA0D0B8
 lRZTAvNMsYDJDDmkWNrs8RczKkQ86DTSjfGawm0YG+Gf94GgD5yMHWbiHh2Gy93e
 u7sHK0RrKbP5BY/MV6vVJxkoV5NoWgCc0tcjBcYwdyvwzxDS75UhV6uoVHC3Ao8=
 =drt2
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.6-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen update from Konrad Rzeszutek Wilk:
 "Features:
   * Performance improvement to lower the amount of traps the hypervisor
     has to do 32-bit guests.  Mainly for setting PTE entries and
     updating TLS descriptors.
   * MCE polling driver to collect hypervisor MCE buffer and present
     them to /dev/mcelog.
   * Physical CPU online/offline support.  When an privileged guest is
     booted it is present with virtual CPUs, which might have an 1:1 to
     physical CPUs but usually don't.  This provides mechanism to
     offline/online physical CPUs.
  Bug-fixes for:
   * Coverity found fixes in the console and ACPI processor driver.
   * PVonHVM kexec fixes along with some cleanups.
   * Pages that fall within E820 gaps and non-RAM regions (and had been
     released to hypervisor) would be populated back, but potentially in
     non-RAM regions."

* tag 'stable/for-linus-3.6-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: populate correct number of pages when across mem boundary (v2)
  xen PVonHVM: move shared_info to MMIO before kexec
  xen: simplify init_hvm_pv_info
  xen: remove cast from HYPERVISOR_shared_info assignment
  xen: enable platform-pci only in a Xen guest
  xen/pv-on-hvm kexec: shutdown watches from old kernel
  xen/x86: avoid updating TLS descriptors if they haven't changed
  xen/x86: add desc_equal() to compare GDT descriptors
  xen/mm: zero PTEs for non-present MFNs in the initial page table
  xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable
  xen/hvc: Fix up checks when the info is allocated.
  xen/acpi: Fix potential memory leak.
  xen/mce: add .poll method for mcelog device driver
  xen/mce: schedule a workqueue to avoid sleep in atomic context
  xen/pcpu: Xen physical cpus online/offline sys interface
  xen/mce: Register native mce handler as vMCE bounce back point
  x86, MCE, AMD: Adjust initcall sequence for xen
  xen/mce: Add mcelog support for Xen platform
2012-07-24 13:14:03 -07:00
Linus Torvalds 5fecc9d8f5 KVM updates for the 3.6 merge window
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQDRDNAAoJEI7yEDeUysxlkl8P/3C2AHx2webOU8sVzhfU6ONZ
 ZoGevwBjyZIeJEmiWVpFTTEew1l0PXtpyOocXGNUXIddVnhXTQOKr/Scj4uFbmx8
 ROqgK8NSX9+xOGrBPCoN7SlJkmp+m6uYtwYkl2SGnsEVLWMKkc7J7oqmszCcTQvN
 UXMf7G47/Ul2NUSBdv4Yvizhl4kpvWxluiweDw3E/hIQKN0uyP7CY58qcAztw8nG
 csZBAnnuPFwIAWxHXW3eBBv4UP138HbNDqJ/dujjocM6GnOxmXJmcZ6b57gh+Y64
 3+w9IR4qrRWnsErb/I8inKLJ1Jdcf7yV2FmxYqR4pIXay2Yzo1BsvFd6EB+JavUv
 pJpixrFiDDFoQyXlh4tGpsjpqdXNMLqyG4YpqzSZ46C8naVv9gKE7SXqlXnjyDlb
 Llx3hb9Fop8O5ykYEGHi+gIISAK5eETiQl4yw9RUBDpxydH4qJtqGIbLiDy8y9wi
 Xyi8PBlNl+biJFsK805lxURqTp/SJTC3+Zb7A7CzYEQm5xZw3W/CKZx1ZYBfpaa/
 pWaP6tB7JwgLIVXi4HQayLWqMVwH0soZIn9yazpOEFv6qO8d5QH5RAxAW2VXE3n5
 JDlrajar/lGIdiBVWfwTJLb86gv3QDZtIWoR9mZuLKeKWE/6PRLe7HQpG1pJovsm
 2AsN5bS0BWq+aqPpZHa5
 =pECD
 -----END PGP SIGNATURE-----

Merge tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Avi Kivity:
 "Highlights include
  - full big real mode emulation on pre-Westmere Intel hosts (can be
    disabled with emulate_invalid_guest_state=0)
  - relatively small ppc and s390 updates
  - PCID/INVPCID support in guests
  - EOI avoidance; 3.6 guests should perform better on 3.6 hosts on
    interrupt intensive workloads)
  - Lockless write faults during live migration
  - EPT accessed/dirty bits support for new Intel processors"

Fix up conflicts in:
 - Documentation/virtual/kvm/api.txt:

   Stupid subchapter numbering, added next to each other.

 - arch/powerpc/kvm/booke_interrupts.S:

   PPC asm changes clashing with the KVM fixes

 - arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c:

   Duplicated commits through the kvm tree and the s390 tree, with
   subsequent edits in the KVM tree.

* tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
  KVM: fix race with level interrupts
  x86, hyper: fix build with !CONFIG_KVM_GUEST
  Revert "apic: fix kvm build on UP without IOAPIC"
  KVM guest: switch to apic_set_eoi_write, apic_write
  apic: add apic_set_eoi_write for PV use
  KVM: VMX: Implement PCID/INVPCID for guests with EPT
  KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check
  KVM: PPC: Critical interrupt emulation support
  KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests
  KVM: PPC64: booke: Set interrupt computation mode for 64-bit host
  KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt
  KVM: PPC: bookehv64: Add support for std/ld emulation.
  booke: Added crit/mc exception handler for e500v2
  booke/bookehv: Add host crit-watchdog exception support
  KVM: MMU: document mmu-lock and fast page fault
  KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint
  KVM: MMU: trace fast page fault
  KVM: MMU: fast path of handling guest page fault
  KVM: MMU: introduce SPTE_MMU_WRITEABLE bit
  KVM: MMU: fold tlb flush judgement into mmu_spte_update
  ...
2012-07-24 12:01:20 -07:00
Peter Zijlstra ee08d1284e sched/x86: Remove broken power estimation
The x86 sched power implementation has been broken forever and gets in
the way of other stuff, remove it.

[ For archaeological interest, fixing this code would require dealing
  with the cross-cpu calling of these functions and more importantly, we
  need to filter idle time out of the a/m-perf stuff because the ratio
  will go down to 0 when idle, giving a 0 capacity which is not what
  we'd want. ]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Link: http://lkml.kernel.org/r/1339594110.8980.38.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24 13:53:00 +02:00
Joerg Roedel 395e51f18d Merge branches 'iommu/fixes', 'x86/amd', 'groups', 'arm/tegra' and 'api/domain-attr' into next
Conflicts:
	drivers/iommu/iommu.c
	include/linux/iommu.h
2012-07-23 12:17:00 +02:00
Linus Torvalds 5b160bd426 Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/mce changes from Ingo Molnar:
 "This tree improves the AMD thresholding bank code and includes a
  memory fault signal handling fixlet."

* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Fix siginfo_t->si_addr value for non-recoverable memory faults
  x86, MCE, AMD: Update copyrights and boilerplate
  x86, MCE, AMD: Give proper names to the thresholding banks
  x86, MCE, AMD: Make error_count read only
  x86, MCE, AMD: Cleanup reading of error_count
  x86, MCE, AMD: Print decimal thresholding values
  x86, MCE, AMD: Move shared bank to node descriptor
  x86, MCE, AMD: Remove local_allocate_... wrapper
  x86, MCE, AMD: Remove shared banks sysfs linking
  x86, amd_nb: Export model 0x10 and later PCI id
2012-07-22 16:07:45 -07:00
Linus Torvalds d5d96ed2d8 Merge branch 'x86-reboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/reboot changes from Ingo Molnar:
 "Now that the revampted x86 real-mode trampoline code is upstream and
  seems to be working well, we can extend the 64-bit reboot code to be
  as capable as the 32-bit one."

* 'x86-reboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86-64, reboot: Be more paranoid in 64-bit reboot=bios
  x86, reboot: Drop redundant write of reboot_mode
  x86-64, reboot: Allow reboot=bios and reboot-cpu override on x86-64
2012-07-22 12:25:47 -07:00
Linus Torvalds bd3e57f913 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform changes from Ingo Molnar:
 "This tree mostly involves various APIC driver cleanups/robustization,
  and vSMP motivated platform callback improvements/cleanups"

Fix up trivial conflict due to printk cleanup right next to return value
change.

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
  Revert "x86/early_printk: Replace obsolete simple_strtoul() usage with kstrtoint()"
  x86/apic/x2apic: Use multiple cluster members for the irq destination only with the explicit affinity
  x86/apic/x2apic: Limit the vector reservation to the user specified mask
  x86/apic: Optimize cpu traversal in __assign_irq_vector() using domain membership
  x86/vsmp: Fix vector_allocation_domain's return value
  irq/apic: Use config_enabled(CONFIG_SMP) checks to clean up irq_set_affinity() for UP
  x86/vsmp: Fix linker error when CONFIG_PROC_FS is not set
  x86/apic/es7000: Make apicid of a cluster (not CPU) from a cpumask
  x86/apic/es7000+summit: Always make valid apicid from a cpumask
  x86/apic/es7000+summit: Fix compile warning in cpu_mask_to_apicid()
  x86/apic: Fix ugly casting and branching in cpu_mask_to_apicid_and()
  x86/apic: Eliminate cpu_mask_to_apicid() operation
  x86/x2apic/cluster: Vector_allocation_domain() should return a value
  x86/apic/irq_remap: Silence a bogus pr_err()
  x86/vsmp: Ignore IOAPIC IRQ affinity if possible
  x86/apic: Make cpu_mask_to_apicid() operations check cpu_online_mask
  x86/apic: Make cpu_mask_to_apicid() operations return error code
  x86/apic: Avoid useless scanning thru a cpumask in assign_irq_vector()
  x86/apic: Try to spread IRQ vectors to different priority levels
  x86/apic: Factor out default vector_allocation_domain() operation
  ...
2012-07-22 12:19:36 -07:00
Linus Torvalds 3fad0953a1 Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debug-for-linus git tree from Ingo Molnar.

Fix up trivial conflict in arch/x86/kernel/cpu/perf_event_intel.c due to
a printk() having changed to a pr_info() differently in the two branches.

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Move call to print_modules() out of show_regs()
  x86/mm: Mark free_initrd_mem() as __init
  x86/microcode: Mark microcode_id[] as __initconst
  x86/nmi: Clean up register_nmi_handler() usage
  x86: Save cr2 in NMI in case NMIs take a page fault (for i386)
  x86: Remove cmpxchg from i386 NMI nesting code
  x86: Save cr2 in NMI in case NMIs take a page fault
  x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>
2012-07-22 12:04:44 -07:00
Linus Torvalds a065de0d25 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/asm changes from Ingo Molnar:
 "Assorted single-commit improvements, as usual"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/mtrr: Slightly simplify print_mtrr_state()
  x86/mm/mtrr: Fix alignment determination in range_to_mtrr()
  x86/copy_user_generic: Optimize copy_user_generic with CPU erms feature
  x86/alternatives: Use atomic_xchg() instead atomic_dec_and_test() for stop_machine_text_poke()
2012-07-22 11:42:28 -07:00
Linus Torvalds 55acdddbac Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp/hotplug changes from Ingo Molnar:
 "Various cleanups to the SMP hotplug code - a continuing effort of
  Thomas et al"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  smpboot: Remove leftover declaration
  smp: Remove num_booting_cpus()
  smp: Remove ipi_call_lock[_irq]()/ipi_call_unlock[_irq]()
  POWERPC: Smp: remove call to ipi_call_lock()/ipi_call_unlock()
  SPARC: SMP: Remove call to ipi_call_lock_irq()/ipi_call_unlock_irq()
  ia64: SMP: Remove call to ipi_call_lock_irq()/ipi_call_unlock_irq()
  x86-smp-remove-call-to-ipi_call_lock-ipi_call_unlock
  tile: SMP: Remove call to ipi_call_lock()/ipi_call_unlock()
  S390: Smp: remove call to ipi_call_lock()/ipi_call_unlock()
  parisc: Smp: remove call to ipi_call_lock()/ipi_call_unlock()
  mn10300: SMP: Remove call to ipi_call_lock()/ipi_call_unlock()
  hexagon: SMP: Remove call to ipi_call_lock()/ipi_call_unlock()
2012-07-22 11:22:15 -07:00
Ingo Molnar 36d93d88a5 Revert "x86/early_printk: Replace obsolete simple_strtoul() usage with kstrtoint()"
This reverts commit fbd24153c4.

This commit is subtly buggy: kstrto*int() can return an error but
it's not checked in every path. simple_strtoul() on the other hand
could not fail, so this patch subtly intruduces new failure modes.

Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Link: http://lkml.kernel.org/r/1338424803.3569.5.camel@lorien2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-22 15:47:52 +02:00
Geert Uytterhoeven 2e76c2838a module.c: spelling s/postition/position/g
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-07-20 10:38:35 +02:00
Liu, Jinsong a8fccdb061 x86, MCE, AMD: Adjust initcall sequence for xen
there are 3 funcs which need to be _initcalled in a logic sequence:
1. xen_late_init_mcelog
2. mcheck_init_device
3. threshold_init_device

xen_late_init_mcelog must register xen_mce_chrdev_device before
native mce_chrdev_device registration if running under xen platform;

mcheck_init_device should be inited before threshold_init_device to
initialize mce_device, otherwise a a NULL ptr dereference will cause panic.

so we use following _initcalls
1. device_initcall(xen_late_init_mcelog);
2. device_initcall_sync(mcheck_init_device);
3. late_initcall(threshold_init_device);

when running under xen, the initcall order is 1,2,3;
on baremetal, we skip 1 and we do only 2 and 3.

Acked-and-tested-by: Borislav Petkov <bp@amd64.org>
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:51:37 -04:00
Liu, Jinsong cef12ee52b xen/mce: Add mcelog support for Xen platform
When MCA error occurs, it would be handled by Xen hypervisor first,
and then the error information would be sent to initial domain for logging.

This patch gets error information from Xen hypervisor and convert
Xen format error into Linux format mcelog. This logic is basically
self-contained, not touching other kernel components.

By using tools like mcelog tool users could read specific error information,
like what they did under native Linux.

To test follow directions outlined in Documentation/acpi/apei/einj.txt

Acked-and-tested-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Ke, Liping <liping.ke@intel.com>
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19 15:51:36 -04:00
Steven Rostedt 4de72395ff ftrace/x86: Add save_regs for i386 function calls
Add saving full regs for function tracing on i386.
The saving of regs was influenced by patches sent out by
Masami Hiramatsu.

Link: Link: http://lkml.kernel.org/r/20120711195745.379060003@goodmis.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-07-19 13:20:37 -04:00
Steven Rostedt 08f6fba503 ftrace/x86: Add separate function to save regs
Add a way to have different functions calling different trampolines.
If a ftrace_ops wants regs saved on the return, then have only the
functions with ops registered to save regs. Functions registered by
other ops would not be affected, unless the functions overlap.

If one ftrace_ops registered functions A, B and C and another ops
registered fucntions to save regs on A, and D, then only functions
A and D would be saving regs. Function B and C would work as normal.
Although A is registered by both ops: normal and saves regs; this is fine
as saving the regs is needed to satisfy one of the ops that calls it
but the regs are ignored by the other ops function.

x86_64 implements the full regs saving, and i386 just passes a NULL
for regs to satisfy the ftrace_ops passing. Where an arch must supply
both regs and ftrace_ops parameters, even if regs is just NULL.

It is OK for an arch to pass NULL regs. All function trace users that
require regs passing must add the flag FTRACE_OPS_FL_SAVE_REGS when
registering the ftrace_ops. If the arch does not support saving regs
then the ftrace_ops will fail to register. The flag
FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED may be set that will prevent the
ftrace_ops from failing to register. In this case, the handler may
either check if regs is not NULL or check if ARCH_SUPPORTS_FTRACE_SAVE_REGS.
If the arch supports passing regs it will set this macro and pass regs
for ops that request them. All other archs will just pass NULL.

Link: Link: http://lkml.kernel.org/r/20120711195745.107705970@goodmis.org

Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-07-19 13:20:03 -04:00
Steven Rostedt 28fb5dfa78 ftrace/x86_32: Push ftrace_ops in as 3rd parameter to function tracer
Add support of passing the current ftrace_ops into the 3rd parameter
of the callback to the function tracer.

Link: http://lkml.kernel.org/r/20120612225424.942411318@goodmis.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-07-19 13:19:27 -04:00
Steven Rostedt 2f5f6ad939 ftrace: Pass ftrace_ops as third parameter to function trace callback
Currently the function trace callback receives only the ip and parent_ip
of the function that it traced. It would be more powerful to also return
the ops that registered the function as well. This allows the same function
to act differently depending on what ftrace_ops registered it.

Link: http://lkml.kernel.org/r/20120612225424.267254552@goodmis.org

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-07-19 13:17:35 -04:00
Avi Kivity d63d3e6217 x86, hyper: fix build with !CONFIG_KVM_GUEST
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-07-18 17:01:48 -03:00
Ingo Molnar a2fe194723 Merge branch 'linus' into perf/core
Pick up the latest ring-buffer fixes, before applying a new fix.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-18 11:17:17 +02:00
Michael S. Tsirkin 9053666406 KVM guest: switch to apic_set_eoi_write, apic_write
Use apic_set_eoi_write, apic_write to avoid meedling in core apic
driver data structures directly.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-16 12:51:44 +03:00
Michael S. Tsirkin 1551df646d apic: add apic_set_eoi_write for PV use
KVM PV EOI optimization overrides eoi_write apic op with its own
version. Add an API for this to avoid meddling with core x86 apic driver
data structures directly.

For KVM use, we don't need any guarantees about when the switch to the
new op will take place, so it could in theory use this API after SMP init,
but it currently doesn't, and restricting callers to early init makes it
clear that it's safe as it won't race with actual APIC driver use.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-16 12:51:23 +03:00
Will Drewry 09d314425f vsyscall_64: add missing ifdef CONFIG_SECCOMP
vsyscall_seccomp introduced a dependency on __secure_computing.  On
configurations with CONFIG_SECCOMP disabled, compilation will fail.

Reported-by: feng xiangjun <fengxj325@gmail.com>
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-14 12:01:36 -07:00
Will Drewry 5651721ede x86/vsyscall: allow seccomp filter in vsyscall=emulate
If a seccomp filter program is installed, older static binaries and
distributions with older libc implementations (glibc 2.13 and earlier)
that rely on vsyscall use will be terminated regardless of the filter
program policy when executing time, gettimeofday, or getcpu.  This is
only the case when vsyscall emulation is in use (vsyscall=emulate is the
default).

This patch emulates system call entry inside a vsyscall=emulate by
populating regs->ax and regs->orig_ax with the system call number prior
to calling into seccomp such that all seccomp-dependencies function
normally.  Additionally, system call return behavior is emulated in line
with other vsyscall entrypoints for the trace/trap cases.

[ v2: fixed ip and sp on SECCOMP_RET_TRAP/TRACE (thanks to luto@mit.edu) ]
Reported-and-tested-by: Owen Kibel <qmewlo@gmail.com>
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-13 14:25:55 -07:00
Ingo Molnar bb65a764de Merge branch 'mce-ripvfix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/mce
Merge memory fault handling fix from Tony Luck.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-11 22:37:48 +02:00
Tony Luck 6751ed65dc x86/mce: Fix siginfo_t->si_addr value for non-recoverable memory faults
In commit dad1743e59 ("x86/mce: Only restart instruction after machine
check recovery if it is safe") we fixed mce_notify_process() to force a
signal to the current process if it was not restartable (RIPV bit not
set in MCG_STATUS). But doing it here means that the process doesn't
get told the virtual address of the fault via siginfo_t->si_addr. This
would prevent application level recovery from the fault.

Make a new MF_MUST_KILL flag bit for memory_failure() et al. to use so
that we will provide the right information with the signal.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: stable@kernel.org    # 3.4+
2012-07-11 10:20:47 -07:00
Prarit Bhargava fc73373b33 KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check
While debugging I noticed that unlike all the other hypervisor code in the
kernel, kvm does not have an entry for x86_hyper which is used in
detect_hypervisor_platform() which results in a nice printk in the
syslog.  This is only really a stub function but it
does make kvm more consistent with the other hypervisors.


Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Marcelo Tostatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-07-11 19:33:32 +03:00
Ingo Molnar 92254d3144 Linux 3.5-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJP+NMmAAoJEHm+PkMAQRiGPxEH/18YQN8FAzEIjcC10ytA3RC3
 KzPv31jXgJGZDy1UqmpKtJ7GDwb92AhqZxVnJimMa+6d1uA8NsZQq5EMOPPiX8Qi
 8P4AEaw5kSMmR/6zxsxguCGdbDLU3xZ1nJZkHyMgjo2UJbMU0jBPneb/79heWPhe
 0HOkLzN5VA6Yx3Nt70sWQ1zsuj0Ji5jCGO0iNTCBmTiv4J9ZlOx3xJQn4aK6JscO
 /3QRTM43GG0j6zToEOCTHrn8ajOq6rHQQkG0bPVR723nFrSGLoaCT6QVBXYug+AZ
 9Xay7zVNvrq2oH5x5jADG2t2vyaG+nEJpSrVjXznzxgDnK7tWjYqiuG5zqKhAq8=
 =IMfr
 -----END PGP SIGNATURE-----

Merge tag 'v3.5-rc6' into x86/mce

Merge Linux 3.5-rc6 before merging more code.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-11 09:41:37 +02:00
Jan Beulich a7101d1526 x86/mm/mtrr: Slightly simplify print_mtrr_state()
high_width can be easily calculated in a single expression when
making use of __ffs64().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/4FF71053020000780008E1B5@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-10 10:38:15 +02:00
Jan Beulich 1ba9a29414 x86/mm/mtrr: Fix alignment determination in range_to_mtrr()
With the variable operated on being of "unsigned long" type,
neither ffs() nor fls() are suitable to use on them, as those
truncate their arguments to 32 bits. Using __ffs() and __fls()
respectively at once eliminates the need to subtract 1 from their
results.

Additionally, with the alignment value subsequently used as a
shift count, it must be enforced to be less than BITS_PER_LONG
(and on 64-bit there's no need for it to be any smaller).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/4FF70D54020000780008E179@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-10 10:38:14 +02:00
Pekka Enberg c3b7cdf180 perf/x86: Fix intel_perfmon_event_mapformatting
Use tabs for "intel_perfmon_event_map" formatting in
perf_event_intel.c.

Signed-off-by: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1341568786-7045-1-git-send-email-penberg@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-06 13:16:15 +02:00
Suresh Siddha d872818dbb x86/apic/x2apic: Use multiple cluster members for the irq destination only with the explicit affinity
During boot or driver load etc, interrupt destination is setup
using default target cpu's. Later the user (irqbalance etc) or
the driver (irq_set_affinity/ irq_set_affinity_hint) can request
the interrupt to be migrated to some specific set of cpu's.

In the x2apic cluster routing, for the default scenario use
single cpu as the interrupt destination and when there is an
explicit interrupt affinity request, route the interrupt to
multiple members of a x2apic cluster specified in the cpumask of
the migration request.

This will minmize the vector pressure when there are lot of
interrupt sources and relatively few x2apic clusters (for
example a single socket server). This will allow the performance
critical interrupts to be routed to multiple cpu's in the x2apic
cluster (irqbalance for example uses the cache siblings etc
while specifying the interrupt destination) and allow
non-critical interrupts to be serviced by a single logical cpu.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/r/1340656709-11423-4-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-06 11:00:23 +02:00
Suresh Siddha 1ac322d0b1 x86/apic/x2apic: Limit the vector reservation to the user specified mask
For the x2apic cluster mode, vector for an interrupt is
currently reserved on all the cpu's that are part of the x2apic
cluster. But the interrupts will be routed only to the cluster
(derived from the first cpu in the mask) members specified in
the mask. So there is no need to reserve the vector in the
unused cluster members.

Modify __assign_irq_vector() to reserve the vectors based on the
user specified irq destination mask. If the new mask is a proper
subset of the currently used mask, cleanup the vector allocation
on the unused cpu members.

Also, allow the apic driver to tune the vector domain based on
the affinity mask (which in most cases is the user-specified
mask).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/r/1340656709-11423-3-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-06 11:00:22 +02:00
Suresh Siddha b39f25a849 x86/apic: Optimize cpu traversal in __assign_irq_vector() using domain membership
Currently __assign_irq_vector() goes through each cpu in the
specified mask until it finds a free vector in all the cpu's
that are part of the same interrupt domain. We visit all the
interrupt domain sibling cpus to reserve the free vector. So,
when we fail to find a free vector in an interrupt domain, it is
safe to continue our search with a cpu belonging to a new
interrupt domain. No need to go through each cpu, if the domain
containing that cpu is already visited.

Use the irq_cfg's old_domain to track the visited domains and
optimize the cpu traversal while finding a free vector in the
given cpumask.

NOTE: We can also optimize the search by using for_each_cpu() and
skip the current cpu, if it is not the first cpu in the mask
returned by the vector_allocation_domain(). But re-using the
cfg->old_domain to track the visited domains will be slightly
faster.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/r/1340656709-11423-2-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-06 11:00:21 +02:00
Yan, Zheng 6a67943a18 perf/x86: Uncore filter support for SandyBridge-EP
This patch adds C-Box and PCU filter support for SandyBridge-EP
uncore. We can filter C-Box events by thread/core ID and filter
PCU events by frequency/voltage.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1341381616-12229-5-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05 21:56:01 +02:00
Yan, Zheng 4208969724 perf/x86: Detect number of instances of uncore CBox
The CBox manages the interface between the core and the LLC, so
the instances of uncore CBox is equal to number of cores.

Reported-by: Andrew Cooks <acooks@gmail.com>
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1341381616-12229-4-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05 21:56:00 +02:00
Yan, Zheng 3b19e4c98c perf/x86: Fix event constraint for SandyBridge-EP C-Box
The constraint for C-Box event 0x1f should have overlap flag set.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340866596-22502-2-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05 21:55:59 +02:00
Yan, Zheng eca26c9950 perf/x86: Use 0xff as pseudo code for fixed uncore event
Stephane Eranian suggestted using 0xff as pseudo code for fixed
uncore event and using the umask value to determine which of the
fixed events we want to map to. So far there is at most one fixed
counter in a uncore PMU. So just change the definition of
UNCORE_FIXED_EVENT to 0xff.

Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340780953-21130-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05 21:55:58 +02:00
Peter Zijlstra 3e0091e2b6 perf/x86: Save a few bytes in 'struct x86_pmu'
All these are basically boolean flags, use a bitfield to save a few
bytes.

Suggested-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-vsevd5g8lhcn129n3s7trl7r@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05 21:55:58 +02:00
Peter Zijlstra c93dc84cbe perf/x86: Add a microcode revision check for SNB-PEBS
Recent Intel microcode resolved the SNB-PEBS issues, so conditionally
enable PEBS on SNB hardware depending on the microcode revision.

Thanks to Stephane for figuring out the various microcode revisions.

Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-v3672ziwh9damwqwh1uz3krm@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05 21:55:57 +02:00
Robert Richter f285f92f7e perf/x86: Improve debug output in check_hw_exists()
It might be of interest which perfctr msr failed.

Signed-off-by: Robert Richter <robert.richter@amd.com>
[ added hunk to avoid GCC warn ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340217996-2254-5-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05 21:19:42 +02:00
Robert Richter b1dc3c4820 perf/x86/amd: Unify AMD's generic and family 15h pmus
There is no need for keeping separate pmu structs. We can enable
amd_{get,put}_event_constraints() functions also for family 15h event.

The advantage is that there is only a single pmu struct for all AMD
cpus. This patch introduces functions to setup the pmu to enabe core
performance counters or counter constraints.

Also, cpuid checks are used instead of family checks where
possible. Thus, it enables the code independently of cpu families if
the feature flag is set.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340217996-2254-4-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05 21:19:41 +02:00
Robert Richter a1eac7ac90 perf/x86: Move Intel specific code to intel_pmu_init()
There is some Intel specific code in the generic x86 path. Move it to
intel_pmu_init().

Since p4 and p6 pmus don't have fixed counters we may skip the check
in case such a pmu is detected.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340217996-2254-3-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05 21:19:40 +02:00
Robert Richter 15c7ad51ad perf/x86: Rename Intel specific macros
There are macros that are Intel specific and not x86 generic. Rename
them into INTEL_*.

This patch removes X86_PMC_IDX_GENERIC and does:

 $ sed -i -e 's/X86_PMC_MAX_/INTEL_PMC_MAX_/g'           \
         arch/x86/include/asm/kvm_host.h                 \
         arch/x86/include/asm/perf_event.h               \
         arch/x86/kernel/cpu/perf_event.c                \
         arch/x86/kernel/cpu/perf_event_p4.c             \
         arch/x86/kvm/pmu.c
 $ sed -i -e 's/X86_PMC_IDX_FIXED/INTEL_PMC_IDX_FIXED/g' \
         arch/x86/include/asm/perf_event.h               \
         arch/x86/kernel/cpu/perf_event.c                \
         arch/x86/kernel/cpu/perf_event_intel.c          \
         arch/x86/kernel/cpu/perf_event_intel_ds.c       \
         arch/x86/kvm/pmu.c
 $ sed -i -e 's/X86_PMC_MSK_/INTEL_PMC_MSK_/g'           \
         arch/x86/include/asm/perf_event.h               \
         arch/x86/kernel/cpu/perf_event.c

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340217996-2254-2-git-send-email-robert.richter@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05 21:19:39 +02:00
Ingo Molnar 1070505d18 Merge branch 'x86/microcode' into perf/core
Merge this branch because we want to rely on the newer (and saner)
microcode loading and checking facilities.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05 21:13:57 +02:00
Ingo Molnar b0338e99b2 Merge branch 'x86/cpu' into perf/core
Merge this branch because we changed the wrmsr*_safe() API and there's
a conflict.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05 21:12:11 +02:00
Ingo Molnar 90574ebb7e Merge branch 'perf/urgent' into perf/core
Merge this branch to pick up a fixlet and to update to a more recent base.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05 21:10:23 +02:00
Peter Zijlstra ce5c1fe9a9 perf/x86: Fix USER/KERNEL tagging of samples
Several perf interrupt handlers (PEBS,IBS,BTS) re-write regs->ip but
do not update the segment registers. So use an regs->ip based test
instead of an regs->cs/regs->flags based test.

Reported-and-tested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-xxrt0a1zronm1sm36obwc2vy@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-05 20:59:07 +02:00
Borislav Petkov 3d8986bc7f x86, microcode: Make reload interface per system
The reload interface should be per-system so that a full system ucode
reload happens (on each core) when doing

echo 1 > /sys/devices/system/cpu/microcode/reload

Move it to the cpu subsys directory instead of it being per-cpu.

Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1340280437-7718-3-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-07-01 10:24:09 -07:00
Borislav Petkov c9fc3f778a x86, microcode: Sanitize per-cpu microcode reloading interface
Microcode reloading in a per-core manner is a very bad idea for both
major x86 vendors. And the thing is, we have such interface with which
we can end up with different microcode versions applied on different
cores of an otherwise homogeneous wrt (family,model,stepping) system.

So turn off the possibility of doing that per core and allow it only
system-wide.

This is a minimal fix which we'd like to see in stable too thus the
more-or-less arbitrary decision to allow system-wide reloading only on
the BSP:

$ echo 1 > /sys/devices/system/cpu/cpu0/microcode/reload
...

and disable the interface on the other cores:

$ echo 1 > /sys/devices/system/cpu/cpu23/microcode/reload
-bash: echo: write error: Invalid argument

Also, allowing the reload only from one CPU (the BSP in
that case) doesn't allow the reload procedure to degenerate
into an O(n^2) deal when triggering reloads from all
/sys/devices/system/cpu/cpuX/microcode/reload sysfs nodes
simultaneously.

A more generic fix will follow.

Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1340280437-7718-2-git-send-email-bp@amd64.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
2012-07-01 10:24:05 -07:00
Linus Torvalds c76760926a Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull ACPI & Power Management patches from Len Brown.

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  acpi_pad: fix power_saving thread deadlock
  ACPI video: Still use ACPI backlight control if _DOS doesn't exist
  ACPI, APEI, Avoid too much error reporting in runtime
  ACPI: Add a quirk for "AMILO PRO V2030" to ignore the timer overriding
  ACPI: Remove one board specific WARN when ignoring timer overriding
  ACPI: Make acpi_skip_timer_override cover all source_irq==0 cases
  ACPI, x86: fix Dell M6600 ACPI reboot regression via DMI
  ACPI sysfs.c strlen fix
2012-06-30 11:11:58 -07:00
Len Brown 6eca954e25 Merge branches 'acpi_pad-bugzilla-42981', 'apei-bugzilla-43282', 'video-bugzilla-43168', 'bugzilla-40002' and 'bugfix-misc' into release
bug fixes
2012-06-30 00:53:50 -04:00
Fenghua Yu 954e482bde x86/copy_user_generic: Optimize copy_user_generic with CPU erms feature
According to Intel 64 and IA-32 SDM and Optimization Reference Manual, beginning
with Ivybridge, REG string operation using MOVSB and STOSB can provide both
flexible and high-performance REG string operations in cases like memory copy.
Enhancement availability is indicated by CPUID.7.0.EBX[9] (Enhanced REP MOVSB/
STOSB).

If CPU erms feature is detected, patch copy_user_generic with enhanced fast
string version of copy_user_generic.

A few new macros are defined to reduce duplicate code in ALTERNATIVE and
ALTERNATIVE_2.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1337908785-14015-1-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-06-29 15:33:34 -07:00
Linus Torvalds 15b77435ed Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, cpufeature: Remove stray %s, add -w to mkcapflags.pl
  x86, cpufeature: Catch duplicate CPU feature strings
  x86, cpufeature: Rename X86_FEATURE_DTS to X86_FEATURE_DTHERM
  x86: Fix kernel-doc warnings
  x86, compat: Use test_thread_flag(TIF_IA32) in compat signal delivery
2012-06-29 10:29:54 -07:00
Alex Shi 52aec3308d x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR
There are 32 INVALIDATE_TLB_VECTOR now in kernel. That is quite big
amount of vector in IDT. But it is still not enough, since modern x86
sever has more cpu number. That still causes heavy lock contention
in TLB flushing.

The patch using generic smp call function to replace it. That saved 32
vector number in IDT, and resolved the lock contention in TLB
flushing on large system.

In the NHM EX machine 4P * 8cores * HT = 64 CPUs, hackbench pthread
has 3% performance increase.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-9-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-27 19:29:13 -07:00
Alex Shi c4211f42d3 x86/tlb: add tlb_flushall_shift for specific CPU
Testing show different CPU type(micro architectures and NUMA mode) has
different balance points between the TLB flush all and multiple invlpg.
And there also has cases the tlb flush change has no any help.

This patch give a interface to let x86 vendor developers have a chance
to set different shift for different CPU type.

like some machine in my hands, balance points is 16 entries on
Romely-EP; while it is at 8 entries on Bloomfield NHM-EP; and is 256 on
IVB mobile CPU. but on model 15 core2 Xeon using invlpg has nothing
help.

For untested machine, do a conservative optimization, same as NHM CPU.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-5-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-27 19:29:10 -07:00
Alex Shi e0ba94f14f x86/tlb_info: get last level TLB entry number of CPU
For 4KB pages, x86 CPU has 2 or 1 level TLB, first level is data TLB and
instruction TLB, second level is shared TLB for both data and instructions.

For hupe page TLB, usually there is just one level and seperated by 2MB/4MB
and 1GB.

Although each levels TLB size is important for performance tuning, but for
genernal and rude optimizing, last level TLB entry number is suitable. And
in fact, last level TLB always has the biggest entry number.

This patch will get the biggest TLB entry number and use it in furture TLB
optimizing.

Accroding Borislav's suggestion, except tlb_ll[i/d]_* array, other
function and data will be released after system boot up.

For all kinds of x86 vendor friendly, vendor specific code was moved to its
specific files.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-2-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-27 19:28:24 -07:00
H. Peter Anvin 1b6b7c9ff3 x86, cpufeature: Remove stray %s, add -w to mkcapflags.pl
There was a stray %s left from testing, remove it.

Add -w to the #! line (which is parsed by Perl even if the Perl
interpreter is invoked explicitly on the command line) to catch these
kinds of errors in the future.

Reported-by: Jean Delvare <khali@linux-fr.org>
Link: http://lkml.kernel.org/r/20120626143246.0c9bf301@endymion.delvare
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-06-26 08:02:48 -07:00
H. Peter Anvin 55f6cb9d0b x86, cpufeature: Catch duplicate CPU feature strings
We had a case of duplicate CPU feature strings, a user space ABI
violation, for almost two years.  Make it a build error so that
doesn't happen again.

Link: http://lkml.kernel.org/r/4FE34BCB.5050305@linux.intel.com
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Jean Delvare <khali@linux-fr.org>
2012-06-25 09:02:13 -07:00
H. Peter Anvin 4ad3341130 x86, cpufeature: Rename X86_FEATURE_DTS to X86_FEATURE_DTHERM
It makes sense to label "Digital Thermal Sensor" as "DTS", but
unfortunately the string "dts" was already used for "Debug Store", and
/proc/cpuinfo is a user space ABI.

Therefore, rename this to "dtherm".

This conflict went into mainline via the hwmon tree without any x86
maintainer ack, and without any kind of hint in the subject.

    a4659053 x86/hwmon: fix initialization of coretemp

Reported-by: Jean Delvare <khali@linux-fr.org>
Link: http://lkml.kernel.org/r/4FE34BCB.5050305@linux.intel.com
Cc: Jan Beulich <JBeulich@suse.com>
Cc: <stable@vger.kernel.org> v2.6.36..v3.4
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-06-25 09:01:15 -07:00
Alex Williamson 7d43c2e42c iommu: Remove group_mf
The iommu=group_mf is really no longer needed with the addition of ACS
support in IOMMU drivers creating groups.  Most multifunction devices
will now be grouped already.  If a device has gone to the trouble of
exposing ACS, trust that it works.  We can use the device specific ACS
function for fixing devices we trust individually.  This largely
reverts bcb71abe.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2012-06-25 13:48:30 +02:00
Michael S. Tsirkin ab9cf4996b KVM guest: guest side for eoi avoidance
The idea is simple: there's a bit, per APIC, in guest memory,
that tells the guest that it does not need EOI.
Guest tests it using a single est and clear operation - this is
necessary so that host can detect interrupt nesting - and if set, it can
skip the EOI MSR.

I run a simple microbenchmark to show exit reduction
(note: for testing, need to apply follow-up patch
'kvm: host side for eoi optimization' + a qemu patch
 I posted separately, on host):

Before:

Performance counter stats for 'sleep 1s':

            47,357 kvm:kvm_entry                                                [99.98%]
                 0 kvm:kvm_hypercall                                            [99.98%]
                 0 kvm:kvm_hv_hypercall                                         [99.98%]
             5,001 kvm:kvm_pio                                                  [99.98%]
                 0 kvm:kvm_cpuid                                                [99.98%]
            22,124 kvm:kvm_apic                                                 [99.98%]
            49,849 kvm:kvm_exit                                                 [99.98%]
            21,115 kvm:kvm_inj_virq                                             [99.98%]
                 0 kvm:kvm_inj_exception                                        [99.98%]
                 0 kvm:kvm_page_fault                                           [99.98%]
            22,937 kvm:kvm_msr                                                  [99.98%]
                 0 kvm:kvm_cr                                                   [99.98%]
                 0 kvm:kvm_pic_set_irq                                          [99.98%]
                 0 kvm:kvm_apic_ipi                                             [99.98%]
            22,207 kvm:kvm_apic_accept_irq                                      [99.98%]
            22,421 kvm:kvm_eoi                                                  [99.98%]
                 0 kvm:kvm_pv_eoi                                               [99.99%]
                 0 kvm:kvm_nested_vmrun                                         [99.99%]
                 0 kvm:kvm_nested_intercepts                                    [99.99%]
                 0 kvm:kvm_nested_vmexit                                        [99.99%]
                 0 kvm:kvm_nested_vmexit_inject                                    [99.99%]
                 0 kvm:kvm_nested_intr_vmexit                                    [99.99%]
                 0 kvm:kvm_invlpga                                              [99.99%]
                 0 kvm:kvm_skinit                                               [99.99%]
                57 kvm:kvm_emulate_insn                                         [99.99%]
                 0 kvm:vcpu_match_mmio                                          [99.99%]
                 0 kvm:kvm_userspace_exit                                       [99.99%]
                 2 kvm:kvm_set_irq                                              [99.99%]
                 2 kvm:kvm_ioapic_set_irq                                       [99.99%]
            23,609 kvm:kvm_msi_set_irq                                          [99.99%]
                 1 kvm:kvm_ack_irq                                              [99.99%]
               131 kvm:kvm_mmio                                                 [99.99%]
               226 kvm:kvm_fpu                                                  [100.00%]
                 0 kvm:kvm_age_page                                             [100.00%]
                 0 kvm:kvm_try_async_get_page                                    [100.00%]
                 0 kvm:kvm_async_pf_doublefault                                    [100.00%]
                 0 kvm:kvm_async_pf_not_present                                    [100.00%]
                 0 kvm:kvm_async_pf_ready                                       [100.00%]
                 0 kvm:kvm_async_pf_completed

       1.002100578 seconds time elapsed

After:

 Performance counter stats for 'sleep 1s':

            28,354 kvm:kvm_entry                                                [99.98%]
                 0 kvm:kvm_hypercall                                            [99.98%]
                 0 kvm:kvm_hv_hypercall                                         [99.98%]
             1,347 kvm:kvm_pio                                                  [99.98%]
                 0 kvm:kvm_cpuid                                                [99.98%]
             1,931 kvm:kvm_apic                                                 [99.98%]
            29,595 kvm:kvm_exit                                                 [99.98%]
            24,884 kvm:kvm_inj_virq                                             [99.98%]
                 0 kvm:kvm_inj_exception                                        [99.98%]
                 0 kvm:kvm_page_fault                                           [99.98%]
             1,986 kvm:kvm_msr                                                  [99.98%]
                 0 kvm:kvm_cr                                                   [99.98%]
                 0 kvm:kvm_pic_set_irq                                          [99.98%]
                 0 kvm:kvm_apic_ipi                                             [99.99%]
            25,953 kvm:kvm_apic_accept_irq                                      [99.99%]
            26,132 kvm:kvm_eoi                                                  [99.99%]
            26,593 kvm:kvm_pv_eoi                                               [99.99%]
                 0 kvm:kvm_nested_vmrun                                         [99.99%]
                 0 kvm:kvm_nested_intercepts                                    [99.99%]
                 0 kvm:kvm_nested_vmexit                                        [99.99%]
                 0 kvm:kvm_nested_vmexit_inject                                    [99.99%]
                 0 kvm:kvm_nested_intr_vmexit                                    [99.99%]
                 0 kvm:kvm_invlpga                                              [99.99%]
                 0 kvm:kvm_skinit                                               [99.99%]
               284 kvm:kvm_emulate_insn                                         [99.99%]
                68 kvm:vcpu_match_mmio                                          [99.99%]
                68 kvm:kvm_userspace_exit                                       [99.99%]
                 2 kvm:kvm_set_irq                                              [99.99%]
                 2 kvm:kvm_ioapic_set_irq                                       [99.99%]
            28,288 kvm:kvm_msi_set_irq                                          [99.99%]
                 1 kvm:kvm_ack_irq                                              [99.99%]
               131 kvm:kvm_mmio                                                 [100.00%]
               588 kvm:kvm_fpu                                                  [100.00%]
                 0 kvm:kvm_age_page                                             [100.00%]
                 0 kvm:kvm_try_async_get_page                                    [100.00%]
                 0 kvm:kvm_async_pf_doublefault                                    [100.00%]
                 0 kvm:kvm_async_pf_not_present                                    [100.00%]
                 0 kvm:kvm_async_pf_ready                                       [100.00%]
                 0 kvm:kvm_async_pf_completed

       1.002039622 seconds time elapsed

We see that # of exits is almost halved.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-06-25 12:38:06 +03:00
Robert Richter 357398e96d perf/x86: Fix section mismatch in uncore_pci_init()
Fix section mismatch in uncore_pci_init():

 WARNING: vmlinux.o(.init.text+0x9246): Section mismatch in reference from the function uncore_pci_init() to the function .devexit.text:uncore_pci_remove()
 The function __init uncore_pci_init() references
 a function __devexit uncore_pci_remove().
 [...]

Signed-off-by: Robert Richter <robert.richter@amd.com>
Cc: <a.p.zijlstra@chello.nl>
Cc: <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/20120620163927.GI5046@erda.amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-25 10:32:21 +02:00
H. Peter Anvin 2b1b712f05 x86, reboot: Drop redundant write of reboot_mode
We write reboot_mode to BIOS location 0x472 in
native_machine_emergency_restart() (reboot.c:542) already, there is no
need to then write it again in machine_real_restart().

This means nothing gets written there for MRR_APM, but the APM call is
a poweroff call and doesn't use this memory location.

Link: http://lkml.kernel.org/n/tip-3i0pfh44c1e3jv5lab0cf7sc@git.kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-20 21:18:14 -07:00
Jan Beulich 0fa0e2f02e x86: Move call to print_modules() out of show_regs()
Printing the list of loaded modules is really unrelated to what
this function is about, and is particularly unnecessary in the
context of the SysRQ key handling (gets printed so far over and
over).

It should really be the caller of the function to decide whether
this piece of information is useful (and to avoid redundantly
printing it).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/4FDF21A4020000780008A67F@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-20 14:33:48 +02:00
Jan Beulich e1b6fc55da x86/microcode: Mark microcode_id[] as __initconst
It's not being used for other than creating module aliases (i.e.
no loadable section has any reference to it).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/4FDF1EFD020000780008A65D@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-20 14:33:47 +02:00
Li Zhong 0718467c85 x86/nmi: Clean up register_nmi_handler() usage
Implement a cleaner and easier to maintain version for the section
warning fixes implemented in commit eeaaa96a3a
("x86/nmi: Fix section mismatch warnings on 32-bit").

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Link: http://lkml.kernel.org/r/1340049393-17771-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-20 14:23:17 +02:00
Ingo Molnar 6a991accee Merge commit 'v3.5-rc3' into x86/debug
Merge it in to pick up a fix that we are going to clean up in this
branch.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-20 14:22:34 +02:00
Peter Zijlstra 2992c542fc perf/x86: Lowercase uncore PMU event names
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-ucnds8gkve4x3s4biuukyph3@git.kernel.org
[ Trivial build fix ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18 15:55:52 +02:00
Yan, Zheng 7c94ee2e09 perf/x86: Add Intel Nehalem and Sandy Bridge-EP uncore support
The uncore subsystem in Sandy Bridge-EP consists of 8 components:

 Ubox, Cacheing Agent, Home Agent, Memory controller, Power Control,
 QPI Link Layer, R2PCIe, R3QPI.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1339741902-8449-9-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18 12:13:23 +02:00
Yan, Zheng 14371cce03 perf: Add generic PCI uncore PMU device support
This patch adds generic support for uncore PMUs presented as
PCI devices. (These come in addition to the CPU/MSR based
uncores.)

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1339741902-8449-8-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18 12:13:23 +02:00
Yan, Zheng fcde10e916 perf/x86: Add Intel Nehalem and Sandy Bridge uncore PMU support
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1339741902-8449-7-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18 12:13:22 +02:00
Yan, Zheng 087bfbb032 perf/x86: Add generic Intel uncore PMU support
This patch adds the generic Intel uncore PMU support, including helper
functions that add/delete uncore events, a hrtimer that periodically
polls the counters to avoid overflow and code that places all events
for a particular socket onto a single cpu.

The code design is based on the structure of Sandy Bridge-EP's uncore
subsystem, which consists of a variety of components, each component
contains one or more "boxes".

(Tooling support follows in the next patches.)

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1339741902-8449-6-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18 12:13:22 +02:00
Yan, Zheng 4b4969b144 perf: Export perf_assign_events()
Export perf_assign_events() so the uncore code can use it to
schedule events.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1339741902-8449-2-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18 12:13:20 +02:00
Robert Richter 76958a61e4 perf/x86/amd: Fix RDPMC index calculation for AMD family 15h
The RDPMC index calculation is wrong for AMD family 15h
(X86_FEATURE_ PERFCTR_CORE set). This leads to a #GP when
accessing the counter:

 Pid: 2237, comm: syslog-ng Not tainted 3.5.0-rc1-perf-x86_64-standard-g130ff90 #135 AMD Pike/Pike
 RIP: 0010:[<ffffffff8100dc33>]  [<ffffffff8100dc33>] x86_perf_event_update+0x27/0x66

While the msr address offset is (index << 1) we must use index to
select the correct rdpmc.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Vince Weaver <vweaver1@eecs.utk.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18 11:14:35 +02:00
Ido Yariv abf71f3066 x86/vsmp: Fix vector_allocation_domain's return value
Commit 8637e38a ("x86/apic: Avoid useless scanning thru a
cpumask in assign_irq_vector()") modified
vector_allocation_domain() to return a boolean indicating if
cpumask is dynamic or static. Adjust vSMP's callback
implementation accordingly.

Signed-off-by: Ido Yariv <ido@wizery.com>
Acked-by: Shai Fultheim <shai@scalemp.com>
Cc: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/1339773055-27397-1-git-send-email-ido@wizery.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18 11:10:23 +02:00
Ingo Molnar 8461689c67 Merge branch 'x86/apic' into x86/platform
Merge in x86/apic to solve a vector_allocation_domain() API change semantic merge conflict.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18 11:09:49 +02:00
Wanpeng Li c15acff337 x86: Fix kernel-doc warnings
Signed-off-by: Wanpeng Li <liwp@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
Cc: Wanpeng Li <liwp.linux@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-18 10:53:18 +02:00
H. Peter Anvin 650513979a x86-64, reboot: Allow reboot=bios and reboot-cpu override on x86-64
With the revamped realmode trampoline code, it is trivial to extend
support for reboot=bios to x86-64.  Furthermore, while we are at it,
remove the restriction that only we can only override the reboot CPU
on 32 bits.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/n/tip-jopx7y6g6dbcx4tpal8q0jlr@git.kernel.org
2012-06-17 10:51:01 -07:00
Linus Torvalds 56b880e2e3 Merge branch 'fixes-for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull DMA-mapping fixes from Marek Szyprowski:
 "A set of minor fixes for dma-mapping code (ARM and x86) required for
  Contiguous Memory Allocator (CMA) patches merged in v3.5-rc1."

* 'fixes-for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  x86: dma-mapping: fix broken allocation when dma_mask has been provided
  ARM: dma-mapping: fix debug messages in dmabounce code
  ARM: mm: fix type of the arm_dma_limit global variable
  ARM: dma-mapping: Add missing static storage class specifier
2012-06-15 17:35:01 -07:00
Linus Torvalds c83119a980 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/smp: Fix topology checks on AMD MCM CPUs
  x86/mm: Fix some kernel-doc warnings
  x86, um: Correct syscall table type attributes breaking gcc 4.8
2012-06-15 16:59:19 -07:00
Suresh Siddha 7eb9ae0799 irq/apic: Use config_enabled(CONFIG_SMP) checks to clean up irq_set_affinity() for UP
Move the ->irq_set_affinity() routines out of the #ifdef CONFIG_SMP
sections and use config_enabled(CONFIG_SMP) checks inside those
routines. Thus making those routines simple null stubs for
!CONFIG_SMP and retaining those routines with no additional
runtime overhead for CONFIG_SMP kernels.

Cleans up the ifdef CONFIG_SMP in and around routines related to
irq_set_affinity in io_apic and irq_remapping subsystems.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: torvalds@linux-foundation.org
Cc: joerg.roedel@amd.com
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: http://lkml.kernel.org/r/1339723729.3475.63.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-15 14:17:29 +02:00
Ingo Molnar 879060d574 Merge branch 'x86/cleanups' into x86/apic
Merge in the cleanups because a followup x86/apic change relies on them.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-15 14:17:01 +02:00
Ido Yariv d48daf37a3 x86/vsmp: Fix linker error when CONFIG_PROC_FS is not set
set_vsmp_pv_ops() references no_irq_affinity which is undeclared
if CONFIG_PROC_FS isn't set. Fix this by adding an #ifdef around
this variable's access.

Reported-by: Fengguang Wu <wfg@linux.intel.com>
Signed-off-by: Ido Yariv <ido@wizery.com>
Acked-by: Shai Fultheim <shai@scalemp.com>
Link: http://lkml.kernel.org/r/1339688588-12674-1-git-send-email-ido@wizery.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-15 13:54:11 +02:00
Marek Szyprowski c080e26edc x86: dma-mapping: fix broken allocation when dma_mask has been provided
Commit 0a2b9a6ea9 ("X86: integrate CMA with DMA-mapping subsystem")
broke memory allocation with dma_mask. This patch fixes possible kernel
ops caused by lack of resetting page variable when jumping to 'again' label.

Reported-by: Konrad Rzeszutek Wilk <konrad@darnok.org>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
2012-06-14 14:01:30 +02:00
Alexander Gordeev 5a0a2a3081 x86/apic/es7000: Make apicid of a cluster (not CPU) from a cpumask
cpu_mask_to_apicid_and() always returns apicid of a single CPU,
even in case multiple CPUs were requested. This update fixes a
typo and forces apicid of a cluster to be returned.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120614075043.GI3383@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-14 12:53:16 +02:00
Alexander Gordeev 214e270b5f x86/apic/es7000+summit: Always make valid apicid from a cpumask
In case of invalid parameters cpu_mask_to_apicid_and() might
return apicid value of 0 (on Summit) or a uninitialized value
(on ES7000), although it is supposed to return apicid of cpu-0
at least. Fix the operation to always return a valid apicid.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120614075026.GH3383@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-14 12:53:15 +02:00
Alexander Gordeev 49ad3fd483 x86/apic/es7000+summit: Fix compile warning in cpu_mask_to_apicid()
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120614075010.GG3383@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-14 12:53:15 +02:00
Alexander Gordeev ea3807ea52 x86/apic: Fix ugly casting and branching in cpu_mask_to_apicid_and()
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120614074954.GF3383@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-14 12:53:14 +02:00
Alexander Gordeev a5a391561b x86/apic: Eliminate cpu_mask_to_apicid() operation
Since there are only two locations where cpu_mask_to_apicid() is
called from, remove the operation and use only
cpu_mask_to_apicid_and() instead.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Suggested-and-acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120614074935.GE3383@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-14 12:53:13 +02:00
Alexander Gordeev cac4afbc3d x86/x2apic/cluster: Vector_allocation_domain() should return a value
Since commit 8637e38 ("x86/apic: Avoid useless scanning thru a
cpumask in assign_irq_vector()") vector_allocation_domain()
operation indicates if a cpumask is dynamic or static. This
update fixes the oversight and makes the operation to return a
value.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120614103933.GJ3383@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-14 12:53:12 +02:00
Vlad Zolotarov 0816b0f036 x86: Add read_mostly declaration/definition to variables from smp.h
Add "read-mostly" qualifier to the following variables in
smp.h:

 - cpu_sibling_map
 - cpu_core_map
 - cpu_llc_shared_map
 - cpu_llc_id
 - cpu_number
 - x86_cpu_to_apicid
 - x86_bios_cpu_apicid
 - x86_cpu_to_logical_apicid

As long as all the variables above are only written during the
initialization, this change is meant to prevent the false
sharing. More specifically, on vSMP Foundation platform
x86_cpu_to_apicid shared the same internode_cache_line with
frequently written lapic_events.

From the analysis of the first 33 per_cpu variables out of 219
(memories they describe, to be more specific) the 8 have read_mostly
nature (tlb_vector_offset, cpu_loops_per_jiffy, xen_debug_irq, etc.)
and 25 are frequently written (irq_stack_union, gdt_page,
exception_stacks, idt_desc, etc.).

Assuming that the spread of the rest of the per_cpu variables is
similar, identifying the read mostly memories will make more sense
in terms of long-term code maintenance comparing to identifying
frequently written memories.

Signed-off-by: Vlad Zolotarov <vlad@scalemp.com>
Acked-by: Shai Fultheim <shai@scalemp.com>
Cc: Shai Fultheim (Shai@ScaleMP.com) <Shai@scalemp.com>
Cc: ido@wizery.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1719258.EYKzE4Zbq5@vlad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-14 12:42:11 +02:00
OGAWA Hirofumi 2f74759056 x86/alternatives: Use atomic_xchg() instead atomic_dec_and_test() for stop_machine_text_poke()
stop_machine_text_poke() uses atomic_dec_and_test() to select one of
the CPUs executing that function to actually modify the code.

Since the variable is initialized to 1, subsequent CPUs will make the
variable go negative. Since going negative is uncommon/unexpected in
typical dec_and_test usage change this user to atomic_xchg().

This was found using a patch that warns on dec_and_test going
negative.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
[ Rewrote changelog ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/87zk8fgsx9.fsf@devron.myhome.or.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-13 15:08:37 +02:00
Borislav Petkov 161270fc1f x86/smp: Fix topology checks on AMD MCM CPUs
The warning below triggers on AMD MCM packages because physical package
IDs on the cores of a _physical_ socket are the same. I.e., this field
says which CPUs belong to the same physical package.

However, the same two CPUs belong to two different internal, i.e.
"logical" nodes in the same physical socket which is reflected in the
CPU-to-node map on x86 with NUMA.

Which makes this check wrong on the above topologies so circumvent it.

[    0.444413] Booting Node   0, Processors  #1 #2 #3 #4 #5 Ok.
[    0.461388] ------------[ cut here ]------------
[    0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
[    0.473960] Hardware name: Dinar
[    0.477170] sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.486860] Booting Node   1, Processors  #6
[    0.491104] Modules linked in:
[    0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1
[    0.499510] Call Trace:
[    0.501946]  [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81
[    0.508185]  [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d
[    0.514163]  [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48
[    0.519881]  [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81
[    0.525943]  [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371
[    0.532004]  [<ffffffff8144c4ee>] start_secondary+0x19a/0x218
[    0.537729] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.628197]  #7 #8 #9 #10 #11 Ok.
[    0.807108] Booting Node   3, Processors  #12 #13 #14 #15 #16 #17 Ok.
[    0.897587] Booting Node   2, Processors  #18 #19 #20 #21 #22 #23 Ok.
[    0.917443] Brought up 24 CPUs

We ran a topology sanity check test we have here on it and
it all looks ok... hopefully :).

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120529135442.GE29157@aftab.osrc.amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-13 14:56:12 +02:00
Sebastian Andrzej Siewior 83452c6a43 x86/PCI: move fixup hooks from __init to __devinit
The fixups are executed once the pci-device is found which is during
boot process so __init seems fine as long as the platform does not
support hotplug.

However it is possible to remove the PCI bus at run time and have it
rediscovered again via "echo 1 > /sys/bus/pci/rescan" and this will call
the fixups again.

Cc: x86@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-06-12 09:10:54 -06:00
Marcelo Tosatti e32025a564 x86: kvmclock: remove check_and_clear_guest_paused warning
CPU offline path calls the hrtimer interrupt handler with interrupts
disabled, without touching preempt_count, triggering this warning.

Remove the warning since it is supposed to be used from hrtimer
interrupt context only.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2012-06-11 23:18:33 -03:00
Feng Tang f6b54f083c ACPI: Add a quirk for "AMILO PRO V2030" to ignore the timer overriding
This is the 2nd part of fix for kernel bugzilla 40002:
    "IRQ 0 assigned to VGA"
https://bugzilla.kernel.org/show_bug.cgi?id=40002

The root cause is the buggy FW, whose ACPI tables assign the GSI 16
to 2 irqs 0 and 16(VGA), and the VGA is the right owner of GSI 16.
So add a quirk to ignore the irq0 overriding GSI 16 for the
FUJITSU SIEMENS AMILO PRO V2030 platform will solve this issue.

Reported-and-tested-by: Szymon Kowalczyk <fazerxlo@o2.pl>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-11 17:29:44 -04:00
Feng Tang 7f68b4c2e1 ACPI: Remove one board specific WARN when ignoring timer overriding
Current WARN msg is only for the ati_ixp4x0 board, while this function
is used by mulitple platforms. So this one board specific warning
is not appropriate any more.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-11 17:29:38 -04:00
Feng Tang ae10ccdc30 ACPI: Make acpi_skip_timer_override cover all source_irq==0 cases
Currently when acpi_skip_timer_override is set, it only cover the
(source_irq == 0 && global_irq == 2) cases. While there is also
platform which need use this option and its global_irq is not 2.
This patch will extend acpi_skip_timer_override to cover all
timer overriding cases as long as the source irq is 0.

This is the first part of a fix to kernel bug bugzilla 40002:
	"IRQ 0 assigned to VGA"
https://bugzilla.kernel.org/show_bug.cgi?id=40002

Reported-and-tested-by: Szymon Kowalczyk <fazerxlo@o2.pl>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-11 17:29:30 -04:00
Ravikiran Thirumalai 110c1e1f1b x86/vsmp: Ignore IOAPIC IRQ affinity if possible
vSMP can route interrupts more optimally based on internal
knowledge the OS does not have. In order to support this
optimization, all CPUs must be able to handle all possible
IOAPIC interrupts.

Fix this by setting the vector allocation domain for all CPUs
and by enabling this feature in vSMP.

Signed-off-by: Ravikiran Thirumalai <kiran.thirumalai@gmail.com>
Signed-off-by: Shai Fultheim <shai@scalemp.com>
[ Rebased, simplified, and reworded the commit message. ]
Signed-off-by: Ido Yariv <ido@wizery.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-11 10:59:13 +02:00
Shuah Khan e2b297fcf1 perf/x86: Convert obsolete simple_strtoul() usage to kstrtoul()
Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1339384421.3025.8.camel@lorien2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-11 10:52:12 +02:00
Ingo Molnar c3e228d59b Linux 3.5-rc2
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJP0qm4AAoJEHm+PkMAQRiG62QIAJRNJFyVB0ZrsMPgdwLnlX4O
 5I86H7GaYXoOK/KMb2s5h4KiFggIODnyEkZi+/39tJOgGo0KrMcDlsh0owB1Iggw
 LE6iyze9I1z9wQze0+SXe7VAcvUYvsx2vgpOKvoNi97Qgn3B6onL+SAi5U+NAqJl
 0NdKmveEd42UIm7JfChHlxl8bm8YB+WcU38OkMGpRpJ/Moz9EbSjYVQg3oHrzJjy
 duiX6SD/OV4m5yCcXXmu+f41pN+SG7xENJ5r4enyi2ZF8mAyVz2goIyL2bA0AJX2
 +GbpD1sxUHkZ6yPg4tf2bmJOj0PkfZNAi8YpFxZDlP4y1pKuCTEDTBp8O2id43w=
 =Jyn8
 -----END PGP SIGNATURE-----

Merge tag 'v3.5-rc2' into perf/core

Merge in Linux 3.5-rc2 - to pick up fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-11 10:51:35 +02:00
Steven Rostedt 70fb74a542 x86: Save cr2 in NMI in case NMIs take a page fault (for i386)
Avi Kivity reported that page faults in NMIs could cause havic if
the NMI preempted another page fault handler:

   The recent changes to NMI allow exceptions to take place in NMI
   handlers, but I think that a #PF (say, due to access to vmalloc space)
   is still problematic.  Consider the sequence

    #PF  (cr2 set by processor)
      NMI
        ...
        #PF (cr2 clobbered)
          do_page_fault()
          IRET
        ...
        IRET
      do_page_fault()
        address = read_cr2()

   The last line reads the overwritten cr2 value.

This is the i386 version, which has the luxury of doing the work
in C code.

Link: http://lkml.kernel.org/r/4FBB8C40.6080304@redhat.com

Reported-by: Avi Kivity <avi@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-06-08 18:51:12 -04:00
Steven Rostedt c7d65a78fc x86: Remove cmpxchg from i386 NMI nesting code
I've been informed by someone on LWN called 'slashdot' that
some i386 machines do not support a true cmpxchg. The cmpxchg
used by the i386 NMI nesting code must be a true cmpxchg as
disabling interrupts will not work for NMIs (which is the work
around for i386s that do not have a true cmpxchg).

This 'slashdot' character also suggested a fix to the issue.
As the state of the nesting NMIs goes as follows:

  NOT_RUNNING -> EXECUTING
  EXECUTING   -> NOT_RUNNING
  EXECUTING   -> LATCHED
  LATCHED     -> EXECUTING

Having these states as enum values of:

  NOT_RUNNING = 0
  EXECUTING   = 1
  LATCHED     = 2

Instead of a cmpxchg to make EXECUTING -> NOT_RUNNING a
dec_and_test() would work as well. If the dec_and_test brings
the state to NOT_RUNNING, that is the same as a cmpxchg
succeeding to change EXECUTING to NOT_RUNNING. If a nested NMI
were to come in and change it to LATCHED, the dec_and_test() would
convert the state to EXECUTING (what we want it to be in such a
case anyway).

I asked 'slashdot' to post this as a patch, but it never came to
be. I decided to do the work instead.

Thanks to H. Peter Anvin for suggesting to use this_cpu_dec_and_return()
instead of local_dec_and_test(&__get_cpu_var()).

Link: http://lwn.net/Articles/484932/

Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-06-08 18:48:05 -04:00
Linus Torvalds 7249450449 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar.

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix the relax_domain_level boot parameter
  sched: Validate assumptions in sched_init_numa()
  sched: Always initialize cpu-power
  sched: Fix domain iteration
  sched/rt: Fix lockdep annotation within find_lock_lowest_rq()
  sched/numa: Load balance between remote nodes
  sched/x86: Calculate booted cores after construction of sibling_mask
2012-06-08 14:59:29 -07:00
Linus Torvalds 0b35d326f8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/nmi: Fix section mismatch warnings on 32-bit
  x86/uv: Fix UV2 BAU legacy mode
  x86/mm: Only add extra pages count for the first memory range during pre-allocation early page table space
  x86, efi stub: Add .reloc section back into image
  x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs
  x86/reboot: Fix a warning message triggered by stop_other_cpus()
  x86/intel/moorestown: Change intel_scu_devices_create() to __devinit
  x86/numa: Set numa_nodes_parsed at acpi_numa_memory_affinity_init()
  x86/gart: Fix kmemleak warning
  x86: mce: Add the dropped timer interval init back
  x86/mce: Fix the MCE poll timer logic
2012-06-08 09:26:55 -07:00
Linus Torvalds 106544d81d Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "A bit larger than what I'd wish for - half of it is due to hw driver
  updates to Intel Ivy-Bridge which info got recently released,
  cycles:pp should work there now too, amongst other things.  (but we
  are generally making exceptions for hardware enablement of this type.)

  There are also callchain fixes in it - responding to mostly
  theoretical (but valid) concerns.  The tooling side sports perf.data
  endianness/portability fixes which did not make it for the merge
  window - and various other fixes as well."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  perf/x86: Check user address explicitly in copy_from_user_nmi()
  perf/x86: Check if user fp is valid
  perf: Limit callchains to 127
  perf/x86: Allow multiple stacks
  perf/x86: Update SNB PEBS constraints
  perf/x86: Enable/Add IvyBridge hardware support
  perf/x86: Implement cycles:p for SNB/IVB
  perf/x86: Fix Intel shared extra MSR allocation
  x86/decoder: Fix bsr/bsf/jmpe decoding with operand-size prefix
  perf: Remove duplicate invocation on perf_event_for_each
  perf uprobes: Remove unnecessary check before strlist__delete
  perf symbols: Check for valid dso before creating map
  perf evsel: Fix 32 bit values endianity swap for sample_id_all header
  perf session: Handle endianity swap on sample_id_all header data
  perf symbols: Handle different endians properly during symbol load
  perf evlist: Pass third argument to ioctl explicitly
  perf tools: Update ioctl documentation for PERF_IOC_FLAG_GROUP
  perf tools: Make --version show kernel version instead of pull req tag
  perf tools: Check if callchain is corrupted
  perf callchain: Make callchain cursors TLS
  ...
2012-06-08 09:14:46 -07:00
Ingo Molnar 707ecec1dc AMD thresholding fixes for 3.6
Those are a bunch of patches which give the MCE thresholding code a
 hard look and a scrubbing to remove a couple of annoyances like sysfs
 warnings when running CPU off-/online tests and the threshold_bank4 node
 under /sys/devices/system/machinecheck/ is a symlink.
 
 It also gives proper names to the thresholding banks instead of simply
 enumerating them, like this:
 
      /sys/devices/system/machinecheck/machinecheck0/
      |-- bank0
      |-- bank1
      |-- bank2
      |-- bank3
      |-- bank4
      |-- bank5
      |-- bank6
      |-- check_interval
      |-- cmci_disabled
      |-- combined_unit
      |   |-- combined_unit
      |       |-- error_count
      |       |-- threshold_limit
      |-- dont_log_ce
      |-- execution_unit
      |   |-- execution_unit
      |       |-- error_count
      |       |-- threshold_limit
      |-- ignore_ce
      |-- insn_fetch
      |   |-- insn_fetch
      |       |-- error_count
      |       |-- threshold_limit
      |-- load_store
      |   |-- load_store
      |       |-- error_count
      |       |-- threshold_limit
      |-- monarch_timeout
      |-- northbridge
      |   |-- dram
      |   |   |-- error_count
      |   |   |-- interrupt_enable
      |   |   |-- threshold_limit
      |   |-- ht_links
      |   |   |-- error_count
      |   |   |-- interrupt_enable
      |   |   |-- threshold_limit
      |   |-- l3_cache
      |       |-- error_count
      |       |-- interrupt_enable
      |       |-- threshold_limit
     ...
 
 It is tested on all our families >= K8.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJP0Jw9AAoJEBLB8Bhh3lVKMa8P/1ZPWkFZVFIdilyViQdSR/1/
 6MPy6BcZAACBl4rgrvjtFhmNv8C2dCGoPYRksHiO9sjgsilhQe/L92rmORifrNB4
 kvqR1QfKH2Hw2X1B/0fWXthh7UV37h1TdrVNJNlzhmi3wO+MHlX54iZcwpsaceFx
 QdzSqdHbaKfkfttojxIdgSfl7M2aCRnkmMOUG4X9HCsIK0C3ChdHLhJDnLT0xYb8
 fdA8dkXMktli0GC+KfevOXILZGLhUQuigu4iYKRm689N98N1Ejfa7BvMCVqLr0kF
 4fNmC+BtZmdw8MYd7EiuYXhA0Unu+CAg23ADQpn0AEyGQcM5h7/9/4GKvgjjsV1h
 /2r1WU+UVGZSUQ3FRDbzD37QVAa9FoOv967Gks6Fa31K7kEPC8yIRhWl72wXQXpa
 hFk+Hf3RlKtaO06iH/2RD2JA+W6xntiFo8CZ+AUMoLWfIQaYSAFP039lpjJp/Hzd
 CDdNWKCchAaMYI1MBmbRZ65mSgsVLLioNrf55+kdWT/CbuXJua95YxRRmllNFv5k
 MHjPoTajL0WKZhYxUSjCH87rqZHyNBH5s8iZlIt7wR//kqBGYfmRvGSDe31yMrL8
 PH/MgEIBVmrLQSWcojF+pU6ep+sQELVNsbu1+doZd/ruD/hzsZeu+MANWtJgrrVs
 +rsPRDWTcC3ca/V5Y1UO
 =XN3W
 -----END PGP SIGNATURE-----

Merge tag 'amd-thresholding-fixes-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/mce

Pull in AMD MCE thresholding fixes for v3.6, from Borislav Petkov:

" Those are a bunch of patches which give the MCE thresholding code a
  hard look and a scrubbing to remove a couple of annoyances like sysfs
  warnings when running CPU off-/online tests and the threshold_bank4 node
  under /sys/devices/system/machinecheck/ is a symlink.

  It also gives proper names to the thresholding banks instead of simply
  enumerating them, like this:

     /sys/devices/system/machinecheck/machinecheck0/
     |-- bank0
     |-- bank1
     |-- bank2
     |-- bank3
     |-- bank4
     |-- bank5
     |-- bank6
     |-- check_interval
     |-- cmci_disabled
     |-- combined_unit
     |   |-- combined_unit
     |       |-- error_count
     |       |-- threshold_limit
     |-- dont_log_ce
     |-- execution_unit
     |   |-- execution_unit
     |       |-- error_count
     |       |-- threshold_limit
     |-- ignore_ce
     |-- insn_fetch
     |   |-- insn_fetch
     |       |-- error_count
     |       |-- threshold_limit
     |-- load_store
     |   |-- load_store
     |       |-- error_count
     |       |-- threshold_limit
     |-- monarch_timeout
     |-- northbridge
     |   |-- dram
     |   |   |-- error_count
     |   |   |-- interrupt_enable
     |   |   |-- threshold_limit
     |   |-- ht_links
     |   |   |-- error_count
     |   |   |-- interrupt_enable
     |   |   |-- threshold_limit
     |   |-- l3_cache
     |       |-- error_count
     |       |-- interrupt_enable
     |       |-- threshold_limit
    ...

  It is tested on all our families >= K8."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08 12:29:47 +02:00
Ananth N Mavinakayanahalli 7eb9ba5ed3 uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()
On RISC architectures like powerpc, instructions are fixed size.
Instruction analysis on such platforms is just a matter of
(insn % 4). Pass the vaddr at which the uprobe is to be inserted so
that arch_uprobe_analyze_insn() can flag misaligned registration
requests.

Signed-off-by: Ananth N Mavinakaynahalli <ananth@in.ibm.com>
Cc: michael@ellerman.id.au
Cc: antonb@thinktux.localdomain
Cc: Paul Mackerras <paulus@samba.org>
Cc: benh@kernel.crashing.org
Cc: peterz@infradead.org
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: oleg@redhat.com
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20120608093257.GG13409@in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08 12:22:27 +02:00
Don Zickus eeaaa96a3a x86/nmi: Fix section mismatch warnings on 32-bit
It was reported that compiling for 32-bit caused a bunch of
section mismatch warnings:

 VDSOSYM arch/x86/vdso/vdso32-syms.lds
  LD      arch/x86/vdso/built-in.o
  LD      arch/x86/built-in.o

 WARNING: arch/x86/built-in.o(.data+0x5af0): Section mismatch in
 reference from the variable test_nmi_ipi_callback_na.10451 to
 the function .init.text:test_nmi_ipi_callback() [...]

 WARNING: arch/x86/built-in.o(.data+0x5b04): Section mismatch in
 reference from the variable nmi_unk_cb_na.10399 to the function
 .init.text:nmi_unk_cb() The variable nmi_unk_cb_na.10399
 references the function __init nmi_unk_cb() [...]

Both of these are attributed to the internal representation of
the nmiaction struct created during register_nmi_handler.  The
reason for this is that those structs are not defined in the
init section whereas the rest of the code in nmi_selftest.c is.

To resolve this, I created a new #define,
register_nmi_handler_initonly, that tags the struct as
__initdata to resolve the mismatch.  This #define should only be
used in rare situations where the register/unregister is called
during init of the kernel.

Big thanks to Jan Beulich for decoding this for me as I didn't
have a clue what was going on.

Reported-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl>
Tested-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl>
Cc: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/1338991542-23000-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08 12:19:27 +02:00
Alexander Gordeev 4988a40c39 x86/apic: Make cpu_mask_to_apicid() operations check cpu_online_mask
Currently cpu_mask_to_apicid() should not get a offline CPU with
the cpumask. Otherwise some apic drivers might try to access
non-existent per-cpu variables (i.e. x2apic). In that regard
cpu_mask_to_apicid() and cpu_mask_to_apicid_and() operations are
inconsistent.

This fix makes the two operations do not rely on calling
functions and always return the apicid for only online CPUs. As
result, the meaning and implementations of cpu_mask_to_apicid()
and cpu_mask_to_apicid_and() operations become straight.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120607131624.GG4759@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08 11:44:30 +02:00
Alexander Gordeev ff16432412 x86/apic: Make cpu_mask_to_apicid() operations return error code
Current cpu_mask_to_apicid() and cpu_mask_to_apicid_and()
implementations have few shortcomings:

1. A value returned by cpu_mask_to_apicid() is written to
hardware registers unconditionally. Should BAD_APICID get ever
returned it will be written to a hardware too. But the value of
BAD_APICID is not universal across all hardware in all modes and
might cause unexpected results, i.e. interrupts might get routed
to CPUs that are not configured to receive it.

2. Because the value of BAD_APICID is not universal it is
counter- intuitive to return it for a hardware where it does not
make sense (i.e. x2apic).

3. cpu_mask_to_apicid_and() operation is thought as an
complement to cpu_mask_to_apicid() that only applies a AND mask
on top of a cpumask being passed. Yet, as consequence of 18374d8
commit the two operations are inconsistent in that of:
  cpu_mask_to_apicid() should not get a offline CPU with the cpumask
  cpu_mask_to_apicid_and() should not fail and return BAD_APICID
These limitations are impossible to realize just from looking at
the operations prototypes.

Most of these shortcomings are resolved by returning a error
code instead of BAD_APICID. As the result, faults are reported
back early rather than possibilities to cause a unexpected
behaviour exist (in case of [1]).

The only exception is setup_timer_IRQ0_pin() routine. Although
obviously controversial to this fix, its existing behaviour is
preserved to not break the fragile check_timer() and would
better addressed in a separate fix.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120607131559.GF4759@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08 11:44:29 +02:00
Alexander Gordeev 8637e38aff x86/apic: Avoid useless scanning thru a cpumask in assign_irq_vector()
In case of static vector allocation domains (i.e. flat) if all
vector numbers are exhausted, an attempt to assign a new vector
will lead to useless scans through all CPUs in the cpumask, even
though it is known that each new pass would fail. Make this
corner case less painful by letting report whether the vector
allocation domain depends on passed arguments or not and stop
scanning early.

The same could have been achived by introducing a static flag to
the apic operations. But let's allow vector_allocation_domain()
have more intelligence here and decide dynamically, in case we
would need it in the future.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120607131542.GE4759@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08 11:44:29 +02:00
Alexander Gordeev 1bccd58bff x86/apic: Try to spread IRQ vectors to different priority levels
When assigning a new vector it is primarially done by adding 8
to the previously given out vector number. Hence, two
consequently allocated vector numbers would likely fall into the
same priority level. Try to spread vector numbers to different
priority levels better by changing the step from 8 to 16.

Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120607131514.GD4759@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08 11:44:28 +02:00
Alexander Gordeev 9d8e106676 x86/apic: Factor out default vector_allocation_domain() operation
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120607131449.GC4759@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-08 11:44:27 +02:00
H. Peter Anvin 715c85b1fc x86, cpu: Rename checking_wrmsrl() to wrmsrl_safe()
Rename checking_wrmsrl() to wrmsrl_safe(), to match the naming
convention used by all the other MSR access functions/macros.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-07 13:32:04 -07:00
Borislav Petkov 2c929ce6f1 x86, cpu, amd: Deprecate AMD-specific MSR variants
Now that all users of {rd,wr}msr_amd_safe have been fixed, deprecate its
use by making them private to amd.c and adding warnings when used on
anything else beside K8.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1338562358-28182-5-git-send-email-bp@amd64.org
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-07 11:43:30 -07:00
Andre Przywara 169e9cbd77 x86, cpu, amd: Fix crash as Xen Dom0 on AMD Trinity systems
f7f286a910 ("x86/amd: Re-enable CPU topology extensions in case BIOS
has disabled it") wrongfully added code which used the AMD-specific
{rd,wr}msr variants for no real reason.

This caused boot panics on xen which wasn't initializing the
{rd,wr}msr_safe_regs pv_ops members properly.

This, in turn, caused a heated discussion leading to us reviewing all
uses of the AMD-specific variants and removing them where unneeded
(almost everywhere except an obscure K8 BIOS fix, see 6b0f43ddfa).

Finally, this patch switches to the standard {rd,wr}msr*_safe* variants
which should've been used in the first place anyway and avoided unneeded
excitation with xen.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Link: http://lkml.kernel.org/r/1338562358-28182-4-git-send-email-bp@amd64.org
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Link: <http://lkml.kernel.org/r/1338383402-3838-1-git-send-email-andre.przywara@amd.com>
[Boris: correct and expand commit message]
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-07 11:43:30 -07:00
Borislav Petkov ecd431d95a x86, cpu: Fix show_msr MSR accessing function
There's no real reason why, when showing the MSRs on a CPU at boottime,
we should be using the AMD-specific variant. Simply use the generic safe
one which handles #GPs just fine.

Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Link: http://lkml.kernel.org/r/1338562358-28182-3-git-send-email-bp@amd64.org
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-07 11:41:28 -07:00
Andre Przywara 1f975f78c8 x86, pvops: Remove hooks for {rd,wr}msr_safe_regs
There were paravirt_ops hooks for the full register set variant of
{rd,wr}msr_safe which are actually not used by anyone anymore. Remove
them to make the code cleaner and avoid silent breakages when the pvops
members were uninitialized. This has been boot-tested natively and under
Xen with PVOPS enabled and disabled on one machine.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Link: http://lkml.kernel.org/r/1338562358-28182-2-git-send-email-bp@amd64.org
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-07 11:41:08 -07:00
Steven Rostedt 7fbb98c5cb x86: Save cr2 in NMI in case NMIs take a page fault
Avi Kivity reported that page faults in NMIs could cause havic if
the NMI preempted another page fault handler:

   The recent changes to NMI allow exceptions to take place in NMI
   handlers, but I think that a #PF (say, due to access to vmalloc space)
   is still problematic.  Consider the sequence

    #PF  (cr2 set by processor)
      NMI
        ...
        #PF (cr2 clobbered)
          do_page_fault()
          IRET
        ...
        IRET
      do_page_fault()
        address = read_cr2()

   The last line reads the overwritten cr2 value.

Originally I wrote a patch to solve this by saving the cr2 on the stack.
Brian Gerst suggested to save it in the r12 register as both r12 and rbx
are saved by the do_nmi handler as required by the C standard. But rbx
is already used for saving if swapgs needs to be run on exit of the NMI
handler.

Link: http://lkml.kernel.org/r/4FBB8C40.6080304@redhat.com
Link: http://lkml.kernel.org/r/1337763411.13348.140.camel@gandalf.stny.rr.com

Reported-by: Avi Kivity <avi@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-06-07 10:21:21 -04:00
Borislav Petkov 1112257019 x86, MCE, AMD: Update copyrights and boilerplate
Jacob is doing something else now so add myself as the loser who
provides support.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-06-07 12:43:50 +02:00
Borislav Petkov 336d335a96 x86, MCE, AMD: Give proper names to the thresholding banks
Having the banks numbered is ok but having real names which mean
something to the user makes a lot more sense:

 /sys/devices/system/machinecheck/machinecheck0/
 |-- bank0
 |-- bank1
 |-- bank2
 |-- bank3
 |-- bank4
 |-- bank5
 |-- bank6
 |-- check_interval
 |-- cmci_disabled
 |-- combined_unit
 |   |-- combined_unit
 |       |-- error_count
 |       |-- threshold_limit
 |-- dont_log_ce
 |-- execution_unit
 |   |-- execution_unit
 |       |-- error_count
 |       |-- threshold_limit
 |-- ignore_ce
 |-- insn_fetch
 |   |-- insn_fetch
 |       |-- error_count
 |       |-- threshold_limit
 |-- load_store
 |   |-- load_store
 |       |-- error_count
 |       |-- threshold_limit
 |-- monarch_timeout
 |-- northbridge
 |   |-- dram
 |   |   |-- error_count
 |   |   |-- interrupt_enable
 |   |   |-- threshold_limit
 |   |-- ht_links
 |   |   |-- error_count
 |   |   |-- interrupt_enable
 |   |   |-- threshold_limit
 |   |-- l3_cache
 |       |-- error_count
 |       |-- interrupt_enable
 |       |-- threshold_limit
...

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-06-07 12:43:48 +02:00
Borislav Petkov 6e927361bd x86, MCE, AMD: Make error_count read only
Until now, writing to error count caused the code to reset the
thresholding bank to the current thresholding limit and start counting
errors from the beginning.

This is misleading and unclear, and can be accomplished by writing the
old thresholding limit into ->threshold_limit.

Make error_count read-only with the functionality to show the current
error count.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-06-07 12:43:47 +02:00
Borislav Petkov 2c9c42fa98 x86, MCE, AMD: Cleanup reading of error_count
We have rdmsr_on_cpu() now so remove locally defined solution in favor
of the generic one.

No functionality change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-06-07 12:43:46 +02:00
Borislav Petkov 18c20f373b x86, MCE, AMD: Print decimal thresholding values
If one sets the threshold limit, say to 25:

$ echo 25 > machinecheck0/threshold_bank4/misc0/threshold_limit

and then reads it back again, it gives

$ cat machinecheck0/threshold_bank4/misc0/threshold_limit
19

which is actually 0x19 but we don't know that.

Make all output decimal.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-06-07 12:43:45 +02:00
Borislav Petkov 019f34fccf x86, MCE, AMD: Move shared bank to node descriptor
Well, instead of having a real bank 4 on the BSP of each node and
symlinks on the remaining cores, we push it up into the amd_northbridge
descriptor which now contains a pointer to the northbridge bank 4
because the bank is one per northbridge and, as such, belongs in the NB
descriptor anyway.

Each time we hotplug CPUs, we use the northbridge pointer to copy the
shared bank into the per-CPU array of threshold_banks pointers, or
destroy it when the last CPU on the node goes offline, or create it when
the first comes online.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-06-07 12:43:44 +02:00
Borislav Petkov 26ab256eaa x86, MCE, AMD: Remove local_allocate_... wrapper
It is unneeded now so drop it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-06-07 12:43:43 +02:00
Borislav Petkov 92e26e2a1a x86, MCE, AMD: Remove shared banks sysfs linking
The code used to create a symlink on all non-BSP cores of a node when
the MCi_MISCj bank is present once per node. (This is generally the
case with bank 4 on AMD). However, these sysfs links cause a bunch
of problems with cpu off-/onlining testing and are, as such, a bit
overengineered. IOW, there's nothing wrong with having normal sysfs
files for the shared banks since the corresponding MSRs are replicated
across each core anyway.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-06-07 12:43:42 +02:00
Borislav Petkov 24214449b0 x86, amd_nb: Export model 0x10 and later PCI id
Add the F3 PCI id of F15h, model 0x10 to pci_ids.h and to the amd_nb
code which generates the list of northbridges on an AMD box. Shorten
define name while at it so that it fits into pci_ids.h.

Acked-by: Clemens Ladisch <clemens@ladisch.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-06-07 12:43:41 +02:00
Andi Kleen 70ab7003de perf/x86: Don't assume there can be only 4 PEBS events
On Sandy Bridge in non HT mode there are 8 counters available.
Since every counter can write a PEBS record assuming there are
4 max is incorrect. Use the reported counter number -- with an
upper limit for a static array -- instead.

Also I made the warning messages a bit more informational.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338944211-28275-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:23:40 +02:00
Vince Weaver c48b60538c perf/x86: Use rdpmc() rather than rdmsr() when possible in the kernel
The rdpmc instruction is faster than the equivelant rdmsr call,
so use it when possible in the kernel.

The perfctr kernel patches did this, after extensive testing showed
rdpmc to always be faster (One can look in etc/costs in the perfctr-2.6
package to see a historical list of the overhead).

I have done some tests on a 3.2 kernel, the kernel module I used
was included in the first posting of this patch:

                   rdmsr           rdpmc
 Core2 T9900:      203.9 cycles     30.9 cycles
 AMD fam0fh:        56.2 cycles      9.8 cycles
 Atom 6/28/2:      129.7 cycles     50.6 cycles

The speedup of using rdpmc is large.

[ It's probably possible (and desirable) to do this without
  requiring a new field in the hw_perf_event structure, but
  the fixed events make this tricky. ]

Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1203011724030.26934@cl320.eecs.utk.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:23:35 +02:00
Peter Zijlstra 1c2ac3fde3 perf/x86: Fix wrmsrl() debug wrapper
Move the wrmslr() debug wrapper to the common header now that all the
include games are gone. Also clean it up a bit to avoid multiple
evaluation of the argument.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-l4gkfnivwv4yi5mqxjlovymx@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:23:22 +02:00
Arun Sharma bc6ca7b342 perf/x86: Check if user fp is valid
Signed-off-by: Arun Sharma <asharma@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334961696-19580-4-git-send-email-asharma@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:08:01 +02:00
Arun Sharma 302fa4b58a perf/x86: Allow multiple stacks
Without this patch, applications with two different stack
regions (eg: native stack vs JIT stack) get truncated
callchains even when RBP chaining is present. GDB shows proper
stack traces and the frame pointer chaining is intact.

This patch disables the (fp < RSP) check, hoping that other checks
in the code save the day for us. In our limited testing, this
didn't seem to break anything.

In the long term, we could potentially have userspace advise
the kernel on the range of valid stack addresses, so we don't
spend a lot of time unwinding from bogus addresses.

Signed-off-by: Arun Sharma <asharma@fb.com>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334961696-19580-2-git-send-email-asharma@fb.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 17:07:58 +02:00
Peter Zijlstra 8440ccb43f perf/x86: Update SNB PEBS constraints
Afaict there's no need to (incompletely) iterate the
MEM_UOPS_RETIRED.* umask state.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1338884803.28282.153.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:59:52 +02:00
Peter Zijlstra b6db437ba8 perf/x86: Enable/Add IvyBridge hardware support
Implement rudimentary IVB perf support. The SDM states its identical
to SNB with exception of the exact event tables, but a quick look
suggests they're similar enough.

Also mark SNB-EP as broken for now.

Requested-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338884803.28282.153.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:59:49 +02:00
Peter Zijlstra cccb9ba9e4 perf/x86: Implement cycles:p for SNB/IVB
Now that there's finally a chip with working PEBS (IvyBridge), we can
enable the hardware and implement cycles:p for SNB/IVB.

Cc: Stephane Eranian <eranian@google.com>
Requested-and-tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338884803.28282.153.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:59:47 +02:00
Peter Zijlstra b430f7c470 perf/x86: Fix Intel shared extra MSR allocation
Zheng Yan reported that event group validation can wreck event state
when Intel extra_reg allocation changes event state.

Validation shouldn't change any persistent state. Cloning events in
validate_{event,group}() isn't really pretty either, so add a few
special cases to avoid modifying the event state.

The code is restructured to minimize the special case impact.

Reported-by: Zheng Yan <zheng.z.yan@linux.intel.com>
Acked-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338903031.28282.175.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:59:44 +02:00
Kamalesh Babulal ceb1cbac8e sched/x86: Calculate booted cores after construction of sibling_mask
Commit 316ad24830 ("sched/x86: Rewrite set_cpu_sibling_map()")
broke the booted_cores accounting.

The problem is that the booted_cores accounting needs all the
sibling links set up. So restore the second loop and add a comment as
to why its needed.

On qemu booted with -smp sockets=1,cores=2,threads=2;
Before:
 $ grep cores /proc/cpuinfo
 cpu cores       : 2
 cpu cores       : 1
 cpu cores       : 4
 cpu cores       : 3

With the patch:
 $ grep cores /proc/cpuinfo
 cpu cores       : 2
 cpu cores       : 2
 cpu cores       : 2
 cpu cores       : 2

Reported-by: Prarit Bhargava <prarit@redhat.com>
Reported-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120531073738.GH7511@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 16:37:59 +02:00
Tomoki Sekiyama f6175f5bfb x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs
In current Linux, percpu variable `vector_irq' is not cleared on
offlined cpus while disabling devices' irqs. If the cpu that has
the disabled irqs in vector_irq is hotplugged,
__setup_vector_irq() hits invalid irq vector and may crash.

This bug can be reproduced as following;

  # echo 0 > /sys/devices/system/cpu/cpu7/online
  # modprobe -r some_driver_using_interrupts      # vector_irq@cpu7 uncleared
  # echo 1 > /sys/devices/system/cpu/cpu7/online  # kernel may crash

This patch fixes this bug by clearing vector_irq in
__clear_irq_vector() even if the cpu is offlined.

Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: ltc-kernel@ml.yrl.intra.hitachi.co.jp
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/4FC340BE.7080101@hitachi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 12:03:25 +02:00
Feng Tang 55c844a4dd x86/reboot: Fix a warning message triggered by stop_other_cpus()
When rebooting our 24 CPU Westmere servers with 3.4-rc6, we
always see this warning msg:

Restarting system.
machine restart
------------[ cut here ]------------
WARNING: at arch/x86/kernel/smp.c:125
native_smp_send_reschedule+0x74/0xa7() Hardware name: X8DTN
Modules linked in: igb [last unloaded: scsi_wait_scan]
Pid: 1, comm: systemd-shutdow Not tainted 3.4.0-rc6+ #22
Call Trace:
 <IRQ>  [<ffffffff8102a41f>] warn_slowpath_common+0x7e/0x96
 [<ffffffff8102a44c>] warn_slowpath_null+0x15/0x17
 [<ffffffff81018cf7>] native_smp_send_reschedule+0x74/0xa7
 [<ffffffff810561c1>] trigger_load_balance+0x279/0x2a6
 [<ffffffff81050112>] scheduler_tick+0xe0/0xe9
 [<ffffffff81036768>] update_process_times+0x60/0x70
 [<ffffffff81062f2f>] tick_sched_timer+0x68/0x92
 [<ffffffff81046e33>] __run_hrtimer+0xb3/0x13c
 [<ffffffff81062ec7>] ? tick_nohz_handler+0xd0/0xd0
 [<ffffffff810474f2>] hrtimer_interrupt+0xdb/0x198
 [<ffffffff81019a35>] smp_apic_timer_interrupt+0x81/0x94
 [<ffffffff81655187>] apic_timer_interrupt+0x67/0x70
 <EOI>  [<ffffffff8101a3c4>] ? default_send_IPI_mask_allbutself_phys+0xb4/0xc4
 [<ffffffff8101c680>] physflat_send_IPI_allbutself+0x12/0x14
 [<ffffffff81018db4>] native_nmi_stop_other_cpus+0x8a/0xd6
 [<ffffffff810188ba>] native_machine_shutdown+0x50/0x67
 [<ffffffff81018926>] machine_shutdown+0xa/0xc
 [<ffffffff8101897e>] native_machine_restart+0x20/0x32
 [<ffffffff810189b0>] machine_restart+0xa/0xc
 [<ffffffff8103b196>] kernel_restart+0x47/0x4c
 [<ffffffff8103b2e6>] sys_reboot+0x13e/0x17c
 [<ffffffff8164e436>] ? _raw_spin_unlock_bh+0x10/0x12
 [<ffffffff810fcac9>] ? bdi_queue_work+0xcf/0xd8
 [<ffffffff810fe82f>] ? __bdi_start_writeback+0xae/0xb7
 [<ffffffff810e0d64>] ? iterate_supers+0xa3/0xb7
 [<ffffffff816547a2>] system_call_fastpath+0x16/0x1b
---[ end trace 320af5cb1cb60c5b ]---

The root cause seems to be the
default_send_IPI_mask_allbutself_phys() takes quite some time (I
measured it could be several ms) to complete sending NMIs to all
the other 23 CPUs, and for HZ=250/1000 system, the time is long
enough for a timer interrupt to happen, which will in turn
trigger to kick load balance to a stopped CPU and cause this
warning in native_smp_send_reschedule().

So disabling the local irq before stop_other_cpu() can fix this
problem (tested 25 times reboot ok), and it is fine as there
should be nobody caring the timer interrupt in such reboot
stage.

The latest 3.4 kernel slightly changes this behavior by sending
REBOOT_VECTOR first and only send NMI_VECTOR if the REBOOT_VCTOR
fails, and this patch is still needed to prevent the problem.

Signed-off-by: Feng Tang <feng.tang@intel.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120530231541.4c13433a@feng-i7
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 12:03:23 +02:00
Xiaotian Feng aff5a62d52 x86/gart: Fix kmemleak warning
aperture_64.c now is using memblock, the previous
kmemleak_ignore() for alloc_bootmem() should be removed then.

Otherwise, with kmemleak enabled, kernel will throw warnings
like:

[    0.000000] kmemleak: Trying to color unknown object at 0xffff8800c4000000 as Black
[    0.000000] Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc1-next-20120605+ #130
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff811b27e6>] paint_ptr+0x66/0xc0
[    0.000000]  [<ffffffff816b90fb>] kmemleak_ignore+0x2b/0x60
[    0.000000]  [<ffffffff81ef7bc0>] kmemleak_init+0x217/0x2c1
[    0.000000]  [<ffffffff81ed2b97>] start_kernel+0x32d/0x3eb
[    0.000000]  [<ffffffff81ed25e4>] ? repair_env_string+0x5a/0x5a
[    0.000000]  [<ffffffff81ed2356>] x86_64_start_reservations+0x131/0x135
[    0.000000]  [<ffffffff81ed2120>] ? early_idt_handlers+0x120/0x120
[    0.000000]  [<ffffffff81ed245c>] x86_64_start_kernel+0x102/0x111
[    0.000000] kmemleak: Early log backtrace:
[    0.000000]    [<ffffffff816b911b>] kmemleak_ignore+0x4b/0x60
[    0.000000]    [<ffffffff81ee6a38>] gart_iommu_hole_init+0x3e7/0x547
[    0.000000]    [<ffffffff81edb20b>] pci_iommu_alloc+0x44/0x6f
[    0.000000]    [<ffffffff81ee81ad>] mem_init+0x19/0xec
[    0.000000]    [<ffffffff81ed2a54>] start_kernel+0x1ea/0x3eb
[    0.000000]    [<ffffffff81ed2356>] x86_64_start_reservations+0x131/0x135
[    0.000000]    [<ffffffff81ed245c>] x86_64_start_kernel+0x102/0x111
[    0.000000]    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com>
Cc: Xiaotian Feng <xtfeng@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1338922831-2847-1-git-send-email-xtfeng@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 11:58:38 +02:00
Shuah Khan fbd24153c4 x86/early_printk: Replace obsolete simple_strtoul() usage with kstrtoint()
Change early_serial_init() to call kstrtoul() instead of calling
obsoleted simple_strtoul().

Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Cc: Joe Perches <joe@perches.com>
Link: http://lkml.kernel.org/r/1338424803.3569.5.camel@lorien2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 11:44:22 +02:00
Thomas Gleixner 1a87fc1ec7 x86: mce: Add the dropped timer interval init back
commit 82f7af09 ("x86/mce: Cleanup timer mess) dropped the
initialization of the per cpu timer interval. Duh :(

Restore the previous behaviour.

Reported-by: Chen Gong <gong.chen@linux.intel.com>
Cc: bp@amd64.org
Cc: tony.luck@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-06-06 11:33:21 +02:00
Alexander Gordeev 6398268d2b x86/apic: Factor out default cpu_mask_to_apicid() operations
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120605112340.GA11454@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 10:22:18 +02:00
Alexander Gordeev bf721d3a3b x86/apic: Factor out default target_cpus() operation
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120605112324.GA11449@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 10:22:17 +02:00
Alexander Gordeev 49d0c7a0a4 x86/apic: Trivial whitespace fixes
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20120605112310.GA11443@dhcp-26-207.brq.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 10:22:16 +02:00
Suresh Siddha 0b8255e660 x86/x2apic/cluster: Use all the members of one cluster specified in the smp_affinity mask for the interrupt destination
If the HW implements round-robin interrupt delivery, this
enables multiple cpu's (which are part of the user specified
interrupt smp_affinity mask and belong to the same x2apic
cluster) to service the interrupt.

Also if the platform supports Power Aware Interrupt Routing,
then this enables the interrupt to be routed to an idle cpu or a
busy cpu depending on the perf/power bias tunable.

We are now grouping all the cpu's in a cluster to one vector
domain. So that will limit the total number of interrupt sources
handled by Linux. Previously we support "cpu-count *
available-vectors-per-cpu" interrupt sources but this will now
reduce to "cpu-count/16 * available-vectors-per-cpu".

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: gorcunov@openvz.org
Cc: agordeev@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337644682-19854-2-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 09:51:22 +02:00
Suresh Siddha 332afa656e x86/irq: Update irq_cfg domain unless the new affinity is a subset of the current domain
Until now, irq_cfg domain is mostly static. Either all CPU's
(used by flat mode) or one CPU (first CPU in the irq afffinity
mask) to which irq is being migrated (this is used by the rest
of apic modes).

Upcoming x2apic cluster mode optimization patch allows the irq
to be sent to any CPU in the x2apic cluster (if supported by the
HW). So irq_cfg domain changes on the fly (depending on which
CPU in the x2apic cluster is online).

Instead of checking for any intersection between the new irq
affinity mask and the current irq_cfg domain, check if the new
irq affinity mask is a subset of the current irq_cfg domain.
Otherwise proceed with updating the irq_cfg domain aswell as
assigning vector's on all the CPUs specified in the new mask.

This also cleans up a workaround in updating irq_cfg domain for
legacy irq's that are handled by the IO-APIC.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: yinghai@kernel.org
Cc: gorcunov@openvz.org
Cc: agordeev@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337644682-19854-1-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 09:51:22 +02:00
Joe Perches c767a54ba0 x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level>
Use a more current logging style:

 - Bare printks should have a KERN_<LEVEL> for consistency's sake
 - Add pr_fmt where appropriate
 - Neaten some macro definitions
 - Convert some Ok output to OK
 - Use "%s: ", __func__ in pr_fmt for summit
 - Convert some printks to pr_<level>

Message output is not identical in all cases.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: levinsasha928@gmail.com
Link: http://lkml.kernel.org/r/1337655007.24226.10.camel@joe2Laptop
[ merged two similar patches, tidied up the changelog ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 09:17:22 +02:00
Ido Yariv 7db971b235 x86/platform: Introduce APIC post-initialization callback
Some subarchitectures (such as vSMP) need to slightly adjust the
underlying APIC structure. Add an APIC post-initialization callback
to 'struct x86_platform_ops' for this purpose and use it for
adjusting the APIC structure on vSMP systems.

Signed-off-by: Ido Yariv <ido@wizery.com>
Acked-by: Shai Fultheim <shai@scalemp.com>
Link: http://lkml.kernel.org/r/1338675095-27260-1-git-send-email-ido@wizery.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 09:06:19 +02:00
Chen Gong 958fb3c512 x86/mce: Fix the MCE poll timer logic
In commit 82f7af09 ("x86/mce: Cleanup timer mess), Thomas just
forgot the "/ 2" there while cleaning up.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@amd64.org
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/1338863702-9245-1-git-send-email-gong.chen@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-06-06 08:28:21 +02:00
Linus Torvalds eea5b5510f Typo/thinko in a cleanup caused a semantic change. Fix it.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPzkYHAAoJEKurIx+X31iBeZ8QAK44Watxy0Ib0IUGrx2wcds8
 dEgVNyv9qrT28/PCBQiFtoRB83vsWWOzdNDWH6s4AAB6cFRbS8dOHvjUTsFfwgJ1
 lgD9URJmegBFInmkwSBVhY/MpwNm/pw0CNIs2ymqAvSlAgj8I8zF5xWsxTZfJcYn
 gxDNTxbuaEd3sPQO8wjBrw8NhCNpNwzEzZUXh31tM92bgpscIsXsJl/cRna5B1NU
 Z5LiSZev1W0/lf+Ys94ZsOSRT9zfjTI+mjXwv/lu8DlgeRQYyXixRjROgWvCbywx
 hlepvAQHtss9z5YTiGhRlPnR/CZ0fEUMQRtyRsp4qxG8BrgkWAdB++3ZMVbQYdom
 i98TQh1HZU3zzxueIwKwfjPKhG9q2Ee1XzE0ow7sXinBJQgiGrXiEv1tcX7001P7
 vpkyqVon2KKSYknxdHtbc6XnwjbzGDEoS0fqZf0boKoHee7KWBmFyX9JXLvZjtY1
 ef4FqTZNccYWL/5Hi0slZWAucC5iPleeV6sm9y4xG/gJFbTIw+joq1dc3pBZJ2uR
 rHhxD5tWTwbovsq1igcjAbrh9davwiFWiufW3Y5GdTAZJJ1tF6YjCmg18QbvaHJj
 9uplEUBUA4N6UUqPCjdKRdPaxPwkNOmjYH3YQIajSmLcn0YoI9NYw4Xn0sp2jCBo
 zGnaJvG2IIC9LNgaVohz
 =LQWb
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull MCE regression fix from Tony Luck:
 "Typo/thinko in a cleanup caused a semantic change. Fix it."

* tag 'please-pull-mce' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  x86/mce: Fix the MCE poll timer logic
2012-06-05 15:15:04 -07:00
Chen Gong c2238f10e0 x86/mce: Fix the MCE poll timer logic
In commit 82f7af09 (x86/mce: Cleanup timer mess), Thomas just forgot
the "/ 2" there while cleaning up.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-06-05 10:15:07 -07:00
Linus Torvalds 0b3e9f3f21 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar.

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Remove NULL assignment of dattr_cur
  sched: Remove the last NULL entry from sched_feat_names
  sched: Make sched_feat_names const
  sched/rt: Fix SCHED_RR across cgroups
  sched: Move nr_cpus_allowed out of 'struct sched_rt_entity'
  sched: Make sure to not re-read variables after validation
  sched: Fix SD_OVERLAP
  sched: Don't try allocating memory from offline nodes
  sched/nohz: Fix rq->cpu_load calculations some more
  sched/x86: Use cpu_llc_shared_mask(cpu) for coregroup_mask
2012-06-05 09:47:15 -07:00
Yong Zhang 3b6f70fd7d x86-smp-remove-call-to-ipi_call_lock-ipi_call_unlock
ipi_call_lock/unlock() lock resp. unlock call_function.lock. This lock
protects only the call_function data structure itself, but it's
completely unrelated to cpu_online_mask. The mask to which the IPIs
are sent is calculated before call_function.lock is taken in
smp_call_function_many(), so the locking around set_cpu_online() is
pointless and can be removed.

[ tglx: Massaged changelog ]

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: ralf@linux-mips.org
Cc: sshtylyov@mvista.com
Cc: david.daney@cavium.com
Cc: nikunj@linux.vnet.ibm.com
Cc: paulmck@linux.vnet.ibm.com
Cc: axboe@kernel.dk
Cc: peterz@infradead.org
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1338275765-3217-7-git-send-email-yong.zhang0@gmail.com
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-06-05 17:27:12 +02:00
Zhang Rui 76eb9a30db ACPI, x86: fix Dell M6600 ACPI reboot regression via DMI
Dell Precision M6600 is known to require PCI reboot, so add it to
the reboot blacklist in pci_reboot_dmi_table[].

https://bugzilla.kernel.org/show_bug.cgi?id=42749

cc: x86@kernel.org
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-05 00:16:12 -04:00
Linus Torvalds 63004afa71 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull straggler x86 fixes from Peter Anvin:
 "Three groups of patches:

  - EFI boot stub documentation and the ability to print error messages;
  - Removal for PTRACE_ARCH_PRCTL for x32 (obsolete interface which
    should never have been ported, and the port is broken and
    potentially dangerous.)
  - ftrace stack corruption fixes.  I'm not super-happy about the
    technical implementation, but it is probably the least invasive in
    the short term.  In the future I would like a single method for
    nesting the debug stack, however."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, x32, ptrace: Remove PTRACE_ARCH_PRCTL for x32
  x86, efi: Add EFI boot stub documentation
  x86, efi; Add EFI boot stub console support
  x86, efi: Only close open files in error path
  ftrace/x86: Do not change stacks in DEBUG when calling lockdep
  x86: Allow nesting of the debug stack IDT setting
  x86: Reset the debug_stack update counter
  ftrace: Use breakpoint method to update ftrace caller
  ftrace: Synchronize variable setting with breakpoints
2012-06-02 16:17:03 -07:00
H. Peter Anvin 40b46a7d29 Merge remote-tracking branch 'rostedt/tip/perf/urgent-2' into x86-urgent-for-linus 2012-06-01 15:55:31 -07:00
H.J. Lu bad1a753d4 x86, x32, ptrace: Remove PTRACE_ARCH_PRCTL for x32
When I added x32 ptrace to 3.4 kernel, I also include PTRACE_ARCH_PRCTL
support for x32 GDB  For ARCH_GET_FS/GS, it takes a pointer to int64.  But
at user level, ARCH_GET_FS/GS takes a pointer to int32.  So I have to add
x32 ptrace to glibc to handle it with a temporary int64 passed to kernel and
copy it back to GDB as int32.  Roland suggested that PTRACE_ARCH_PRCTL
is obsolete and x32 GDB should use fs_base and gs_base fields of
user_regs_struct instead.

Accordingly, remove PTRACE_ARCH_PRCTL completely from the x32 code to
avoid possible memory overrun when pointer to int32 is passed to
kernel.

Link: http://lkml.kernel.org/r/CAMe9rOpDzHfS7NH7m1vmD9QRw8SSj4Sc%2BaNOgcWm_WJME2eRsQ@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org> v3.4
2012-06-01 13:54:21 -07:00
Al Viro 44fbbb3dc6 x86: get rid of calling do_notify_resume() when returning to kernel mode
If we end up calling do_notify_resume() with !user_mode(refs), it
does nothing (do_signal() explicitly bails out and we can't get there
with TIF_NOTIFY_RESUME in such situations).  Then we jump to
resume_userspace_sig, which rechecks the same thing and bails out
to resume_kernel, thus breaking the loop.

It's easier and cheaper to check *before* calling do_notify_resume()
and bail out to resume_kernel immediately.  And kill the check in
do_signal()...

Note that on amd64 we can't get there with !user_mode() at all - asm
glue takes care of that.

Acked-and-reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 13:01:51 -04:00
Al Viro efee984c27 new helper: signal_delivered()
Does block_sigmask() + tracehook_signal_handler();  called when
sigframe has been successfully built.  All architectures converted
to it; block_sigmask() itself is gone now (merged into this one).

I'm still not too happy with the signature, but that's a separate
story (IMO we need a structure that would contain signal number +
siginfo + k_sigaction, so that get_signal_to_deliver() would fill one,
signal_delivered(), handle_signal() and probably setup...frame() -
take one).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:58:52 -04:00
Al Viro 77097ae503 most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set
Only 3 out of 63 do not.  Renamed the current variant to __set_current_blocked(),
added set_current_blocked() that will exclude unblockable signals, switched
open-coded instances to it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:58:51 -04:00
Al Viro a610d6e672 pull clearing RESTORE_SIGMASK into block_sigmask()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:58:49 -04:00
Al Viro b7f9a11a6c new helper: sigmask_to_save()
replace boilerplate "should we use ->saved_sigmask or ->blocked?"
with calls of obvious inlined helper...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:58:48 -04:00
Al Viro 51a7b448d4 new helper: restore_saved_sigmask()
first fruits of ..._restore_sigmask() helpers: now we can take
boilerplate "signal didn't have a handler, clear RESTORE_SIGMASK
and restore the blocked mask from ->saved_mask" into a common
helper.  Open-coded instances switched...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-06-01 12:58:47 -04:00
Steven Rostedt 5963e317b1 ftrace/x86: Do not change stacks in DEBUG when calling lockdep
When both DYNAMIC_FTRACE and LOCKDEP are set, the TRACE_IRQS_ON/OFF
will call into the lockdep code. The lockdep code can call lots of
functions that may be traced by ftrace. When ftrace is updating its
code and hits a breakpoint, the breakpoint handler will call into
lockdep. If lockdep happens to call a function that also has a breakpoint
attached, it will jump back into the breakpoint handler resetting
the stack to the debug stack and corrupt the contents currently on
that stack.

The 'do_sym' call that calls do_int3() is protected by modifying the
IST table to point to a different location if another breakpoint is
hit. But the TRACE_IRQS_OFF/ON are outside that protection, and if
a breakpoint is hit from those, the stack will get corrupted, and
the kernel will crash:

[ 1013.243754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
[ 1013.272665] IP: [<ffff880145cc0000>] 0xffff880145cbffff
[ 1013.285186] PGD 1401b2067 PUD 14324c067 PMD 0
[ 1013.298832] Oops: 0010 [#1] PREEMPT SMP
[ 1013.310600] CPU 2
[ 1013.317904] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
[ 1013.401848]
[ 1013.407399] Pid: 112, comm: kworker/2:1 Not tainted 3.4.0+ #30
[ 1013.437943] RIP: 8eb8:[<ffff88014630a000>]  [<ffff88014630a000>] 0xffff880146309fff
[ 1013.459871] RSP: ffffffff8165e919:ffff88014780f408  EFLAGS: 00010046
[ 1013.477909] RAX: 0000000000000001 RBX: ffffffff81104020 RCX: 0000000000000000
[ 1013.499458] RDX: ffff880148008ea8 RSI: ffffffff8131ef40 RDI: ffffffff82203b20
[ 1013.521612] RBP: ffffffff81005751 R08: 0000000000000000 R09: 0000000000000000
[ 1013.543121] R10: ffffffff82cdc318 R11: 0000000000000000 R12: ffff880145cc0000
[ 1013.564614] R13: ffff880148008eb8 R14: 0000000000000002 R15: ffff88014780cb40
[ 1013.586108] FS:  0000000000000000(0000) GS:ffff880148000000(0000) knlGS:0000000000000000
[ 1013.609458] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1013.627420] CR2: 0000000000000002 CR3: 0000000141f10000 CR4: 00000000001407e0
[ 1013.649051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1013.670724] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1013.692376] Process kworker/2:1 (pid: 112, threadinfo ffff88013fe0e000, task ffff88014020a6a0)
[ 1013.717028] Stack:
[ 1013.724131]  ffff88014780f570 ffff880145cc0000 0000400000004000 0000000000000000
[ 1013.745918]  cccccccccccccccc ffff88014780cca8 ffffffff811072bb ffffffff81651627
[ 1013.767870]  ffffffff8118f8a7 ffffffff811072bb ffffffff81f2b6c5 ffffffff81f11bdb
[ 1013.790021] Call Trace:
[ 1013.800701] Code: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a <e7> d7 64 81 ff ff ff ff 01 00 00 00 00 00 00 00 65 d9 64 81 ff
[ 1013.861443] RIP  [<ffff88014630a000>] 0xffff880146309fff
[ 1013.884466]  RSP <ffff88014780f408>
[ 1013.901507] CR2: 0000000000000002

The solution was to reuse the NMI functions that change the IDT table to make the debug
stack keep its current stack (in kernel mode) when hitting a breakpoint:

  call debug_stack_set_zero
  TRACE_IRQS_ON
  call debug_stack_reset

If the TRACE_IRQS_ON happens to hit a breakpoint then it will keep the current stack
and not crash the box.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-05-31 23:12:22 -04:00
Steven Rostedt f8988175fd x86: Allow nesting of the debug stack IDT setting
When the NMI handler runs, it checks if it preempted a debug handler
and if that handler is using the debug stack. If it is, it changes the
IDT table not to update the stack, otherwise it will reset the debug
stack and corrupt the debug handler it preempted.

Now that ftrace uses breakpoints to change functions from nops to
callers, many more places may hit a breakpoint. Unfortunately this
includes some of the calls that lockdep performs. Which causes issues
with the debug stack. It too needs to change the debug stack before
tracing (if called from the debug handler).

Allow the debug_stack_set_zero() and debug_stack_reset() to be nested
so that the debug handlers can take advantage of them too.

[ Used this_cpu_*() over __get_cpu_var() as suggested by H. Peter Anvin ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-05-31 23:12:21 -04:00
Steven Rostedt c0525a6972 x86: Reset the debug_stack update counter
When an NMI goes off and it sees that it preempted the debug stack,
to keep the debug stack safe, it changes the IDT to point to one that
does not modify the stack on breakpoint (to allow breakpoints in NMIs).

But the variable that gets set to know to undo it on exit never gets
cleared on exit. Thus every NMI will reset it on exit the first time
it is done even if it does not need to be reset.

[ Added H. Peter Anvin's suggestion to use this_cpu_read/write ]

Cc: <stable@vger.kernel.org> # v3.3
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-05-31 23:12:20 -04:00
Steven Rostedt 8a4d0a687a ftrace: Use breakpoint method to update ftrace caller
On boot up and module load, it is fine to modify the code directly,
without the use of breakpoints. This is because boot up modification
is done before SMP is initialized, thus the modification is serial,
and module load is done before the module executes.

But after that we must use a SMP safe method to modify running code.
Otherwise, if we are running the function tracer and update its
function (by starting off the stack tracer, or perf tracing)
the change of the function called by the ftrace trampoline is done
directly. If this is being executed on another CPU, that CPU may
take a GPF and crash the kernel.

The breakpoint method is used to change the nops at all the functions, but
the change of the ftrace callback handler itself was still using a
direct modification. If tracing was enabled and the function callback
was changed then another CPU could fault if it was currently calling
the original callback. This modification must use the breakpoint method
too.

Note, the direct method is still used for boot up and module load.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-05-31 23:12:19 -04:00
Steven Rostedt a192cd0413 ftrace: Synchronize variable setting with breakpoints
When the function tracer starts modifying the code via breakpoints
it sets a variable (modifying_ftrace_code) to inform the breakpoint
handler to call the ftrace int3 code.

But there's no synchronization between setting this code and the
handler, thus it is possible for the handler to be called on another
CPU before it sees the variable. This will cause a kernel crash as
the int3 handler will not know what to do with it.

I originally added smp_mb()'s to force the visibility of the variable
but H. Peter Anvin suggested that I just make it atomic.

[ Added comments as suggested by Peter Zijlstra ]

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-05-31 23:12:17 -04:00
Linus Torvalds fb21affa49 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull second pile of signal handling patches from Al Viro:
 "This one is just task_work_add() series + remaining prereqs for it.

  There probably will be another pull request from that tree this
  cycle - at least for helpers, to get them out of the way for per-arch
  fixes remaining in the tree."

Fix trivial conflict in kernel/irq/manage.c: the merge of Andrew's pile
had brought in commit 97fd75b7b8 ("kernel/irq/manage.c: use the
pr_foo() infrastructure to prefix printks") which changed one of the
pr_err() calls that this merge moves around.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  keys: kill task_struct->replacement_session_keyring
  keys: kill the dummy key_replace_session_keyring()
  keys: change keyctl_session_to_parent() to use task_work_add()
  genirq: reimplement exit_irq_thread() hook via task_work_add()
  task_work_add: generic process-context callbacks
  avr32: missed _TIF_NOTIFY_RESUME on one of do_notify_resume callers
  parisc: need to check NOTIFY_RESUME when exiting from syscall
  move key_repace_session_keyring() into tracehook_notify_resume()
  TIF_NOTIFY_RESUME is defined on all targets now
2012-05-31 18:47:30 -07:00
Linus Torvalds 2d117403b3 One more mce cleanup before the 3.5 merge window closes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPxpZ1AAoJEKurIx+X31iBACgP/jZiXAdu9JjYqvPA8/ACsntt
 S0fpbn40kKftn58x6Ddqemu/rg8Iy/ezsB7Fm93TYfBzDb8RGdkSiLj/KoO39Jzy
 Tmod/vJ0XMsBFgDob5GSKSOYMsnkO7//6jtnM09A8wDlxUwE3LV/3/84EqOrQH0n
 LR3Ltd6wnCc/HVasdDZX/814QcHe5KSoFMY2jUHWf0suOKcI22X57PQt9831bKky
 tvvsqFKSaOaerW9F1atwB2Qx37f0pUCu/Qo4KmBB0EVwSapRpHSDu657byft7VWQ
 FJ1eLfZF7lnVaYHxyCqz+wgTVBTsBPDt1xGkIJqSMMHtziG5iP3m0NSYLCJ5XJYx
 AcmF55hIx4yMCrIdiKc7wWs2K4U59+FiGJ2LgFCBZ3XLJoAuZnhf+WOh4ws/wi/j
 qDk9vK8KG3VBYIwZnTWb6N1bRtzznp1ElXjZR1byh/Cu2Ne/EWL64VydNT7ts0Ga
 3mGF88keAlw04qHYtU/7y5WrHRKGV5LBjGujVV2e2a5dC6p7LO11UCLptEv11DWS
 LcbIegbrimWmMxVScJQkL12GEzZKHpZZvrFRIKiWkTA15R6R1OTC7VrywA2GjDbd
 h5mWKd7zV6Ankjmq9SuvnA1UazhC/r+uQ58INVpFVM+aFor32HVd6L7VRns7sv2Z
 WbQNYxv5O+D3J2K+dQNy
 =4tLS
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull mce cleanup from Tony Luck:
 "One more mce cleanup before the 3.5 merge window closes"

* tag 'please-pull-mce' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  x86/mce: Cleanup timer mess
2012-05-31 10:53:37 -07:00
Thomas Gleixner 82f7af09e6 x86/mce: Cleanup timer mess
Use unsigned long for dealing with jiffies not int. Rename the
callback to something sensible. Use __this_cpu_read/write for
accessing per cpu data.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-05-30 14:40:01 -07:00
zhenzhong.duan 2da06af810 x86, mtrr: Fix a type overflow in range_to_mtrr func
When boot on sun G5+ with 4T mem, see an overflow in mtrr cleanup as below.

*BAD*gran_size: 2G      chunk_size: 2G  num_reg: 10     lose cover RAM:
-18014398505283592M

This is because 1<<31 sign extended. Use an unsigned long constant to
fix it.  Useful for mem larger than or equal to 4T.

-v2: Use 64bit constant instead of explicit type conversion as suggested
by Yinghai. Description updated too.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Link: http://lkml.kernel.org/r/4FC5A77F.6060505@oracle.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-05-30 14:37:00 -07:00
H. Peter Anvin bbd771474e Merge branch 'x86/trampoline' into x86/urgent
x86/trampoline contains an urgent commit which is necessarily on a
newer baseline.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-05-30 12:11:32 -07:00
Ingo Molnar 403e1c5b74 Merge branch 'x86/mce' into x86/urgent
Merge in these fixlets.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30 14:12:06 +02:00
Peter Zijlstra 9f646389aa sched/x86: Use cpu_llc_shared_mask(cpu) for coregroup_mask
Commit commit 8e7fbcbc2 ("sched: Remove stale power aware scheduling
remnants and dysfunctional knobs") made a boo-boo with removing the
power aware scheduling muck from the x86 topology bits.

We should unconditionally use the llc_shared mask for multi-core.

Reported-and-tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Borislav Petkov <bp@amd64.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Link: http://lkml.kernel.org/n/tip-lsksc2kfyeveb13avh327p0d@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30 11:05:44 +02:00
Linus Torvalds 731a7378b8 Merge branch 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 trampoline rework from H. Peter Anvin:
 "This code reworks all the "trampoline"/"realmode" code (various bits
  that need to live in the first megabyte of memory, most but not all of
  which runs in real mode at some point) in the kernel into a single
  object.  The main reason for doing this is that it eliminates the last
  place in the kernel where we needed pages to be mapped RWX.  This code
  separates all that code into proper R/RW/RX pages."

Fix up conflicts in arch/x86/kernel/Makefile (mca removed next to reboot
code), and arch/x86/kernel/reboot.c (reboot code moved around in one
branch, modified in this one), and arch/x86/tools/relocs.c (mostly same
code came in earlier due to working around the ld bugs just before the
3.4 release).

Also remove stale x86-relocs entry from scripts/.gitignore as per Peter
Anvin.

* commit '61f5446169046c217a5479517edac3a890c3bee7': (36 commits)
  x86, realmode: Move end signature into header.S
  x86, relocs: When printing an error, say relative or absolute
  x86, relocs: More relocations which may end up as absolute
  x86, relocs: Workaround for binutils 2.22.52.0.1 section bug
  xen-acpi-processor: Add missing #include <xen/xen.h>
  acpi, bgrd: Add missing <linux/io.h> to drivers/acpi/bgrt.c
  x86, realmode: Change EFER to a single u64 field
  x86, realmode: Move kernel/realmode.c to realmode/init.c
  x86, realmode: Move not-common bits out of trampoline_common.S
  x86, realmode: Mask out EFER.LMA when saving trampoline EFER
  x86, realmode: Fix no cache bits test in reboot_32.S
  x86, realmode: Make sure all generated files are listed in targets
  x86, realmode: build fix: remove duplicate build
  x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline
  x86, realmode: fixes compilation issue in tboot.c
  x86, realmode: move relocs from scripts/ to arch/x86/tools
  x86, realmode: header for trampoline code
  x86, realmode: flattened rm hierachy
  x86, realmode: don't copy real_mode_header
  x86, realmode: fix 64-bit wakeup sequence
  ...
2012-05-29 20:14:53 -07:00
Bjorn Helgaas 365811d6f9 x86: print physical addresses consistently with other parts of kernel
Print physical address info in a style consistent with the %pR style used
elsewhere in the kernel.  For example:

    -found SMP MP-table at [ffff8800000fce90] fce90
    +found SMP MP-table at [mem 0x000fce90-0x000fce9f] mapped at [ffff8800000fce90]
    -initial memory mapped : 0 - 20000000
    +initial memory mapped: [mem 0x00000000-0x1fffffff]
    -Base memory trampoline at [ffff88000009c000] 9c000 size 8192
    +Base memory trampoline [mem 0x0009c000-0x0009dfff] mapped at [ffff88000009c000]
    -SRAT: Node 0 PXM 0 0-80000000
    +SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:21 -07:00