Commit Graph

10245 Commits

Author SHA1 Message Date
Paolo Bonzini ba7b39203a x86: export get_xsave_addr
get_xsave_addr is the API to access XSAVE states, and KVM would
like to use it.  Export it.

Cc: stable@vger.kernel.org
Cc: x86@kernel.org
Cc: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-05 13:55:44 +01:00
Tiejun Chen c6338ce494 kvm: kvmclock: use get_cpu() and put_cpu()
We can use get_cpu() and put_cpu() to replace
preempt_disable()/cpu = smp_processor_id() and
preempt_enable() for slightly better code.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-03 12:07:33 +01:00
Linus Torvalds 19e0d5f16a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Fixes from all around the place:

   - hyper-V 32-bit PAE guest kernel fix
   - two IRQ allocation fixes on certain x86 boards
   - intel-mid boot crash fix
   - intel-quark quirk
   - /proc/interrupts duplicate irq chip name fix
   - cma boot crash fix
   - syscall audit fix
   - boot crash fix with certain TSC configurations (seen on Qemu)
   - smpboot.c build warning fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
  ACPI, irq, x86: Return IRQ instead of GSI in mp_register_gsi()
  x86, intel-mid: Create IRQs for APB timers and RTC timers
  x86: Don't enable F00F workaround on Intel Quark processors
  x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts
  x86, cma: Reserve DMA contiguous area after initmem_init()
  i386/audit: stop scribbling on the stack frame
  x86, apic: Handle a bad TSC more gracefully
  x86: ACPI: Do not translate GSI number if IOAPIC is disabled
  x86/smpboot: Move data structure to its primary usage scope
2014-10-31 14:30:16 -07:00
Ingo Molnar 1776b10627 perf/x86/intel: Revert incomplete and undocumented Broadwell client support
These patches:

  86a349a28b ("perf/x86/intel: Add Broadwell core support")
  c46e665f03 ("perf/x86: Add INST_RETIRED.ALL workarounds")
  fdda3c4aac ("perf/x86/intel: Use Broadwell cache event list for Haswell")

introduced magic constants and unexplained changes:

  https://lkml.org/lkml/2014/10/28/1128
  https://lkml.org/lkml/2014/10/27/325
  https://lkml.org/lkml/2014/8/27/546
  https://lkml.org/lkml/2014/10/28/546

Peter Zijlstra has attempted to help out, to clean up the mess:

  https://lkml.org/lkml/2014/10/28/543

But has not received helpful and constructive replies which makes
me doubt wether it can all be finished in time until v3.18 is
released.

Despite various review feedback the author (Andi Kleen) has answered
only few of the review questions and has generally been uncooperative,
only giving replies when prompted repeatedly, and only giving minimal
answers instead of constructively explaining and helping along the effort.

That kind of behavior is not acceptable.

There's also a boot crash on Intel E5-1630 v3 CPUs reported for another
commit from Andi Kleen:

  e735b9db12 ("perf/x86/intel/uncore: Add Haswell-EP uncore support")

  https://lkml.org/lkml/2014/10/22/730

Which is not yet resolved. The uncore driver is independent in theory,
but the crash makes me worry about how well all these patches were
tested and makes me uneasy about the level of interminging that the
Broadwell and Haswell code has received by the commits above.

As a first step to resolve the mess revert the Broadwell client commits
back to the v3.17 version, before we run out of time and problematic
code hits a stable upstream kernel.

( If the Haswell-EP crash is not resolved via a simple fix then we'll have
  to revert the Haswell-EP uncore driver as well. )

The Broadwell client series has to be submitted in a clean fashion, with
single, well documented changes per patch. If they are submitted in time
and are accepted during review then they can possibly go into v3.19 but
will need additional scrutiny due to the rocky history of this patch set.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-29 11:07:58 +01:00
Jiang Liu b77e8f4353 ACPI, irq, x86: Return IRQ instead of GSI in mp_register_gsi()
Function mp_register_gsi() returns blindly the GSI number for the ACPI
SCI interrupt. That causes a regression when the GSI for ACPI SCI is
shared with other devices.

The regression was caused by commit 84245af729 "x86, irq, ACPI:
Change __acpi_register_gsi to return IRQ number instead of GSI" and
exposed on a SuperMicro system, which shares one GSI between ACPI SCI
and PCI device, with following failure:

http://sourceforge.net/p/linux1394/mailman/linux1394-user/?viewmonth=201410
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 20 low
level)
[    2.699224] firewire_ohci 0000:06:00.0: failed to allocate interrupt
20

Return mp_map_gsi_to_irq(gsi, 0) instead of the GSI number.

Reported-and-Tested-by: Daniel Robbins <drobbins@funtoo.org>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: <stable@vger.kernel.org> # 3.17
Link: http://lkml.kernel.org/r/1414387308-27148-4-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-29 08:52:30 +01:00
Jiang Liu f18298595a x86, intel-mid: Create IRQs for APB timers and RTC timers
Intel MID platforms has no legacy interrupts, so no IRQ descriptors
preallocated. We need to call mp_map_gsi_to_irq() to create IRQ
descriptors for APB timers and RTC timers, otherwise it may cause
invalid memory access as:
[    0.116839] BUG: unable to handle kernel NULL pointer dereference at
0000003a
[    0.123803] IP: [<c1071c0e>] setup_irq+0xf/0x4d

Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Cohen <david.a.cohen@linux.intel.com>
Cc: <stable@vger.kernel.org> # 3.17
Link: http://lkml.kernel.org/r/1414387308-27148-3-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-29 08:52:23 +01:00
Dave Jones d4e1a0af1d x86: Don't enable F00F workaround on Intel Quark processors
The Intel Quark processor is a part of family 5, but does not have the
F00F bug present in Pentiums of the same family.

Pentiums were models 0 through 8, Quark is model 9.

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Link: http://lkml.kernel.org/r/20141028175753.GA12743@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-29 08:52:09 +01:00
Maciej W. Rozycki 60e684f0d6 x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts
Fix duplicate XT-PIC seen in /proc/interrupts on x86 systems
that make  use of 8259A Programmable Interrupt Controllers.
Specifically convert  output like this:

           CPU0
  0:      76573    XT-PIC-XT-PIC    timer
  1:         11    XT-PIC-XT-PIC    i8042
  2:          0    XT-PIC-XT-PIC    cascade
  4:          8    XT-PIC-XT-PIC    serial
  6:          3    XT-PIC-XT-PIC    floppy
  7:          0    XT-PIC-XT-PIC    parport0
  8:          1    XT-PIC-XT-PIC    rtc0
 10:        448    XT-PIC-XT-PIC    fddi0
 12:         23    XT-PIC-XT-PIC    eth0
 14:       2464    XT-PIC-XT-PIC    ide0
NMI:          0   Non-maskable interrupts
ERR:          0

to one like this:

           CPU0
  0:     122033    XT-PIC  timer
  1:         11    XT-PIC  i8042
  2:          0    XT-PIC  cascade
  4:          8    XT-PIC  serial
  6:          3    XT-PIC  floppy
  7:          0    XT-PIC  parport0
  8:          1    XT-PIC  rtc0
 10:        145    XT-PIC  fddi0
 12:         31    XT-PIC  eth0
 14:       2245    XT-PIC  ide0
NMI:          0   Non-maskable interrupts
ERR:          0

that is one like we used to have from ~2.2 till it was changed
sometime.

The rationale is there is no value in this duplicate
information, it  merely clutters output and looks ugly.  We only
have one handler for  8259A interrupts so there is no need to
give it a name separate from the  name already given to
irq_chip.

We could define meaningful names for handlers based on bits in
the ELCR  register on systems that have it or the value of the
LTIM bit we use in  ICW1 otherwise (hardcoded to 0 though with
MCA support gone), to tell  edge-triggered and level-triggered
inputs apart.  While that information  does not affect 8259A
interrupt handlers it could help people determine  which lines
are shareable and which are not.  That is material for a
separate change though.

Any tools that parse /proc/interrupts are supposed not to be
affected  since it was many years we used the format this change
converts back to.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.11.1410260147190.21390@eddie.linux-mips.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 12:01:08 +01:00
Peter Zijlstra 7fb0f1de49 perf/x86: Fix compile warnings for intel_uncore
The uncore drivers require PCI and generate compile time warnings when
!CONFIG_PCI.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 10:51:03 +01:00
Peter Zijlstra (Intel) 65d71fe137 perf: Fix bogus kernel printk
Andy spotted the fail in what was intended as a conditional printk level.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Fixes: cc6cd47e73 ("perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141007124757.GH19379@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 10:51:01 +01:00
Weijie Yang 3c325f8233 x86, cma: Reserve DMA contiguous area after initmem_init()
Fengguang Wu reported a boot crash on the x86 platform
via the 0-day Linux Kernel Performance Test:

  cma: dma_contiguous_reserve: reserving 31 MiB for global area
  BUG: Int 6: CR2   (null)
  [<41850786>] dump_stack+0x16/0x18
  [<41d2b1db>] early_idt_handler+0x6b/0x6b
  [<41072227>] ? __phys_addr+0x2e/0xca
  [<41d4ee4d>] cma_declare_contiguous+0x3c/0x2d7
  [<41d6d359>] dma_contiguous_reserve_area+0x27/0x47
  [<41d6d4d1>] dma_contiguous_reserve+0x158/0x163
  [<41d33e0f>] setup_arch+0x79b/0xc68
  [<41d2b7cf>] start_kernel+0x9c/0x456
  [<41d2b2ca>] i386_start_kernel+0x79/0x7d

(See details at: https://lkml.org/lkml/2014/10/8/708)

It is because dma_contiguous_reserve() is called before
initmem_init() in x86, the variable high_memory is not
initialized but accessed by __pa(high_memory) in
dma_contiguous_reserve().

This patch moves dma_contiguous_reserve() after initmem_init()
so that high_memory is initialized before accessed.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Cc: iamjoonsoo.kim@lge.com
Cc: 'Linux-MM' <linux-mm@kvack.org>
Cc: 'Weijie Yang' <weijie.yang.kh@gmail.com>
Link: http://lkml.kernel.org/r/000101cfef69%2431e528a0%2495af79e0%24%25yang@samsung.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 07:36:50 +01:00
Eric Paris 26c2d2b391 i386/audit: stop scribbling on the stack frame
git commit b4f0d3755c was very very dumb.
It was writing over %esp/pt_regs semi-randomly on i686  with the expected
"system can't boot" results.  As noted in:

https://bugs.freedesktop.org/show_bug.cgi?id=85277

This patch stops fscking with pt_regs.  Instead it sets up the registers
for the call to __audit_syscall_entry in the most obvious conceivable
way.  It then does just a tiny tiny touch of magic.  We need to get what
started in PT_EDX into 0(%esp) and PT_ESI into 4(%esp).  This is as easy
as a pair of pushes.

After the call to __audit_syscall_entry all we need to do is get that
now useless junk off the stack (pair of pops) and reload %eax with the
original syscall so other stuff can keep going about it's business.

Reported-by: Paulo Zanoni <przanoni@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Link: http://lkml.kernel.org/r/1414037043-30647-1-git-send-email-eparis@redhat.com
Cc: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-10-24 13:27:56 -07:00
H. Peter Anvin db65bcfd95 Linux 3.18-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJURGCfAAoJEHm+PkMAQRiG6toH/RUazjqZxqMvLlm1y+O6+7s9
 OpFdcDl4ZQtrvymBRYipu46pbDUoAAsVbxQJllaLNtHE0UrvaQE76WihBQYM8qW/
 WoESLsZRbNQqQYQixf55pOozX7uIuG+9LKHagC8JNfD1Bw/nQ+RleSXqFsBCdpMW
 i7SzcZBu2Iv+LnVmjvoGMOQa+loKzO6Pj1MpoHxxJQmeypH3dZR7mLVeBJNZQtLE
 BGY47gYraVzb9EjKnSbjrIKjpM9o0MIihoanrrjnq0JMrfm4pi6W5GgaGDUiaBVH
 w7Vmr5S2pjzrS41gKSVK9/XO1CrDG8tsp3QwA2+iIbjdR3wBDynyeG3UfnLABec=
 =hwbG
 -----END PGP SIGNATURE-----

Merge tag 'v3.18-rc1' into x86/urgent

Reason:
Need to apply audit patch on top of v3.18-rc1.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-10-24 13:26:37 -07:00
Andy Lutomirski b47dcbdc51 x86, apic: Handle a bad TSC more gracefully
If the TSC is unusable or disabled, then this patch fixes:

 - Confusion while trying to clear old APIC interrupts.
 - Division by zero and incorrect programming of the TSC deadline
   timer.

This fixes boot if the CPU has a TSC deadline timer but a missing or
broken TSC.  The failure to boot can be observed with qemu using
-cpu qemu64,-tsc,+tsc-deadline

This also happens to me in nested KVM for unknown reasons.
With this patch, I can boot cleanly (although without a TSC).

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Bandan Das <bsd@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/e2fa274e498c33988efac0ba8b7e3120f7f92d78.1413393027.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-22 21:31:46 +02:00
Jiang Liu 961b6a7003 x86: ACPI: Do not translate GSI number if IOAPIC is disabled
When IOAPIC is disabled, acpi_gsi_to_irq() should return gsi directly
instead of calling mp_map_gsi_to_irq() to translate gsi to IRQ by IOAPIC.
It fixes https://bugzilla.kernel.org/show_bug.cgi?id=84381.

This regression was introduced with commit 6b9fb70824 "x86, ACPI,
irq: Consolidate algorithm of mapping (ioapic, pin) to IRQ number"

Reported-and-Tested-by: Thomas Richter <thor@math.tu-berlin.de>
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Richter <thor@math.tu-berlin.de>
Cc: rui.zhang@intel.com
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: <stable@vger.kernel.org> # 3.17
Link: http://lkml.kernel.org/r/1413816327-12850-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-20 17:23:00 +02:00
Linus Torvalds ab074ade9c Merge git://git.infradead.org/users/eparis/audit
Pull audit updates from Eric Paris:
 "So this change across a whole bunch of arches really solves one basic
  problem.  We want to audit when seccomp is killing a process.  seccomp
  hooks in before the audit syscall entry code.  audit_syscall_entry
  took as an argument the arch of the given syscall.  Since the arch is
  part of what makes a syscall number meaningful it's an important part
  of the record, but it isn't available when seccomp shoots the
  syscall...

  For most arch's we have a better way to get the arch (syscall_get_arch)
  So the solution was two fold: Implement syscall_get_arch() everywhere
  there is audit which didn't have it.  Use syscall_get_arch() in the
  seccomp audit code.  Having syscall_get_arch() everywhere meant it was
  a useless flag on the stack and we could get rid of it for the typical
  syscall entry.

  The other changes inside the audit system aren't grand, fixed some
  records that had invalid spaces.  Better locking around the task comm
  field.  Removing some dead functions and structs.  Make some things
  static.  Really minor stuff"

* git://git.infradead.org/users/eparis/audit: (31 commits)
  audit: rename audit_log_remove_rule to disambiguate for trees
  audit: cull redundancy in audit_rule_change
  audit: WARN if audit_rule_change called illegally
  audit: put rule existence check in canonical order
  next: openrisc: Fix build
  audit: get comm using lock to avoid race in string printing
  audit: remove open_arg() function that is never used
  audit: correct AUDIT_GET_FEATURE return message type
  audit: set nlmsg_len for multicast messages.
  audit: use union for audit_field values since they are mutually exclusive
  audit: invalid op= values for rules
  audit: use atomic_t to simplify audit_serial()
  kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0]
  audit: reduce scope of audit_log_fcaps
  audit: reduce scope of audit_net_id
  audit: arm64: Remove the audit arch argument to audit_syscall_entry
  arm64: audit: Add audit hook in syscall_trace_enter/exit()
  audit: x86: drop arch from __audit_syscall_entry() interface
  sparc: implement is_32bit_task
  sparc: properly conditionalize use of TIF_32BIT
  ...
2014-10-19 16:25:56 -07:00
Ingo Molnar db6a00b4be x86/smpboot: Move data structure to its primary usage scope
Makes the code more readable by moving variable and usage closer
to each other, which also avoids this build warning in the
!CONFIG_HOTPLUG_CPU case:

  arch/x86/kernel/smpboot.c:105:42: warning: ‘die_complete’ defined but not used [-Wunused-variable]

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Lan Tianyu <tianyu.lan@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: srostedt@redhat.com
Cc: toshi.kani@hp.com
Cc: imammedo@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1409039025-32310-1-git-send-email-tianyu.lan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-19 11:44:49 +02:00
Linus Torvalds 0429fbc0bd Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu consistent-ops changes from Tejun Heo:
 "Way back, before the current percpu allocator was implemented, static
  and dynamic percpu memory areas were allocated and handled separately
  and had their own accessors.  The distinction has been gone for many
  years now; however, the now duplicate two sets of accessors remained
  with the pointer based ones - this_cpu_*() - evolving various other
  operations over time.  During the process, we also accumulated other
  inconsistent operations.

  This pull request contains Christoph's patches to clean up the
  duplicate accessor situation.  __get_cpu_var() uses are replaced with
  with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

  Unfortunately, the former sometimes is tricky thanks to C being a bit
  messy with the distinction between lvalues and pointers, which led to
  a rather ugly solution for cpumask_var_t involving the introduction of
  this_cpu_cpumask_var_ptr().

  This converts most of the uses but not all.  Christoph will follow up
  with the remaining conversions in this merge window and hopefully
  remove the obsolete accessors"

* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
  irqchip: Properly fetch the per cpu offset
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
  ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
  Revert "powerpc: Replace __get_cpu_var uses"
  percpu: Remove __this_cpu_ptr
  clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
  sparc: Replace __get_cpu_var uses
  avr32: Replace __get_cpu_var with __this_cpu_write
  blackfin: Replace __get_cpu_var uses
  tile: Use this_cpu_ptr() for hardware counters
  tile: Replace __get_cpu_var uses
  powerpc: Replace __get_cpu_var uses
  alpha: Replace __get_cpu_var
  ia64: Replace __get_cpu_var uses
  s390: cio driver &__get_cpu_var replacements
  s390: Replace __get_cpu_var uses
  mips: Replace __get_cpu_var uses
  MIPS: Replace __get_cpu_var uses in FPU emulator.
  arm: Replace __this_cpu_ptr with raw_cpu_ptr
  ...
2014-10-15 07:48:18 +02:00
Linus Torvalds dfe2c6dcc8 Merge branch 'akpm' (patches from Andrew Morton)
Merge second patch-bomb from Andrew Morton:
 - a few hotfixes
 - drivers/dma updates
 - MAINTAINERS updates
 - Quite a lot of lib/ updates
 - checkpatch updates
 - binfmt updates
 - autofs4
 - drivers/rtc/
 - various small tweaks to less used filesystems
 - ipc/ updates
 - kernel/watchdog.c changes

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits)
  mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
  kernel/param: consolidate __{start,stop}___param[] in <linux/moduleparam.h>
  ia64: remove duplicate declarations of __per_cpu_start[] and __per_cpu_end[]
  frv: remove unused declarations of __start___ex_table and __stop___ex_table
  kvm: ensure hard lockup detection is disabled by default
  kernel/watchdog.c: control hard lockup detection default
  staging: rtl8192u: use %*pEn to escape buffer
  staging: rtl8192e: use %*pEn to escape buffer
  staging: wlan-ng: use %*pEhp to print SN
  lib80211: remove unused print_ssid()
  wireless: hostap: proc: print properly escaped SSID
  wireless: ipw2x00: print SSID via %*pE
  wireless: libertas: print esaped string via %*pE
  lib/vsprintf: add %*pE[achnops] format specifier
  lib / string_helpers: introduce string_escape_mem()
  lib / string_helpers: refactoring the test suite
  lib / string_helpers: move documentation to c-file
  include/linux: remove strict_strto* definitions
  arch/x86/mm/numa.c: fix boot failure when all nodes are hotpluggable
  fs: check bh blocknr earlier when searching lru
  ...
2014-10-14 03:54:50 +02:00
Linus Torvalds 77654908ff Merge branches 'x86-ras-for-linus', 'x86-uv-for-linus' and 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 ras, uv and vdso fixlets from Ingo Molnar:
 "ras: tone down a kernel message to only occur during initial bootup,
    not during suspend/resume cycles.

  uv: a cleanup commit

  vdso: a fix to error checking"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Avoid showing repetitive message from intel_init_thermal()

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/uv: Remove unnecessary #ifdef

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Fix vdso2c's special_pages[] error checking
2014-10-14 02:31:22 +02:00
Linus Torvalds 2fd7476de9 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc smaller fixes that missed the v3.17 cycle"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Add arch/x86/purgatory/ make generated files to gitignore
  x86: Fix section conflict for numachip
  x86: Reject x32 executables if x32 ABI not supported
  x86_64, entry: Filter RFLAGS.NT on entry from userspace
  x86, boot, kaslr: Fix nuisance warning on 32-bit builds
2014-10-14 02:28:16 +02:00
Linus Torvalds ba1a96fc7d Merge branch 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 seccomp changes from Ingo Molnar:
 "This tree includes x86 seccomp filter speedups and related preparatory
  work, which touches core seccomp facilities as well.

  The main idea is to split seccomp into two phases, to be able to enter
  a simple fast path for syscalls with ptrace side effects.

  There's no substantial user-visible (and ABI) effects expected from
  this, except a change in how we emit a better audit record for
  SECCOMP_RET_TRACE events"

* 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls
  x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls
  x86: Split syscall_trace_enter into two phases
  x86, entry: Only call user_exit if TIF_NOHZ
  x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit
  seccomp: Document two-phase seccomp and arch-provided seccomp_data
  seccomp: Allow arch code to provide seccomp_data
  seccomp: Refactor the filter callback and the API
  seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing
2014-10-14 02:27:06 +02:00
Linus Torvalds f1bfbd984b Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
 "The main changes in this tree are:

   - fix and update Intel Quark [Galileo] SoC platform support

   - update IOSF chipset side band interface and make it available via
     debugfs

   - enable HPETs on Soekris net6501 and other e6xx based systems"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Add cpu_detect_cache_sizes to init_intel() add Quark legacy_cache()
  x86: Quark: Comment setup_arch() to document TLB/PGE bug
  x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead
  x86/platform/intel/iosf: Add debugfs config option for IOSF
  x86/platform/intel/iosf: Add better description of IOSF driver in config
  x86/platform/intel/iosf: Add Braswell PCI ID
  x86/platform/pmc_atom: Fix warning when CONFIG_DEBUG_FS=n
  x86: HPET force enable for e6xx based systems
  x86/iosf: Add debugfs support
  x86/iosf: Add Kconfig prompt for IOSF_MBI selection
2014-10-14 02:23:55 +02:00
Linus Torvalds df133e8fa8 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "This tree includes the following changes:

   - fix memory hotplug
   - fix hibernation bootup memory layout assumptions
   - fix hyperv numa guest kernel messages
   - remove dead code
   - update documentation"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Update memory map description to list hypervisor-reserved area
  x86/mm, hibernate: Do not assume the first e820 area to be RAM
  x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data()
  x86/mm/hotplug: Modify PGD entry when removing memory
  x86/mm/hotplug: Pass sync_global_pgds() a correct argument in remove_pagetable()
  x86: Remove set_pmd_pfn
2014-10-14 02:22:41 +02:00
Linus Torvalds e3438330f5 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Ingo Molnar:
 "Misc smaller cleanups"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, intel: Fix total_size computation
  x86, microcode, intel: Rename apply_microcode and declare it static
  x86, microcode, intel: Fix typos
  x86, microcode, intel: Add missing static declarations
  x86, microcode, amd: Fix missing static declaration
2014-10-14 02:21:51 +02:00
Linus Torvalds c7b228adca Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FPU updates from Ingo Molnar:
 "x86 FPU handling fixes, cleanups and enhancements from Oleg.

  The signal handling race fix and the __restore_xstate_sig() preemption
  fix for eager-mode is marked for -stable as well"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: copy_thread: Don't nullify ->ptrace_bps twice
  x86, fpu: Shift "fpu_counter = 0" from copy_thread() to arch_dup_task_struct()
  x86, fpu: copy_process: Sanitize fpu->last_cpu initialization
  x86, fpu: copy_process: Avoid fpu_alloc/copy if !used_math()
  x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu()
  x86, fpu: __restore_xstate_sig()->math_state_restore() needs preempt_disable()
  x86, fpu: shift drop_init_fpu() from save_xstate_sig() to handle_signal()
2014-10-14 02:20:50 +02:00
Linus Torvalds 708d0b41a2 Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpufeature updates from Ingo Molnar:
 "This tree includes the following changes:

   - Introduce DISABLED_MASK to list disabled CPU features, to simplify
     CPU feature handling and avoid excessive #ifdefs

   - Remove the lightly used cpu_has_pae() primitive"

* 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Add more disabled features
  x86: Introduce disabled-features
  x86: Axe the lightly-used cpu_has_pae
2014-10-14 02:19:47 +02:00
Ulrich Obergfell 9919e39a17 kvm: ensure hard lockup detection is disabled by default
Use watchdog_enable_hardlockup_detector() to set hard lockup detection's
default value to false.  It's risky to run this detection in a guest, as
false positives are easy to trigger, especially if the host is
overcommitted.

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:27 +02:00
Andrew Morton e48510f451 arch/x86/kernel/cpu/common.c: fix unused symbol warning
x86_64 allnoconfig:

arch/x86/kernel/cpu/common.c:968: warning: 'syscall32_cpu_init' defined but not used

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:23 +02:00
Vivek Goyal f8da964dfb kexec-bzimage64: fix sparse warnings
David Howells brought to my attention the mails generated by kbuild test
bot and following sparse warnings were present.  This patch fixes these
warnings.

  arch/x86/kernel/kexec-bzimage64.c:270:5: warning: symbol 'bzImage64_probe' was not declared. Should it be static?
  arch/x86/kernel/kexec-bzimage64.c:328:6: warning: symbol 'bzImage64_load' was not declared. Should it be static?
  arch/x86/kernel/kexec-bzimage64.c:517:5: warning: symbol 'bzImage64_cleanup' was not declared. Should it be static?
  arch/x86/kernel/kexec-bzimage64.c:531:5: warning: symbol 'bzImage64_verify_sig' was not declared. Should it be static?
  arch/x86/kernel/kexec-bzimage64.c:546:23: warning: symbol 'kexec_bzImage64_ops' was not declared. Should it be static?

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:21 +02:00
Baoquan He a2d6aa8fa0 kexec: check if crashk_res_low exists when exclude it from crash mem ranges
Add a check if crashk_res_low exists just like GART region does.  If
crashk_res_low doesn't exist, calling exclude_mem_range is unnecessary.

Meanwhile, since crashk_res_low has been initialized at definition, it's
safe just use "if (crashk_low_res.end)" to check if it's exist.  And this
can make it consistent with other places of check.

Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:21 +02:00
Linus Torvalds f1d0d14120 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu offlining patch from Ingo Molnar:
 "This tree includes a single commit that speeds up x86 suspend/resume
  by replacing a naive 100msec sleep based polling loop with proper
  completion notification.

  This gives some real suspend/resume benefit on servers with larger
  core counts"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/smpboot: Speed up suspend/resume by avoiding 100ms sleep for CPU offline during S3
2014-10-13 18:20:39 +02:00
Linus Torvalds 19e00d593e Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 bootup updates from Ingo Molnar:
 "The changes in this cycle were:

   - Fix rare SMP-boot hang (mostly in virtual environments)

   - Fix build warning with certain (rare) toolchains"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/relocs: Make per_cpu_load_addr static
  x86/smpboot: Initialize secondary CPU only if master CPU will wait for it
2014-10-13 18:16:32 +02:00
Linus Torvalds 197fe6b0e6 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "The changes in this cycle were:

   - Speed up the x86 __preempt_schedule() implementation
   - Fix/improve low level asm code debug info annotations"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Unwind-annotate thunk_32.S
  x86: Improve cmpxchg8b_emu.S
  x86: Improve cmpxchg16b_emu.S
  x86/lib/Makefile: Remove the unnecessary "+= thunk_64.o"
  x86: Speed up ___preempt_schedule*() by using THUNK helpers
2014-10-13 18:14:50 +02:00
Linus Torvalds faafcba3b5 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
     Hansen)

   - Various sched/idle refinements for better idle handling (Nicolas
     Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)

   - sched/numa updates and optimizations (Rik van Riel)

   - sysbench speedup (Vincent Guittot)

   - capacity calculation cleanups/refactoring (Vincent Guittot)

   - Various cleanups to thread group iteration (Oleg Nesterov)

   - Double-rq-lock removal optimization and various refactorings
     (Kirill Tkhai)

   - various sched/deadline fixes

  ... and lots of other changes"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
  sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
  sched/fair: Delete resched_cpu() from idle_balance()
  sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
  sched: Improve sysbench performance by fixing spurious active migration
  sched/x86: Fix up typo in topology detection
  x86, sched: Add new topology for multi-NUMA-node CPUs
  sched/rt: Use resched_curr() in task_tick_rt()
  sched: Use rq->rd in sched_setaffinity() under RCU read lock
  sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
  sched: Use dl_bw_of() under RCU read lock
  sched/fair: Remove duplicate code from can_migrate_task()
  sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
  sched: print_rq(): Don't use tasklist_lock
  sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
  sched: Fix the task-group check in tg_has_rt_tasks()
  sched/fair: Leverage the idle state info when choosing the "idlest" cpu
  sched: Let the scheduler see CPU idle states
  sched/deadline: Fix inter- exclusive cpusets migrations
  sched/deadline: Clear dl_entity params when setscheduling to different class
  sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
  ...
2014-10-13 16:23:15 +02:00
Linus Torvalds 9d9420f120 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side updates:

   - Fix and enhance poll support (Jiri Olsa)

   - Re-enable inheritance optimization (Jiri Olsa)

   - Enhance Intel memory events support (Stephane Eranian)

   - Refactor the Intel uncore driver to be more maintainable (Zheng
     Yan)

   - Enhance and fix Intel CPU and uncore PMU drivers (Peter Zijlstra,
     Andi Kleen)

   - [ plus various smaller fixes/cleanups ]

  User visible tooling updates:

   - Add +field argument support for --field option, so that one can add
     fields to the default list of fields to show, ie now one can just
     do:

         perf report --fields +pid

     And the pid will appear in addition to the default fields (Jiri
     Olsa)

   - Add +field argument support for --sort option (Jiri Olsa)

   - Honour -w in the report tools (report, top), allowing to specify
     the widths for the histogram entries columns (Namhyung Kim)

   - Properly show submicrosecond times in 'perf kvm stat' (Christian
     Borntraeger)

   - Add beautifier for mremap flags param in 'trace' (Alex Snast)

   - perf script: Allow callchains if any event samples them

   - Don't truncate Intel style addresses in 'annotate' (Alex Converse)

   - Allow profiling when kptr_restrict == 1 for non root users, kernel
     samples will just remain unresolved (Andi Kleen)

   - Allow configuring default options for callchains in config file
     (Namhyung Kim)

   - Support operations for shared futexes.  (Davidlohr Bueso)

   - "perf kvm stat report" improvements by Alexander Yarygin:
       -  Save pid string in opts.target.pid
       -  Enable the target.system_wide flag
       -  Unify the title bar output

   - [ plus lots of other fixes and small improvements.  ]

  Tooling infrastructure changes:

   - Refactor unit and scale function parameters for PMU parsing
     routines (Matt Fleming)

   - Improve DSO long names lookup with rbtree, resulting in great
     speedup for workloads with lots of DSOs (Waiman Long)

   - We were not handling POLLHUP notifications for event file
     descriptors

     Fix it by filtering entries in the events file descriptor array
     after poll() returns, refcounting mmaps so that when the last fd
     pointing to a perf mmap goes away we do the unmap (Arnaldo Carvalho
     de Melo)

   - Intel PT prep work, from Adrian Hunter, including:
       - Let a user specify a PMU event without any config terms
       - Add perf-with-kcore script
       - Let default config be defined for a PMU
       - Add perf_pmu__scan_file()
       - Add a 'perf test' for tracking with sched_switch
       - Add 'flush' callback to scripting API

   - Use ring buffer consume method to look like other tools (Arnaldo
     Carvalho de Melo)

   - hists browser (used in top and report) refactorings, getting rid of
     unused variables and reducing source code size by handling similar
     cases in a fewer functions (Namhyung Kim).

   - Replace thread unsafe strerror() with strerror_r() accross the
     whole tools/perf/ tree (Masami Hiramatsu)

   - Rename ordered_samples to ordered_events and allow setting a queue
     size for ordering events (Jiri Olsa)

   - [ plus lots of fixes, cleanups and other improvements ]"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (198 commits)
  perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment
  perf/x86/intel/uncore: Fix minor race in box set up
  perf record: Fix error message for --filter option not coming after tracepoint
  perf tools: Fix build breakage on arm64 targets
  perf symbols: Improve DSO long names lookup speed with rbtree
  perf symbols: Encapsulate dsos list head into struct dsos
  perf bench futex: Sanitize -q option in requeue
  perf bench futex: Support operations for shared futexes
  perf trace: Fix mmap return address truncation to 32-bit
  perf tools: Refactor unit and scale function parameters
  perf tools: Fix line number in the config file error message
  perf tools: Convert {record,top}.call-graph option to call-graph.record-mode
  perf tools: Introduce perf_callchain_config()
  perf callchain: Move some parser functions to callchain.c
  perf tools: Move callchain config from record_opts to callchain_param
  perf hists browser: Fix callchain print bug on TUI
  perf tools: Use ACCESS_ONCE() instead of volatile cast
  perf tools: Modify error code for when perf_session__new() fails
  perf tools: Fix perf record as non root with kptr_restrict == 1
  perf stat: Fix --per-core on multi socket systems
  ...
2014-10-13 15:58:15 +02:00
Linus Torvalds b528392669 ACPI and power management updates for 3.18-rc1
- Rework the handling of wakeup IRQs by the IRQ core such that
    all of them will be switched over to "wakeup" mode in
    suspend_device_irqs() and in that mode the first interrupt
    will abort system suspend in progress or wake up the system
    if already in suspend-to-idle (or equivalent) without executing
    any interrupt handlers.  Among other things that eliminates the
    wakeup-related motivation to use the IRQF_NO_SUSPEND interrupt
    flag with interrupts which don't really need it and should not
    use it (Thomas Gleixner and Rafael J Wysocki).
 
  - Switch over ACPI to handling wakeup interrupts with the help
    of the new mechanism introduced by the above IRQ core rework
    (Rafael J Wysocki).
 
  - Rework the core generic PM domains code to eliminate code that's
    not used, add DT support and add a generic mechanism by which
    devices can be added to PM domains automatically during
    enumeration (Ulf Hansson, Geert Uytterhoeven and Tomasz Figa).
 
  - Add debugfs-based mechanics for debugging generic PM domains
    (Maciej Matraszek).
 
  - ACPICA update to upstream version 20140828.  Included are updates
    related to the SRAT and GTDT tables and the _PSx methods are in
    the METHOD_NAME list now (Bob Moore and Hanjun Guo).
 
  - Add _OSI("Darwin") support to the ACPI core (unfortunately, that
    can't really be done in a straightforward way) to prevent
    Thunderbolt from being turned off on Apple systems after boot
    (or after resume from system suspend) and rework the ACPI Smart
    Battery Subsystem (SBS) driver to work correctly with Apple
    platforms (Matthew Garrett and Andreas Noever).
 
  - ACPI LPSS (Low-Power Subsystem) driver update cleaning up the
    code, adding support for 133MHz I2C source clock on Intel Baytrail
    to it and making it avoid using UART RTS override with Auto Flow
    Control (Heikki Krogerus).
 
  - ACPI backlight updates removing the video_set_use_native_backlight
    quirk which is not necessary any more, making the code check the
    list of output devices returned by the _DOD method to avoid
    creating acpi_video interfaces that won't work and adding a quirk
    for Lenovo Ideapad Z570 (Hans de Goede, Aaron Lu and Stepan Bujnak).
 
  - New Win8 ACPI OSI quirks for some Dell laptops (Edward Lin).
 
  - Assorted ACPI code cleanups (Fabian Frederick, Rasmus Villemoes,
    Sudip Mukherjee, Yijing Wang, and Zhang Rui).
 
  - cpufreq core updates and cleanups (Viresh Kumar, Preeti U Murthy,
    Rasmus Villemoes).
 
  - cpufreq driver updates: cpufreq-cpu0/cpufreq-dt (driver name
    change among other things), ppc-corenet, powernv (Viresh Kumar,
    Preeti U Murthy, Shilpasri G Bhat, Lucas Stach).
 
  - cpuidle support for DT-based idle states infrastructure, new
    ARM64 cpuidle driver, cpuidle core cleanups (Lorenzo Pieralisi,
    Rasmus Villemoes).
 
  - ARM big.LITTLE cpuidle driver updates: support for DT-based
    initialization and Exynos5800 compatible string (Lorenzo Pieralisi,
    Kevin Hilman).
 
  - Rework of the test_suspend kernel command line argument and
    a new trace event for console resume (Srinivas Pandruvada,
    Todd E Brandt).
 
  - Second attempt to optimize swsusp_free() (hibernation core) to
    make it avoid going through all PFNs which may be way too slow on
    some systems (Joerg Roedel).
 
  - devfreq updates (Paul Bolle, Punit Agrawal, Ãrjan Eide).
 
  - rockchip-io Adaptive Voltage Scaling (AVS) driver and AVS
    entry update in MAINTAINERS (Heiko Stübner, Kevin Hilman).
 
  - PM core fix related to clock management (Geert Uytterhoeven).
 
  - PM core's sysfs code cleanup (Johannes Berg).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJUNbJoAAoJEILEb/54YlRxRp8QAJyGIPdx+f03oBir+7vvEwhY
 svxd+V9xXK0UgWNGkCvlMk/1RIVy0qqtXliUrDaE+9tcHACA9+iAxMmNmDsjLOiO
 gpazuz5kgeznrmp1eNwQnYTt+OCReQIcyCsj4q4fNo9bbETTyr2bRz226LEuZekC
 TAiKdphYoOszFBgTVg5gfu+lqjHyXjgXPnwMTlRYn1y4YL2adDIgxj9cFedykTTW
 Eu593TY2dH6ovERJ6q3qxZbRuWuxtww95J07b3t2/2Eb3e/R/zlX0/XJ/C88f/m2
 DkqngbOYqCdw+zJeN6k8631foyfUwAcTd0sJ1+5nsm5H4NE5NqObjbxOk5/yNht6
 HgvgISGHWLerEw+A/Dk6o0oZOtR1G/TAQ5qQk5nUfKT/sSoU+9/USsXtWhXwZCia
 XccnJgW6ZtPrJJP3zDnkrxe3gndmLic11QXArw2IhWTsq0sZlAyMgtauBXLdDiQa
 H/AMiYrUNmIABef1cirBLTtgXN4Zbsai9vIrxMmV7OgBrclrh52NTjzr05P5Hnl2
 fRK56mb6mP59LymI7n8fyXL8tHnbNwFvTaxuvrZmzcYbzL0l9DuPocJrrTHRSfhm
 GFfzfvLj0R66ZM4PthRSwz4H2v1FnlRcCkj5k/QjtBPlyzxtOnJveqve5umbrnb9
 T5mRmlAs4iYwLuKCVVNT
 =sIv/
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management updates from Rafael Wysocki:
 "Features-wise, to me the most important this time is a rework of
  wakeup interrupts handling in the core that makes them work
  consistently across all of the available sleep states, including
  suspend-to-idle.  Many thanks to Thomas Gleixner for his help with
  this work.

  Second is an update of the generic PM domains code that has been in
  need of some care for quite a while.  Unused code is being removed, DT
  support is being added and domains are now going to be attached to
  devices in bus type code in analogy with the ACPI PM domain.  The
  majority of work here was done by Ulf Hansson who also has been the
  most active developer this time.

  Apart from this we have a traditional ACPICA update, this time to
  upstream version 20140828 and a few ACPI wakeup interrupts handling
  patches on top of the general rework mentioned above.  There also are
  several cpufreq commits including renaming the cpufreq-cpu0 driver to
  cpufreq-dt, as this is what implements generic DT-based cpufreq
  support, and a new DT-based idle states infrastructure for cpuidle.

  In addition to that, the ACPI LPSS driver is updated, ACPI support for
  Apple machines is improved, a few bugs are fixed and a few cleanups
  are made all over.

  Finally, the Adaptive Voltage Scaling (AVS) subsystem now has a tree
  maintained by Kevin Hilman that will be merged through the PM tree.

  Numbers-wise, the generic PM domains update takes the lead this time
  with 32 non-merge commits, second is cpufreq (15 commits) and the 3rd
  place goes to the wakeup interrupts handling rework (13 commits).

  Specifics:

   - Rework the handling of wakeup IRQs by the IRQ core such that all of
     them will be switched over to "wakeup" mode in suspend_device_irqs()
     and in that mode the first interrupt will abort system suspend in
     progress or wake up the system if already in suspend-to-idle (or
     equivalent) without executing any interrupt handlers.  Among other
     things that eliminates the wakeup-related motivation to use the
     IRQF_NO_SUSPEND interrupt flag with interrupts which don't really
     need it and should not use it (Thomas Gleixner and Rafael Wysocki)

   - Switch over ACPI to handling wakeup interrupts with the help of the
     new mechanism introduced by the above IRQ core rework (Rafael Wysocki)

   - Rework the core generic PM domains code to eliminate code that's
     not used, add DT support and add a generic mechanism by which
     devices can be added to PM domains automatically during enumeration
     (Ulf Hansson, Geert Uytterhoeven and Tomasz Figa).

   - Add debugfs-based mechanics for debugging generic PM domains
     (Maciej Matraszek).

   - ACPICA update to upstream version 20140828.  Included are updates
     related to the SRAT and GTDT tables and the _PSx methods are in the
     METHOD_NAME list now (Bob Moore and Hanjun Guo).

   - Add _OSI("Darwin") support to the ACPI core (unfortunately, that
     can't really be done in a straightforward way) to prevent
     Thunderbolt from being turned off on Apple systems after boot (or
     after resume from system suspend) and rework the ACPI Smart Battery
     Subsystem (SBS) driver to work correctly with Apple platforms
     (Matthew Garrett and Andreas Noever).

   - ACPI LPSS (Low-Power Subsystem) driver update cleaning up the code,
     adding support for 133MHz I2C source clock on Intel Baytrail to it
     and making it avoid using UART RTS override with Auto Flow Control
     (Heikki Krogerus).

   - ACPI backlight updates removing the video_set_use_native_backlight
     quirk which is not necessary any more, making the code check the
     list of output devices returned by the _DOD method to avoid
     creating acpi_video interfaces that won't work and adding a quirk
     for Lenovo Ideapad Z570 (Hans de Goede, Aaron Lu and Stepan Bujnak)

   - New Win8 ACPI OSI quirks for some Dell laptops (Edward Lin)

   - Assorted ACPI code cleanups (Fabian Frederick, Rasmus Villemoes,
     Sudip Mukherjee, Yijing Wang, and Zhang Rui)

   - cpufreq core updates and cleanups (Viresh Kumar, Preeti U Murthy,
     Rasmus Villemoes)

   - cpufreq driver updates: cpufreq-cpu0/cpufreq-dt (driver name change
     among other things), ppc-corenet, powernv (Viresh Kumar, Preeti U
     Murthy, Shilpasri G Bhat, Lucas Stach)

   - cpuidle support for DT-based idle states infrastructure, new ARM64
     cpuidle driver, cpuidle core cleanups (Lorenzo Pieralisi, Rasmus
     Villemoes)

   - ARM big.LITTLE cpuidle driver updates: support for DT-based
     initialization and Exynos5800 compatible string (Lorenzo Pieralisi,
     Kevin Hilman)

   - Rework of the test_suspend kernel command line argument and a new
     trace event for console resume (Srinivas Pandruvada, Todd E Brandt)

   - Second attempt to optimize swsusp_free() (hibernation core) to make
     it avoid going through all PFNs which may be way too slow on some
     systems (Joerg Roedel)

   - devfreq updates (Paul Bolle, Punit Agrawal, Ãrjan Eide).

   - rockchip-io Adaptive Voltage Scaling (AVS) driver and AVS entry
     update in MAINTAINERS (Heiko Stübner, Kevin Hilman)

   - PM core fix related to clock management (Geert Uytterhoeven)

   - PM core's sysfs code cleanup (Johannes Berg)"

* tag 'pm+acpi-3.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (105 commits)
  ACPI / fan: printk replacement
  PM / clk: Fix crash in clocks management code if !CONFIG_PM_RUNTIME
  PM / Domains: Rename cpu_data to cpuidle_data
  cpufreq: cpufreq-dt: fix potential double put of cpu OF node
  cpufreq: cpu0: rename driver and internals to 'cpufreq_dt'
  PM / hibernate: Iterate over set bits instead of PFNs in swsusp_free()
  cpufreq: ppc-corenet: remove duplicate update of cpu_data
  ACPI / sleep: Rework the handling of ACPI GPE wakeup from suspend-to-idle
  PM / sleep: Rename platform suspend/resume functions in suspend.c
  PM / sleep: Export dpm_suspend_late/noirq() and dpm_resume_early/noirq()
  ACPICA: Introduce acpi_enable_all_wakeup_gpes()
  ACPICA: Clear all non-wakeup GPEs in acpi_hw_enable_wakeup_gpe_block()
  ACPI / video: check _DOD list when creating backlight devices
  PM / Domains: Move dev_pm_domain_attach|detach() to pm_domain.h
  cpufreq: Replace strnicmp with strncasecmp
  cpufreq: powernv: Set the cpus to nominal frequency during reboot/kexec
  cpufreq: powernv: Set the pstate of the last hotplugged out cpu in policy->cpus to minimum
  cpufreq: Allow stop CPU callback to be used by all cpufreq drivers
  PM / devfreq: exynos: Enable building exynos PPMU as module
  PM / devfreq: Export helper functions for drivers
  ...
2014-10-09 16:07:43 -04:00
Linus Torvalds afa3536be8 Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
 "Main changes:

  - Fix the deadlock reported by Dave Jones et al
  - Clean up and fix nohz_full interaction with arch abilities
  - nohz init code consolidation/cleanup"

* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  nohz: nohz full depends on irq work self IPI support
  nohz: Consolidate nohz full init code
  arm64: Tell irq work about self IPI support
  arm: Tell irq work about self IPI support
  x86: Tell irq work about self IPI support
  irq_work: Force raised irq work to run on irq work interrupt
  irq_work: Introduce arch_irq_work_has_interrupt()
  nohz: Move nohz full init call to tick init
2014-10-09 06:30:57 -04:00
Linus Torvalds e4e65676f2 Fixes and features for 3.18.
Apart from the usual cleanups, here is the summary of new features:
 
 - s390 moves closer towards host large page support
 
 - PowerPC has improved support for debugging (both inside the guest and
   via gdbstub) and support for e6500 processors
 
 - ARM/ARM64 support read-only memory (which is necessary to put firmware
   in emulated NOR flash)
 
 - x86 has the usual emulator fixes and nested virtualization improvements
   (including improved Windows support on Intel and Jailhouse hypervisor
   support on AMD), adaptive PLE which helps overcommitting of huge guests.
   Also included are some patches that make KVM more friendly to memory
   hot-unplug, and fixes for rare caching bugs.
 
 Two patches have trivial mm/ parts that were acked by Rik and Andrew.
 
 Note: I will soon switch to a subkey for signing purposes.  To verify
 future signed pull requests from me, please update my key with
 "gpg --recv-keys 9B4D86F2".  You should see 3 new subkeys---the
 one for signing will be a 2048-bit RSA key, 4E6B09D7.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJUL5sPAAoJEBvWZb6bTYbyfkEP/3MNhSyn6HCjPjtjLNPAl9KL
 WpExZSUFL2+4CztpdGIsek1BeJYHmqv3+c5S+WvaWVA1aqh2R7FT1D1ErBLjgLQq
 lq23IOr+XxmC3dXQUEEk+TlD+283UzypzEG4l4UD3JYg79fE3UrXAz82SeyewJDY
 x7aPYhkZG3RHu+wAyMPasG6E3zS5LySdUtGWbiPwz5BejrhBJoJdeb2WIL/RwnUK
 7ppSLB5EoFj/uMkuyeAAdAbdfSrhHA6faDZxNdxS9k9wGutrhhfUoQ49ONrKG4dV
 sFo1tSPTVgRs8QFYUZ2fJUPBAmUVddsgqh2K9d0NftGTq7b8YszaCsfFrs2/Y4MU
 YxssWEhxsfszerCu12bbAJrv6JBZYQ7TwGvI9L7P0iFU6IVw/djmukU4AkM9/e91
 YS/cue/PN+9Pn2ccXzL9J7xRtZb8FsOuRsCXTCmbOwDkLmrKPDBN2t3RUbeF+Eam
 ABrpWnLKX13kZSo4LKU+/niarzmPMp7odQfHVdr8ea0fiYLp4iN8puA20WaSPIgd
 CLvm+RAvXe5Lm91L4mpFotJ2uFyK6QlIYJV4FsgeWv/0D0qppWQi0Utb/aCNHCgy
 z8MyUMD48y7EpoQrFYr/7cddXIu0/NegnM8I1coVjIPEk4NfeebGUlCJ/V3D8wMG
 BgEfS2x6jRc5zB3hjwDr
 =iEVi
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Fixes and features for 3.18.

  Apart from the usual cleanups, here is the summary of new features:

   - s390 moves closer towards host large page support

   - PowerPC has improved support for debugging (both inside the guest
     and via gdbstub) and support for e6500 processors

   - ARM/ARM64 support read-only memory (which is necessary to put
     firmware in emulated NOR flash)

   - x86 has the usual emulator fixes and nested virtualization
     improvements (including improved Windows support on Intel and
     Jailhouse hypervisor support on AMD), adaptive PLE which helps
     overcommitting of huge guests.  Also included are some patches that
     make KVM more friendly to memory hot-unplug, and fixes for rare
     caching bugs.

  Two patches have trivial mm/ parts that were acked by Rik and Andrew.

  Note: I will soon switch to a subkey for signing purposes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (157 commits)
  kvm: do not handle APIC access page if in-kernel irqchip is not in use
  KVM: s390: count vcpu wakeups in stat.halt_wakeup
  KVM: s390/facilities: allow TOD-CLOCK steering facility bit
  KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode
  arm/arm64: KVM: Report correct FSC for unsupported fault types
  arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc
  kvm: Fix kvm_get_page_retry_io __gup retval check
  arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset
  kvm: x86: Unpin and remove kvm_arch->apic_access_page
  kvm: vmx: Implement set_apic_access_page_addr
  kvm: x86: Add request bit to reload APIC access page address
  kvm: Add arch specific mmu notifier for page invalidation
  kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static
  kvm: Fix page ageing bugs
  kvm/x86/mmu: Pass gfn and level to rmapp callback.
  x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only
  kvm: x86: use macros to compute bank MSRs
  KVM: x86: Remove debug assertion of non-PAE reserved bits
  kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
  kvm: Faults which trigger IO release the mmap_sem
  ...
2014-10-08 05:27:39 -04:00
Andi Kleen 2dee5c43da x86: Fix section conflict for numachip
A variable cannot be both __read_mostly and const. This
is a meaningless combination.

Just make it only const.

This fixes the LTO build with numachip enabled.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1411533139-25708-1-git-send-email-andi@firstfloor.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-08 11:18:49 +02:00
Bryan O'Donoghue aece118e48 x86: Add cpu_detect_cache_sizes to init_intel() add Quark legacy_cache()
Intel processors which don't report cache information via cpuid(2)
or cpuid(4) need quirk code in the legacy_cache_size callback to
report this data. For Intel that callback is is intel_size_cache().

This patch enables calling of cpu_detect_cache_sizes() inside of
init_intel() and hence the calling of the legacy_cache callback in
intel_size_cache(). Adding this call will ensure that PIII Tualatin
currently in intel_size_cache() and Quark SoC X1000 being added to
intel_size_cache() in this patch will report their respective cache
sizes.

This model of calling cpu_detect_cache_sizes() is consistent with
AMD/Via/Cirix/Transmeta and Centaur.

Also added is a string to idenitfy the Quark as Quark SoC X1000
giving better and more descriptive output via /proc/cpuinfo

Adding cpu_detect_cache_sizes to init_intel() will enable calling
of intel_size_cache() on Intel processors which currently no code
can reach. Therefore this patch will also re-enable reporting
of PIII Tualatin cache size information as well as add
Quark SoC X1000 support.

Comment text and cache flow logic suggested by Thomas Gleixner

Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: davej@redhat.com
Cc: hmh@hmh.eng.br
Link: http://lkml.kernel.org/r/1412641189-12415-3-git-send-email-pure.logic@nexus-software.ie
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-08 10:07:46 +02:00
Bryan O'Donoghue 2075244f9b x86: Quark: Comment setup_arch() to document TLB/PGE bug
Quark SoC X1000 advertises Page Global Enable for it's
Translation Lookaside Buffer via cpuid. The silicon does not
in fact support PGE and hence will not flush the TLB when CR4.PGE
is rewritten. The Quark documentation makes clear the necessity to
instead rewrite CR3 in order to flush any TLB entries, irrespective
of the state of CR4.PGE or an individual PTE.PGE

See Intel Quark Core DevMan_001.pdf section 6.4.11

In setup.c setup_arch() the code will load_cr3() and then do a
__flush_tlb_all().

On Quark the entire TLB will be flushed at the load_cr3().
The __flush_tlb_all() have no effect and can be safely ignored.

Later on in the boot process we switch off the flag for cpu_has_pge()
which means that subsequent calls to __flush_tlb_all() will
call __flush_tlb() not __flush_tlb_global() flushing the TLB in the
correct way via load_cr3() not CR4.PGE rewrite

This patch documents the behaviour of flushing the TLB for Quark in
setup_arch()

Comment text suggested by Thomas Gleixner

Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: davej@redhat.com
Cc: hmh@hmh.eng.br
Link: http://lkml.kernel.org/r/1412641189-12415-2-git-send-email-pure.logic@nexus-software.ie
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-08 10:07:46 +02:00
Linus Torvalds 74da38631a Tinification for 3.18
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJUL0J0AAoJEA7Zo9+K/4c9w40P/iMFPfCethdBtPz5rI88CVr2
 7yU99TdbEPoRJm+rU4ohvHdB73p2KWINIKvpSThvegvjXbEcKxQkdpVWHsFJZeHS
 bZiYmhjxdCBvJGLrYo5IwqH0PrSjokTPzMUekUCk7BkUKNJRaDjfUBHvUmKsinUR
 dQL+3KE3edy6W3DL+FOd0QZwSOgmOfEibTWpfmg+n16kFNa75Kg/QLwjYRvtQplP
 eElywDZN07IhAeBFqKhKvlKmDSAeqMd8RfoPPo9Ts+reeIrWYjVNbl9ISOqXqy2x
 JoLeZQmwSXj/C9Ehr5e+aId2eO8In5xueQfXP8SS8dCC7VLwRbnNgyAQQZEslEBk
 QH0GhT6GqTamBdiNI3I+usfs65cEaialXh2afcoLwGS/iGD8MhZ8Dt+m4iyXNxEZ
 kT9VA4974mPjJ1g0mDDnYIxNjxF43m+SD5K1sR/XGpMcA8NdqMUmvKNcbePCobVa
 WTutIemQqGipNeWE94XwZEbc0B+aWwH7eiZOBMVGhWsHInd7QeTBTbfZlctyBkzf
 AswgsFjC5FW05CWK6J1Lf/UI1FD9PmHMKpmQUPED1+7okDTfqGjKjdREWgZSixUt
 LIRfWqWEaNpRRBFbDyt0C+F4pBRPLiRDaOyNhwEdtXuVGKRXb1G3qX7nFOJAZo6G
 GDTZo9iIRNSfm/M4tJ+n
 =2VyW
 -----END PGP SIGNATURE-----

Merge tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux

Pull "tinification" patches from Josh Triplett.

Work on making smaller kernels.

* tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux:
  bloat-o-meter: Ignore syscall aliases SyS_ and compat_SyS_
  mm: Support compiling out madvise and fadvise
  x86: Support compiling out human-friendly processor feature names
  x86: Drop support for /proc files when !CONFIG_PROC_FS
  x86, boot: Don't compile early_serial_console.c when !CONFIG_EARLY_PRINTK
  x86, boot: Don't compile aslr.c when !CONFIG_RANDOMIZE_BASE
  x86, boot: Use the usual -y -n mechanism for objects in vmlinux
  x86: Add "make tinyconfig" to configure the tiniest possible kernel
  x86, platform, kconfig: move kvmconfig functionality to a helper
2014-10-07 08:51:59 -04:00
Rafael J. Wysocki 88b42a4883 Merge branch 'pm-genirq'
* pm-genirq:
  PM / genirq: Document rules related to system suspend and interrupts
  PCI / PM: Make PCIe PME interrupts wake up from suspend-to-idle
  x86 / PM: Set IRQCHIP_SKIP_SET_WAKE for IOAPIC IRQ chip objects
  genirq: Simplify wakeup mechanism
  genirq: Mark wakeup sources as armed on suspend
  genirq: Create helper for flow handler entry check
  genirq: Distangle edge handler entry
  genirq: Avoid double loop on suspend
  genirq: Move MASK_ON_SUSPEND handling into suspend_device_irqs()
  genirq: Make use of pm misfeature accounting
  genirq: Add sanity checks for PM options on shared interrupt lines
  genirq: Move suspend/resume logic into irq/pm code
  PM / sleep: Mechanism for aborting system suspends unconditionally
2014-10-07 01:17:21 +02:00
Andy Lutomirski 8c7aa698ba x86_64, entry: Filter RFLAGS.NT on entry from userspace
The NT flag doesn't do anything in long mode other than causing IRET
to #GP.  Oddly, CPL3 code can still set NT using popf.

Entry via hardware or software interrupt clears NT automatically, so
the only relevant entries are fast syscalls.

If user code causes kernel code to run with NT set, then there's at
least some (small) chance that it could cause trouble.  For example,
user code could cause a call to EFI code with NT set, and who knows
what would happen?  Apparently some games on Wine sometimes do
this (!), and, if an IRET return happens, they will segfault.  That
segfault cannot be handled, because signal delivery fails, too.

This patch programs the CPU to clear NT on entry via SYSCALL (both
32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
in software on entry via SYSENTER.

To save a few cycles, this borrows a trick from Jan Beulich in Xen:
it checks whether NT is set before trying to clear it.  As a result,
it seems to have very little effect on SYSENTER performance on my
machine.

There's another minor bug fix in here: it looks like the CFI
annotations were wrong if CONFIG_AUDITSYSCALL=n.

Testers beware: on Xen, SYSENTER with NT set turns into a GPF.

I haven't touched anything on 32-bit kernels.

The syscall mask change comes from a variant of this patch by Anish
Bhatt.

Note to stable maintainers: there is no known security issue here.
A misguided program can set NT and cause the kernel to try and fail
to deliver SIGSEGV, crashing the program.  This patch fixes Far Cry
on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275

Cc: <stable@vger.kernel.org>
Reported-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-10-06 10:53:26 -07:00
Wei Huang cc6cd47e73 perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment
PMU checking can fail due to various reasons. On native machine, this
is mostly caused by faulty hardware and it is reasonable to use
KERN_ERR in reporting. However, when kernel is running on virtualized
environment, this checking can fail if virtual PMU is not supported
(e.g. KVM on AMD host). It is annoying to see an error message on
splash screen, even though we know such failure is benign on
virtualized environment.

This patch checks if the kernel is running in a virtualized environment.
If so, it will use KERN_INFO in reporting, which reduces the syslog
priority of them. This patch was tested successfully on KVM.

Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411617314-24659-1-git-send-email-wei@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03 06:04:41 +02:00
Andi Kleen 4f971248bc perf/x86/intel/uncore: Fix minor race in box set up
I was looking for the trinity oops cause in the uncore driver.
(so far didn't found it)

However I found this tiny race: when a box is set up two threads on the
same CPU, they may be setting up the box in parallel (e.g. with kernel
preemption). This could lead to the reference count being increasing
too much. Always recheck there is no existing cpu reference inside the lock.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411424826-15629-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03 06:02:49 +02:00
Dave Hansen 728e5653e6 sched/x86: Fix up typo in topology detection
Commit:

  cebf15eb09 ("x86, sched: Add new topology for multi-NUMA-node CPUs")

some code to try to detect the situation where we have a NUMA node
inside of the "DIE" sched domain.

It detected this by looking for cpus which match_die() but do not match
NUMA nodes via topology_same_node().

I wrote it up as:

	if (match_die(c, o) == !topology_same_node(c, o))

which actually seemed to work some of the time, albiet
accidentally.

It should have been doing an &&, not an ==.

This code essentially chopped off the "DIE" domain on one of
Andrew Morton's systems.  He reported that this patch fixed his
issue.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Lan Tianyu <tianyu.lan@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/20140930214546.FD481CFF@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03 05:46:52 +02:00
Oleg Nesterov 0ad6e3c519 x86: Speed up ___preempt_schedule*() by using THUNK helpers
___preempt_schedule() does SAVE_ALL/RESTORE_ALL but this is
suboptimal, we do not need to save/restore the callee-saved
register. And we already have arch/x86/lib/thunk_*.S which
implements the similar asm wrappers, so it makes sense to
redefine ___preempt_schedule() as "THUNK ..." and remove
preempt.S altogether.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140921184153.GA23727@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 15:15:38 +02:00
Wanpeng Li 03bd4e1f72 sched: Fix unreleased llc_shared_mask bit during CPU hotplug
The following bug can be triggered by hot adding and removing a large number of
xen domain0's vcpus repeatedly:

	BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group
	PGD 5a9d5067 PUD 13067 PMD 0
	Oops: 0000 [#3] SMP
	[...]
	Call Trace:
	load_balance
	? _raw_spin_unlock_irqrestore
	idle_balance
	__schedule
	schedule
	schedule_timeout
	? lock_timer_base
	schedule_timeout_uninterruptible
	msleep
	lock_device_hotplug_sysfs
	online_store
	dev_attr_store
	sysfs_write_file
	vfs_write
	SyS_write
	system_call_fastpath

Last level cache shared mask is built during CPU up and the
build_sched_domain() routine takes advantage of it to setup
the sched domain CPU topology.

However, llc_shared_mask is not released during CPU disable,
which leads to an invalid sched domainCPU topology.

This patch fix it by releasing the llc_shared_mask correctly
during CPU disable.

Yasuaki also reported that this can happen on real hardware:

  https://lkml.org/lkml/2014/7/22/1018

His case is here:

	==
	Here is an example on my system.
	My system has 4 sockets and each socket has 15 cores and HT is
	enabled. In this case, each core of sockes is numbered as
	follows:

		 | CPU#
	Socket#0 | 0-14 , 60-74
	Socket#1 | 15-29, 75-89
	Socket#2 | 30-44, 90-104
	Socket#3 | 45-59, 105-119

	Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.

	It means that last level cache of Socket#2 is shared with
	CPU#30-44 and 90-104.

	When hot-removing socket#2 and #3, each core of sockets is
	numbered as follows:

		 | CPU#
	Socket#0 | 0-14 , 60-74
	Socket#1 | 15-29, 75-89

	But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30
	remains having 0x3fff80000001fffc0000000.

	After that, when hot-adding socket#2 and #3, each core of
	sockets is numbered as follows:

		 | CPU#
	Socket#0 | 0-14 , 60-74
	Socket#1 | 15-29, 75-89
	Socket#2 | 30-59
	Socket#3 | 90-119

	Then llc_shared_mask of CPU#30 becomes
	0x3fff8000fffffffc0000000. It means that last level cache of
	Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
	the wrong value.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Tested-by: Linn Crosetto <linn@hp.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 15:13:20 +02:00