* 'x86-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, rwsem: Minor cleanups
x86, rwsem: Stay on fast path when count > 0 in __up_write()
* 'x86-gcc46-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, gcc-4.6: Fix set but not read variables
x86, gcc-4.6: Avoid unused by set variables in rdmsr
* 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mrst: make mrst_timer_options an enum
x86, mrst: make mrst_identify_cpu() an inline returning enum
x86, mrst: add more timer config options
x86, mrst: add cpu type detection
x86: detect scattered cpuid features earlier
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits)
tracing/kprobes: unregister_trace_probe needs to be called under mutex
perf: expose event__process function
perf events: Fix mmap offset determination
perf, powerpc: fsl_emb: Restore setting perf_sample_data.period
perf, powerpc: Convert the FSL driver to use local64_t
perf tools: Don't keep unreferenced maps when unmaps are detected
perf session: Invalidate last_match when removing threads from rb_tree
perf session: Free the ref_reloc_sym memory at the right place
x86,mmiotrace: Add support for tracing STOS instruction
perf, sched migration: Librarize task states and event headers helpers
perf, sched migration: Librarize the GUI class
perf, sched migration: Make the GUI class client agnostic
perf, sched migration: Make it vertically scrollable
perf, sched migration: Parameterize cpu height and spacing
perf, sched migration: Fix key bindings
perf, sched migration: Ignore unhandled task states
perf, sched migration: Handle ignored migrate out events
perf: New migration tool overview
tracing: Drop cpparg() macro
perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call
...
Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
Revert "net: Make accesses to ->br_port safe for sparse RCU"
mce: convert to rcu_dereference_index_check()
net: Make accesses to ->br_port safe for sparse RCU
vfs: add fs.h to define struct file
lockdep: Add an in_workqueue_context() lockdep-based test function
rcu: add __rcu API for later sparse checking
rcu: add an rcu_dereference_index_check()
tree/tiny rcu: Add debug RCU head objects
mm: remove all rcu head initializations
fs: remove all rcu head initializations, except on_stack initializations
powerpc: remove all rcu head initializations
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/amd-iommu: Export cache-coherency capability
iommu-api: Extension to check for interrupt remapping
x86/amd-iommu: Use for_each_pci_dev()
This fixes a regression in 2.6.35 from 2.6.34, that is
present for select models of Intel cpus when people are
using an MP table.
The commit cf7500c0ea
"x86, ioapic: In mpparse use mp_register_ioapic" started
calling mp_register_ioapic from MP_ioapic_info. An extremely
simple change that was obviously correct. Unfortunately
mp_register_ioapic did just a little more than the previous
hand crafted code and so we gained this call path.
The problem call path is:
MP_ioapic_info()
mp_register_ioapic()
io_apic_unique_id()
io_apic_get_unique_id()
get_physical_broadcast()
modern_apic()
lapic_get_version()
apic_read(APIC_LVR)
Which turned out to be a problem because the local apic
was not mapped, at that point, unlike the similar point
in the ACPI parsing code.
This problem is fixed by mapping the local apic when
parsing the mptable as soon as we reasonably can.
Looking at the number of places we setup the fixmap for
the local apic, I see some serious simplification opportunities.
For the moment except for not duplicating the setting up of the
fixmap in init_apic_mappings, I have not acted on them.
The regression from 2.6.34 is tracked in bug
https://bugzilla.kernel.org/show_bug.cgi?id=16173
Cc: <stable@kernel.org> 2.6.35
Reported-by: David Hill <hilld@binarystorm.net>
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@sophos.com>
Tested-by: Tvrtko Ursulin <tvrtko.ursulin@sophos.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <m1eiee86jg.fsf_-_@fess.ebiederm.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
debug_core,kdb: fix crash when arch does not have single step
kgdb,x86: use macro HBP_NUM to replace magic number 4
kgdb,mips: remove unused kgdb_cpu_doing_single_step operations
mm,kdb,kgdb: Add a debug reference for the kdb kmap usage
KGDB: Remove set but unused newPC
ftrace,kdb: Allow dumping a specific cpu's buffer with ftdump
ftrace,kdb: Extend kdb to be able to dump the ftrace buffer
kgdb,powerpc: Replace hardcoded offset by BREAK_INSTR_SIZE
arm,kgdb: Add ability to trap into debugger on notify_die
gdbstub: do not directly use dbg_reg_def[] in gdb_cmd_reg_set()
gdbstub: Implement gdbserial 'p' and 'P' packets
kgdb,arm: Individual register get/set for arm
kgdb,mips: Individual register get/set for mips
kgdb,x86: Individual register get/set for x86
kgdb,kdb: individual register set and and get API
gdbstub: Optimize kgdb's "thread:" response for the gdb serial protocol
kgdb: remove custom hex_to_bin()implementation
* 'upstream/xen' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: (23 commits)
xen/panic: use xen_reboot and fix smp_send_stop
Xen: register panic notifier to take crashes of xen guests on panic
xen: support large numbers of CPUs with vcpu info placement
xen: drop xen_sched_clock in favour of using plain wallclock time
pvops: do not notify callers from register_xenstore_notifier
Introduce CONFIG_XEN_PVHVM compile option
blkfront: do not create a PV cdrom device if xen_hvm_guest
support multiple .discard.* sections to avoid section type conflicts
xen/pvhvm: fix build problem when !CONFIG_XEN
xenfs: enable for HVM domains too
x86: Call HVMOP_pagetable_dying on exit_mmap.
x86: Unplug emulated disks and nics.
x86: Use xen_vcpuop_clockevent, xen_clocksource and xen wallclock.
implement O_NONBLOCK for /proc/xen/xenbus
xen: Fix find_unbound_irq in presence of ioapic irqs.
xen: Add suspend/resume support for PV on HVM guests.
xen: Xen PCI platform device driver.
x86/xen: event channels delivery on HVM.
x86: early PV on HVM features initialization.
xen: Add support for HVM hypercalls.
...
Use the macros provided by the HW breakpoint API.
Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Implement the ability to individually get and set registers for kdb
and kgdb for x86.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: x86@kernel.org
* upstream/pvhvm:
Introduce CONFIG_XEN_PVHVM compile option
blkfront: do not create a PV cdrom device if xen_hvm_guest
support multiple .discard.* sections to avoid section type conflicts
xen/pvhvm: fix build problem when !CONFIG_XEN
xenfs: enable for HVM domains too
x86: Call HVMOP_pagetable_dying on exit_mmap.
x86: Unplug emulated disks and nics.
x86: Use xen_vcpuop_clockevent, xen_clocksource and xen wallclock.
xen: Fix find_unbound_irq in presence of ioapic irqs.
xen: Add suspend/resume support for PV on HVM guests.
xen: Xen PCI platform device driver.
x86/xen: event channels delivery on HVM.
x86: early PV on HVM features initialization.
xen: Add support for HVM hypercalls.
Conflicts:
arch/x86/xen/enlighten.c
arch/x86/xen/time.c
Power limit notification feature is published in Intel 64 and IA-32
Architectures SDMV Vol 3A 14.5.6 Power Limit Notification.
It is implemented first on Intel Sandy Bridge platform.
The patch handles notification interrupt. Interrupt handler dumps power limit
information in log_buf, logs the event in mce log, and increases the event
counters (core_power_limit and package_power_limit). Upper level applications
could use the data to detect system health or diagnose functionality/performance
issues.
In the future, the event could be handled in a more fancy way.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1280448826-12004-5-git-send-email-fenghua.yu@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Add package level thermal throttle interrupt support. The interrupt handler
increases package level thermal throttle count. It also logs the event in MCE
log.
The package level thermal throttle interrupt happens across threads in a
package. Each thread handles the interrupt individually. User level application
is supposed to retrieve correct event count and log based on package/thread
topology. This is the same situation for core level interrupt handler. In the
future, interrupt may be reported only per package or per core.
core_throttle_count and package_throttle_count are used for user interface.
Previously only throttle_count is used for core throttle count. If you think
new core_throttle_count name breaks user interface, I can change this part.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1280448826-12004-4-git-send-email-fenghua.yu@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The only machines this is triggering on should be supported by
acpi-cpufreq or acpi's internal throttling.
Signed-off-by: Dave Jones <davej@redhat.com>
Use __cpuinit instead of __init for the cpufreq_driver
init function like it is done in powernow-k8.c.
This is removing the warning generated when compiling with
the CONFIG_DEBUG_SECTION_MISMATCH=y option.
Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Use __cpuinit instead of __init for the cpufreq_driver
init function like it is done in powernow-k8.c. Use the
__cpuinitdata for data used by the routines marked as __cpuinit.
This is removing the warning generated when compiling with
the CONFIG_DEBUG_SECTION_MISMATCH=y option.
Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Use __cpuinit instead of __init for the cpufreq_driver
init function like it is done in powernow-k8.c.
This is removing the warning generated when compiling with
the CONFIG_DEBUG_SECTION_MISMATCH=y option.
Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
Signed-off-by: Dave Jones <davej@redhat.com>
rdmsr() takes the lower 32 bits as a second argument and the high 32 as
a third. Fix the names accordingly since they were swapped.
There should be no functionality change resulting from this patch.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
This patch converts pci_table entries, where .subvendor=PCI_ANY_ID and
.subdevice=PCI_ANY_ID, .class=0 and .class_mask=0, to use the
PCI_VDEVICE macro, and thus improves readability.
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Dave Jones <davej@redhat.com>
and fix the broken case if a core's frequency depends on others.
trace_power_frequency was only implemented in a rather ungeneric way
in acpi-cpufreq driver's target() function only.
-> Move the call to trace_power_frequency to
cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE
notifier is triggered.
This will support power frequency tracing by all cpufreq drivers
trace_power_frequency did not trace frequency changes correctly when
the userspace governor was used or when CPU cores' frequency depend
on each other.
-> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu
which gets switched automatically fixes this.
Robert Schoene provided some important fixes on top of my initial
quick shot version which are integrated in this patch:
- Forgot some changes in power_end trace (TP_printk/variable names)
- Variable dummy in power_end must now be cpu_id
- Use static 64 bit variable instead of unsigned int for cpu_id
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: davej@redhat.com
CC: arjan@infradead.org
CC: linux-kernel@vger.kernel.org
CC: robert.schoene@tu-dresden.de
Tested-by: robert.schoene@tu-dresden.de
Signed-off-by: Dave Jones <davej@redhat.com>
On Wed, 2010-01-20 at 16:56 +0100, Thomas Renninger wrote:
> But most often this happens if people upgrade their CPU and do not
> update their BIOS.
> Or the vendor does not recognise the new CPU even if the BIOS got
> updated.
Maybe some of those people just didn't realize it was disabled in BIOS?
If you tell users that it's a firmware bug then they'll probably just
give up.
> The itself message might be an enhancment, IMO it's not worth a patch.
Why do you think so? I spent an hour on hunting down the BIOS upgrade,
only to find that it didn't improve anything. It was a day later that I
realized that it might be a BIOS option; and the option was literally
the _last_ option in the whole BIOS setup. :)
This message would have saved the day.
> But do not revert the FW_BUG part!
Sure, you have a point here.
How about this patch?
The Pstate transition latency check was added for broken F10h BIOSen
which wrongly contain a value of 0 for transition and bus master
latency. Fam11h and later, however, (will) have similar transition
latency so extend that behavior for them too.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
The PCC cpufreq driver unmaps the mailbox address range if any CPUs fail to
initialise, but doesn't do anything to remove the registered CPUs from the
cpufreq core resulting in failures further down the line. We're better off
simply returning a failure - the cpufreq core will unregister us cleanly if
we end up with no successfully registered CPUs. Tidy up the failure path
and also add a sanity check to ensure that the firmware gives us a realistic
frequency - the core deals badly with that being set to 0.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Signed-off-by: Dave Jones <davej@redhat.com>
The pcc specification documents an _OSC method that's incompatible with the
one defined as part of the ACPI spec. This shouldn't be a problem as both
are supposed to be guarded with a UUID. Unfortunately approximately nobody
(including HP, who wrote this spec) properly check the UUID on entry to the
_OSC call. Right now this could result in surprising behaviour if the pcc
driver performs an _OSC call on a machine that doesn't implement the pcc
specification. Check whether the PCCH method exists first in order to reduce
this probability.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Signed-off-by: Dave Jones <davej@redhat.com>
When running on VMware's platform, we have seen situations where
the AP's try to calibrate the lpj values and fail to get good calibration
runs becasue of timing issues. As a result delays don't work correctly
on all cpus.
The solutions is to set preset_lpj value based on the current tsc frequency
value. This is similar to what KVM does as well.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
LKML-Reference: <1280790637.14933.29.camel@ank32.eng.vmware.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
It is paramount that we call pci_xen_swiotlb_detect before
pci_swiotlb_detect as both implementations use the 'swiotlb'
and 'swiotlb_force' flags. The pci-xen_swiotlb_detect inhibits
the swiotlb_force and swiotlb flag so that the native SWIOTLB
implementation is not enabled when running under Xen.
[since v1 changed two Cc's to Acked-by]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
[http://lkml.org/lkml/2010/7/27/374]
Cc: Albert Herranz <albert_herranz@yahoo.es>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
[conditional http://lkml.org/lkml/2010/8/2/324]
Cc: x86@kernel.org
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Accomodate the original C1E-aware idle routine to the different times
during boot when the BIOS enables C1E. While at it, remove the synthetic
CPUID flag in favor of a single global setting which denotes C1E status
on the system.
[ hpa: changed c1e_enabled to be a bool; clarified cpu bit 3:21 comment ]
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
LKML-Reference: <20100727165335.GA11630@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Use a direct per-cpu reference for the GDT instead of using a scratch
register.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1280594903-6341-2-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Use a direct per-cpu reference for the IST instead of using a scratch
register.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1280594903-6341-1-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Fix uninitialized uvhub_mask:
- An unitialized bit map variable was causing initialization of
non-existant hubs (this one causes boot panics).
- And the bit map was too small for large machines. This patch
makes it dynamic in size.
- Fix the case where socket 0 has no enabled cpu's. Don't assume
every hub has a socket 0.
- uv_init_per_cpu() should be __init.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org> # for .35.x
LKML-Reference: <E1Oeuyt-0004XS-0y@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The arguments passed to OFW shouldn't be modified; update the 'args'
argument of olpc_ofw to reflect this. This saves us some later
casting away of consts.
Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <20100628220029.1555ac24@debian>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Unconditionally printing EC debug messages was helpful when we were actually
debugging the EC, but during normal operation it can get pretty annoying.
Using pr_debug allows us finer-grained control.
Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <20100616231928.16b539f0@dev.queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Add package level thermal and power limit feature support.
The two MSRs and features are new starting with Intel's Sandy Bridge processor.
Please check Intel 64 and IA-32 Architectures SDMV Vol 3A 14.5.6 Power Limit
Notification and 14.6 Package Level Thermal Management.
This patch also fixes a bug which defines reverse THERM_INT_LOW_ENABLE bit and
THERM_INT_HIGH_ENABLE bit.
[ hpa: fixed up against current tip:x86/cpu ]
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1280448826-12004-2-git-send-email-fenghua.yu@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Use the stop machine context rather than IPI's to rendezvous all the cpus for
MTRR initialization that happens during cpu bringup or for MTRR modifications
during runtime.
This avoids deadlock scenario (reported by Prarit) like:
cpu A holds a read_lock (tasklist_lock for example) with irqs enabled
cpu B waits for the same lock with irqs disabled using write_lock_irq
cpu C doing set_mtrr() (during AP bringup for example), which will try to
rendezvous all the cpus using IPI's
This will result in C and A come to the rendezvous point and waiting
for B. B is stuck forever waiting for the lock and thus not
reaching the rendezvous point.
Using stop cpu (run in the process context of per cpu based keventd) to do
this rendezvous, avoids this deadlock scenario.
Also make sure all the cpu's are in the rendezvous handler before we proceed
with the local_irq_save() on each cpu. This lock step disabling irqs on all
the cpus will avoid other deadlock scenarios (for example involving
with the blocking smp_call_function's etc).
[ This problem is very old. Marking -stable only for 2.6.35 as the
stop_one_cpu_nowait() API is present only in 2.6.35. Any older
kernel interested in this fix need to do some more work in backporting
this patch. ]
Reported-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1280515602.2682.10.camel@sbsiddha-MOBL3.sc.intel.com>
Acked-by: Prarit Bhargava <prarit@redhat.com>
Cc: stable@kernel.org [2.6.35]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
commit 2ca1af9aa3285c6a5f103ed31ad09f7399fc65d7 "PCI: MSI: Remove
unsafe and unnecessary hardware access" changed read_msi_msg_desc() to
return the last MSI message written instead of reading it from the
device, since it may be called while the device is in a reduced
power state.
However, the pSeries platform code really does need to read messages
from the device, since they are initially written by firmware.
Therefore:
- Restore the previous behaviour of read_msi_msg_desc()
- Add new functions get_cached_msi_msg{,_desc}() which return the
last MSI message written
- Use the new functions where appropriate
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch introduce a CONFIG_XEN_PVHVM compile time option to
enable/disable Xen PV on HVM support.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
HW breakpoints events stopped working correctly with kgdb
as a result of commit: 018cbffe68
(Merge commit 'v2.6.33' into perf/core).
The regression occurred because the behavior changed for setting
NOTIFY_STOP as the return value to the die notifier if the breakpoint
was known to the HW breakpoint API. Because kgdb is using the HW
breakpoint API to register HW breakpoints slots, it must also now
implement the overflow_handler call back else kgdb does not get to see
the events from the die notifier.
The kgdb_ll_trap function will be changed to be general purpose code
which can allow an easy way to implement the hw_breakpoint API
overflow call back.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Dongdong Deng <dongdong.deng@windriver.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Move cmpxchg emulation code from arch/x86/kernel/cpu (which is
otherwise CPU identification) to arch/x86/lib, where other emulation
code lives already.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <AANLkTikAmaDPji-TVDarmG1yD=fwbffcsmEU=YEuP+8r@mail.gmail.com>
Exprot the AMD errata definitions, since they are needed by kvm_amd.ko
if that is built as a module. Doing "make allmodconfig" during
testing would have caught this.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
LKML-Reference: <1280336972-865982-1-git-send-email-hans.rosenfeld@amd.com>
Use the AMD errata checking framework instead of open-coding the test.
Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
LKML-Reference: <1280336972-865982-3-git-send-email-hans.rosenfeld@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Remove check_c1e_idle() and use the new AMD errata checking framework
instead.
Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
LKML-Reference: <1280336972-865982-2-git-send-email-hans.rosenfeld@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Errata are defined using the AMD_LEGACY_ERRATUM() or AMD_OSVW_ERRATUM()
macros. The latter is intended for newer errata that have an OSVW id
assigned, which it takes as first argument. Both take a variable number
of family-specific model-stepping ranges created by AMD_MODEL_RANGE().
Iff an erratum has an OSVW id, OSVW is available on the CPU, and the
OSVW id is known to the hardware, it is used to determine whether an
erratum is present. Otherwise, the model-stepping ranges are matched
against the current CPU to find out whether the erratum applies.
For certain special errata, the code using this framework might have to
conduct further checks to make sure an erratum is really (not) present.
Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
LKML-Reference: <1280336972-865982-1-git-send-email-hans.rosenfeld@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This patch simply declares the new sys_fanotify_mark syscall
int fanotify_mark(int fanotify_fd, unsigned int flags, u64_mask,
int dfd const char *pathname)
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch defines a new syscall fanotify_init() of the form:
int sys_fanotify_init(unsigned int flags, unsigned int event_f_flags,
unsigned int priority)
This syscall is used to create and fanotify group. This is very similar to
the inotify_init() syscall.
Signed-off-by: Eric Paris <eparis@redhat.com>
This patch exports the capability of the AMD IOMMU to force
cache coherency of DMA transactions through the IOMMU-API.
This is required to disable some nasty hacks in KVM when
this capability is not available.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
This converts the most common of the x86 clocksources over to use
clocksource_register_hz/khz.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
LKML-Reference: <1279068988-21864-11-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
update_vsyscall() did not provide the wall_to_monotoinc offset,
so arch specific implementations tend to reference wall_to_monotonic
directly. This limits future cleanups in the timekeeping core, so
this patch fixes the update_vsyscall interface to provide
wall_to_monotonic, allowing wall_to_monotonic to be made static
as planned in Documentation/feature-removal-schedule.txt
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tony Luck <tony.luck@intel.com>
LKML-Reference: <1279068988-21864-7-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Due to vtime calling vgettimeofday(), its possible that an application
could call time();create("stuff",O_RDRW); only to see the file's
creation timestamp to be before the value returned by time.
A similar way to reproduce the issue is to compare the vsyscall time()
with the syscall time(), and observe ordering issues.
The modified test case from Oleg Nesterov below can illustrate this:
int main(void)
{
time_t sec1,sec2;
do {
sec1 = time(&sec2);
sec2 = syscall(__NR_time, NULL);
} while (sec1 <= sec2);
printf("vtime: %d.000000\n", sec1);
printf("time: %d.000000\n", sec2);
return 0;
}
The proper fix is to make vtime use the same time value as
current_kernel_time() (which is exported via update_vsyscall) instead of
vgettime().
Thanks to Jiri Olsa for bringing up the issue and catching bugs in
earlier verisons of this fix.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
LKML-Reference: <1279068988-21864-2-git-send-email-johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Do not try to disable hpet if it hasn't been initialized before
x86, i8259: Only register sysdev if we have a real 8259 PIC
The Pstate transition latency check was added for broken F10h BIOSen
which wrongly contain a value of 0 for transition and bus master
latency. Fam11h and later, however, (will) have similar transition
latency so extend that behavior for them too.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
The PCC cpufreq driver unmaps the mailbox address range if any CPUs fail to
initialise, but doesn't do anything to remove the registered CPUs from the
cpufreq core resulting in failures further down the line. We're better off
simply returning a failure - the cpufreq core will unregister us cleanly if
we end up with no successfully registered CPUs. Tidy up the failure path
and also add a sanity check to ensure that the firmware gives us a realistic
frequency - the core deals badly with that being set to 0.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Signed-off-by: Dave Jones <davej@redhat.com>
The pcc specification documents an _OSC method that's incompatible with the
one defined as part of the ACPI spec. This shouldn't be a problem as both
are supposed to be guarded with a UUID. Unfortunately approximately nobody
(including HP, who wrote this spec) properly check the UUID on entry to the
_OSC call. Right now this could result in surprising behaviour if the pcc
driver performs an _OSC call on a machine that doesn't implement the pcc
specification. Check whether the PCCH method exists first in order to reduce
this probability.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Commit 2a6b69765a
(ACPI: Store NVS state even when entering suspend to RAM) caused the
ACPI suspend code save the NVS area during suspend and restore it
during resume unconditionally, although it is known that some systems
need to use acpi_sleep=s4_nonvs for hibernation to work. To allow
the affected systems to avoid saving and restoring the NVS area
during suspend to RAM and resume, introduce kernel command line
option acpi_sleep=nonvs and make acpi_sleep=s4_nonvs work as its
alias temporarily (add acpi_sleep=s4_nonvs to the feature removal
file).
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16396 .
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-and-tested-by: tomas m <tmezzadra@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
hpet_disable is called unconditionally on machine reboot if hpet support
is compiled in the kernel.
hpet_disable only checks if the machine is hpet capable but doesn't make
sure that hpet has been initialized.
[ tglx: Made it a one liner and removed the redundant hpet_address check ]
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <alpine.DEB.2.00.1007211726240.22235@kaball-desktop>
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Set the callback to receive evtchns from Xen, using the
callback vector delivery mechanism.
The traditional way for receiving event channel notifications from Xen
is via the interrupts from the platform PCI device.
The callback vector is a newer alternative that allow us to receive
notifications on any vcpu and doesn't need any PCI support: we allocate
a vector exclusively to receive events, in the vector handler we don't
need to interact with the vlapic, therefore we avoid a VMEXIT.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Initialize basic pv on hvm features adding a new Xen HVM specific
hypervisor_x86 structure.
Don't try to initialize xen-kbdfront and xen-fbfront when running on HVM
because the backends are not available.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
It turns out that there is a bit in the _CST for Intel FFH C3
that tells the OS if we should be checking BM_STS or not.
Linux has been unconditionally checking BM_STS.
If the chip-set is configured to enable BM_STS,
it can retard or completely prevent entry into
deep C-states -- as illustrated by turbostat:
http://userweb.kernel.org/~lenb/acpi/utils/pmtools/turbostat/
ref: Intel Processor Vendor-Specific ACPI Interface Specification
table 4 "_CST FFH GAS Field Encoding"
Bit 1: Set to 1 if OSPM should use Bus Master avoidance for this C-state
https://bugzilla.kernel.org/show_bug.cgi?id=15886
Signed-off-by: Len Brown <len.brown@intel.com>
and fix the broken case if a core's frequency depends on others.
trace_power_frequency was only implemented in a rather ungeneric
way in acpi-cpufreq driver's target() function only.
-> Move the call to trace_power_frequency to
cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE
notifier is triggered.
This will support power frequency tracing by all cpufreq
drivers.
trace_power_frequency did not trace frequency changes correctly
when the userspace governor was used or when CPU cores'
frequency depend on each other.
-> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu
which gets switched automatically fixes this.
Robert Schoene provided some important fixes on top of my
initial quick shot version which are integrated in this patch:
- Forgot some changes in power_end trace (TP_printk/variable names)
- Variable dummy in power_end must now be cpu_id
- Use static 64 bit variable instead of unsigned int for cpu_id
[akpm@linux-foundation.org: build fix]
Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: davej@codemonkey.org.uk
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Dave Jones <davej@codemonkey.org.uk>
Acked-by: Arjan van de Ven <arjan@infradead.org>
Cc: Robert Schoene <robert.schoene@tu-dresden.de>
Tested-by: Robert Schoene <robert.schoene@tu-dresden.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Load initial_gs as two 32-bit values instead of splitting a 64-bit value.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1279371808-24804-3-git-send-email-brgerst@gmail.com>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Use symbolic MSR names instead of hardcoding the MSR index.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1279371808-24804-2-git-send-email-brgerst@gmail.com>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
In the CONFIG_AUDITSYSCALL fast-path for x86 64-bit system calls,
we can pass a bad return value and/or error indication for the
system call to audit_syscall_exit(). This happens when
TIF_NEED_RESCHED was set as the system call returned, so we went
out to schedule() and came back to the exit-audit fast-path. The
fix is to reload the user return value register from the pt_regs
before using it for audit_syscall_exit().
Both the 32-bit kernel's fast path and the 64-bit kernel's 32-bit
system call fast paths work slightly differently, so that they
always leave the fast path entirely to reschedule and don't return
there, so they don't have the analogous bugs.
Reported-by: Alexander Viro <aviro@redhat.com>
Signed-off-by: Roland McGrath <roland@redhat.com>
xstate_enable_boot_cpu() is, as the name implies, only used on the
boot CPU; furthermore, it invokes alloc_bootmem(), which is __init;
hence it needs to be tagged __init rather than __cpuinit.
Furthermore, it is *not* safe in the long run to rely on CPU 0 only
coming online during the early boot -- at some point we're going to
support offlining (and re-onlining) the boot CPU, and at that point we
must not call xstate_enable_boot_cpu() again.
The code is a fair bit more obscure than one would like, because the
__ref overrides aren't quite powerful enough.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4C476236.1020302@zytor.com>
This is called only from initialization code.
Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279731838-1522-6-git-send-email-robert.richter@amd.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The pointer is only used in xsave.c. Making it static.
Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279731838-1522-5-git-send-email-robert.richter@amd.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The patch introduces the XSTATE_CPUID macro and adds a check that
tests if XSTATE_CPUID exists.
Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279731838-1522-4-git-send-email-robert.richter@amd.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The patch renames xsave_cntxt_init() and __xsave_init() into
xstate_enable_boot_cpu() and xstate_enable() as this names are more
meaningful.
It also removes the duplicate xcr setup for the boot cpu.
Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279731838-1522-3-git-send-email-robert.richter@amd.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
As xsave also supports other than fpu features, it should be
initialized independently of the fpu. This patch moves this out of fpu
initialization.
There is also a lot of cross referencing between fpu and xsave
code. This patch reduces this by making xsave_cntxt_init() and
init_thread_xstate() static functions.
The patch moves the cpu_has_xsave check at the beginning of
xsave_init(). All other checks may removed then.
Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279731838-1522-2-git-send-email-robert.richter@amd.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
UV NMI callback's should not write stack dumps when a kdump is to be written.
When invoking the crash kernel to write a dump, kdump_nmi_shootdown_cpus()
uses NMI's to get all the cpu's to save their register context and halt.
But the NMI interrupt handler runs a callback list. This patch sets a flag
to prevent any of those callbacks from interfering with the halt of the cpu.
For UV, which currently has the only callback to which this is relevant, the
uv_handle_nmi() callback should not do dumping of stacks.
The 'in_crash_kexec' flag is defined as an extern in kdebug.h firstly
because x2apic_uv_x.c includes it. Secondly because some future callback
might need the flag to know that it should not enter the debugger.
(Such a scenario was in fact present in the 2.6.32 kernel, SuSE distribution,
where a call to kdb needed to be avoided.)
Signed-off-by: Cliff Wickman <cpw@sgi.com>
LKML-Reference: <E1ObLvt-0005UZ-Va@eag09.americas.sgi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Pointed out by Lucas who found the new one in a comment in
setup_percpu.c. And then I fixed the others that I grepped
for.
Reported-by: Lucas <canolucas@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit e534c7c5f8 ("numa: x86_64: use generic percpu var
numa_node_id() implementation") broke numa systems that don't have ram
on node0 when MEMORY_HOTPLUG is enabled, because cpu_up() will call
cpu_to_node() before per_cpu(numa_node) is setup for APs.
When Node0 doesn't have RAM, on x86, cpus already round it to nearest
node with RAM in x86_cpu_to_node_map. and per_cpu(numa_node) is not set
up until in c_init for APs.
When later cpu_up() calling cpu_to_node() will get 0 again, and make it
online even there is no RAM on node0. so later all APs can not booted up,
and later will have panic.
[ 1.611101] On node 0 totalpages: 0
.........
[ 2.608558] On node 0 totalpages: 0
[ 2.612065] Brought up 1 CPUs
[ 2.615199] Total of 1 processors activated (3990.31 BogoMIPS).
...
93.225341] calling loop_init+0x0/0x1a4 @ 1
[ 93.229314] PERCPU: allocation failed, size=80 align=8, failed to populate
[ 93.246539] Pid: 1, comm: swapper Tainted: G W 2.6.35-rc4-tip-yh-04371-gd64e6c4-dirty #354
[ 93.264621] Call Trace:
[ 93.266533] [<ffffffff81125e43>] pcpu_alloc+0x83a/0x8e7
[ 93.270710] [<ffffffff81125f15>] __alloc_percpu+0x10/0x12
[ 93.285849] [<ffffffff8140786c>] alloc_disk_node+0x94/0x16d
[ 93.291811] [<ffffffff81407956>] alloc_disk+0x11/0x13
[ 93.306157] [<ffffffff81503e51>] loop_alloc+0xa7/0x180
[ 93.310538] [<ffffffff8277ef48>] loop_init+0x9b/0x1a4
[ 93.324909] [<ffffffff8277eead>] ? loop_init+0x0/0x1a4
[ 93.329650] [<ffffffff810001f2>] do_one_initcall+0x57/0x136
[ 93.345197] [<ffffffff827486d0>] kernel_init+0x184/0x20e
[ 93.348146] [<ffffffff81034954>] kernel_thread_helper+0x4/0x10
[ 93.365194] [<ffffffff81c7cc3c>] ? restore_args+0x0/0x30
[ 93.369305] [<ffffffff8274854c>] ? kernel_init+0x0/0x20e
[ 93.386011] [<ffffffff81034950>] ? kernel_thread_helper+0x0/0x10
[ 93.392047] loop: out of memory
...
Try to assign per_cpu(numa_node) early
[akpm@linux-foundation.org: tidy up code comment]
Signed-off-by: Yinghai <yinghai@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch moves boot cpu initialization to xsave_init(). Now all cpus
are initialized in one single function.
Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279651857-24639-5-git-send-email-robert.richter@amd.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Boot cpu id is always 0, thus simplifying and unifying boot cpu check.
boot_cpu_id is there for historical reasons and was renamed to
boot_cpu_physical_apicid in patch:
c70dcb7 x86: change boot_cpu_id to boot_cpu_physical_apicid
However, there are some remaining occurrences of boot_cpu_id that are
never touched in the kernel and thus its value is always 0.
Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279651857-24639-3-git-send-email-robert.richter@amd.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Just some dead code, no real bugs.
Found by gcc 4.6 -Wall
Signed-off-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <201007202219.o6KMJnQ0021072@imap1.linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
My platform makes use of the null_legacy_pic choice and oopses when doing
a shutdown as the shutdown code goes through all the registered sysdevs
and calls their shutdown method which in my case poke on a non-existing
i8259. Imho the i8259 specific sysdev should only be registered if the
i8259 is actually there.
Do not register the sysdev function when the null_legacy_pic is used so
that the i8259 resume, suspend and shutdown functions are not called.
Signed-off-by: Adam Lackorzynski <adam@os.inf.tu-dresden.de>
LKML-Reference: <201007202218.o6KMIJ3m020955@imap1.linux-foundation.org>
Cc: Jacob Pan <jacob.jun.pan@intel.com>
Cc: <stable@kernel.org> 2.6.34
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
addon_cpuid_features.c contains exactly two almost completely
unrelated functions, plus has a long and very generic name. Split it
into two files, scattered.c for the scattered feature flags, and
topology.c for the topology information.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <tip-*@git.kernel.org>
xsaveopt is a more optimized form of xsave specifically designed
for the context switch usage. xsaveopt doesn't save the state that's not
modified from the prior xrstor. And if a specific feature state gets
modified to the init state, then xsaveopt just updates the header bit
in the xsave memory layout without updating the corresponding memory
layout.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100719230205.604014179@sbs-t61.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>