Powerclamp works by aligning idle time to achieve package level
idle states, aka cstates. As long as one of the package cstates
is available, synchronized idle injection is meaningful.
This patch replaces the CPU whitelist with CPU feature and
package cstate counter check such that we don't have to modify
this whitelist for every new CPU.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Pull thermal updates from Zhang Rui:
- use int instead of unsigned long to represent temperature to avoid
bogus overheat detection when negative temperature reported. From
Sascha Hauer.
- export available thermal governors information to user space via
sysfs. From Wei Ni.
- introduce new thermal driver for Wildcat Point platform controller
hub, which uses PCH thermal sensor and associated critical and hot
trip points. From Tushar Dave.
- add suuport for Intel Skylake and Denlow platforms in powerclamp
driver.
- some small cleanups in thermal core.
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal: Add Intel PCH thermal driver
thermal: Add comment explaining test for critical temperature
thermal: Use IS_ENABLED instead of #ifdef
thermal: remove unnecessary call to thermal_zone_device_set_polling
thermal: trivial: fix typo in comment
thermal: consistently use int for temperatures
thermal: add available policies sysfs attribute
thermal/powerclamp: add cpu id for denlow platform
thermal/powerclamp: add cpu id for Skylake u/y
thermal/powerclamp: add cpu id for skylake h/s
Add support for Intel Denlow UP server platform.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Now that there is no paravirt TSC, the "native" is
inappropriate. The function does RDTSC, so give it the obvious
name: rdtsc().
Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/fd43e16281991f096c1e4d21574d9e1402c62d39.1434501121.git.luto@kernel.org
[ Ported it to v4.2-rc1. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that the ->read_tsc() paravirt hook is gone, rdtscll() is
just a wrapper around native_read_tsc(). Unwrap it.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm ML <kvm@vger.kernel.org>
Link: http://lkml.kernel.org/r/d2449ae62c1b1fb90195bcfb19ef4a35883a04dc.1434501121.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Main excitement here is Peter Zijlstra's lockless rbtree optimization to
speed module address lookup. He found some abusers of the module lock
doing that too.
A little bit of parameter work here too; including Dan Streetman's breaking
up the big param mutex so writing a parameter can load another module (yeah,
really). Unfortunately that broke the usual suspects, !CONFIG_MODULES and
!CONFIG_SYSFS, so those fixes were appended too.
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVkgKHAAoJENkgDmzRrbjxQpwQAJVmBN6jF3SnwbQXv9vRixjH
58V33sb1G1RW+kXxQ3/e8jLX/4VaN479CufruXQp+IJWXsN/CH0lbC3k8m7u50d7
b1Zeqd/Yrh79rkc11b0X1698uGCSMlzz+V54Z0QOTEEX+nSu2ZZvccFS4UaHkn3z
rqDo00lb7rxQz8U25qro2OZrG6D3ub2q20TkWUB8EO4AOHkPn8KWP2r429Axrr0K
wlDWDTTt8/IsvPbuPf3T15RAhq1avkMXWn9nDXDjyWbpLfTn8NFnWmtesgY7Jl4t
GjbXC5WYekX3w2ZDB9KaT/DAMQ1a7RbMXNSz4RX4VbzDl+yYeSLmIh2G9fZb1PbB
PsIxrOgy4BquOWsJPm+zeFPSC3q9Cfu219L4AmxSjiZxC3dlosg5rIB892Mjoyv4
qxmg6oiqtc4Jxv+Gl9lRFVOqyHZrTC5IJ+xgfv1EyP6kKMUKLlDZtxZAuQxpUyxR
HZLq220RYnYSvkWauikq4M8fqFM8bdt6hLJnv7bVqllseROk9stCvjSiE3A9szH5
OgtOfYV5GhOeb8pCZqJKlGDw+RoJ21jtNCgOr6DgkNKV9CX/kL/Puwv8gnA0B0eh
dxCeB7f/gcLl7Cg3Z3gVVcGlgak6JWrLf5ITAJhBZ8Lv+AtL2DKmwEWS/iIMRmek
tLdh/a9GiCitqS0bT7GE
=tWPQ
-----END PGP SIGNATURE-----
Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module updates from Rusty Russell:
"Main excitement here is Peter Zijlstra's lockless rbtree optimization
to speed module address lookup. He found some abusers of the module
lock doing that too.
A little bit of parameter work here too; including Dan Streetman's
breaking up the big param mutex so writing a parameter can load
another module (yeah, really). Unfortunately that broke the usual
suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
appended too"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
modules: only use mod->param_lock if CONFIG_MODULES
param: fix module param locks when !CONFIG_SYSFS.
rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
module: add per-module param_lock
module: make perm const
params: suppress unused variable error, warn once just in case code changes.
modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
kernel/module.c: avoid ifdefs for sig_enforce declaration
kernel/workqueue.c: remove ifdefs over wq_power_efficient
kernel/params.c: export param_ops_bool_enable_only
kernel/params.c: generalize bool_enable_only
kernel/module.c: use generic module param operaters for sig_enforce
kernel/params: constify struct kernel_param_ops uses
sysfs: tightened sysfs permission checks
module: Rework module_addr_{min,max}
module: Use __module_address() for module_address_lookup()
module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
module: Optimize __module_address() using a latched RB-tree
rbtree: Implement generic latch_tree
seqlock: Introduce raw_read_seqcount_latch()
...
Most code already uses consts for the struct kernel_param_ops,
sweep the kernel for the last offending stragglers. Other than
include/linux/moduleparam.h and kernel/params.c all other changes
were generated with the following Coccinelle SmPL patch. Merge
conflicts between trees can be handled with Coccinelle.
In the future git could get Coccinelle merge support to deal with
patch --> fail --> grammar --> Coccinelle --> new patch conflicts
automatically for us on patches where the grammar is available and
the patch is of high confidence. Consider this a feature request.
Test compiled on x86_64 against:
* allnoconfig
* allmodconfig
* allyesconfig
@ const_found @
identifier ops;
@@
const struct kernel_param_ops ops = {
};
@ const_not_found depends on !const_found @
identifier ops;
@@
-struct kernel_param_ops ops = {
+const struct kernel_param_ops ops = {
};
Generated-by: Coccinelle SmPL
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Junio C Hamano <gitster@pobox.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: cocci@systeme.lip6.fr
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Package C8 to C10 was introduced in newer Intel CPUs, we need to
include them in the package c-state residency calculation.
Otherwise, idle injection target is not accurately maintained by
the closed control loop.
Also cleaned up the code to make it scale better with large number
of c-states.
Reported-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Broadwell server has support for package C-states, idle injection works
as expected on this platform.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Mark the module init / exit functions with __init / __exit accodingly.
This allows making the intel_powerclamp_ids[] array __initconst, too, as
it only gets referenced from powerclamp_probe(). This is safe as
file2alias doesn't care about the section, but the symbol name for the
MODULE_DEVICE_TABLE alias.
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
This patch enables intel_powerclamp driver to run on the
next-generation Intel(R) Xeon Phi Microarchitecture
code named "Knights Landing"
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Enable Intel Powerclamp driver on Atom* Processor C2000 Product
Family for Microservers (Avoton). Avoton - SoCs for micro-servers
has package C-states which can be used for idle injection.
Reported-by: Jose Navarro <jose.navarro@intel.com>
Suggested-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Tested-by: Jose Carlos Venegas Munoz <jos.c.venegas.munoz@intel.com>
Signed-off-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Enable Intel Powerclamp driver on Xeon cpu id 0x56, package C-state
is available on this CPU for idle injection.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Pull NOHZ update from Thomas Gleixner:
"Remove the call into the nohz idle code from the fake 'idle' thread in
the powerclamp driver along with the export of those functions which
was smuggeled in via the thermal tree. People have tried to hack
around it in the nohz core code, but it just violates all rightful
assumptions of that code about the only valid calling context (i.e.
the proper idle task).
The powerclamp trainwreck will still work, it just wont get the
benefit of long idle sleeps"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/powerclamp: Remove tick_nohz_idle abuse
commit 4dbd27711c "tick: export nohz tick idle symbols for module
use" was merged via the thermal tree without an explicit ack from the
relevant maintainers.
The exports are abused by the intel powerclamp driver which implements
a fake idle state from a sched FIFO task. This causes all kinds of
wreckage in the NOHZ core code which rightfully assumes that
tick_nohz_idle_enter/exit() are only called from the idle task itself.
Recent changes in the NOHZ core lead to a failure of the powerclamp
driver and now people try to hack completely broken and backwards
workarounds into the NOHZ core code. This is completely unacceptable
and just papers over the real problem. There are way more subtle
issues lurking around the corner.
The real solution is to fix the powerclamp driver by rewriting it with
a sane concept, but that's beyond the scope of this.
So the only solution for now is to remove the calls into the core NOHZ
code from the powerclamp trainwreck along with the exports.
Fixes: d6d71ee4a1 "PM: Introduce Intel PowerClamp Driver"
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Pan Jacob jun <jacob.jun.pan@intel.com>
Cc: LKP <lkp@01.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412181110110.17382@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Braswell also has package C-states which can be used for idle
injection. This patch adds Braswell CPU ID in intel powerclamp
driver.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Pull thermal management updates from Zhang Rui:
"This time, the biggest change is the work of representing hardware
thermal properties in device tree infrastructure.
This work includes the introduction of a device tree bindings for
describing the hardware thermal behavior and limits, and also a parser
to read and interpret the data, and build thermal zones and thermal
binding parameters. It also contains three examples on how to use the
new representation on sensor devices, using three different drivers to
accomplish it. One driver is in thermal subsystem, the TI SoC
thermal, and the other two drivers are in hwmon subsystem.
Actually, this would be the first step of the complete work because we
still need to check other potential drivers to be converted and then
validate the proposed API. But the reason why I include it in this
pull request is that, first, this change does not hurt any others
without using this approach, second, the principle and concept of this
change would not break after converting the remaining drivers. BTW,
as you can see, there are several points in this change that do not
belong to thermal subsystem. Because it has been suggested by Guenter
R that in such cases, it is recommended to send the complete series
via one single subsystem.
Specifics:
- representing hardware thermal properties in device tree
infrastructure
- fix a regression that the imx thermal driver breaks system suspend.
- introduce ACPI INT3403 thermal driver to retrieve temperature data
from the INT3403 ACPI device object present on some systems.
- introduce debug statement for thermal core and step_wise governor.
- assorted fixes and cleanups for thermal core, cpu cooling, exynos
thrmal, intel powerclamp and imx thermal driver"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (34 commits)
thermal: remove const flag from .ops of imx thermal
Thermal: update thermal zone device after setting emul_temp
intel_powerclamp: Fix cstate counter detection.
thermal: imx: add necessary clk operation
Thermal cpu cooling: return error if no valid cpu frequency entry
thermal: fix cpu_cooling max_level behavior
thermal: rcar-thermal: Enable driver compilation with COMPILE_TEST
thermal: debug: add debug statement for core and step_wise
thermal: imx_thermal: add module device table
drivers: thermal: Mark function as static in x86_pkg_temp_thermal.c
thermal:samsung: fix compilation warning
thermal: imx: correct suspend/resume flow
thermal: exynos: fix error return code
Thermal: ACPI INT3403 thermal driver
MAINTAINERS: add thermal bindings entry in thermal domain
arm: dts: make OMAP4460 bandgap node to belong to OCP
arm: dts: make OMAP443x bandgap node to belong to OCP
arm: dts: add cooling properties on omap5 cpu node
arm: dts: add omap5 thermal data
arm: dts: add omap5 CORE thermal data
...
The only valid use of preempt_enable_no_resched() is if the very next
line is schedule() or if we know preemption cannot actually be enabled
by that statement due to known more preempt_count 'refs'.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: rjw@rjwysocki.net
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: rui.zhang@intel.com
Cc: jacob.jun.pan@linux.intel.com
Cc: Mike Galbraith <bitbucket@online.de>
Cc: hpa@zytor.com
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: lenb@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-zcfvacdlvlr63qmnn5i58vuj@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Having all zero cstate count doesn't necesserily mean the cstate
counter is no functional.
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
People seem to delight in writing wrong and broken mwait idle routines;
collapse the lot.
This leaves mwait_play_dead() the sole remaining user of __mwait() and
new __mwait() users are probably doing it wrong.
Also remove __sti_mwait() as its unused.
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jacob Jun Pan <jacob.jun.pan@linux.intel.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Len Brown <lenb@kernel.org>
Cc: Rui Zhang <rui.zhang@intel.com>
Acked-by: Rafael Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131212141654.616820819@infradead.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This will enable intel_powerclamp driver on newer Intel CPUs
including some Ivy Bridge and Haswell processors.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
This patch
* adds missing kfree() for cpu_clamping_mask
* adds return value checking for alloc_percpu()
* unregister hotcpu notifier in exit path
Signed-off-by: Durgadoss R <durgadoss.r@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
The new intel_powerclamp thermal cooling device driver was merged in
commit 2af78448ff (Pull thermal management updates from Zhang Rui)
without any data conflicts. But there was a more subtle conflict I
missed: the driver uses MAX_USER_RT_PRIO, but commit 8bd75c77b7
("sched/rt: Move rt specific bits into new header file") had moved that
define from <linux/sched.h> to <linux/sched/rt.h>.
Which caused this build failure:
drivers/thermal/intel_powerclamp.c: In function ‘clamp_thread’:
drivers/thermal/intel_powerclamp.c:360:21: error: ‘MAX_USER_RT_PRIO’ undeclared (first use in this function)
drivers/thermal/intel_powerclamp.c:360:21: note: each undeclared identifier is reported only once for each function it appears in
And because I don't do a full "make allmodconfig" build after each pull,
I didn't notice until too late. So now the fix is here, separately from
the merge commit.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This value has already been clamped correctly to 0 through 49 in
powerclamp_set_cur_state() so this patch doesn't actually change
anything. But we should fix it anyway for consistency.
set_target_ratio is used as an offset into an array with
MAX_TARGET_RATIO (50) elements.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Intel PowerClamp driver performs synchronized idle injection across
all online CPUs. The goal is to maintain a given package level C-state
ratio.
Compared to other throttling methods already exist in the kernel,
such as ACPI PAD (taking CPUs offline) and clock modulation, this is often
more efficient in terms of performance per watt.
Please refer to Documentation/thermal/intel_powerclamp.txt for more details.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>