Commit Graph

126 Commits

Author SHA1 Message Date
Arvind Yadav b2fd708ffa thermal: cpu_cooling: pr_err() strings should end with newlines
pr_err() messages should end with a new-line to avoid other messages
being concatenated.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Javi Merino <javi.merino@kernel.org>
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-10-31 19:32:19 -07:00
Viresh Kumar f19b1a1735 thermal: cpu_cooling: Replace kmalloc with kmalloc_array
Checkpatch reports following:

WARNING: Prefer kmalloc_array over kmalloc with multiply
+	cpufreq_cdev->freq_table = kmalloc(sizeof(*cpufreq_cdev->freq_table) * i,

Fix that.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:33:04 -07:00
Viresh Kumar d72b401583 thermal: cpu_cooling: Rearrange struct cpufreq_cooling_device
This shrinks the size of the structure on arm64 by 8 bytes by avoiding
padding of 4 bytes at two places.

Also add missing doc comment for freq_table

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:33:02 -07:00
Viresh Kumar cb1b631864 thermal: cpu_cooling: 'freq' can't be zero in cpufreq_state2power()
The frequency table shouldn't have any zero frequency entries and so
such a check isn't required. Though it would be better to make sure
'state' is within limits.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:32:59 -07:00
Viresh Kumar 53ca1ece60 thermal: cpu_cooling: don't store cpu_dev in cpufreq_cdev
'cpu_dev' is used by only one function, get_static_power(), and it
wouldn't be time consuming to get the cpu device structure within it.
This would help removing cpu_dev from struct cpufreq_cooling_device.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:32:55 -07:00
Viresh Kumar da27f69d6a thermal: cpu_cooling: get_level() can't fail
The frequency passed to get_level() is returned by cpu_power_to_freq()
and it is guaranteed that get_level() can't fail.

Get rid of error code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:32:52 -07:00
Viresh Kumar 81ee14da10 thermal: cpu_cooling: create structure for idle time stats
We keep two arrays for idle time stats and allocate memory for them
separately. It would be much easier to follow if we create an array of
idle stats structure instead and allocate it once.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:32:48 -07:00
Viresh Kumar 349d39dc57 thermal: cpu_cooling: merge frequency and power tables
The cpu_cooling driver keeps two tables:

- freq_table: table of frequencies in descending order, built from
  policy->freq_table.

- power_table: table of frequencies and power in ascending order, built
  from OPP table.

If the OPPs are used for the CPU device then both these tables are
actually built using the OPP core and should have the same frequency
entries. And there is no need to keep separate tables for this.

Lets merge them both.

Note that the new table is in descending order of frequencies and so the
'for' loops were required to be fixed at few places to make it work.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:32:44 -07:00
Viresh Kumar ba76dd9ddd thermal: cpu_cooling: get rid of 'allowed_cpus'
'allowed_cpus' is a copy of policy->related_cpus and can be replaced by
it directly. At some places we are only concerned about online CPUs and
policy->cpus can be used there.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:32:41 -07:00
Viresh Kumar 02bacb21c6 thermal: cpu_cooling: OPPs are registered for all CPUs
The OPPs are registered for all CPUs of a cpufreq policy now and we
don't need to run the loop in build_dyn_power_table(). Just check for
the policy->cpu and we should be fine.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:32:37 -07:00
Viresh Kumar b12b651949 thermal: cpu_cooling: store cpufreq policy
The cpufreq policy can be used by the cpu_cooling driver, lets store it
in the cpufreq_cooling_device structure.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:32:33 -07:00
Viresh Kumar 55d8529313 cpufreq: create cpufreq_table_count_valid_entries()
We need such a routine at two places already, lets create one.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:32:28 -07:00
Viresh Kumar 4d753aa7b6 thermal: cpu_cooling: use cpufreq_policy to register cooling device
The CPU cooling driver uses the cpufreq policy, to get clip_cpus, the
frequency table, etc. Most of the callers of CPU cooling driver's
registration routines have the cpufreq policy with them, but they only
pass the policy->related_cpus cpumask. The __cpufreq_cooling_register()
routine then gets the policy by itself and uses it.

It would be much better if the callers can pass the policy instead
directly. This also fixes a basic design flaw, where the policy can be
freed while the CPU cooling driver is still active.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:32:24 -07:00
Viresh Kumar 18f301c934 thermal: cpu_cooling: get rid of a variable in cpufreq_set_cur_state()
'cpu' is used at only one place and there is no need to keep a separate
variable for it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:32:20 -07:00
Viresh Kumar 3e08b2df12 thermal: cpu_cooling: remove cpufreq_cooling_get_level()
There is only one user of cpufreq_cooling_get_level() and that already
has pointer to the cpufreq_cdev structure. It can directly call
get_level() instead and we can get rid of cpufreq_cooling_get_level().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:32:04 -07:00
Viresh Kumar 04bdbdf93c thermal: cpu_cooling: replace cool_dev with cdev
Objects of "struct thermal_cooling_device" are named a bit
inconsistently. Lets use cdev everywhere.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:31:59 -07:00
Viresh Kumar 1dea432a67 thermal: cpu_cooling: Name cpufreq cooling devices as cpufreq_cdev
Objects of "struct cpufreq_cooling_device" are named a bit
inconsistently. Lets use cpufreq_cdev everywhere. Also note that the
lists containing such devices is renamed similarly too.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:31:55 -07:00
Viresh Kumar fb8ea30821 thermal: cpu_cooling: rearrange globals
Just to make it look better.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:31:49 -07:00
Viresh Kumar 289d72afdd thermal: cpu_cooling: Avoid accessing potentially freed structures
After the lock is dropped, it is possible that the cpufreq_dev gets
freed before we call get_level() and that can cause kernel to crash.

Drop the lock after we are done using the structure.

Cc: 4.2+ <stable@vger.kernel.org> # 4.2+
Fixes: 02373d7c69 ("thermal: cpu_cooling: fix lockdep problems in cpu_cooling")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2017-05-27 17:31:05 -07:00
Viresh Kumar 3ea3217cf9 thermal: cpu_cooling: Check OPP for errors
It is possible for dev_pm_opp_find_freq_exact() to return errors. It was
all fine earlier as dev_pm_opp_get_voltage() had a check within it to
check for invalid OPPs, but dev_pm_opp_put() doesn't have any similar
checks and the callers need to make sure OPP is valid before calling
them.

Also update the later dev_warn_ratelimited() to not print the error
message as the OPP is guaranteed to be valid now.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2017-03-13 10:06:55 +08:00
Viresh Kumar 9aec9082bf thermal: cpu_cooling: Replace dev_warn with dev_err
There isn't much the user can do on seeing these warnings, as the
hardware is actually okay. dev_err suits much better here.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2017-03-13 10:06:47 +08:00
Matthew Wilcox 088db931e0 thermal: Fix potential deadlock in cpu_cooling
cooling_list_lock is covering not just cpufreq_dev_count, but also the
calls to cpufreq_register_notifier() and cpufreq_unregister_notifier().

Since cooling_list_lock is also used within cpufreq_thermal_notifier(),
lockdep reports a potential deadlock. Fix it by testing the condition
under cooling_list_lock and dropping the lock before calling
cpufreq_register_notifier(). And variable cpufreq_dev_count is removed
at the same time, because it's no longer needed after the fix.

Fixes: ae60608962 ("thermal: convert cpu_cooling to use an IDA")
Reported-and-Tested-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2017-03-13 10:06:04 +08:00
Linus Torvalds 544a068f1f Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management updates from Zhang Rui:

 - add thermal driver for R-Car Gen3 thermal sensors.

 - add thermal driver for ZTE' zx2967 family thermal sensors.

 - convert thermal ID allocation from IDR to IDA.

 - fix a possible NULL dereference in imx thermal driver.

 - fix a ti-soc-thermal driver dependency issue so that critical thermal
   control is still available when CPU_THERMAL is not defined.

 - update binding information for QorIQ thermal driver.

 - a couple of cleanups in thermal core, intel_powerclamp, exynos,
   dra752-thermal, mtk-thermal driver.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
  powerpc/mpc85xx: Update TMU device tree node for T1023/T1024
  powerpc/mpc85xx: Update TMU device tree node for T1040/T1042
  dt-bindings: Update QorIQ TMU thermal bindings
  thermal: mtk_thermal: Staticise a number of data variables
  thermal: arm: dra752: Remove all TSHUT related definitions
  thermal: arm: dra752: Remove TSHUT configuration
  thermal: ti-soc-thermal: Remove CPU_THERMAL Dependency from TI_THERMAL
  thermal: imx: Fix possible NULL dereference.
  thermal: exynos: Remove parsing unused samsung,tmu_cal_mode property
  thermal: zx2967: add thermal driver for ZTE's zx2967 family
  thermal: use cpumask_var_t for on-stack cpu masks
  dt: bindings: add documentation for zx2967 family thermal sensor
  thermal/intel_powerclamp: Remove set-but-not-used variables
  thermal: rcar_gen3_thermal: Add R-Car Gen3 thermal driver
  thermal: rcar_gen3_thermal: Document the R-Car Gen3
  thermal: convert devfreq_cooling to use an IDA
  thermal: convert cpu_cooling to use an IDA
  thermal: convert clock cooling to use an IDA
  thermal core: convert ID allocation to IDA
2017-03-01 09:54:32 -08:00
Zhang Rui 6fefe19f58 Merge branches 'thermal-core', 'thermal-soc', 'thermal-intel' and 'ida-conversion' into next 2017-02-22 15:35:06 +08:00
Arnd Bergmann d9cc34a6e1 thermal: use cpumask_var_t for on-stack cpu masks
Putting a bare cpumask structure on the stack produces a warning on
large SMP configurations:

drivers/thermal/cpu_cooling.c: In function 'cpufreq_state2power':
drivers/thermal/cpu_cooling.c:644:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
drivers/thermal/cpu_cooling.c: In function '__cpufreq_cooling_register':
drivers/thermal/cpu_cooling.c:898:1: warning: the frame size of 1104 bytes is larger than 1024 bytes [-Wframe-larger-than=]

The recommended workaround is to use cpumask_var_t, which behaves just like
a normal cpu mask in most cases, but turns into a dynamic allocation
when CONFIG_CPUMASK_OFFSTACK is set.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2017-02-10 16:40:47 +08:00
Viresh Kumar 8a31d9d942 PM / OPP: Update OPP users to put reference
This patch updates dev_pm_opp_find_freq_*() routines to get a reference
to the OPPs returned by them.

Also updates the users of dev_pm_opp_find_freq_*() routines to call
dev_pm_opp_put() after they are done using the OPPs.

As it is guaranteed the that OPPs wouldn't get freed while being used,
the RCU read side locking present with the users isn't required anymore.
Drop it as well.

This patch also updates all users of devfreq_recommended_opp() which was
returning an OPP received from the OPP core.

Note that some of the OPP core routines have gained
rcu_read_{lock|unlock}() calls, as those still use RCU specific APIs
within them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> [Devfreq]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-30 09:22:21 +01:00
Matthew Wilcox ae60608962 thermal: convert cpu_cooling to use an IDA
thermal cpu cooling does not use the ability to look up pointers by ID,
so convert it from using an IDR to the more space-efficient IDA.

The cooling_cpufreq_lock was being used to protect cpufreq_dev_count as
well as the IDR.  Rather than keep the mutex to protect a single integer,
I expanded the scope of cooling_list_lock to also cover cpufreq_dev_count.
We could also convert cpufreq_dev_count into an atomic.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2017-01-04 12:47:28 +08:00
Hugh Kang 0744f1306f thermal: cpu_cooling: Fix wrong comment call function name
The last_load is updated not cpufreq_get_actual_power() function call
but cpufreq_get_requested_power() function call.

Signed-off-by: Inhyuk Kang <hugh.kang@lge.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Javi Merino <javi.merino@arm.com>
Acked-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2016-09-27 09:38:17 +08:00
Brendan Jackman a305a4387a thermal: cpu_cooling: Fix NULL dereference in cpufreq_state2power
Currently all CPU cooling devices share a
`struct thermal_cooling_device_ops` instance. The thermal core uses the
presence of functions in this struct to determine if a cooling device
has a power model (see cdev_is_power_actor). cpu_cooling.c adds the
power model functions to the shared struct when a device is registered
with a power model.

Therefore, if a CPU cooling device is registered using
[of_]cpufreq_power_cooling_register, _all_ devices will be determined to
have a power model, including any registered with
[of_]cpufreq_cooling_register. This can result in cpufreq_state2power
being called on a device where dyn_power_table is NULL.

With this commit, instead of having a shared thermal_cooling_device_ops
which is mutated, we have two versions: one with the power functions and
one without.

Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Cc: Amit Daniel Kachhap <amit.kachhap@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Javi Merino <javi.merino@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2016-08-19 21:32:18 +08:00
Rafael J. Wysocki bb4b9933e2 Merge back earlier cpufreq changes for v4.8. 2016-06-13 23:33:17 +02:00
Viresh Kumar f8bfc116ca cpufreq: Remove cpufreq_frequency_get_table()
Most of the callers of cpufreq_frequency_get_table() already have the
pointer to a valid 'policy' structure and they don't really need to go
through the per-cpu variable first and then a check to validate the
frequency, in order to find the freq-table for the policy.

Directly use the policy->freq_table field instead for them.

Only one user of that API is left after above changes, cpu_cooling.c and
it accesses the freq_table in a racy way as the policy can get freed in
between.

Fix it by using cpufreq_cpu_get() properly.

Since there are no more users of cpufreq_frequency_get_table() left, get
rid of it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Javi Merino <javi.merino@arm.com> (cpu_cooling.c)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-06-09 00:58:05 +02:00
Lukasz Luba f840ab18bd thermal: cpu_cooling: fix improper order during initialization
The freq_table array is not populated before calling
thermal_of_cooling_register. The code which populates the freq table was
introduced in commit f6859014.
This should be done before registering new thermal cooling device.
The log shows effects of this wrong decision.
[    2.172614] cpu cpu1: Failed to get voltage for frequency 1984518656000: -34
[    2.220863] cpu cpu0: Failed to get voltage for frequency 1984524416000: -34

Cc: <stable@vger.kernel.org> # 3.19+
Fixes: f6859014c7 ("thermal: cpu_cooling: Store frequencies in descending order")
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Javi Merino <javi.merino@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2016-06-01 22:21:19 +08:00
Javi Merino a53b8394ec thermal: cpu_cooling: fix out of bounds access in time_in_idle
In __cpufreq_cooling_register() we allocate the arrays for time_in_idle
and time_in_idle_timestamp to be as big as the number of cpus in this
cpufreq device.  However, in get_load() we access this array using the
cpu number as index, which can result in an out of bound access.

Index time_in_idle{,_timestamp} using the index in the cpufreq_device's
allowed_cpus mask, as we do for the load_cpu array in
cpufreq_get_requested_power()

Reported-by: Nicolas Boichat <drinkcat@chromium.org>
Cc: Amit Daniel Kachhap <amit.kachhap@gmail.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Tested-by: Nicolas Boichat <drinkcat@chromium.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-02-11 07:13:29 -08:00
Vaishali Thakkar a71544cd93 thermal: cpu_cooling: Remove usage of devm functions
In the function cpufreq_get_requested_power, the memory allocated
for load_cpu is live within the function only. And after the
allocation it is immediately freed with devm_kfree. There is no
need to allocate memory for load_cpu with devm function so replace
devm_kcalloc with kcalloc and devm_kfree with kfree.

Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2015-10-10 11:32:21 +08:00
Javi Merino eba4f88d5a thermal: cpu_cooling: free power table on error or when unregistering
The power table is not being freed on error from cpufreq_cooling
register or when unregistering.  Free it.

Fixes: c36cf07176 ("thermal: cpu_cooling: implement the power cooling device API")
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-09-13 20:34:05 -07:00
Javi Merino 459ac37506 thermal: cpu_cooling: don't call kcalloc() under rcu_read_lock
build_dyn_power_table() allocates the power table while holding
rcu_read_lock.  kcalloc using GFP_KERNEL may sleep, so it can't be
called in an RCU read-side path.

Move the rcu protection to the part of the function that really needs
it: the part that handles the dev_pm_opp pointer received from
dev_pm_opp_find_freq_ceil().  In the unlikely case that there is an OPP
added to the cpu while this function is running, return -EAGAIN.

Fixes: c36cf07176 ("thermal: cpu_cooling: implement the power cooling device API")
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-09-13 20:33:17 -07:00
Viresh Kumar 1afb9c539d thermal/cpu_cooling: update policy limits if clipped_freq < policy->max
policy->max is the maximum allowed frequency defined by user and
clipped_freq is the maximum that thermal constraints allow.

If clipped_freq is lower than policy->max, then we need to readjust
policy->max.

But, if clipped_freq is greater than policy->max, we don't need to do
anything. We used to call cpufreq_verify_within_limits() in this case,
but it doesn't change anything in this case.

Lets skip this unnecessary call and write a comment that explains this.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-08-14 18:26:23 -07:00
Viresh Kumar abcbcc25cb thermal/cpu_cooling: rename max_freq as clipped_freq in notifier
That's what it is for, lets name it properly.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-08-14 18:26:23 -07:00
Viresh Kumar 59f0d21883 thermal/cpu_cooling: rename cpufreq_val as clipped_freq
That's what it is for, lets name it properly.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-08-14 18:26:22 -07:00
Viresh Kumar a24af233a1 thermal/cpu_cooling: convert 'switch' block to 'if' block in notifier
We just need to take care of single event here and there is no need to
increase indentation level of most of the code (which causes lines
longer that 80 columns to break).

Kill the switch block.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-08-14 18:26:22 -07:00
Viresh Kumar 166529c9b6 thermal/cpu_cooling: quit early after updating policy
If a valid cpufreq_dev is found for policy->cpu, we should update the
policy and quit the for loop. There is no need to keep traversing the
list of cpufreq_dev's.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-08-14 18:26:22 -07:00
Viresh Kumar 76fd38ce21 thermal/cpu_cooling: No need to initialize max_freq to 0
Its always set before getting used, don't initialize it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-08-14 18:26:22 -07:00
Russell King 02373d7c69 thermal: cpu_cooling: fix lockdep problems in cpu_cooling
A recent change to the cpu_cooling code introduced a AB-BA deadlock
scenario between the cpufreq_policy_notifier_list rwsem and the
cooling_cpufreq_lock.  This is caused by cooling_cpufreq_lock being held
before the registration/removal of the notifier block (an operation
which takes the rwsem), and the notifier code itself which takes the
locks in the reverse order:

======================================================
[ INFO: possible circular locking dependency detected ]
3.18.0+ #1453 Not tainted
-------------------------------------------------------
rc.local/770 is trying to acquire lock:
 (cooling_cpufreq_lock){+.+.+.}, at: [<c04abfc4>] cpufreq_thermal_notifier+0x34/0xfc

but task is already holding lock:
 ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>]  __blocking_notifier_call_chain+0x34/0x68

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 ((cpufreq_policy_notifier_list).rwsem){++++.+}:
       [<c06bc3b0>] down_write+0x44/0x9c
       [<c0043444>] blocking_notifier_chain_register+0x28/0xd8
       [<c04ad610>] cpufreq_register_notifier+0x68/0x90
       [<c04abe4c>] __cpufreq_cooling_register.part.1+0x120/0x180
       [<c04abf44>] __cpufreq_cooling_register+0x98/0xa4
       [<c04abf8c>] cpufreq_cooling_register+0x18/0x1c
       [<bf0046f8>] imx_thermal_probe+0x1c0/0x470 [imx_thermal]
       [<c037cef8>] platform_drv_probe+0x50/0xac
       [<c037b710>] driver_probe_device+0x114/0x234
       [<c037b8cc>] __driver_attach+0x9c/0xa0
       [<c0379d68>] bus_for_each_dev+0x5c/0x90
       [<c037b204>] driver_attach+0x24/0x28
       [<c037ae7c>] bus_add_driver+0xe0/0x1d8
       [<c037c0cc>] driver_register+0x80/0xfc
       [<c037cd80>] __platform_driver_register+0x50/0x64
       [<bf007018>] 0xbf007018
       [<c0008a5c>] do_one_initcall+0x88/0x1d8
       [<c0095da4>] load_module+0x1768/0x1ef8
       [<c0096614>] SyS_init_module+0xe0/0xf4
       [<c000ec00>] ret_fast_syscall+0x0/0x48

-> #0 (cooling_cpufreq_lock){+.+.+.}:
       [<c00619f8>] lock_acquire+0xb0/0x124
       [<c06ba3b4>] mutex_lock_nested+0x5c/0x3d8
       [<c04abfc4>] cpufreq_thermal_notifier+0x34/0xfc
       [<c0042bf4>] notifier_call_chain+0x4c/0x8c
       [<c0042f20>] __blocking_notifier_call_chain+0x50/0x68
       [<c0042f58>] blocking_notifier_call_chain+0x20/0x28
       [<c04ae62c>] cpufreq_set_policy+0x7c/0x1d0
       [<c04af3cc>] store_scaling_governor+0x74/0x9c
       [<c04ad418>] store+0x90/0xc0
       [<c0175384>] sysfs_kf_write+0x54/0x58
       [<c01746b4>] kernfs_fop_write+0xdc/0x190
       [<c010dcc0>] vfs_write+0xac/0x1b4
       [<c010dfec>] SyS_write+0x44/0x90
       [<c000ec00>] ret_fast_syscall+0x0/0x48

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((cpufreq_policy_notifier_list).rwsem);
                               lock(cooling_cpufreq_lock);
                               lock((cpufreq_policy_notifier_list).rwsem);
  lock(cooling_cpufreq_lock);

 *** DEADLOCK ***

7 locks held by rc.local/770:
 #0:  (sb_writers#6){.+.+.+}, at: [<c010dda0>] vfs_write+0x18c/0x1b4
 #1:  (&of->mutex){+.+.+.}, at: [<c0174678>] kernfs_fop_write+0xa0/0x190
 #2:  (s_active#52){.+.+.+}, at: [<c0174680>] kernfs_fop_write+0xa8/0x190
 #3:  (cpu_hotplug.lock){++++++}, at: [<c0026a60>] get_online_cpus+0x34/0x90
 #4:  (cpufreq_rwsem){.+.+.+}, at: [<c04ad3e0>] store+0x58/0xc0
 #5:  (&policy->rwsem){+.+.+.}, at: [<c04ad3f8>] store+0x70/0xc0
 #6:  ((cpufreq_policy_notifier_list).rwsem){++++.+}, at: [<c0042f04>] __blocking_notifier_call_chain+0x34/0x68

stack backtrace:
CPU: 0 PID: 770 Comm: rc.local Not tainted 3.18.0+ #1453
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[<c00121c8>] (dump_backtrace) from [<c0012360>] (show_stack+0x18/0x1c)
 r6:c0b85a80 r5:c0b75630 r4:00000000 r3:00000000
[<c0012348>] (show_stack) from [<c06b6c48>] (dump_stack+0x7c/0x98)
[<c06b6bcc>] (dump_stack) from [<c06b42a4>] (print_circular_bug+0x28c/0x2d8)
 r4:c0b85a80 r3:d0071d40
[<c06b4018>] (print_circular_bug) from [<c00613b0>] (__lock_acquire+0x1acc/0x1bb0)
 r10:c0b50660 r8:c09e6d80 r7:d0071d40 r6:c11d0f0c r5:00000007 r4:d0072240
[<c005f8e4>] (__lock_acquire) from [<c00619f8>] (lock_acquire+0xb0/0x124)
 r10:00000000 r9:c04abfc4 r8:00000000 r7:00000000 r6:00000000 r5:c0a06f0c
 r4:00000000
[<c0061948>] (lock_acquire) from [<c06ba3b4>] (mutex_lock_nested+0x5c/0x3d8)
 r10:ec853800 r9:c0a06ed4 r8:d0071d40 r7:c0a06ed4 r6:c11d0f0c r5:00000000
 r4:c04abfc4
[<c06ba358>] (mutex_lock_nested) from [<c04abfc4>] (cpufreq_thermal_notifier+0x34/0xfc)
 r10:ec853800 r9:ec85380c r8:d00d7d3c r7:c0a06ed4 r6:d00d7d3c r5:00000000
 r4:fffffffe
[<c04abf90>] (cpufreq_thermal_notifier) from [<c0042bf4>] (notifier_call_chain+0x4c/0x8c)
 r7:00000000 r6:00000000 r5:00000000 r4:fffffffe
[<c0042ba8>] (notifier_call_chain) from [<c0042f20>] (__blocking_notifier_call_chain+0x50/0x68)
 r8:c0a072a4 r7:00000000 r6:d00d7d3c r5:ffffffff r4:c0a06fc8 r3:ffffffff
[<c0042ed0>] (__blocking_notifier_call_chain) from [<c0042f58>] (blocking_notifier_call_chain+0x20/0x28)
 r7:ec98b540 r6:c13ebc80 r5:ed76e600 r4:d00d7d3c
[<c0042f38>] (blocking_notifier_call_chain) from [<c04ae62c>] (cpufreq_set_policy+0x7c/0x1d0)
[<c04ae5b0>] (cpufreq_set_policy) from [<c04af3cc>] (store_scaling_governor+0x74/0x9c)
 r7:ec98b540 r6:0000000c r5:ec98b540 r4:ed76e600
[<c04af358>] (store_scaling_governor) from [<c04ad418>] (store+0x90/0xc0)
 r6:0000000c r5:ed76e6d4 r4:ed76e600
[<c04ad388>] (store) from [<c0175384>] (sysfs_kf_write+0x54/0x58)
 r8:0000000c r7:d00d7f78 r6:ec98b540 r5:0000000c r4:ec853800 r3:0000000c
[<c0175330>] (sysfs_kf_write) from [<c01746b4>] (kernfs_fop_write+0xdc/0x190)
 r6:ec98b540 r5:00000000 r4:00000000 r3:c0175330
[<c01745d8>] (kernfs_fop_write) from [<c010dcc0>] (vfs_write+0xac/0x1b4)
 r10:0162aa70 r9:d00d6000 r8:0000000c r7:d00d7f78 r6:0162aa70 r5:0000000c
 r4:eccde500
[<c010dc14>] (vfs_write) from [<c010dfec>] (SyS_write+0x44/0x90)
 r10:0162aa70 r8:0000000c r7:eccde500 r6:eccde500 r5:00000000 r4:00000000
[<c010dfa8>] (SyS_write) from [<c000ec00>] (ret_fast_syscall+0x0/0x48)
 r10:00000000 r8:c000edc4 r7:00000004 r6:000216cc r5:0000000c r4:0162aa70

Solve this by moving to finer grained locking - use one mutex to protect
the cpufreq_dev_list as a whole, and a separate lock to ensure correct
ordering of cpufreq notifier registration and removal.

cooling_list_lock is taken within cooling_cpufreq_lock on
(un)registration to preserve the behavior of the code, i.e. to
atomically add/remove to the list and (un)register the notifier.

Fixes: 2dcd851fe4 ("thermal: cpu_cooling: Update always cpufreq policy with
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-08-14 18:21:41 -07:00
Kapileshwar Singh dd658e0235 thermal: cpu_cooling: Fix power calculation when CPUs are offline
Ensure that the CPU for which the frequency is being requested
is online. If none of the CPUs are online the requested power is
returned as 0.

Acked-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Kapileshwar Singh <kapileshwar.singh@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-05-04 21:27:54 -07:00
Kapileshwar Singh 54b92aae83 thermal: cpu_cooling: Remove cpu_dev update on policy CPU update
It was initially understood that an update to the cpu_device
(cached in cpufreq_cooling_device) was required to ascertain the
correct operating point of the device on a cpufreq policy->cpu update
or creation or deletion of a cpufreq policy.
(e.g. when the existing policy CPU goes offline).

This update is not required and it is possible to ascertain the OPPs
from the leading CPU in a cpufreq domain even if the CPU is hotplugged out.

Fixes: e0128d8ab423 ("thermal: cpu_cooling: implement the power cooling device API")
Acked-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Kapileshwar Singh <kapileshwar.singh@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-05-04 21:27:54 -07:00
Javi Merino 0cdf97e1ad thermal: cpu_cooling: Check memory allocation of power_table
We allocate the power_table in memory but we don't test whether the
allocation succeeded.  Return -ENOMEM if kcalloc() fails.

Fixes: e0128d8ab423 ("thermal: cpu_cooling: implement the power cooling device API")
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-05-04 21:27:54 -07:00
Javi Merino 6828a4711f thermal: add trace events to the power allocator governor
Add trace events for the power allocator governor and the power actor
interface of the cpu cooling device.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-05-04 21:27:52 -07:00
Javi Merino c36cf07176 thermal: cpu_cooling: implement the power cooling device API
Add a basic power model to the cpu cooling device to implement the
power cooling device API.  The power model uses the current frequency,
current load and OPPs for the power calculations.  The cpus must have
registered their OPPs using the OPP library.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Kapileshwar Singh <kapileshwar.singh@arm.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-05-04 21:27:52 -07:00
Zhang Rui 7429b1e0d0 Merge branches 'thermal-core', 'thermal-soc' and 'thermal-int340x' of .git into next 2014-12-24 10:38:30 +08:00
Javi Merino fc4de356e0 thermal: cpu_cooling: document node in struct cpufreq_cooling_device
The node field of struct cpufreq_cooling_device was reintroduced in
2dcd851fe4 (thermal: cpu_cooling: Update always cpufreq policy with
thermal constraints) but without the documentation that it once had.
Add it back so that all the fields of struct cpufreq_cooling_device
are documented.

Cc: Yadwinder Singh Brar <yadi.brar@samsung.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2014-12-21 21:40:13 +08:00