Power management updates for 5.8-rc1

- Rework the system-wide PM driver flags to make them easier to
    understand and use and update their documentation (Rafael Wysocki,
    Alan Stern).
 
  - Allow cpuidle governors to be switched at run time regardless of
    the kernel configuration and update the related documentation
    accordingly (Hanjun Guo).
 
  - Improve the resume device handling in the user space hibernarion
    interface code (Domenico Andreoli).
 
  - Document the intel-speed-select sysfs interface (Srinivas
    Pandruvada).
 
  - Make the ACPI code handing suspend to idle print more debug
    messages to help diagnose issues with it (Rafael Wysocki).
 
  - Fix a helper routine in the cpufreq core and correct a typo in
    the struct cpufreq_driver kerneldoc comment (Rafael Wysocki, Wang
    Wenhu).
 
  - Update cpufreq drivers:
 
    * Make the intel_pstate driver start in the passive mode by
      default on systems without HWP (Rafael Wysocki).
 
    * Add i.MX7ULP support to the imx-cpufreq-dt driver and add
      i.MX7ULP to the cpufreq-dt-platdev blacklist (Peng Fan).
 
    * Convert the qoriq cpufreq driver to a platform one, make the
      platform code create a suitable device object for it and add
      platform dependencies to it (Mian Yousaf Kaukab, Geert
      Uytterhoeven).
 
    * Fix wrong compatible binding in the qcom driver (Ansuel Smith).
 
    * Build the omap driver by default for ARCH_OMAP2PLUS (Anders
      Roxell).
 
    * Add r8a7742 SoC support to the dt cpufreq driver (Lad Prabhakar).
 
  - Update cpuidle core and drivers:
 
    * Fix three reference count leaks in error code paths in the
      cpuidle core (Qiushi Wu).
 
    * Convert Qualcomm SPM to a generic cpuidle driver (Stephan
      Gerhold).
 
    * Fix up the execution order when entering a domain idle state in
      the PSCI driver (Ulf Hansson).
 
  - Fix a reference counting issue related to clock management and
    clean up two oddities in the PM-runtime framework (Rafael Wysocki,
    Andy Shevchenko).
 
  - Add ElkhartLake support to the Intel RAPL power capping driver
    and remove an unused local MSR definition from it (Jacob Pan,
    Sumeet Pawnikar).
 
  - Update devfreq core and drivers:
 
    * Replace strncpy() with strscpy() in the devfreq core and use
      lockdep asserts instead of manual checks for a locked mutex in
      it (Dmitry Osipenko, Krzysztof Kozlowski).
 
    * Add a generic imx bus scaling driver and make it register an
      interconnect device (Leonard Crestez, Gustavo A. R. Silva).
 
    * Make the cpufreq notifier in the tegra30 driver take boosting
      into account and delete an unuseful error message from that
      driver (Dmitry Osipenko, Markus Elfring).
 
  - Remove unneeded semicolon from the cpupower code (Zou Wei).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl7VGjwSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx46gP/jGAXlddFEQswi6qUT3Cff0A9mb8CdcX
 dyKrjX4xxo/wtBIAwSN4achxrgse//ayo2dYTzWRDd31W9Azbv+5F+46XsDRz4hL
 pH29u/E66NMtFWnHCmt78NEJn0FzSa0YBC43ZzwFwKktCK9skYIpGN2z6iuXUBSX
 Q5GHqop3zvDsdKQFBGL62xvUw/AmOTPG7ohIZvqWBN2mbOqEqMcoFHT+aUF/NbLj
 +i14dvTH767eDZGRVASmXWQyljjaRWm+SIw4+m8zT1D1Y3d5IFObuMN+9RQl1Tif
 BYjkgJ2oDDMhCJLW7TBuJB+g7exiyaSQds3nMr2ZR+eZbJipICjU4eehNEKIUopU
 DM17tHQfnwZfS/7YbCx3vYQwLkNq37AJyXS9uqCAIFM+0n4xN4/mIVmgWYISLDTs
 1v9olFxtwMRNpjGGQWPJAO7ebB8Zz9qhQv7pIkSQEfwp93/SzvlVf4vvruTeFN9J
 qqG60cDumXWAm+s43eQHJNn5nOd5ocWv0FBpo/cxqKbzxFVWwdB42Cm0SY+rK2ID
 uHdnc2DJcK2c78UVbz3Cmk4272foJt2zxchqjFXXAZPLrOsFfzmti4B28VxGxjmP
 LG3MhH5sdbF4yl/1aSC1Bnrt+PV9Lus6ut/VKhjwIpw8cqiXgpwSbMoDoaBd9UMQ
 ubGz2rplGAtB
 =APdj
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These rework the system-wide PM driver flags, make runtime switching
  of cpuidle governors easier, improve the user space hibernation
  interface code, add intel-speed-select interface documentation, add
  more debug messages to the ACPI code handling suspend to idle, update
  the cpufreq core and drivers, fix a minor issue in the cpuidle core
  and update two cpuidle drivers, improve the PM-runtime framework,
  update the Intel RAPL power capping driver, update devfreq core and
  drivers, and clean up the cpupower utility.

  Specifics:

   - Rework the system-wide PM driver flags to make them easier to
     understand and use and update their documentation (Rafael Wysocki,
     Alan Stern).

   - Allow cpuidle governors to be switched at run time regardless of
     the kernel configuration and update the related documentation
     accordingly (Hanjun Guo).

   - Improve the resume device handling in the user space hibernarion
     interface code (Domenico Andreoli).

   - Document the intel-speed-select sysfs interface (Srinivas
     Pandruvada).

   - Make the ACPI code handing suspend to idle print more debug
     messages to help diagnose issues with it (Rafael Wysocki).

   - Fix a helper routine in the cpufreq core and correct a typo in the
     struct cpufreq_driver kerneldoc comment (Rafael Wysocki, Wang
     Wenhu).

   - Update cpufreq drivers:

      - Make the intel_pstate driver start in the passive mode by
        default on systems without HWP (Rafael Wysocki).

      - Add i.MX7ULP support to the imx-cpufreq-dt driver and add
        i.MX7ULP to the cpufreq-dt-platdev blacklist (Peng Fan).

      - Convert the qoriq cpufreq driver to a platform one, make the
        platform code create a suitable device object for it and add
        platform dependencies to it (Mian Yousaf Kaukab, Geert
        Uytterhoeven).

      - Fix wrong compatible binding in the qcom driver (Ansuel Smith).

      - Build the omap driver by default for ARCH_OMAP2PLUS (Anders
        Roxell).

      - Add r8a7742 SoC support to the dt cpufreq driver (Lad
        Prabhakar).

   - Update cpuidle core and drivers:

      - Fix three reference count leaks in error code paths in the
        cpuidle core (Qiushi Wu).

      - Convert Qualcomm SPM to a generic cpuidle driver (Stephan
        Gerhold).

      - Fix up the execution order when entering a domain idle state in
        the PSCI driver (Ulf Hansson).

   - Fix a reference counting issue related to clock management and
     clean up two oddities in the PM-runtime framework (Rafael Wysocki,
     Andy Shevchenko).

   - Add ElkhartLake support to the Intel RAPL power capping driver and
     remove an unused local MSR definition from it (Jacob Pan, Sumeet
     Pawnikar).

   - Update devfreq core and drivers:

      - Replace strncpy() with strscpy() in the devfreq core and use
        lockdep asserts instead of manual checks for a locked mutex in
        it (Dmitry Osipenko, Krzysztof Kozlowski).

      - Add a generic imx bus scaling driver and make it register an
        interconnect device (Leonard Crestez, Gustavo A. R. Silva).

      - Make the cpufreq notifier in the tegra30 driver take boosting
        into account and delete an unuseful error message from that
        driver (Dmitry Osipenko, Markus Elfring).

   - Remove unneeded semicolon from the cpupower code (Zou Wei)"

* tag 'pm-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (51 commits)
  cpuidle: Fix three reference count leaks
  PM: runtime: Replace pm_runtime_callbacks_present()
  PM / devfreq: Use lockdep asserts instead of manual checks for locked mutex
  PM / devfreq: imx-bus: Fix inconsistent IS_ERR and PTR_ERR
  PM / devfreq: Replace strncpy with strscpy
  PM / devfreq: imx: Register interconnect device
  PM / devfreq: Add generic imx bus scaling driver
  PM / devfreq: tegra30: Delete an error message in tegra_devfreq_probe()
  PM / devfreq: tegra30: Make CPUFreq notifier to take into account boosting
  PM: hibernate: Restrict writes to the resume device
  PM: runtime: clk: Fix clk_pm_runtime_get() error path
  cpuidle: Convert Qualcomm SPM driver to a generic CPUidle driver
  ACPI: EC: PM: s2idle: Extend GPE dispatching debug message
  ACPI: PM: s2idle: Print type of wakeup debug messages
  powercap: RAPL: remove unused local MSR define
  PM: runtime: Make clear what we do when conditions are wrong in rpm_suspend()
  Documentation: admin-guide: pm: Document intel-speed-select
  PM: hibernate: Split off snapshot dev option
  PM: hibernate: Incorporate concurrency handling
  Documentation: ABI: make current_governer_ro as a candidate for removal
  ...
This commit is contained in:
Linus Torvalds 2020-06-02 13:17:23 -07:00
commit 355ba37d75
70 changed files with 1830 additions and 727 deletions

View File

@ -0,0 +1,9 @@
What: /sys/devices/system/cpu/cpuidle/current_governor_ro
Date: April, 2020
Contact: linux-pm@vger.kernel.org
Description:
current_governor_ro shows current using cpuidle governor, but read only.
with the update that cpuidle governor can be changed at runtime in default,
both current_governor and current_governor_ro co-exist under
/sys/devices/system/cpu/cpuidle/ file, it's duplicate so make
current_governor_ro obselete.

View File

@ -106,10 +106,10 @@ Description: CPU topology files that describe a logical CPU's relationship
See Documentation/admin-guide/cputopology.rst for more information.
What: /sys/devices/system/cpu/cpuidle/current_driver
/sys/devices/system/cpu/cpuidle/current_governer_ro
/sys/devices/system/cpu/cpuidle/available_governors
What: /sys/devices/system/cpu/cpuidle/available_governors
/sys/devices/system/cpu/cpuidle/current_driver
/sys/devices/system/cpu/cpuidle/current_governor
/sys/devices/system/cpu/cpuidle/current_governer_ro
Date: September 2007
Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org>
Description: Discover cpuidle policy and mechanism
@ -119,24 +119,18 @@ Description: Discover cpuidle policy and mechanism
consumption during idle.
Idle policy (governor) is differentiated from idle mechanism
(driver)
current_driver: (RO) displays current idle mechanism
current_governor_ro: (RO) displays current idle policy
With the cpuidle_sysfs_switch boot option enabled (meant for
developer testing), the following three attributes are visible
instead:
current_driver: same as described above
(driver).
available_governors: (RO) displays a space separated list of
available governors
available governors.
current_driver: (RO) displays current idle mechanism.
current_governor: (RW) displays current idle policy. Users can
switch the governor at runtime by writing to this file.
current_governor_ro: (RO) displays current idle policy.
See Documentation/admin-guide/pm/cpuidle.rst and
Documentation/driver-api/pm/cpuidle.rst for more information.

View File

@ -159,17 +159,15 @@ governor uses that information depends on what algorithm is implemented by it
and that is the primary reason for having more than one governor in the
``CPUIdle`` subsystem.
There are three ``CPUIdle`` governors available, ``menu``, `TEO <teo-gov_>`_
and ``ladder``. Which of them is used by default depends on the configuration
of the kernel and in particular on whether or not the scheduler tick can be
`stopped by the idle loop <idle-cpus-and-tick_>`_. It is possible to change the
governor at run time if the ``cpuidle_sysfs_switch`` command line parameter has
been passed to the kernel, but that is not safe in general, so it should not be
done on production systems (that may change in the future, though). The name of
the ``CPUIdle`` governor currently used by the kernel can be read from the
:file:`current_governor_ro` (or :file:`current_governor` if
``cpuidle_sysfs_switch`` is present in the kernel command line) file under
:file:`/sys/devices/system/cpu/cpuidle/` in ``sysfs``.
There are four ``CPUIdle`` governors available, ``menu``, `TEO <teo-gov_>`_,
``ladder`` and ``haltpoll``. Which of them is used by default depends on the
configuration of the kernel and in particular on whether or not the scheduler
tick can be `stopped by the idle loop <idle-cpus-and-tick_>`_. Available
governors can be read from the :file:`available_governors`, and the governor
can be changed at runtime. The name of the ``CPUIdle`` governor currently
used by the kernel can be read from the :file:`current_governor_ro` or
:file:`current_governor` file under :file:`/sys/devices/system/cpu/cpuidle/`
in ``sysfs``.
Which ``CPUIdle`` driver is used, on the other hand, usually depends on the
platform the kernel is running on, but there are platforms with more than one

View File

@ -0,0 +1,917 @@
.. SPDX-License-Identifier: GPL-2.0
============================================================
Intel(R) Speed Select Technology User Guide
============================================================
The Intel(R) Speed Select Technology (Intel(R) SST) provides a powerful new
collection of features that give more granular control over CPU performance.
With Intel(R) SST, one server can be configured for power and performance for a
variety of diverse workload requirements.
Refer to the links below for an overview of the technology:
- https://www.intel.com/content/www/us/en/architecture-and-technology/speed-select-technology-article.html
- https://builders.intel.com/docs/networkbuilders/intel-speed-select-technology-base-frequency-enhancing-performance.pdf
These capabilities are further enhanced in some of the newer generations of
server platforms where these features can be enumerated and controlled
dynamically without pre-configuring via BIOS setup options. This dynamic
configuration is done via mailbox commands to the hardware. One way to enumerate
and configure these features is by using the Intel Speed Select utility.
This document explains how to use the Intel Speed Select tool to enumerate and
control Intel(R) SST features. This document gives example commands and explains
how these commands change the power and performance profile of the system under
test. Using this tool as an example, customers can replicate the messaging
implemented in the tool in their production software.
intel-speed-select configuration tool
======================================
Most Linux distribution packages may include the "intel-speed-select" tool. If not,
it can be built by downloading the Linux kernel tree from kernel.org. Once
downloaded, the tool can be built without building the full kernel.
From the kernel tree, run the following commands::
# cd tools/power/x86/intel-speed-select/
# make
# make install
Getting Help
------------
To get help with the tool, execute the command below::
# intel-speed-select --help
The top-level help describes arguments and features. Notice that there is a
multi-level help structure in the tool. For example, to get help for the feature "perf-profile"::
# intel-speed-select perf-profile --help
To get help on a command, another level of help is provided. For example for the command info "info"::
# intel-speed-select perf-profile info --help
Summary of platform capability
------------------------------
To check the current platform and driver capaibilities, execute::
#intel-speed-select --info
For example on a test system::
# intel-speed-select --info
Intel(R) Speed Select Technology
Executing on CPU model: X
Platform: API version : 1
Platform: Driver version : 1
Platform: mbox supported : 1
Platform: mmio supported : 1
Intel(R) SST-PP (feature perf-profile) is supported
TDP level change control is unlocked, max level: 4
Intel(R) SST-TF (feature turbo-freq) is supported
Intel(R) SST-BF (feature base-freq) is not supported
Intel(R) SST-CP (feature core-power) is supported
Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP)
------------------------------------------------------------------------
This feature allows configuration of a server dynamically based on workload
performance requirements. This helps users during deployment as they do not have
to choose a specific server configuration statically. This Intel(R) Speed Select
Technology - Performance Profile (Intel(R) SST-PP) feature introduces a mechanism
that allows multiple optimized performance profiles per system. Each profile
defines a set of CPUs that need to be online and rest offline to sustain a
guaranteed base frequency. Once the user issues a command to use a specific
performance profile and meet CPU online/offline requirement, the user can expect
a change in the base frequency dynamically. This feature is called
"perf-profile" when using the Intel Speed Select tool.
Number or performance levels
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There can be multiple performance profiles on a system. To get the number of
profiles, execute the command below::
# intel-speed-select perf-profile get-config-levels
Intel(R) Speed Select Technology
Executing on CPU model: X
package-0
die-0
cpu-0
get-config-levels:4
package-1
die-0
cpu-14
get-config-levels:4
On this system under test, there are 4 performance profiles in addition to the
base performance profile (which is performance level 0).
Lock/Unlock status
~~~~~~~~~~~~~~~~~~
Even if there are multiple performance profiles, it is possible that that they
are locked. If they are locked, users cannot issue a command to change the
performance state. It is possible that there is a BIOS setup to unlock or check
with your system vendor.
To check if the system is locked, execute the following command::
# intel-speed-select perf-profile get-lock-status
Intel(R) Speed Select Technology
Executing on CPU model: X
package-0
die-0
cpu-0
get-lock-status:0
package-1
die-0
cpu-14
get-lock-status:0
In this case, lock status is 0, which means that the system is unlocked.
Properties of a performance level
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To get properties of a specific performance level (For example for the level 0, below), execute the command below::
# intel-speed-select perf-profile info -l 0
Intel(R) Speed Select Technology
Executing on CPU model: X
package-0
die-0
cpu-0
perf-profile-level-0
cpu-count:28
enable-cpu-mask:000003ff,f0003fff
enable-cpu-list:0,1,2,3,4,5,6,7,8,9,10,11,12,13,28,29,30,31,32,33,34,35,36,37,38,39,40,41
thermal-design-power-ratio:26
base-frequency(MHz):2600
speed-select-turbo-freq:disabled
speed-select-base-freq:disabled
...
...
Here -l option is used to specify a performance level.
If the option -l is omitted, then this command will print information about all
the performance levels. The above command is printing properties of the
performance level 0.
For this performance profile, the list of CPUs displayed by the
"enable-cpu-mask/enable-cpu-list" at the max can be "online." When that
condition is met, then base frequency of 2600 MHz can be maintained. To
understand more, execute "intel-speed-select perf-profile info" for performance
level 4::
# intel-speed-select perf-profile info -l 4
Intel(R) Speed Select Technology
Executing on CPU model: X
package-0
die-0
cpu-0
perf-profile-level-4
cpu-count:28
enable-cpu-mask:000000fa,f0000faf
enable-cpu-list:0,1,2,3,5,7,8,9,10,11,28,29,30,31,33,35,36,37,38,39
thermal-design-power-ratio:28
base-frequency(MHz):2800
speed-select-turbo-freq:disabled
speed-select-base-freq:unsupported
...
...
There are fewer CPUs in the "enable-cpu-mask/enable-cpu-list". Consequently, if
the user only keeps these CPUs online and the rest "offline," then the base
frequency is increased to 2.8 GHz compared to 2.6 GHz at performance level 0.
Get current performance level
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To get the current performance level, execute::
# intel-speed-select perf-profile get-config-current-level
Intel(R) Speed Select Technology
Executing on CPU model: X
package-0
die-0
cpu-0
get-config-current_level:0
First verify that the base_frequency displayed by the cpufreq sysfs is correct::
# cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency
2600000
This matches the base-frequency (MHz) field value displayed from the
"perf-profile info" command for performance level 0(cpufreq frequency is in
KHz).
To check if the average frequency is equal to the base frequency for a 100% busy
workload, disable turbo::
# echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
Then runs a busy workload on all CPUs, for example::
#stress -c 64
To verify the base frequency, run turbostat::
#turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
Package Core CPU Bzy_MHz
- - 2600
0 0 0 2600
0 1 1 2600
0 2 2 2600
0 3 3 2600
0 4 4 2600
. . . .
Changing performance level
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To the change the performance level to 4, execute::
# intel-speed-select -d perf-profile set-config-level -l 4 -o
Intel(R) Speed Select Technology
Executing on CPU model: X
package-0
die-0
cpu-0
perf-profile
set_tdp_level:success
In the command above, "-o" is optional. If it is specified, then it will also
offline CPUs which are not present in the enable_cpu_mask for this performance
level.
Now if the base_frequency is checked::
#cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency
2800000
Which shows that the base frequency now increased from 2600 MHz at performance
level 0 to 2800 MHz at performance level 4. As a result, any workload, which can
use fewer CPUs, can see a boost of 200 MHz compared to performance level 0.
Check presence of other Intel(R) SST features
---------------------------------------------
Each of the performance profiles also specifies weather there is support of
other two Intel(R) SST features (Intel(R) Speed Select Technology - Base Frequency
(Intel(R) SST-BF) and Intel(R) Speed Select Technology - Turbo Frequency (Intel
SST-TF)).
For example, from the output of "perf-profile info" above, for level 0 and level
4:
For level 0::
speed-select-turbo-freq:disabled
speed-select-base-freq:disabled
For level 4::
speed-select-turbo-freq:disabled
speed-select-base-freq:unsupported
Given these results, the "speed-select-base-freq" (Intel(R) SST-BF) in level 4
changed from "disabled" to "unsupported" compared to performance level 0.
This means that at performance level 4, the "speed-select-base-freq" feature is
not supported. However, at performance level 0, this feature is "supported", but
currently "disabled", meaning the user has not activated this feature. Whereas
"speed-select-turbo-freq" (Intel(R) SST-TF) is supported at both performance
levels, but currently not activated by the user.
The Intel(R) SST-BF and the Intel(R) SST-TF features are built on a foundation
technology called Intel(R) Speed Select Technology - Core Power (Intel(R) SST-CP).
The platform firmware enables this feature when Intel(R) SST-BF or Intel(R) SST-TF
is supported on a platform.
Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP)
---------------------------------------------------------------
Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP) is an interface that
allows users to define per core priority. This defines a mechanism to distribute
power among cores when there is a power constrained scenario. This defines a
class of service (CLOS) configuration.
The user can configure up to 4 class of service configurations. Each CLOS group
configuration allows definitions of parameters, which affects how the frequency
can be limited and power is distributed. Each CPU core can be tied to a class of
service and hence an associated priority. The granularity is at core level not
at per CPU level.
Enable CLOS based prioritization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To use CLOS based prioritization feature, firmware must be informed to enable
and use a priority type. There is a default per platform priority type, which
can be changed with optional command line parameter.
To enable and check the options, execute::
# intel-speed-select core-power enable --help
Intel(R) Speed Select Technology
Executing on CPU model: X
Enable core-power for a package/die
Clos Enable: Specify priority type with [--priority|-p]
0: Proportional, 1: Ordered
There are two types of priority types:
- Ordered
Priority for ordered throttling is defined based on the index of the assigned
CLOS group. Where CLOS0 gets highest priority (throttled last).
Priority order is:
CLOS0 > CLOS1 > CLOS2 > CLOS3.
- Proportional
When proportional priority is used, there is an additional parameter called
frequency_weight, which can be specified per CLOS group. The goal of
proportional priority is to provide each core with the requested min., then
distribute all remaining (excess/deficit) budgets in proportion to a defined
weight. This proportional priority can be configured using "core-power config"
command.
To enable with the platform default priority type, execute::
# intel-speed-select core-power enable
Intel(R) Speed Select Technology
Executing on CPU model: X
package-0
die-0
cpu-0
core-power
enable:success
package-1
die-0
cpu-6
core-power
enable:success
The scope of this enable is per package or die scoped when a package contains
multiple dies. To check if CLOS is enabled and get priority type, "core-power
info" command can be used. For example to check the status of core-power feature
on CPU 0, execute::
# intel-speed-select -c 0 core-power info
Intel(R) Speed Select Technology
Executing on CPU model: X
package-0
die-0
cpu-0
core-power
support-status:supported
enable-status:enabled
clos-enable-status:enabled
priority-type:proportional
package-1
die-0
cpu-24
core-power
support-status:supported
enable-status:enabled
clos-enable-status:enabled
priority-type:proportional
Configuring CLOS groups
~~~~~~~~~~~~~~~~~~~~~~~
Each CLOS group has its own attributes including min, max, freq_weight and
desired. These parameters can be configured with "core-power config" command.
Defaults will be used if user skips setting a parameter except clos id, which is
mandatory. To check core-power config options, execute::
# intel-speed-select core-power config --help
Intel(R) Speed Select Technology
Executing on CPU model: X
Set core-power configuration for one of the four clos ids
Specify targeted clos id with [--clos|-c]
Specify clos Proportional Priority [--weight|-w]
Specify clos min in MHz with [--min|-n]
Specify clos max in MHz with [--max|-m]
For example::
# intel-speed-select core-power config -c 0
Intel(R) Speed Select Technology
Executing on CPU model: X
clos epp is not specified, default: 0
clos frequency weight is not specified, default: 0
clos min is not specified, default: 0 MHz
clos max is not specified, default: 25500 MHz
clos desired is not specified, default: 0
package-0
die-0
cpu-0
core-power
config:success
package-1
die-0
cpu-6
core-power
config:success
The user has the option to change defaults. For example, the user can change the
"min" and set the base frequency to always get guaranteed base frequency.
Get the current CLOS configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To check the current configuration, "core-power get-config" can be used. For
example, to get the configuration of CLOS 0::
# intel-speed-select core-power get-config -c 0
Intel(R) Speed Select Technology
Executing on CPU model: X
package-0
die-0
cpu-0
core-power
clos:0
epp:0
clos-proportional-priority:0
clos-min:0 MHz
clos-max:Max Turbo frequency
clos-desired:0 MHz
package-1
die-0
cpu-24
core-power
clos:0
epp:0
clos-proportional-priority:0
clos-min:0 MHz
clos-max:Max Turbo frequency
clos-desired:0 MHz
Associating a CPU with a CLOS group
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To associate a CPU to a CLOS group "core-power assoc" command can be used::
# intel-speed-select core-power assoc --help
Intel(R) Speed Select Technology
Executing on CPU model: X
Associate a clos id to a CPU
Specify targeted clos id with [--clos|-c]
For example to associate CPU 10 to CLOS group 3, execute::
# intel-speed-select -c 10 core-power assoc -c 3
Intel(R) Speed Select Technology
Executing on CPU model: X
package-0
die-0
cpu-10
core-power
assoc:success
Once a CPU is associated, its sibling CPUs are also associated to a CLOS group.
Once associated, avoid changing Linux "cpufreq" subsystem scaling frequency
limits.
To check the existing association for a CPU, "core-power get-assoc" command can
be used. For example, to get association of CPU 10, execute::
# intel-speed-select -c 10 core-power get-assoc
Intel(R) Speed Select Technology
Executing on CPU model: X
package-1
die-0
cpu-10
get-assoc
clos:3
This shows that CPU 10 is part of a CLOS group 3.
Disable CLOS based prioritization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To disable, execute::
# intel-speed-select core-power disable
Some features like Intel(R) SST-TF can only be enabled when CLOS based prioritization
is enabled. For this reason, disabling while Intel(R) SST-TF is enabled can cause
Intel(R) SST-TF to fail. This will cause the "disable" command to display an error
if Intel(R) SST-TF is already enabled. In turn, to disable, the Intel(R) SST-TF
feature must be disabled first.
Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF)
-------------------------------------------------------------------
The Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF) feature lets
the user control base frequency. If some critical workload threads demand
constant high guaranteed performance, then this feature can be used to execute
the thread at higher base frequency on specific sets of CPUs (high priority
CPUs) at the cost of lower base frequency (low priority CPUs) on other CPUs.
This feature does not require offline of the low priority CPUs.
The support of Intel(R) SST-BF depends on the Intel(R) Speed Select Technology -
Performance Profile (Intel(R) SST-PP) performance level configuration. It is
possible that only certain performance levels support Intel(R) SST-BF. It is also
possible that only base performance level (level = 0) has support of Intel
SST-BF. Consequently, first select the desired performance level to enable this
feature.
In the system under test here, Intel(R) SST-BF is supported at the base
performance level 0, but currently disabled. For example for the level 0::
# intel-speed-select -c 0 perf-profile info -l 0
Intel(R) Speed Select Technology
Executing on CPU model: X
package-0
die-0
cpu-0
perf-profile-level-0
...
speed-select-base-freq:disabled
...
Before enabling Intel(R) SST-BF and measuring its impact on a workload
performance, execute some workload and measure performance and get a baseline
performance to compare against.
Here the user wants more guaranteed performance. For this reason, it is likely
that turbo is disabled. To disable turbo, execute::
#echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
Based on the output of the "intel-speed-select perf-profile info -l 0" base
frequency of guaranteed frequency 2600 MHz.
Measure baseline performance for comparison
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To compare, pick a multi-threaded workload where each thread can be scheduled on
separate CPUs. "Hackbench pipe" test is a good example on how to improve
performance using Intel(R) SST-BF.
Below, the workload is measuring average scheduler wakeup latency, so a lower
number means better performance::
# taskset -c 3,4 perf bench -r 100 sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 6.102 [sec]
6.102445 usecs/op
163868 ops/sec
While running the above test, if we take turbostat output, it will show us that
2 of the CPUs are busy and reaching max. frequency (which would be the base
frequency as the turbo is disabled). The turbostat output::
#turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
Package Core CPU Bzy_MHz
0 0 0 1000
0 1 1 1005
0 2 2 1000
0 3 3 2600
0 4 4 2600
0 5 5 1000
0 6 6 1000
0 7 7 1005
0 8 8 1005
0 9 9 1000
0 10 10 1000
0 11 11 995
0 12 12 1000
0 13 13 1000
From the above turbostat output, both CPU 3 and 4 are very busy and reaching
full guaranteed frequency of 2600 MHz.
Intel(R) SST-BF Capabilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To get capabilities of Intel(R) SST-BF for the current performance level 0,
execute::
# intel-speed-select base-freq info -l 0
Intel(R) Speed Select Technology
Executing on CPU model: X
package-0
die-0
cpu-0
speed-select-base-freq
high-priority-base-frequency(MHz):3000
high-priority-cpu-mask:00000216,00002160
high-priority-cpu-list:5,6,8,13,33,34,36,41
low-priority-base-frequency(MHz):2400
tjunction-temperature(C):125
thermal-design-power(W):205
The above capabilities show that there are some CPUs on this system that can
offer base frequency of 3000 MHz compared to the standard base frequency at this
performance levels. Nevertheless, these CPUs are fixed, and they are presented
via high-priority-cpu-list/high-priority-cpu-mask. But if this Intel(R) SST-BF
feature is selected, the low priorities CPUs (which are not in
high-priority-cpu-list) can only offer up to 2400 MHz. As a result, if this
clipping of low priority CPUs is acceptable, then the user can enable Intel
SST-BF feature particularly for the above "sched pipe" workload since only two
CPUs are used, they can be scheduled on high priority CPUs and can get boost of
400 MHz.
Enable Intel(R) SST-BF
~~~~~~~~~~~~~~~~~~~~~~
To enable Intel(R) SST-BF feature, execute::
# intel-speed-select base-freq enable -a
Intel(R) Speed Select Technology
Executing on CPU model: X
package-0
die-0
cpu-0
base-freq
enable:success
package-1
die-0
cpu-14
base-freq
enable:success
In this case, -a option is optional. This not only enables Intel(R) SST-BF, but it
also adjusts the priority of cores using Intel(R) Speed Select Technology Core
Power (Intel(R) SST-CP) features. This option sets the minimum performance of each
Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP) class to
maximum performance so that the hardware will give maximum performance possible
for each CPU.
If -a option is not used, then the following steps are required before enabling
Intel(R) SST-BF:
- Discover Intel(R) SST-BF and note low and high priority base frequency
- Note the high prioity CPU list
- Enable CLOS using core-power feature set
- Configure CLOS parameters. Use CLOS.min to set to minimum performance
- Subscribe desired CPUs to CLOS groups
With this configuration, if the same workload is executed by pinning the
workload to high priority CPUs (CPU 5 and 6 in this case)::
#taskset -c 5,6 perf bench -r 100 sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 5.627 [sec]
5.627922 usecs/op
177685 ops/sec
This way, by enabling Intel(R) SST-BF, the performance of this benchmark is
improved (latency reduced) by 7.79%. From the turbostat output, it can be
observed that the high priority CPUs reached 3000 MHz compared to 2600 MHz.
The turbostat output::
#turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
Package Core CPU Bzy_MHz
0 0 0 2151
0 1 1 2166
0 2 2 2175
0 3 3 2175
0 4 4 2175
0 5 5 3000
0 6 6 3000
0 7 7 2180
0 8 8 2662
0 9 9 2176
0 10 10 2175
0 11 11 2176
0 12 12 2176
0 13 13 2661
Disable Intel(R) SST-BF
~~~~~~~~~~~~~~~~~~~~~~~
To disable the Intel(R) SST-BF feature, execute::
# intel-speed-select base-freq disable -a
Intel(R) Speed Select Technology - Turbo Frequency (Intel(R) SST-TF)
--------------------------------------------------------------------
This feature enables the ability to set different "All core turbo ratio limits"
to cores based on the priority. By using this feature, some cores can be
configured to get higher turbo frequency by designating them as high priority at
the cost of lower or no turbo frequency on the low priority cores.
For this reason, this feature is only useful when system is busy utilizing all
CPUs, but the user wants some configurable option to get high performance on
some CPUs.
The support of Intel(R) Speed Select Technology - Turbo Frequency (Intel(R) SST-TF)
depends on the Intel(R) Speed Select Technology - Performance Profile (Intel
SST-PP) performance level configuration. It is possible that only a certain
performance level supports Intel(R) SST-TF. It is also possible that only the base
performance level (level = 0) has the support of Intel(R) SST-TF. Hence, first
select the desired performance level to enable this feature.
In the system under test here, Intel(R) SST-TF is supported at the base
performance level 0, but currently disabled::
# intel-speed-select -c 0 perf-profile info -l 0
Intel(R) Speed Select Technology
package-0
die-0
cpu-0
perf-profile-level-0
...
...
speed-select-turbo-freq:disabled
...
...
To check if performance can be improved using Intel(R) SST-TF feature, get the turbo
frequency properties with Intel(R) SST-TF enabled and compare to the base turbo
capability of this system.
Get Base turbo capability
~~~~~~~~~~~~~~~~~~~~~~~~~
To get the base turbo capability of performance level 0, execute::
# intel-speed-select perf-profile info -l 0
Intel(R) Speed Select Technology
Executing on CPU model: X
package-0
die-0
cpu-0
perf-profile-level-0
...
...
turbo-ratio-limits-sse
bucket-0
core-count:2
max-turbo-frequency(MHz):3200
bucket-1
core-count:4
max-turbo-frequency(MHz):3100
bucket-2
core-count:6
max-turbo-frequency(MHz):3100
bucket-3
core-count:8
max-turbo-frequency(MHz):3100
bucket-4
core-count:10
max-turbo-frequency(MHz):3100
bucket-5
core-count:12
max-turbo-frequency(MHz):3100
bucket-6
core-count:14
max-turbo-frequency(MHz):3100
bucket-7
core-count:16
max-turbo-frequency(MHz):3100
Based on the data above, when all the CPUS are busy, the max. frequency of 3100
MHz can be achieved. If there is some busy workload on cpu 0 - 11 (e.g. stress)
and on CPU 12 and 13, execute "hackbench pipe" workload::
# taskset -c 12,13 perf bench -r 100 sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 5.705 [sec]
5.705488 usecs/op
175269 ops/sec
The turbostat output::
#turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
Package Core CPU Bzy_MHz
0 0 0 3000
0 1 1 3000
0 2 2 3000
0 3 3 3000
0 4 4 3000
0 5 5 3100
0 6 6 3100
0 7 7 3000
0 8 8 3100
0 9 9 3000
0 10 10 3000
0 11 11 3000
0 12 12 3100
0 13 13 3100
Based on turbostat output, the performance is limited by frequency cap of 3100
MHz. To check if the hackbench performance can be improved for CPU 12 and CPU
13, first check the capability of the Intel(R) SST-TF feature for this performance
level.
Get Intel(R) SST-TF Capability
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To get the capability, the "turbo-freq info" command can be used::
# intel-speed-select turbo-freq info -l 0
Intel(R) Speed Select Technology
Executing on CPU model: X
package-0
die-0
cpu-0
speed-select-turbo-freq
bucket-0
high-priority-cores-count:2
high-priority-max-frequency(MHz):3200
high-priority-max-avx2-frequency(MHz):3200
high-priority-max-avx512-frequency(MHz):3100
bucket-1
high-priority-cores-count:4
high-priority-max-frequency(MHz):3100
high-priority-max-avx2-frequency(MHz):3000
high-priority-max-avx512-frequency(MHz):2900
bucket-2
high-priority-cores-count:6
high-priority-max-frequency(MHz):3100
high-priority-max-avx2-frequency(MHz):3000
high-priority-max-avx512-frequency(MHz):2900
speed-select-turbo-freq-clip-frequencies
low-priority-max-frequency(MHz):2600
low-priority-max-avx2-frequency(MHz):2400
low-priority-max-avx512-frequency(MHz):2100
Based on the output above, there is an Intel(R) SST-TF bucket for which there are
two high priority cores. If only two high priority cores are set, then max.
turbo frequency on those cores can be increased to 3200 MHz. This is 100 MHz
more than the base turbo capability for all cores.
In turn, for the hackbench workload, two CPUs can be set as high priority and
rest as low priority. One side effect is that once enabled, the low priority
cores will be clipped to a lower frequency of 2600 MHz.
Enable Intel(R) SST-TF
~~~~~~~~~~~~~~~~~~~~~~
To enable Intel(R) SST-TF, execute::
# intel-speed-select -c 12,13 turbo-freq enable -a
Intel(R) Speed Select Technology
Executing on CPU model: X
package-0
die-0
cpu-12
turbo-freq
enable:success
package-0
die-0
cpu-13
turbo-freq
enable:success
package--1
die-0
cpu-63
turbo-freq --auto
enable:success
In this case, the option "-a" is optional. If set, it enables Intel(R) SST-TF
feature and also sets the CPUs to high and and low priority using Intel Speed
Select Technology Core Power (Intel(R) SST-CP) features. The CPU numbers passed
with "-c" arguments are marked as high priority, including its siblings.
If -a option is not used, then the following steps are required before enabling
Intel(R) SST-TF:
- Discover Intel(R) SST-TF and note buckets of high priority cores and maximum frequency
- Enable CLOS using core-power feature set - Configure CLOS parameters
- Subscribe desired CPUs to CLOS groups making sure that high priority cores are set to the maximum frequency
If the same hackbench workload is executed, schedule hackbench threads on high
priority CPUs::
#taskset -c 12,13 perf bench -r 100 sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 5.510 [sec]
5.510165 usecs/op
180826 ops/sec
This improved performance by around 3.3% improvement on a busy system. Here the
turbostat output will show that the CPU 12 and CPU 13 are getting 100 MHz boost.
The turbostat output::
#turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
Package Core CPU Bzy_MHz
...
0 12 12 3200
0 13 13 3200

View File

@ -62,9 +62,10 @@ on the capabilities of the processor.
Active Mode
-----------
This is the default operation mode of ``intel_pstate``. If it works in this
mode, the ``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq``
policies contains the string "intel_pstate".
This is the default operation mode of ``intel_pstate`` for processors with
hardware-managed P-states (HWP) support. If it works in this mode, the
``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` policies
contains the string "intel_pstate".
In this mode the driver bypasses the scaling governors layer of ``CPUFreq`` and
provides its own scaling algorithms for P-state selection. Those algorithms
@ -138,12 +139,13 @@ internal P-state selection logic to be less performance-focused.
Active Mode Without HWP
~~~~~~~~~~~~~~~~~~~~~~~
This is the default operation mode for processors that do not support the HWP
feature. It also is used by default with the ``intel_pstate=no_hwp`` argument
in the kernel command line. However, in this mode ``intel_pstate`` may refuse
to work with the given processor if it does not recognize it. [Note that
``intel_pstate`` will never refuse to work with any processor with the HWP
feature enabled.]
This operation mode is optional for processors that do not support the HWP
feature or when the ``intel_pstate=no_hwp`` argument is passed to the kernel in
the command line. The active mode is used in those cases if the
``intel_pstate=active`` argument is passed to the kernel in the command line.
In this mode ``intel_pstate`` may refuse to work with processors that are not
recognized by it. [Note that ``intel_pstate`` will never refuse to work with
any processor with the HWP feature enabled.]
In this mode ``intel_pstate`` registers utilization update callbacks with the
CPU scheduler in order to run a P-state selection algorithm, either
@ -188,10 +190,14 @@ is not set.
Passive Mode
------------
This mode is used if the ``intel_pstate=passive`` argument is passed to the
kernel in the command line (it implies the ``intel_pstate=no_hwp`` setting too).
Like in the active mode without HWP support, in this mode ``intel_pstate`` may
refuse to work with the given processor if it does not recognize it.
This is the default operation mode of ``intel_pstate`` for processors without
hardware-managed P-states (HWP) support. It is always used if the
``intel_pstate=passive`` argument is passed to the kernel in the command line
regardless of whether or not the given processor supports HWP. [Note that the
``intel_pstate=no_hwp`` setting implies ``intel_pstate=passive`` if it is used
without ``intel_pstate=active``.] Like in the active mode without HWP support,
in this mode ``intel_pstate`` may refuse to work with processors that are not
recognized by it.
If the driver works in this mode, the ``scaling_driver`` policy attribute in
``sysfs`` for all ``CPUFreq`` policies contains the string "intel_cpufreq".

View File

@ -13,3 +13,4 @@ Working-State Power Management
intel_pstate
cpufreq_drivers
intel_epb
intel-speed-select

View File

@ -68,9 +68,8 @@ only one in the list (that is, the list was empty before) or the value of its
governor currently in use, or the name of the new governor was passed to the
kernel as the value of the ``cpuidle.governor=`` command line parameter, the new
governor will be used from that point on (there can be only one ``CPUIdle``
governor in use at a time). Also, if ``cpuidle_sysfs_switch`` is passed to the
kernel in the command line, user space can choose the ``CPUIdle`` governor to
use at run time via ``sysfs``.
governor in use at a time). Also, user space can choose the ``CPUIdle``
governor to use at run time via ``sysfs``.
Once registered, ``CPUIdle`` governors cannot be unregistered, so it is not
practical to put them into loadable kernel modules.

View File

@ -349,7 +349,7 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
PM core will skip the ``suspend``, ``suspend_late`` and
``suspend_noirq`` phases as well as all of the corresponding phases of
the subsequent device resume for all of these devices. In that case,
the ``->complete`` callback will be invoked directly after the
the ``->complete`` callback will be the next one invoked after the
``->prepare`` callback and is entirely responsible for putting the
device into a consistent state as appropriate.
@ -361,9 +361,9 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
runtime PM disabled.
This feature also can be controlled by device drivers by using the
``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power
management flags. [Typically, they are set at the time the driver is
probed against the device in question by passing them to the
``DPM_FLAG_NO_DIRECT_COMPLETE`` and ``DPM_FLAG_SMART_PREPARE`` driver
power management flags. [Typically, they are set at the time the driver
is probed against the device in question by passing them to the
:c:func:`dev_pm_set_driver_flags` helper function.] If the first of
these flags is set, the PM core will not apply the direct-complete
procedure described above to the given device and, consequenty, to any
@ -383,11 +383,15 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
``->suspend`` methods provided by subsystems (bus types and PM domains
in particular) must follow an additional rule regarding what can be done
to the devices before their drivers' ``->suspend`` methods are called.
Namely, they can only resume the devices from runtime suspend by
calling :c:func:`pm_runtime_resume` for them, if that is necessary, and
Namely, they may resume the devices from runtime suspend by
calling :c:func:`pm_runtime_resume` for them, if that is necessary, but
they must not update the state of the devices in any other way at that
time (in case the drivers need to resume the devices from runtime
suspend in their ``->suspend`` methods).
suspend in their ``->suspend`` methods). In fact, the PM core prevents
subsystems or drivers from putting devices into runtime suspend at
these times by calling :c:func:`pm_runtime_get_noresume` before issuing
the ``->prepare`` callback (and calling :c:func:`pm_runtime_put` after
issuing the ``->complete`` callback).
3. For a number of devices it is convenient to split suspend into the
"quiesce device" and "save device state" phases, in which cases
@ -459,22 +463,22 @@ When resuming from freeze, standby or memory sleep, the phases are:
Note, however, that new children may be registered below the device as
soon as the ``->resume`` callbacks occur; it's not necessary to wait
until the ``complete`` phase with that.
until the ``complete`` phase runs.
Moreover, if the preceding ``->prepare`` callback returned a positive
number, the device may have been left in runtime suspend throughout the
whole system suspend and resume (the ``suspend``, ``suspend_late``,
``suspend_noirq`` phases of system suspend and the ``resume_noirq``,
``resume_early``, ``resume`` phases of system resume may have been
skipped for it). In that case, the ``->complete`` callback is entirely
whole system suspend and resume (its ``->suspend``, ``->suspend_late``,
``->suspend_noirq``, ``->resume_noirq``,
``->resume_early``, and ``->resume`` callbacks may have been
skipped). In that case, the ``->complete`` callback is entirely
responsible for putting the device into a consistent state after system
suspend if necessary. [For example, it may need to queue up a runtime
resume request for the device for this purpose.] To check if that is
the case, the ``->complete`` callback can consult the device's
``power.direct_complete`` flag. Namely, if that flag is set when the
``->complete`` callback is being run, it has been called directly after
the preceding ``->prepare`` and special actions may be required
to make the device work correctly afterward.
``power.direct_complete`` flag. If that flag is set when the
``->complete`` callback is being run then the direct-complete mechanism
was used, and special actions may be required to make the device work
correctly afterward.
At the end of these phases, drivers should be as functional as they were before
suspending: I/O can be performed using DMA and IRQs, and the relevant clocks are
@ -575,10 +579,12 @@ and the phases are similar.
The ``->poweroff``, ``->poweroff_late`` and ``->poweroff_noirq`` callbacks
should do essentially the same things as the ``->suspend``, ``->suspend_late``
and ``->suspend_noirq`` callbacks, respectively. The only notable difference is
and ``->suspend_noirq`` callbacks, respectively. A notable difference is
that they need not store the device register values, because the registers
should already have been stored during the ``freeze``, ``freeze_late`` or
``freeze_noirq`` phases.
``freeze_noirq`` phases. Also, on many machines the firmware will power-down
the entire system, so it is not necessary for the callback to put the device in
a low-power state.
Leaving Hibernation
@ -764,70 +770,119 @@ device driver in question.
If it is necessary to resume a device from runtime suspend during a system-wide
transition into a sleep state, that can be done by calling
:c:func:`pm_runtime_resume` for it from the ``->suspend`` callback (or its
couterpart for transitions related to hibernation) of either the device's driver
or a subsystem responsible for it (for example, a bus type or a PM domain).
That is guaranteed to work by the requirement that subsystems must not change
the state of devices (possibly except for resuming them from runtime suspend)
:c:func:`pm_runtime_resume` from the ``->suspend`` callback (or the ``->freeze``
or ``->poweroff`` callback for transitions related to hibernation) of either the
device's driver or its subsystem (for example, a bus type or a PM domain).
However, subsystems must not otherwise change the runtime status of devices
from their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before*
invoking device drivers' ``->suspend`` callbacks (or equivalent).
.. _smart_suspend_flag:
The ``DPM_FLAG_SMART_SUSPEND`` Driver Flag
------------------------------------------
Some bus types and PM domains have a policy to resume all devices from runtime
suspend upfront in their ``->suspend`` callbacks, but that may not be really
necessary if the driver of the device can cope with runtime-suspended devices.
The driver can indicate that by setting ``DPM_FLAG_SMART_SUSPEND`` in
:c:member:`power.driver_flags` at the probe time, by passing it to the
:c:func:`dev_pm_set_driver_flags` helper. That also may cause middle-layer code
necessary if the device's driver can cope with runtime-suspended devices.
The driver can indicate this by setting ``DPM_FLAG_SMART_SUSPEND`` in
:c:member:`power.driver_flags` at probe time, with the assistance of the
:c:func:`dev_pm_set_driver_flags` helper routine.
Setting that flag causes the PM core and middle-layer code
(bus types, PM domains etc.) to skip the ``->suspend_late`` and
``->suspend_noirq`` callbacks provided by the driver if the device remains in
runtime suspend at the beginning of the ``suspend_late`` phase of system-wide
suspend (or in the ``poweroff_late`` phase of hibernation), when runtime PM
has been disabled for it, under the assumption that its state should not change
after that point until the system-wide transition is over (the PM core itself
does that for devices whose "noirq", "late" and "early" system-wide PM callbacks
are executed directly by it). If that happens, the driver's system-wide resume
callbacks, if present, may still be invoked during the subsequent system-wide
resume transition and the device's runtime power management status may be set
to "active" before enabling runtime PM for it, so the driver must be prepared to
cope with the invocation of its system-wide resume callbacks back-to-back with
its ``->runtime_suspend`` one (without the intervening ``->runtime_resume`` and
so on) and the final state of the device must reflect the "active" runtime PM
status in that case.
runtime suspend throughout those phases of the system-wide suspend (and
similarly for the "freeze" and "poweroff" parts of system hibernation).
[Otherwise the same driver
callback might be executed twice in a row for the same device, which would not
be valid in general.] If the middle-layer system-wide PM callbacks are present
for the device then they are responsible for skipping these driver callbacks;
if not then the PM core skips them. The subsystem callback routines can
determine whether they need to skip the driver callbacks by testing the return
value from the :c:func:`dev_pm_skip_suspend` helper function.
In addition, with ``DPM_FLAG_SMART_SUSPEND`` set, the driver's ``->thaw_noirq``
and ``->thaw_early`` callbacks are skipped in hibernation if the device remained
in runtime suspend throughout the preceding "freeze" transition. Again, if the
middle-layer callbacks are present for the device, they are responsible for
doing this, otherwise the PM core takes care of it.
The ``DPM_FLAG_MAY_SKIP_RESUME`` Driver Flag
--------------------------------------------
During system-wide resume from a sleep state it's easiest to put devices into
the full-power state, as explained in :file:`Documentation/power/runtime_pm.rst`.
[Refer to that document for more information regarding this particular issue as
well as for information on the device runtime power management framework in
general.]
However, it often is desirable to leave devices in suspend after system
transitions to the working state, especially if those devices had been in
general.] However, it often is desirable to leave devices in suspend after
system transitions to the working state, especially if those devices had been in
runtime suspend before the preceding system-wide suspend (or analogous)
transition. Device drivers can use the ``DPM_FLAG_LEAVE_SUSPENDED`` flag to
indicate to the PM core (and middle-layer code) that they prefer the specific
devices handled by them to be left suspended and they have no problems with
skipping their system-wide resume callbacks for this reason. Whether or not the
devices will actually be left in suspend may depend on their state before the
given system suspend-resume cycle and on the type of the system transition under
way. In particular, devices are not left suspended if that transition is a
restore from hibernation, as device states are not guaranteed to be reflected
by the information stored in the hibernation image in that case.
transition.
The middle-layer code involved in the handling of the device is expected to
indicate to the PM core if the device may be left in suspend by setting its
:c:member:`power.may_skip_resume` status bit which is checked by the PM core
during the "noirq" phase of the preceding system-wide suspend (or analogous)
transition. The middle layer is then responsible for handling the device as
appropriate in its "noirq" resume callback, which is executed regardless of
whether or not the device is left suspended, but the other resume callbacks
(except for ``->complete``) will be skipped automatically by the PM core if the
device really can be left in suspend.
To that end, device drivers can use the ``DPM_FLAG_MAY_SKIP_RESUME`` flag to
indicate to the PM core and middle-layer code that they allow their "noirq" and
"early" resume callbacks to be skipped if the device can be left in suspend
after system-wide PM transitions to the working state. Whether or not that is
the case generally depends on the state of the device before the given system
suspend-resume cycle and on the type of the system transition under way.
In particular, the "thaw" and "restore" transitions related to hibernation are
not affected by ``DPM_FLAG_MAY_SKIP_RESUME`` at all. [All callbacks are
issued during the "restore" transition regardless of the flag settings,
and whether or not any driver callbacks
are skipped during the "thaw" transition depends whether or not the
``DPM_FLAG_SMART_SUSPEND`` flag is set (see `above <smart_suspend_flag_>`_).
In addition, a device is not allowed to remain in runtime suspend if any of its
children will be returned to full power.]
For devices whose "noirq", "late" and "early" driver callbacks are invoked
directly by the PM core, all of the system-wide resume callbacks are skipped if
``DPM_FLAG_LEAVE_SUSPENDED`` is set and the device is in runtime suspend during
the ``suspend_noirq`` (or analogous) phase or the transition under way is a
proper system suspend (rather than anything related to hibernation) and the
device's wakeup settings are suitable for runtime PM (that is, it cannot
generate wakeup signals at all or it is allowed to wake up the system from
sleep).
The ``DPM_FLAG_MAY_SKIP_RESUME`` flag is taken into account in combination with
the :c:member:`power.may_skip_resume` status bit set by the PM core during the
"suspend" phase of suspend-type transitions. If the driver or the middle layer
has a reason to prevent the driver's "noirq" and "early" resume callbacks from
being skipped during the subsequent system resume transition, it should
clear :c:member:`power.may_skip_resume` in its ``->suspend``, ``->suspend_late``
or ``->suspend_noirq`` callback. [Note that the drivers setting
``DPM_FLAG_SMART_SUSPEND`` need to clear :c:member:`power.may_skip_resume` in
their ``->suspend`` callback in case the other two are skipped.]
Setting the :c:member:`power.may_skip_resume` status bit along with the
``DPM_FLAG_MAY_SKIP_RESUME`` flag is necessary, but generally not sufficient,
for the driver's "noirq" and "early" resume callbacks to be skipped. Whether or
not they should be skipped can be determined by evaluating the
:c:func:`dev_pm_skip_resume` helper function.
If that function returns ``true``, the driver's "noirq" and "early" resume
callbacks should be skipped and the device's runtime PM status will be set to
"suspended" by the PM core. Otherwise, if the device was runtime-suspended
during the preceding system-wide suspend transition and its
``DPM_FLAG_SMART_SUSPEND`` is set, its runtime PM status will be set to
"active" by the PM core. [Hence, the drivers that do not set
``DPM_FLAG_SMART_SUSPEND`` should not expect the runtime PM status of their
devices to be changed from "suspended" to "active" by the PM core during
system-wide resume-type transitions.]
If the ``DPM_FLAG_MAY_SKIP_RESUME`` flag is not set for a device, but
``DPM_FLAG_SMART_SUSPEND`` is set and the driver's "late" and "noirq" suspend
callbacks are skipped, its system-wide "noirq" and "early" resume callbacks, if
present, are invoked as usual and the device's runtime PM status is set to
"active" by the PM core before enabling runtime PM for it. In that case, the
driver must be prepared to cope with the invocation of its system-wide resume
callbacks back-to-back with its ``->runtime_suspend`` one (without the
intervening ``->runtime_resume`` and system-wide suspend callbacks) and the
final state of the device must reflect the "active" runtime PM status in that
case. [Note that this is not a problem at all if the driver's
``->suspend_late`` callback pointer points to the same function as its
``->runtime_suspend`` one and its ``->resume_early`` callback pointer points to
the same function as the ``->runtime_resume`` one, while none of the other
system-wide suspend-resume callbacks of the driver are present, for example.]
Likewise, if ``DPM_FLAG_MAY_SKIP_RESUME`` is set for a device, its driver's
system-wide "noirq" and "early" resume callbacks may be skipped while its "late"
and "noirq" suspend callbacks may have been executed (in principle, regardless
of whether or not ``DPM_FLAG_SMART_SUSPEND`` is set). In that case, the driver
needs to be able to cope with the invocation of its ``->runtime_resume``
callback back-to-back with its "late" and "noirq" suspend ones. [For instance,
that is not a concern if the driver sets both ``DPM_FLAG_SMART_SUSPEND`` and
``DPM_FLAG_MAY_SKIP_RESUME`` and uses the same pair of suspend/resume callback
functions for runtime PM and system-wide suspend/resume.]

View File

@ -1004,41 +1004,39 @@ including the PCI bus type. The flags should be set once at the driver probe
time with the help of the dev_pm_set_driver_flags() function and they should not
be updated directly afterwards.
The DPM_FLAG_NEVER_SKIP flag prevents the PM core from using the direct-complete
mechanism allowing device suspend/resume callbacks to be skipped if the device
is in runtime suspend when the system suspend starts. That also affects all of
the ancestors of the device, so this flag should only be used if absolutely
necessary.
The DPM_FLAG_NO_DIRECT_COMPLETE flag prevents the PM core from using the
direct-complete mechanism allowing device suspend/resume callbacks to be skipped
if the device is in runtime suspend when the system suspend starts. That also
affects all of the ancestors of the device, so this flag should only be used if
absolutely necessary.
The DPM_FLAG_SMART_PREPARE flag instructs the PCI bus type to only return a
positive value from pci_pm_prepare() if the ->prepare callback provided by the
The DPM_FLAG_SMART_PREPARE flag causes the PCI bus type to return a positive
value from pci_pm_prepare() only if the ->prepare callback provided by the
driver of the device returns a positive value. That allows the driver to opt
out from using the direct-complete mechanism dynamically.
out from using the direct-complete mechanism dynamically (whereas setting
DPM_FLAG_NO_DIRECT_COMPLETE means permanent opt-out).
The DPM_FLAG_SMART_SUSPEND flag tells the PCI bus type that from the driver's
perspective the device can be safely left in runtime suspend during system
suspend. That causes pci_pm_suspend(), pci_pm_freeze() and pci_pm_poweroff()
to skip resuming the device from runtime suspend unless there are PCI-specific
reasons for doing that. Also, it causes pci_pm_suspend_late/noirq(),
pci_pm_freeze_late/noirq() and pci_pm_poweroff_late/noirq() to return early
if the device remains in runtime suspend in the beginning of the "late" phase
of the system-wide transition under way. Moreover, if the device is in
runtime suspend in pci_pm_resume_noirq() or pci_pm_restore_noirq(), its runtime
power management status will be changed to "active" (as it is going to be put
into D0 going forward), but if it is in runtime suspend in pci_pm_thaw_noirq(),
the function will set the power.direct_complete flag for it (to make the PM core
skip the subsequent "thaw" callbacks for it) and return.
to avoid resuming the device from runtime suspend unless there are PCI-specific
reasons for doing that. Also, it causes pci_pm_suspend_late/noirq() and
pci_pm_poweroff_late/noirq() to return early if the device remains in runtime
suspend during the "late" phase of the system-wide transition under way.
Moreover, if the device is in runtime suspend in pci_pm_resume_noirq() or
pci_pm_restore_noirq(), its runtime PM status will be changed to "active" (as it
is going to be put into D0 going forward).
Setting the DPM_FLAG_LEAVE_SUSPENDED flag means that the driver prefers the
device to be left in suspend after system-wide transitions to the working state.
This flag is checked by the PM core, but the PCI bus type informs the PM core
which devices may be left in suspend from its perspective (that happens during
the "noirq" phase of system-wide suspend and analogous transitions) and next it
uses the dev_pm_may_skip_resume() helper to decide whether or not to return from
pci_pm_resume_noirq() early, as the PM core will skip the remaining resume
callbacks for the device during the transition under way and will set its
runtime PM status to "suspended" if dev_pm_may_skip_resume() returns "true" for
it.
Setting the DPM_FLAG_MAY_SKIP_RESUME flag means that the driver allows its
"noirq" and "early" resume callbacks to be skipped if the device can be left
in suspend after a system-wide transition into the working state. This flag is
taken into consideration by the PM core along with the power.may_skip_resume
status bit of the device which is set by pci_pm_suspend_noirq() in certain
situations. If the PM core determines that the driver's "noirq" and "early"
resume callbacks should be skipped, the dev_pm_skip_resume() helper function
will return "true" and that will cause pci_pm_resume_noirq() and
pci_pm_resume_early() to return upfront without touching the device and
executing the driver callbacks.
3.2. Device Runtime Power Management
------------------------------------

View File

@ -2237,6 +2237,7 @@ F: drivers/*/qcom*
F: drivers/*/qcom/
F: drivers/bluetooth/btqcomsmd.c
F: drivers/clocksource/timer-qcom.c
F: drivers/cpuidle/cpuidle-qcom-spm.c
F: drivers/extcon/extcon-qcom*
F: drivers/i2c/busses/i2c-qcom-geni.c
F: drivers/i2c/busses/i2c-qup.c

View File

@ -1041,7 +1041,7 @@ static int acpi_lpss_do_suspend_late(struct device *dev)
{
int ret;
if (dev_pm_smart_suspend_and_suspended(dev))
if (dev_pm_skip_suspend(dev))
return 0;
ret = pm_generic_suspend_late(dev);
@ -1093,6 +1093,9 @@ static int acpi_lpss_resume_early(struct device *dev)
if (pdata->dev_desc->resume_from_noirq)
return 0;
if (dev_pm_skip_resume(dev))
return 0;
return acpi_lpss_do_resume_early(dev);
}
@ -1102,12 +1105,9 @@ static int acpi_lpss_resume_noirq(struct device *dev)
int ret;
/* Follow acpi_subsys_resume_noirq(). */
if (dev_pm_may_skip_resume(dev))
if (dev_pm_skip_resume(dev))
return 0;
if (dev_pm_smart_suspend_and_suspended(dev))
pm_runtime_set_active(dev);
ret = pm_generic_resume_noirq(dev);
if (ret)
return ret;
@ -1169,7 +1169,7 @@ static int acpi_lpss_poweroff_late(struct device *dev)
{
struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev));
if (dev_pm_smart_suspend_and_suspended(dev))
if (dev_pm_skip_suspend(dev))
return 0;
if (pdata->dev_desc->resume_from_noirq)
@ -1182,7 +1182,7 @@ static int acpi_lpss_poweroff_noirq(struct device *dev)
{
struct lpss_private_data *pdata = acpi_driver_data(ACPI_COMPANION(dev));
if (dev_pm_smart_suspend_and_suspended(dev))
if (dev_pm_skip_suspend(dev))
return 0;
if (pdata->dev_desc->resume_from_noirq) {

View File

@ -624,7 +624,7 @@ static int acpi_tad_probe(struct platform_device *pdev)
*/
device_init_wakeup(dev, true);
dev_pm_set_driver_flags(dev, DPM_FLAG_SMART_SUSPEND |
DPM_FLAG_LEAVE_SUSPENDED);
DPM_FLAG_MAY_SKIP_RESUME);
/*
* The platform bus type layer tells the ACPI PM domain powers up the
* device, so set the runtime PM status of it to "active".

View File

@ -1084,7 +1084,7 @@ int acpi_subsys_suspend_late(struct device *dev)
{
int ret;
if (dev_pm_smart_suspend_and_suspended(dev))
if (dev_pm_skip_suspend(dev))
return 0;
ret = pm_generic_suspend_late(dev);
@ -1100,10 +1100,8 @@ int acpi_subsys_suspend_noirq(struct device *dev)
{
int ret;
if (dev_pm_smart_suspend_and_suspended(dev)) {
dev->power.may_skip_resume = true;
if (dev_pm_skip_suspend(dev))
return 0;
}
ret = pm_generic_suspend_noirq(dev);
if (ret)
@ -1116,8 +1114,8 @@ int acpi_subsys_suspend_noirq(struct device *dev)
* acpi_subsys_complete() to take care of fixing up the device's state
* anyway, if need be.
*/
dev->power.may_skip_resume = device_may_wakeup(dev) ||
!device_can_wakeup(dev);
if (device_can_wakeup(dev) && !device_may_wakeup(dev))
dev->power.may_skip_resume = false;
return 0;
}
@ -1129,17 +1127,9 @@ EXPORT_SYMBOL_GPL(acpi_subsys_suspend_noirq);
*/
static int acpi_subsys_resume_noirq(struct device *dev)
{
if (dev_pm_may_skip_resume(dev))
if (dev_pm_skip_resume(dev))
return 0;
/*
* Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend
* during system suspend, so update their runtime PM status to "active"
* as they will be put into D0 going forward.
*/
if (dev_pm_smart_suspend_and_suspended(dev))
pm_runtime_set_active(dev);
return pm_generic_resume_noirq(dev);
}
@ -1153,7 +1143,12 @@ static int acpi_subsys_resume_noirq(struct device *dev)
*/
static int acpi_subsys_resume_early(struct device *dev)
{
int ret = acpi_dev_resume(dev);
int ret;
if (dev_pm_skip_resume(dev))
return 0;
ret = acpi_dev_resume(dev);
return ret ? ret : pm_generic_resume_early(dev);
}
@ -1218,7 +1213,7 @@ static int acpi_subsys_poweroff_late(struct device *dev)
{
int ret;
if (dev_pm_smart_suspend_and_suspended(dev))
if (dev_pm_skip_suspend(dev))
return 0;
ret = pm_generic_poweroff_late(dev);
@ -1234,7 +1229,7 @@ static int acpi_subsys_poweroff_late(struct device *dev)
*/
static int acpi_subsys_poweroff_noirq(struct device *dev)
{
if (dev_pm_smart_suspend_and_suspended(dev))
if (dev_pm_skip_suspend(dev))
return 0;
return pm_generic_poweroff_noirq(dev);

View File

@ -2017,7 +2017,7 @@ bool acpi_ec_dispatch_gpe(void)
*/
ret = acpi_dispatch_gpe(NULL, first_ec->gpe);
if (ret == ACPI_INTERRUPT_HANDLED) {
pm_pr_dbg("EC GPE dispatched\n");
pm_pr_dbg("ACPI EC GPE dispatched\n");
/* Flush the event and query workqueues. */
acpi_ec_flush_work();

View File

@ -992,23 +992,31 @@ static bool acpi_s2idle_wake(void)
* wakeup is pending anyway and the SCI is not the source of
* it).
*/
if (irqd_is_wakeup_armed(irq_get_irq_data(acpi_sci_irq)))
if (irqd_is_wakeup_armed(irq_get_irq_data(acpi_sci_irq))) {
pm_pr_dbg("Wakeup unrelated to ACPI SCI\n");
return true;
}
/*
* If the status bit of any enabled fixed event is set, the
* wakeup is regarded as valid.
*/
if (acpi_any_fixed_event_status_set())
if (acpi_any_fixed_event_status_set()) {
pm_pr_dbg("ACPI fixed event wakeup\n");
return true;
}
/* Check wakeups from drivers sharing the SCI. */
if (acpi_check_wakeup_handlers())
if (acpi_check_wakeup_handlers()) {
pm_pr_dbg("ACPI custom handler wakeup\n");
return true;
}
/* Check non-EC GPE wakeups and dispatch the EC GPE. */
if (acpi_ec_dispatch_gpe())
if (acpi_ec_dispatch_gpe()) {
pm_pr_dbg("ACPI non-EC GPE wakeup\n");
return true;
}
/*
* Cancel the SCI wakeup and process all pending events in case
@ -1027,8 +1035,10 @@ static bool acpi_s2idle_wake(void)
* are pending here, they must be resulting from the processing
* of EC events above or coming from somewhere else.
*/
if (pm_wakeup_pending())
if (pm_wakeup_pending()) {
pm_pr_dbg("Wakeup after ACPI Notify sync\n");
return true;
}
rearm_wake_irq(acpi_sci_irq);
}

View File

@ -562,72 +562,26 @@ static void dpm_watchdog_clear(struct dpm_watchdog *wd)
/*------------------------- Resume routines -------------------------*/
/**
* suspend_event - Return a "suspend" message for given "resume" one.
* @resume_msg: PM message representing a system-wide resume transition.
*/
static pm_message_t suspend_event(pm_message_t resume_msg)
{
switch (resume_msg.event) {
case PM_EVENT_RESUME:
return PMSG_SUSPEND;
case PM_EVENT_THAW:
case PM_EVENT_RESTORE:
return PMSG_FREEZE;
case PM_EVENT_RECOVER:
return PMSG_HIBERNATE;
}
return PMSG_ON;
}
/**
* dev_pm_may_skip_resume - System-wide device resume optimization check.
* dev_pm_skip_resume - System-wide device resume optimization check.
* @dev: Target device.
*
* Checks whether or not the device may be left in suspend after a system-wide
* transition to the working state.
* Return:
* - %false if the transition under way is RESTORE.
* - Return value of dev_pm_skip_suspend() if the transition under way is THAW.
* - The logical negation of %power.must_resume otherwise (that is, when the
* transition under way is RESUME).
*/
bool dev_pm_may_skip_resume(struct device *dev)
bool dev_pm_skip_resume(struct device *dev)
{
return !dev->power.must_resume && pm_transition.event != PM_EVENT_RESTORE;
if (pm_transition.event == PM_EVENT_RESTORE)
return false;
if (pm_transition.event == PM_EVENT_THAW)
return dev_pm_skip_suspend(dev);
return !dev->power.must_resume;
}
static pm_callback_t dpm_subsys_resume_noirq_cb(struct device *dev,
pm_message_t state,
const char **info_p)
{
pm_callback_t callback;
const char *info;
if (dev->pm_domain) {
info = "noirq power domain ";
callback = pm_noirq_op(&dev->pm_domain->ops, state);
} else if (dev->type && dev->type->pm) {
info = "noirq type ";
callback = pm_noirq_op(dev->type->pm, state);
} else if (dev->class && dev->class->pm) {
info = "noirq class ";
callback = pm_noirq_op(dev->class->pm, state);
} else if (dev->bus && dev->bus->pm) {
info = "noirq bus ";
callback = pm_noirq_op(dev->bus->pm, state);
} else {
return NULL;
}
if (info_p)
*info_p = info;
return callback;
}
static pm_callback_t dpm_subsys_suspend_noirq_cb(struct device *dev,
pm_message_t state,
const char **info_p);
static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev,
pm_message_t state,
const char **info_p);
/**
* device_resume_noirq - Execute a "noirq resume" callback for given device.
* @dev: Device to handle.
@ -639,8 +593,8 @@ static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev,
*/
static int device_resume_noirq(struct device *dev, pm_message_t state, bool async)
{
pm_callback_t callback;
const char *info;
pm_callback_t callback = NULL;
const char *info = NULL;
bool skip_resume;
int error = 0;
@ -656,37 +610,41 @@ static int device_resume_noirq(struct device *dev, pm_message_t state, bool asyn
if (!dpm_wait_for_superior(dev, async))
goto Out;
skip_resume = dev_pm_may_skip_resume(dev);
skip_resume = dev_pm_skip_resume(dev);
/*
* If the driver callback is skipped below or by the middle layer
* callback and device_resume_early() also skips the driver callback for
* this device later, it needs to appear as "suspended" to PM-runtime,
* so change its status accordingly.
*
* Otherwise, the device is going to be resumed, so set its PM-runtime
* status to "active", but do that only if DPM_FLAG_SMART_SUSPEND is set
* to avoid confusing drivers that don't use it.
*/
if (skip_resume)
pm_runtime_set_suspended(dev);
else if (dev_pm_skip_suspend(dev))
pm_runtime_set_active(dev);
callback = dpm_subsys_resume_noirq_cb(dev, state, &info);
if (dev->pm_domain) {
info = "noirq power domain ";
callback = pm_noirq_op(&dev->pm_domain->ops, state);
} else if (dev->type && dev->type->pm) {
info = "noirq type ";
callback = pm_noirq_op(dev->type->pm, state);
} else if (dev->class && dev->class->pm) {
info = "noirq class ";
callback = pm_noirq_op(dev->class->pm, state);
} else if (dev->bus && dev->bus->pm) {
info = "noirq bus ";
callback = pm_noirq_op(dev->bus->pm, state);
}
if (callback)
goto Run;
if (skip_resume)
goto Skip;
if (dev_pm_smart_suspend_and_suspended(dev)) {
pm_message_t suspend_msg = suspend_event(state);
/*
* If "freeze" callbacks have been skipped during a transition
* related to hibernation, the subsequent "thaw" callbacks must
* be skipped too or bad things may happen. Otherwise, resume
* callbacks are going to be run for the device, so its runtime
* PM status must be changed to reflect the new state after the
* transition under way.
*/
if (!dpm_subsys_suspend_late_cb(dev, suspend_msg, NULL) &&
!dpm_subsys_suspend_noirq_cb(dev, suspend_msg, NULL)) {
if (state.event == PM_EVENT_THAW) {
skip_resume = true;
goto Skip;
} else {
pm_runtime_set_active(dev);
}
}
}
if (dev->driver && dev->driver->pm) {
info = "noirq driver ";
callback = pm_noirq_op(dev->driver->pm, state);
@ -698,20 +656,6 @@ static int device_resume_noirq(struct device *dev, pm_message_t state, bool asyn
Skip:
dev->power.is_noirq_suspended = false;
if (skip_resume) {
/* Make the next phases of resume skip the device. */
dev->power.is_late_suspended = false;
dev->power.is_suspended = false;
/*
* The device is going to be left in suspend, but it might not
* have been in runtime suspend before the system suspended, so
* its runtime PM status needs to be updated to avoid confusing
* the runtime PM framework when runtime PM is enabled for the
* device again.
*/
pm_runtime_set_suspended(dev);
}
Out:
complete_all(&dev->power.completion);
TRACE_RESUME(error);
@ -810,35 +754,6 @@ void dpm_resume_noirq(pm_message_t state)
cpuidle_resume();
}
static pm_callback_t dpm_subsys_resume_early_cb(struct device *dev,
pm_message_t state,
const char **info_p)
{
pm_callback_t callback;
const char *info;
if (dev->pm_domain) {
info = "early power domain ";
callback = pm_late_early_op(&dev->pm_domain->ops, state);
} else if (dev->type && dev->type->pm) {
info = "early type ";
callback = pm_late_early_op(dev->type->pm, state);
} else if (dev->class && dev->class->pm) {
info = "early class ";
callback = pm_late_early_op(dev->class->pm, state);
} else if (dev->bus && dev->bus->pm) {
info = "early bus ";
callback = pm_late_early_op(dev->bus->pm, state);
} else {
return NULL;
}
if (info_p)
*info_p = info;
return callback;
}
/**
* device_resume_early - Execute an "early resume" callback for given device.
* @dev: Device to handle.
@ -849,8 +764,8 @@ static pm_callback_t dpm_subsys_resume_early_cb(struct device *dev,
*/
static int device_resume_early(struct device *dev, pm_message_t state, bool async)
{
pm_callback_t callback;
const char *info;
pm_callback_t callback = NULL;
const char *info = NULL;
int error = 0;
TRACE_DEVICE(dev);
@ -865,17 +780,37 @@ static int device_resume_early(struct device *dev, pm_message_t state, bool asyn
if (!dpm_wait_for_superior(dev, async))
goto Out;
callback = dpm_subsys_resume_early_cb(dev, state, &info);
if (dev->pm_domain) {
info = "early power domain ";
callback = pm_late_early_op(&dev->pm_domain->ops, state);
} else if (dev->type && dev->type->pm) {
info = "early type ";
callback = pm_late_early_op(dev->type->pm, state);
} else if (dev->class && dev->class->pm) {
info = "early class ";
callback = pm_late_early_op(dev->class->pm, state);
} else if (dev->bus && dev->bus->pm) {
info = "early bus ";
callback = pm_late_early_op(dev->bus->pm, state);
}
if (callback)
goto Run;
if (!callback && dev->driver && dev->driver->pm) {
if (dev_pm_skip_resume(dev))
goto Skip;
if (dev->driver && dev->driver->pm) {
info = "early driver ";
callback = pm_late_early_op(dev->driver->pm, state);
}
Run:
error = dpm_run_callback(callback, dev, state, info);
Skip:
dev->power.is_late_suspended = false;
Out:
Out:
TRACE_RESUME(error);
pm_runtime_enable(dev);
@ -1245,61 +1180,6 @@ static void dpm_superior_set_must_resume(struct device *dev)
device_links_read_unlock(idx);
}
static pm_callback_t dpm_subsys_suspend_noirq_cb(struct device *dev,
pm_message_t state,
const char **info_p)
{
pm_callback_t callback;
const char *info;
if (dev->pm_domain) {
info = "noirq power domain ";
callback = pm_noirq_op(&dev->pm_domain->ops, state);
} else if (dev->type && dev->type->pm) {
info = "noirq type ";
callback = pm_noirq_op(dev->type->pm, state);
} else if (dev->class && dev->class->pm) {
info = "noirq class ";
callback = pm_noirq_op(dev->class->pm, state);
} else if (dev->bus && dev->bus->pm) {
info = "noirq bus ";
callback = pm_noirq_op(dev->bus->pm, state);
} else {
return NULL;
}
if (info_p)
*info_p = info;
return callback;
}
static bool device_must_resume(struct device *dev, pm_message_t state,
bool no_subsys_suspend_noirq)
{
pm_message_t resume_msg = resume_event(state);
/*
* If all of the device driver's "noirq", "late" and "early" callbacks
* are invoked directly by the core, the decision to allow the device to
* stay in suspend can be based on its current runtime PM status and its
* wakeup settings.
*/
if (no_subsys_suspend_noirq &&
!dpm_subsys_suspend_late_cb(dev, state, NULL) &&
!dpm_subsys_resume_early_cb(dev, resume_msg, NULL) &&
!dpm_subsys_resume_noirq_cb(dev, resume_msg, NULL))
return !pm_runtime_status_suspended(dev) &&
(resume_msg.event != PM_EVENT_RESUME ||
(device_can_wakeup(dev) && !device_may_wakeup(dev)));
/*
* The only safe strategy here is to require that if the device may not
* be left in suspend, resume callbacks must be invoked for it.
*/
return !dev->power.may_skip_resume;
}
/**
* __device_suspend_noirq - Execute a "noirq suspend" callback for given device.
* @dev: Device to handle.
@ -1311,9 +1191,8 @@ static bool device_must_resume(struct device *dev, pm_message_t state,
*/
static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool async)
{
pm_callback_t callback;
const char *info;
bool no_subsys_cb = false;
pm_callback_t callback = NULL;
const char *info = NULL;
int error = 0;
TRACE_DEVICE(dev);
@ -1327,13 +1206,23 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
if (dev->power.syscore || dev->power.direct_complete)
goto Complete;
callback = dpm_subsys_suspend_noirq_cb(dev, state, &info);
if (dev->pm_domain) {
info = "noirq power domain ";
callback = pm_noirq_op(&dev->pm_domain->ops, state);
} else if (dev->type && dev->type->pm) {
info = "noirq type ";
callback = pm_noirq_op(dev->type->pm, state);
} else if (dev->class && dev->class->pm) {
info = "noirq class ";
callback = pm_noirq_op(dev->class->pm, state);
} else if (dev->bus && dev->bus->pm) {
info = "noirq bus ";
callback = pm_noirq_op(dev->bus->pm, state);
}
if (callback)
goto Run;
no_subsys_cb = !dpm_subsys_suspend_late_cb(dev, state, NULL);
if (dev_pm_smart_suspend_and_suspended(dev) && no_subsys_cb)
if (dev_pm_skip_suspend(dev))
goto Skip;
if (dev->driver && dev->driver->pm) {
@ -1351,13 +1240,16 @@ static int __device_suspend_noirq(struct device *dev, pm_message_t state, bool a
Skip:
dev->power.is_noirq_suspended = true;
if (dev_pm_test_driver_flags(dev, DPM_FLAG_LEAVE_SUSPENDED)) {
dev->power.must_resume = dev->power.must_resume ||
atomic_read(&dev->power.usage_count) > 1 ||
device_must_resume(dev, state, no_subsys_cb);
} else {
/*
* Skipping the resume of devices that were in use right before the
* system suspend (as indicated by their PM-runtime usage counters)
* would be suboptimal. Also resume them if doing that is not allowed
* to be skipped.
*/
if (atomic_read(&dev->power.usage_count) > 1 ||
!(dev_pm_test_driver_flags(dev, DPM_FLAG_MAY_SKIP_RESUME) &&
dev->power.may_skip_resume))
dev->power.must_resume = true;
}
if (dev->power.must_resume)
dpm_superior_set_must_resume(dev);
@ -1474,35 +1366,6 @@ static void dpm_propagate_wakeup_to_parent(struct device *dev)
spin_unlock_irq(&parent->power.lock);
}
static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev,
pm_message_t state,
const char **info_p)
{
pm_callback_t callback;
const char *info;
if (dev->pm_domain) {
info = "late power domain ";
callback = pm_late_early_op(&dev->pm_domain->ops, state);
} else if (dev->type && dev->type->pm) {
info = "late type ";
callback = pm_late_early_op(dev->type->pm, state);
} else if (dev->class && dev->class->pm) {
info = "late class ";
callback = pm_late_early_op(dev->class->pm, state);
} else if (dev->bus && dev->bus->pm) {
info = "late bus ";
callback = pm_late_early_op(dev->bus->pm, state);
} else {
return NULL;
}
if (info_p)
*info_p = info;
return callback;
}
/**
* __device_suspend_late - Execute a "late suspend" callback for given device.
* @dev: Device to handle.
@ -1513,8 +1376,8 @@ static pm_callback_t dpm_subsys_suspend_late_cb(struct device *dev,
*/
static int __device_suspend_late(struct device *dev, pm_message_t state, bool async)
{
pm_callback_t callback;
const char *info;
pm_callback_t callback = NULL;
const char *info = NULL;
int error = 0;
TRACE_DEVICE(dev);
@ -1535,12 +1398,23 @@ static int __device_suspend_late(struct device *dev, pm_message_t state, bool as
if (dev->power.syscore || dev->power.direct_complete)
goto Complete;
callback = dpm_subsys_suspend_late_cb(dev, state, &info);
if (dev->pm_domain) {
info = "late power domain ";
callback = pm_late_early_op(&dev->pm_domain->ops, state);
} else if (dev->type && dev->type->pm) {
info = "late type ";
callback = pm_late_early_op(dev->type->pm, state);
} else if (dev->class && dev->class->pm) {
info = "late class ";
callback = pm_late_early_op(dev->class->pm, state);
} else if (dev->bus && dev->bus->pm) {
info = "late bus ";
callback = pm_late_early_op(dev->bus->pm, state);
}
if (callback)
goto Run;
if (dev_pm_smart_suspend_and_suspended(dev) &&
!dpm_subsys_suspend_noirq_cb(dev, state, NULL))
if (dev_pm_skip_suspend(dev))
goto Skip;
if (dev->driver && dev->driver->pm) {
@ -1766,7 +1640,7 @@ static int __device_suspend(struct device *dev, pm_message_t state, bool async)
dev->power.direct_complete = false;
}
dev->power.may_skip_resume = false;
dev->power.may_skip_resume = true;
dev->power.must_resume = false;
dpm_watchdog_set(&wd, dev);
@ -1970,7 +1844,7 @@ static int device_prepare(struct device *dev, pm_message_t state)
spin_lock_irq(&dev->power.lock);
dev->power.direct_complete = state.event == PM_EVENT_SUSPEND &&
(ret > 0 || dev->power.no_pm_callbacks) &&
!dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP);
!dev_pm_test_driver_flags(dev, DPM_FLAG_NO_DIRECT_COMPLETE);
spin_unlock_irq(&dev->power.lock);
return 0;
}
@ -2128,7 +2002,7 @@ void device_pm_check_callbacks(struct device *dev)
spin_unlock_irq(&dev->power.lock);
}
bool dev_pm_smart_suspend_and_suspended(struct device *dev)
bool dev_pm_skip_suspend(struct device *dev)
{
return dev_pm_test_driver_flags(dev, DPM_FLAG_SMART_SUSPEND) &&
pm_runtime_status_suspended(dev);

View File

@ -523,13 +523,11 @@ static int rpm_suspend(struct device *dev, int rpmflags)
repeat:
retval = rpm_check_suspend_allowed(dev);
if (retval < 0)
; /* Conditions are wrong. */
goto out; /* Conditions are wrong. */
/* Synchronous suspends are not allowed in the RPM_RESUMING state. */
else if (dev->power.runtime_status == RPM_RESUMING &&
!(rpmflags & RPM_ASYNC))
if (dev->power.runtime_status == RPM_RESUMING && !(rpmflags & RPM_ASYNC))
retval = -EAGAIN;
if (retval)
goto out;

View File

@ -666,7 +666,7 @@ int dpm_sysfs_add(struct device *dev)
if (rc)
return rc;
if (pm_runtime_callbacks_present(dev)) {
if (!pm_runtime_has_no_callbacks(dev)) {
rc = sysfs_merge_group(&dev->kobj, &pm_runtime_attr_group);
if (rc)
goto err_out;
@ -709,7 +709,7 @@ int dpm_sysfs_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid)
if (rc)
return rc;
if (pm_runtime_callbacks_present(dev)) {
if (!pm_runtime_has_no_callbacks(dev)) {
rc = sysfs_group_change_owner(
&dev->kobj, &pm_runtime_attr_group, kuid, kgid);
if (rc)

View File

@ -95,6 +95,7 @@ struct clockgen {
};
static struct clockgen clockgen;
static bool add_cpufreq_dev __initdata;
static void cg_out(struct clockgen *cg, u32 val, u32 __iomem *reg)
{
@ -1019,7 +1020,7 @@ static void __init create_muxes(struct clockgen *cg)
}
}
static void __init clockgen_init(struct device_node *np);
static void __init _clockgen_init(struct device_node *np, bool legacy);
/*
* Legacy nodes may get probed before the parent clockgen node.
@ -1030,7 +1031,7 @@ static void __init clockgen_init(struct device_node *np);
static void __init legacy_init_clockgen(struct device_node *np)
{
if (!clockgen.node)
clockgen_init(of_get_parent(np));
_clockgen_init(of_get_parent(np), true);
}
/* Legacy node */
@ -1447,7 +1448,7 @@ static bool __init has_erratum_a4510(void)
}
#endif
static void __init clockgen_init(struct device_node *np)
static void __init _clockgen_init(struct device_node *np, bool legacy)
{
int i, ret;
bool is_old_ls1021a = false;
@ -1516,12 +1517,35 @@ static void __init clockgen_init(struct device_node *np)
__func__, np, ret);
}
/* Don't create cpufreq device for legacy clockgen blocks */
add_cpufreq_dev = !legacy;
return;
err:
iounmap(clockgen.regs);
clockgen.regs = NULL;
}
static void __init clockgen_init(struct device_node *np)
{
_clockgen_init(np, false);
}
static int __init clockgen_cpufreq_init(void)
{
struct platform_device *pdev;
if (add_cpufreq_dev) {
pdev = platform_device_register_simple("qoriq-cpufreq", -1,
NULL, 0);
if (IS_ERR(pdev))
pr_err("Couldn't register qoriq-cpufreq err=%ld\n",
PTR_ERR(pdev));
}
return 0;
}
device_initcall(clockgen_cpufreq_init);
CLK_OF_DECLARE(qoriq_clockgen_1, "fsl,qoriq-clockgen-1.0", clockgen_init);
CLK_OF_DECLARE(qoriq_clockgen_2, "fsl,qoriq-clockgen-2.0", clockgen_init);
CLK_OF_DECLARE(qoriq_clockgen_b4420, "fsl,b4420-clockgen", clockgen_init);

View File

@ -114,7 +114,11 @@ static int clk_pm_runtime_get(struct clk_core *core)
return 0;
ret = pm_runtime_get_sync(core->dev);
return ret < 0 ? ret : 0;
if (ret < 0) {
pm_runtime_put_noidle(core->dev);
return ret;
}
return 0;
}
static void clk_pm_runtime_put(struct clk_core *core)

View File

@ -323,7 +323,8 @@ endif
config QORIQ_CPUFREQ
tristate "CPU frequency scaling driver for Freescale QorIQ SoCs"
depends on OF && COMMON_CLK && (PPC_E500MC || ARM || ARM64)
depends on OF && COMMON_CLK
depends on PPC_E500MC || SOC_LS1021A || ARCH_LAYERSCAPE || COMPILE_TEST
select CLK_QORIQ
help
This adds the CPUFreq driver support for Freescale QorIQ SoCs

View File

@ -317,6 +317,7 @@ config ARM_TEGRA186_CPUFREQ
config ARM_TI_CPUFREQ
bool "Texas Instruments CPUFreq support"
depends on ARCH_OMAP2PLUS
default ARCH_OMAP2PLUS
help
This driver enables valid OPPs on the running platform based on
values contained within the SoC in use. Enable this in order to

View File

@ -53,6 +53,7 @@ static const struct of_device_id whitelist[] __initconst = {
{ .compatible = "renesas,r7s72100", },
{ .compatible = "renesas,r8a73a4", },
{ .compatible = "renesas,r8a7740", },
{ .compatible = "renesas,r8a7742", },
{ .compatible = "renesas,r8a7743", },
{ .compatible = "renesas,r8a7744", },
{ .compatible = "renesas,r8a7745", },
@ -105,6 +106,7 @@ static const struct of_device_id blacklist[] __initconst = {
{ .compatible = "calxeda,highbank", },
{ .compatible = "calxeda,ecx-2000", },
{ .compatible = "fsl,imx7ulp", },
{ .compatible = "fsl,imx7d", },
{ .compatible = "fsl,imx8mq", },
{ .compatible = "fsl,imx8mm", },

View File

@ -2535,26 +2535,27 @@ EXPORT_SYMBOL_GPL(cpufreq_update_limits);
static int cpufreq_boost_set_sw(int state)
{
struct cpufreq_policy *policy;
int ret = -EINVAL;
for_each_active_policy(policy) {
int ret;
if (!policy->freq_table)
continue;
return -ENXIO;
ret = cpufreq_frequency_table_cpuinfo(policy,
policy->freq_table);
if (ret) {
pr_err("%s: Policy frequency update failed\n",
__func__);
break;
return ret;
}
ret = freq_qos_update_request(policy->max_freq_req, policy->max);
if (ret < 0)
break;
return ret;
}
return ret;
return 0;
}
int cpufreq_boost_trigger_state(int state)

View File

@ -3,7 +3,9 @@
* Copyright 2019 NXP
*/
#include <linux/clk.h>
#include <linux/cpu.h>
#include <linux/cpufreq.h>
#include <linux/err.h>
#include <linux/init.h>
#include <linux/kernel.h>
@ -12,8 +14,11 @@
#include <linux/of.h>
#include <linux/platform_device.h>
#include <linux/pm_opp.h>
#include <linux/regulator/consumer.h>
#include <linux/slab.h>
#include "cpufreq-dt.h"
#define OCOTP_CFG3_SPEED_GRADE_SHIFT 8
#define OCOTP_CFG3_SPEED_GRADE_MASK (0x3 << 8)
#define IMX8MN_OCOTP_CFG3_SPEED_GRADE_MASK (0xf << 8)
@ -22,20 +27,92 @@
#define IMX8MP_OCOTP_CFG3_MKT_SEGMENT_SHIFT 5
#define IMX8MP_OCOTP_CFG3_MKT_SEGMENT_MASK (0x3 << 5)
#define IMX7ULP_MAX_RUN_FREQ 528000
/* cpufreq-dt device registered by imx-cpufreq-dt */
static struct platform_device *cpufreq_dt_pdev;
static struct opp_table *cpufreq_opp_table;
static struct device *cpu_dev;
enum IMX7ULP_CPUFREQ_CLKS {
ARM,
CORE,
SCS_SEL,
HSRUN_CORE,
HSRUN_SCS_SEL,
FIRC,
};
static struct clk_bulk_data imx7ulp_clks[] = {
{ .id = "arm" },
{ .id = "core" },
{ .id = "scs_sel" },
{ .id = "hsrun_core" },
{ .id = "hsrun_scs_sel" },
{ .id = "firc" },
};
static unsigned int imx7ulp_get_intermediate(struct cpufreq_policy *policy,
unsigned int index)
{
return clk_get_rate(imx7ulp_clks[FIRC].clk);
}
static int imx7ulp_target_intermediate(struct cpufreq_policy *policy,
unsigned int index)
{
unsigned int newfreq = policy->freq_table[index].frequency;
clk_set_parent(imx7ulp_clks[SCS_SEL].clk, imx7ulp_clks[FIRC].clk);
clk_set_parent(imx7ulp_clks[HSRUN_SCS_SEL].clk, imx7ulp_clks[FIRC].clk);
if (newfreq > IMX7ULP_MAX_RUN_FREQ)
clk_set_parent(imx7ulp_clks[ARM].clk,
imx7ulp_clks[HSRUN_CORE].clk);
else
clk_set_parent(imx7ulp_clks[ARM].clk, imx7ulp_clks[CORE].clk);
return 0;
}
static struct cpufreq_dt_platform_data imx7ulp_data = {
.target_intermediate = imx7ulp_target_intermediate,
.get_intermediate = imx7ulp_get_intermediate,
};
static int imx_cpufreq_dt_probe(struct platform_device *pdev)
{
struct device *cpu_dev = get_cpu_device(0);
struct platform_device *dt_pdev;
u32 cell_value, supported_hw[2];
int speed_grade, mkt_segment;
int ret;
cpu_dev = get_cpu_device(0);
if (!of_find_property(cpu_dev->of_node, "cpu-supply", NULL))
return -ENODEV;
if (of_machine_is_compatible("fsl,imx7ulp")) {
ret = clk_bulk_get(cpu_dev, ARRAY_SIZE(imx7ulp_clks),
imx7ulp_clks);
if (ret)
return ret;
dt_pdev = platform_device_register_data(NULL, "cpufreq-dt",
-1, &imx7ulp_data,
sizeof(imx7ulp_data));
if (IS_ERR(dt_pdev)) {
clk_bulk_put(ARRAY_SIZE(imx7ulp_clks), imx7ulp_clks);
ret = PTR_ERR(dt_pdev);
dev_err(&pdev->dev, "Failed to register cpufreq-dt: %d\n", ret);
return ret;
}
cpufreq_dt_pdev = dt_pdev;
return 0;
}
ret = nvmem_cell_read_u32(cpu_dev, "speed_grade", &cell_value);
if (ret)
return ret;
@ -98,7 +175,10 @@ static int imx_cpufreq_dt_probe(struct platform_device *pdev)
static int imx_cpufreq_dt_remove(struct platform_device *pdev)
{
platform_device_unregister(cpufreq_dt_pdev);
dev_pm_opp_put_supported_hw(cpufreq_opp_table);
if (!of_machine_is_compatible("fsl,imx7ulp"))
dev_pm_opp_put_supported_hw(cpufreq_opp_table);
else
clk_bulk_put(ARRAY_SIZE(imx7ulp_clks), imx7ulp_clks);
return 0;
}

View File

@ -2771,6 +2771,8 @@ static int __init intel_pstate_init(void)
pr_info("Invalid MSRs\n");
return -ENODEV;
}
/* Without HWP start in the passive mode. */
default_driver = &intel_cpufreq;
hwp_cpu_matched:
/*
@ -2816,7 +2818,6 @@ static int __init intel_pstate_setup(char *str)
if (!strcmp(str, "disable")) {
no_load = 1;
} else if (!strcmp(str, "passive")) {
pr_info("Passive mode enabled\n");
default_driver = &intel_cpufreq;
no_hwp = 1;
}

View File

@ -277,7 +277,7 @@ static int qcom_cpufreq_probe(struct platform_device *pdev)
if (!np)
return -ENOENT;
ret = of_device_is_compatible(np, "operating-points-v2-qcom-cpu");
ret = of_device_is_compatible(np, "operating-points-v2-kryo-cpu");
if (!ret) {
of_node_put(np);
return -ENOENT;

View File

@ -18,6 +18,7 @@
#include <linux/of.h>
#include <linux/slab.h>
#include <linux/smp.h>
#include <linux/platform_device.h>
/**
* struct cpu_data
@ -29,12 +30,6 @@ struct cpu_data {
struct cpufreq_frequency_table *table;
};
/*
* Don't use cpufreq on this SoC -- used when the SoC would have otherwise
* matched a more generic compatible.
*/
#define SOC_BLACKLIST 1
/**
* struct soc_data - SoC specific data
* @flags: SOC_xxx
@ -264,64 +259,51 @@ static struct cpufreq_driver qoriq_cpufreq_driver = {
.attr = cpufreq_generic_attr,
};
static const struct soc_data blacklist = {
.flags = SOC_BLACKLIST,
};
static const struct of_device_id node_matches[] __initconst = {
static const struct of_device_id qoriq_cpufreq_blacklist[] = {
/* e6500 cannot use cpufreq due to erratum A-008083 */
{ .compatible = "fsl,b4420-clockgen", &blacklist },
{ .compatible = "fsl,b4860-clockgen", &blacklist },
{ .compatible = "fsl,t2080-clockgen", &blacklist },
{ .compatible = "fsl,t4240-clockgen", &blacklist },
{ .compatible = "fsl,ls1012a-clockgen", },
{ .compatible = "fsl,ls1021a-clockgen", },
{ .compatible = "fsl,ls1028a-clockgen", },
{ .compatible = "fsl,ls1043a-clockgen", },
{ .compatible = "fsl,ls1046a-clockgen", },
{ .compatible = "fsl,ls1088a-clockgen", },
{ .compatible = "fsl,ls2080a-clockgen", },
{ .compatible = "fsl,lx2160a-clockgen", },
{ .compatible = "fsl,p4080-clockgen", },
{ .compatible = "fsl,qoriq-clockgen-1.0", },
{ .compatible = "fsl,qoriq-clockgen-2.0", },
{ .compatible = "fsl,b4420-clockgen", },
{ .compatible = "fsl,b4860-clockgen", },
{ .compatible = "fsl,t2080-clockgen", },
{ .compatible = "fsl,t4240-clockgen", },
{}
};
static int __init qoriq_cpufreq_init(void)
static int qoriq_cpufreq_probe(struct platform_device *pdev)
{
int ret;
struct device_node *np;
const struct of_device_id *match;
const struct soc_data *data;
struct device_node *np;
np = of_find_matching_node(NULL, node_matches);
if (!np)
return -ENODEV;
match = of_match_node(node_matches, np);
data = match->data;
of_node_put(np);
if (data && data->flags & SOC_BLACKLIST)
np = of_find_matching_node(NULL, qoriq_cpufreq_blacklist);
if (np) {
dev_info(&pdev->dev, "Disabling due to erratum A-008083");
return -ENODEV;
}
ret = cpufreq_register_driver(&qoriq_cpufreq_driver);
if (!ret)
pr_info("Freescale QorIQ CPU frequency scaling driver\n");
if (ret)
return ret;
return ret;
dev_info(&pdev->dev, "Freescale QorIQ CPU frequency scaling driver\n");
return 0;
}
module_init(qoriq_cpufreq_init);
static void __exit qoriq_cpufreq_exit(void)
static int qoriq_cpufreq_remove(struct platform_device *pdev)
{
cpufreq_unregister_driver(&qoriq_cpufreq_driver);
}
module_exit(qoriq_cpufreq_exit);
return 0;
}
static struct platform_driver qoriq_cpufreq_platform_driver = {
.driver = {
.name = "qoriq-cpufreq",
},
.probe = qoriq_cpufreq_probe,
.remove = qoriq_cpufreq_remove,
};
module_platform_driver(qoriq_cpufreq_platform_driver);
MODULE_ALIAS("platform:qoriq-cpufreq");
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Tang Yuantian <Yuantian.Tang@freescale.com>");
MODULE_DESCRIPTION("cpufreq driver for Freescale QorIQ series SoCs");

View File

@ -94,3 +94,16 @@ config ARM_TEGRA_CPUIDLE
select ARM_CPU_SUSPEND
help
Select this to enable cpuidle for NVIDIA Tegra20/30/114/124 SoCs.
config ARM_QCOM_SPM_CPUIDLE
bool "CPU Idle Driver for Qualcomm Subsystem Power Manager (SPM)"
depends on (ARCH_QCOM || COMPILE_TEST) && !ARM64
select ARM_CPU_SUSPEND
select CPU_IDLE_MULTIPLE_DRIVERS
select DT_IDLE_STATES
select QCOM_SCM
help
Select this to enable cpuidle for Qualcomm processors.
The Subsystem Power Manager (SPM) controls low power modes for the
CPU and L2 cores. It interface with various system drivers to put
the cores in low power modes.

View File

@ -25,6 +25,7 @@ obj-$(CONFIG_ARM_PSCI_CPUIDLE) += cpuidle_psci.o
cpuidle_psci-y := cpuidle-psci.o
cpuidle_psci-$(CONFIG_PM_GENERIC_DOMAINS_OF) += cpuidle-psci-domain.o
obj-$(CONFIG_ARM_TEGRA_CPUIDLE) += cpuidle-tegra.o
obj-$(CONFIG_ARM_QCOM_SPM_CPUIDLE) += cpuidle-qcom-spm.o
###############################################################################
# MIPS drivers

View File

@ -58,6 +58,10 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
u32 state;
int ret;
ret = cpu_pm_enter();
if (ret)
return -1;
/* Do runtime PM to manage a hierarchical CPU toplogy. */
pm_runtime_put_sync_suspend(pd_dev);
@ -65,10 +69,12 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
if (!state)
state = states[idx];
ret = psci_enter_state(idx, state);
ret = psci_cpu_suspend_enter(state) ? -1 : idx;
pm_runtime_get_sync(pd_dev);
cpu_pm_exit();
/* Clear the domain state to start fresh when back from idle. */
psci_set_domain_state(0);
return ret;

View File

@ -19,10 +19,11 @@
#include <linux/cpu_pm.h>
#include <linux/qcom_scm.h>
#include <asm/cpuidle.h>
#include <asm/proc-fns.h>
#include <asm/suspend.h>
#include "dt_idle_states.h"
#define MAX_PMIC_DATA 2
#define MAX_SEQ_DATA 64
#define SPM_CTL_INDEX 0x7f
@ -62,6 +63,7 @@ struct spm_reg_data {
};
struct spm_driver_data {
struct cpuidle_driver cpuidle_driver;
void __iomem *reg_base;
const struct spm_reg_data *reg_data;
};
@ -107,11 +109,6 @@ static const struct spm_reg_data spm_reg_8064_cpu = {
.start_index[PM_SLEEP_MODE_SPC] = 2,
};
static DEFINE_PER_CPU(struct spm_driver_data *, cpu_spm_drv);
typedef int (*idle_fn)(void);
static DEFINE_PER_CPU(idle_fn*, qcom_idle_ops);
static inline void spm_register_write(struct spm_driver_data *drv,
enum spm_reg reg, u32 val)
{
@ -172,10 +169,9 @@ static int qcom_pm_collapse(unsigned long int unused)
return -1;
}
static int qcom_cpu_spc(void)
static int qcom_cpu_spc(struct spm_driver_data *drv)
{
int ret;
struct spm_driver_data *drv = __this_cpu_read(cpu_spm_drv);
spm_set_low_power_mode(drv, PM_SLEEP_MODE_SPC);
ret = cpu_suspend(0, qcom_pm_collapse);
@ -190,94 +186,49 @@ static int qcom_cpu_spc(void)
return ret;
}
static int qcom_idle_enter(unsigned long index)
static int spm_enter_idle_state(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int idx)
{
return __this_cpu_read(qcom_idle_ops)[index]();
struct spm_driver_data *data = container_of(drv, struct spm_driver_data,
cpuidle_driver);
return CPU_PM_CPU_IDLE_ENTER_PARAM(qcom_cpu_spc, idx, data);
}
static const struct of_device_id qcom_idle_state_match[] __initconst = {
{ .compatible = "qcom,idle-state-spc", .data = qcom_cpu_spc },
static struct cpuidle_driver qcom_spm_idle_driver = {
.name = "qcom_spm",
.owner = THIS_MODULE,
.states[0] = {
.enter = spm_enter_idle_state,
.exit_latency = 1,
.target_residency = 1,
.power_usage = UINT_MAX,
.name = "WFI",
.desc = "ARM WFI",
}
};
static const struct of_device_id qcom_idle_state_match[] = {
{ .compatible = "qcom,idle-state-spc", .data = spm_enter_idle_state },
{ },
};
static int __init qcom_cpuidle_init(struct device_node *cpu_node, int cpu)
static int spm_cpuidle_init(struct cpuidle_driver *drv, int cpu)
{
const struct of_device_id *match_id;
struct device_node *state_node;
int i;
int state_count = 1;
idle_fn idle_fns[CPUIDLE_STATE_MAX];
idle_fn *fns;
cpumask_t mask;
bool use_scm_power_down = false;
int ret;
if (!qcom_scm_is_available())
return -EPROBE_DEFER;
memcpy(drv, &qcom_spm_idle_driver, sizeof(*drv));
drv->cpumask = (struct cpumask *)cpumask_of(cpu);
for (i = 0; ; i++) {
state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i);
if (!state_node)
break;
/* Parse idle states from device tree */
ret = dt_init_idle_driver(drv, qcom_idle_state_match, 1);
if (ret <= 0)
return ret ? : -ENODEV;
if (!of_device_is_available(state_node))
continue;
if (i == CPUIDLE_STATE_MAX) {
pr_warn("%s: cpuidle states reached max possible\n",
__func__);
break;
}
match_id = of_match_node(qcom_idle_state_match, state_node);
if (!match_id)
return -ENODEV;
idle_fns[state_count] = match_id->data;
/* Check if any of the states allow power down */
if (match_id->data == qcom_cpu_spc)
use_scm_power_down = true;
state_count++;
}
if (state_count == 1)
goto check_spm;
fns = devm_kcalloc(get_cpu_device(cpu), state_count, sizeof(*fns),
GFP_KERNEL);
if (!fns)
return -ENOMEM;
for (i = 1; i < state_count; i++)
fns[i] = idle_fns[i];
if (use_scm_power_down) {
/* We have atleast one power down mode */
cpumask_clear(&mask);
cpumask_set_cpu(cpu, &mask);
qcom_scm_set_warm_boot_addr(cpu_resume_arm, &mask);
}
per_cpu(qcom_idle_ops, cpu) = fns;
/*
* SPM probe for the cpu should have happened by now, if the
* SPM device does not exist, return -ENXIO to indicate that the
* cpu does not support idle states.
*/
check_spm:
return per_cpu(cpu_spm_drv, cpu) ? 0 : -ENXIO;
/* We have atleast one power down mode */
return qcom_scm_set_warm_boot_addr(cpu_resume_arm, drv->cpumask);
}
static const struct cpuidle_ops qcom_cpuidle_ops __initconst = {
.suspend = qcom_idle_enter,
.init = qcom_cpuidle_init,
};
CPUIDLE_METHOD_OF_DECLARE(qcom_idle_v1, "qcom,kpss-acc-v1", &qcom_cpuidle_ops);
CPUIDLE_METHOD_OF_DECLARE(qcom_idle_v2, "qcom,kpss-acc-v2", &qcom_cpuidle_ops);
static struct spm_driver_data *spm_get_drv(struct platform_device *pdev,
int *spm_cpu)
{
@ -323,11 +274,15 @@ static int spm_dev_probe(struct platform_device *pdev)
struct resource *res;
const struct of_device_id *match_id;
void __iomem *addr;
int cpu;
int cpu, ret;
if (!qcom_scm_is_available())
return -EPROBE_DEFER;
drv = spm_get_drv(pdev, &cpu);
if (!drv)
return -EINVAL;
platform_set_drvdata(pdev, drv);
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
drv->reg_base = devm_ioremap_resource(&pdev->dev, res);
@ -340,6 +295,10 @@ static int spm_dev_probe(struct platform_device *pdev)
drv->reg_data = match_id->data;
ret = spm_cpuidle_init(&drv->cpuidle_driver, cpu);
if (ret)
return ret;
/* Write the SPM sequences first.. */
addr = drv->reg_base + drv->reg_data->reg_offset[SPM_REG_SEQ_ENTRY];
__iowrite32_copy(addr, drv->reg_data->seq,
@ -362,13 +321,20 @@ static int spm_dev_probe(struct platform_device *pdev)
/* Set up Standby as the default low power mode */
spm_set_low_power_mode(drv, PM_SLEEP_MODE_STBY);
per_cpu(cpu_spm_drv, cpu) = drv;
return cpuidle_register(&drv->cpuidle_driver, NULL);
}
static int spm_dev_remove(struct platform_device *pdev)
{
struct spm_driver_data *drv = platform_get_drvdata(pdev);
cpuidle_unregister(&drv->cpuidle_driver);
return 0;
}
static struct platform_driver spm_driver = {
.probe = spm_dev_probe,
.remove = spm_dev_remove,
.driver = {
.name = "saw",
.of_match_table = spm_match_table,

View File

@ -18,14 +18,6 @@
#include "cpuidle.h"
static unsigned int sysfs_switch;
static int __init cpuidle_sysfs_setup(char *unused)
{
sysfs_switch = 1;
return 1;
}
__setup("cpuidle_sysfs_switch", cpuidle_sysfs_setup);
static ssize_t show_available_governors(struct device *dev,
struct device_attribute *attr,
char *buf)
@ -35,10 +27,10 @@ static ssize_t show_available_governors(struct device *dev,
mutex_lock(&cpuidle_lock);
list_for_each_entry(tmp, &cpuidle_governors, governor_list) {
if (i >= (ssize_t) ((PAGE_SIZE/sizeof(char)) -
CPUIDLE_NAME_LEN - 2))
if (i >= (ssize_t) (PAGE_SIZE - (CPUIDLE_NAME_LEN + 2)))
goto out;
i += scnprintf(&buf[i], CPUIDLE_NAME_LEN, "%s ", tmp->name);
i += scnprintf(&buf[i], CPUIDLE_NAME_LEN + 1, "%s ", tmp->name);
}
out:
@ -85,58 +77,43 @@ static ssize_t store_current_governor(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
char gov_name[CPUIDLE_NAME_LEN];
int ret = -EINVAL;
size_t len = count;
char gov_name[CPUIDLE_NAME_LEN + 1];
int ret;
struct cpuidle_governor *gov;
if (!len || len >= sizeof(gov_name))
ret = sscanf(buf, "%" __stringify(CPUIDLE_NAME_LEN) "s", gov_name);
if (ret != 1)
return -EINVAL;
memcpy(gov_name, buf, len);
gov_name[len] = '\0';
if (gov_name[len - 1] == '\n')
gov_name[--len] = '\0';
mutex_lock(&cpuidle_lock);
ret = -EINVAL;
list_for_each_entry(gov, &cpuidle_governors, governor_list) {
if (strlen(gov->name) == len && !strcmp(gov->name, gov_name)) {
if (!strncmp(gov->name, gov_name, CPUIDLE_NAME_LEN)) {
ret = cpuidle_switch_governor(gov);
break;
}
}
mutex_unlock(&cpuidle_lock);
if (ret)
return ret;
else
return count;
return ret ? ret : count;
}
static DEVICE_ATTR(available_governors, 0444, show_available_governors, NULL);
static DEVICE_ATTR(current_driver, 0444, show_current_driver, NULL);
static DEVICE_ATTR(current_governor, 0644, show_current_governor,
store_current_governor);
static DEVICE_ATTR(current_governor_ro, 0444, show_current_governor, NULL);
static struct attribute *cpuidle_default_attrs[] = {
static struct attribute *cpuidle_attrs[] = {
&dev_attr_available_governors.attr,
&dev_attr_current_driver.attr,
&dev_attr_current_governor.attr,
&dev_attr_current_governor_ro.attr,
NULL
};
static DEVICE_ATTR(available_governors, 0444, show_available_governors, NULL);
static DEVICE_ATTR(current_governor, 0644, show_current_governor,
store_current_governor);
static struct attribute *cpuidle_switch_attrs[] = {
&dev_attr_available_governors.attr,
&dev_attr_current_driver.attr,
&dev_attr_current_governor.attr,
NULL
};
static struct attribute_group cpuidle_attr_group = {
.attrs = cpuidle_default_attrs,
.attrs = cpuidle_attrs,
.name = "cpuidle",
};
@ -146,9 +123,6 @@ static struct attribute_group cpuidle_attr_group = {
*/
int cpuidle_add_interface(struct device *dev)
{
if (sysfs_switch)
cpuidle_attr_group.attrs = cpuidle_switch_attrs;
return sysfs_create_group(&dev->kobj, &cpuidle_attr_group);
}
@ -167,11 +141,6 @@ struct cpuidle_attr {
ssize_t (*store)(struct cpuidle_device *, const char *, size_t count);
};
#define define_one_ro(_name, show) \
static struct cpuidle_attr attr_##_name = __ATTR(_name, 0444, show, NULL)
#define define_one_rw(_name, show, store) \
static struct cpuidle_attr attr_##_name = __ATTR(_name, 0644, show, store)
#define attr_to_cpuidleattr(a) container_of(a, struct cpuidle_attr, attr)
struct cpuidle_device_kobj {
@ -431,12 +400,12 @@ static inline void cpuidle_remove_s2idle_attr_group(struct cpuidle_state_kobj *k
#define attr_to_stateattr(a) container_of(a, struct cpuidle_state_attr, attr)
static ssize_t cpuidle_state_show(struct kobject *kobj, struct attribute *attr,
char * buf)
char *buf)
{
int ret = -EIO;
struct cpuidle_state *state = kobj_to_state(kobj);
struct cpuidle_state_usage *state_usage = kobj_to_state_usage(kobj);
struct cpuidle_state_attr * cattr = attr_to_stateattr(attr);
struct cpuidle_state_attr *cattr = attr_to_stateattr(attr);
if (cattr->show)
ret = cattr->show(state, state_usage, buf);
@ -515,7 +484,7 @@ static int cpuidle_add_state_sysfs(struct cpuidle_device *device)
ret = kobject_init_and_add(&kobj->kobj, &ktype_state_cpuidle,
&kdev->kobj, "state%d", i);
if (ret) {
kfree(kobj);
kobject_put(&kobj->kobj);
goto error_state;
}
cpuidle_add_s2idle_attr_group(kobj);
@ -646,7 +615,7 @@ static int cpuidle_add_driver_sysfs(struct cpuidle_device *dev)
ret = kobject_init_and_add(&kdrv->kobj, &ktype_driver_cpuidle,
&kdev->kobj, "driver");
if (ret) {
kfree(kdrv);
kobject_put(&kdrv->kobj);
return ret;
}
@ -740,7 +709,7 @@ int cpuidle_add_sysfs(struct cpuidle_device *dev)
error = kobject_init_and_add(&kdev->kobj, &ktype_cpuidle, &cpu_dev->kobj,
"cpuidle");
if (error) {
kfree(kdev);
kobject_put(&kdev->kobj);
return error;
}

View File

@ -91,6 +91,14 @@ config ARM_EXYNOS_BUS_DEVFREQ
and adjusts the operating frequencies and voltages with OPP support.
This does not yet operate with optimal voltages.
config ARM_IMX_BUS_DEVFREQ
tristate "i.MX Generic Bus DEVFREQ Driver"
depends on ARCH_MXC || COMPILE_TEST
select DEVFREQ_GOV_USERSPACE
help
This adds the generic DEVFREQ driver for i.MX interconnects. It
allows adjusting NIC/NOC frequency.
config ARM_IMX8M_DDRC_DEVFREQ
tristate "i.MX8M DDRC DEVFREQ Driver"
depends on (ARCH_MXC && HAVE_ARM_SMCCC) || \

View File

@ -9,6 +9,7 @@ obj-$(CONFIG_DEVFREQ_GOV_PASSIVE) += governor_passive.o
# DEVFREQ Drivers
obj-$(CONFIG_ARM_EXYNOS_BUS_DEVFREQ) += exynos-bus.o
obj-$(CONFIG_ARM_IMX_BUS_DEVFREQ) += imx-bus.o
obj-$(CONFIG_ARM_IMX8M_DDRC_DEVFREQ) += imx8m-ddrc.o
obj-$(CONFIG_ARM_RK3399_DMC_DEVFREQ) += rk3399_dmc.o
obj-$(CONFIG_ARM_TEGRA_DEVFREQ) += tegra30-devfreq.o

View File

@ -60,12 +60,12 @@ static struct devfreq *find_device_devfreq(struct device *dev)
{
struct devfreq *tmp_devfreq;
lockdep_assert_held(&devfreq_list_lock);
if (IS_ERR_OR_NULL(dev)) {
pr_err("DEVFREQ: %s: Invalid parameters\n", __func__);
return ERR_PTR(-EINVAL);
}
WARN(!mutex_is_locked(&devfreq_list_lock),
"devfreq_list_lock must be locked.");
list_for_each_entry(tmp_devfreq, &devfreq_list, node) {
if (tmp_devfreq->dev.parent == dev)
@ -258,12 +258,12 @@ static struct devfreq_governor *find_devfreq_governor(const char *name)
{
struct devfreq_governor *tmp_governor;
lockdep_assert_held(&devfreq_list_lock);
if (IS_ERR_OR_NULL(name)) {
pr_err("DEVFREQ: %s: Invalid parameters\n", __func__);
return ERR_PTR(-EINVAL);
}
WARN(!mutex_is_locked(&devfreq_list_lock),
"devfreq_list_lock must be locked.");
list_for_each_entry(tmp_governor, &devfreq_governor_list, node) {
if (!strncmp(tmp_governor->name, name, DEVFREQ_NAME_LEN))
@ -289,12 +289,12 @@ static struct devfreq_governor *try_then_request_governor(const char *name)
struct devfreq_governor *governor;
int err = 0;
lockdep_assert_held(&devfreq_list_lock);
if (IS_ERR_OR_NULL(name)) {
pr_err("DEVFREQ: %s: Invalid parameters\n", __func__);
return ERR_PTR(-EINVAL);
}
WARN(!mutex_is_locked(&devfreq_list_lock),
"devfreq_list_lock must be locked.");
governor = find_devfreq_governor(name);
if (IS_ERR(governor)) {
@ -392,10 +392,7 @@ int update_devfreq(struct devfreq *devfreq)
int err = 0;
u32 flags = 0;
if (!mutex_is_locked(&devfreq->lock)) {
WARN(true, "devfreq->lock must be locked by the caller.\n");
return -EINVAL;
}
lockdep_assert_held(&devfreq->lock);
if (!devfreq->governor)
return -EINVAL;
@ -768,7 +765,7 @@ struct devfreq *devfreq_add_device(struct device *dev,
devfreq->dev.release = devfreq_dev_release;
INIT_LIST_HEAD(&devfreq->node);
devfreq->profile = profile;
strncpy(devfreq->governor_name, governor_name, DEVFREQ_NAME_LEN);
strscpy(devfreq->governor_name, governor_name, DEVFREQ_NAME_LEN);
devfreq->previous_freq = profile->initial_freq;
devfreq->last_status.current_frequency = profile->initial_freq;
devfreq->data = data;

179
drivers/devfreq/imx-bus.c Normal file
View File

@ -0,0 +1,179 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright 2019 NXP
*/
#include <linux/clk.h>
#include <linux/devfreq.h>
#include <linux/device.h>
#include <linux/module.h>
#include <linux/of_device.h>
#include <linux/pm_opp.h>
#include <linux/platform_device.h>
#include <linux/slab.h>
struct imx_bus {
struct devfreq_dev_profile profile;
struct devfreq *devfreq;
struct clk *clk;
struct platform_device *icc_pdev;
};
static int imx_bus_target(struct device *dev,
unsigned long *freq, u32 flags)
{
struct dev_pm_opp *new_opp;
int ret;
new_opp = devfreq_recommended_opp(dev, freq, flags);
if (IS_ERR(new_opp)) {
ret = PTR_ERR(new_opp);
dev_err(dev, "failed to get recommended opp: %d\n", ret);
return ret;
}
dev_pm_opp_put(new_opp);
return dev_pm_opp_set_rate(dev, *freq);
}
static int imx_bus_get_cur_freq(struct device *dev, unsigned long *freq)
{
struct imx_bus *priv = dev_get_drvdata(dev);
*freq = clk_get_rate(priv->clk);
return 0;
}
static int imx_bus_get_dev_status(struct device *dev,
struct devfreq_dev_status *stat)
{
struct imx_bus *priv = dev_get_drvdata(dev);
stat->busy_time = 0;
stat->total_time = 0;
stat->current_frequency = clk_get_rate(priv->clk);
return 0;
}
static void imx_bus_exit(struct device *dev)
{
struct imx_bus *priv = dev_get_drvdata(dev);
dev_pm_opp_of_remove_table(dev);
platform_device_unregister(priv->icc_pdev);
}
/* imx_bus_init_icc() - register matching icc provider if required */
static int imx_bus_init_icc(struct device *dev)
{
struct imx_bus *priv = dev_get_drvdata(dev);
const char *icc_driver_name;
if (!of_get_property(dev->of_node, "#interconnect-cells", 0))
return 0;
if (!IS_ENABLED(CONFIG_INTERCONNECT_IMX)) {
dev_warn(dev, "imx interconnect drivers disabled\n");
return 0;
}
icc_driver_name = of_device_get_match_data(dev);
if (!icc_driver_name) {
dev_err(dev, "unknown interconnect driver\n");
return 0;
}
priv->icc_pdev = platform_device_register_data(
dev, icc_driver_name, -1, NULL, 0);
if (IS_ERR(priv->icc_pdev)) {
dev_err(dev, "failed to register icc provider %s: %ld\n",
icc_driver_name, PTR_ERR(priv->icc_pdev));
return PTR_ERR(priv->icc_pdev);
}
return 0;
}
static int imx_bus_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
struct imx_bus *priv;
const char *gov = DEVFREQ_GOV_USERSPACE;
int ret;
priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
return -ENOMEM;
/*
* Fetch the clock to adjust but don't explicitly enable.
*
* For imx bus clock clk_set_rate is safe no matter if the clock is on
* or off and some peripheral side-buses might be off unless enabled by
* drivers for devices on those specific buses.
*
* Rate adjustment on a disabled bus clock just takes effect later.
*/
priv->clk = devm_clk_get(dev, NULL);
if (IS_ERR(priv->clk)) {
ret = PTR_ERR(priv->clk);
dev_err(dev, "failed to fetch clk: %d\n", ret);
return ret;
}
platform_set_drvdata(pdev, priv);
ret = dev_pm_opp_of_add_table(dev);
if (ret < 0) {
dev_err(dev, "failed to get OPP table\n");
return ret;
}
priv->profile.polling_ms = 1000;
priv->profile.target = imx_bus_target;
priv->profile.get_dev_status = imx_bus_get_dev_status;
priv->profile.exit = imx_bus_exit;
priv->profile.get_cur_freq = imx_bus_get_cur_freq;
priv->profile.initial_freq = clk_get_rate(priv->clk);
priv->devfreq = devm_devfreq_add_device(dev, &priv->profile,
gov, NULL);
if (IS_ERR(priv->devfreq)) {
ret = PTR_ERR(priv->devfreq);
dev_err(dev, "failed to add devfreq device: %d\n", ret);
goto err;
}
ret = imx_bus_init_icc(dev);
if (ret)
goto err;
return 0;
err:
dev_pm_opp_of_remove_table(dev);
return ret;
}
static const struct of_device_id imx_bus_of_match[] = {
{ .compatible = "fsl,imx8mq-noc", .data = "imx8mq-interconnect", },
{ .compatible = "fsl,imx8mm-noc", .data = "imx8mm-interconnect", },
{ .compatible = "fsl,imx8mn-noc", .data = "imx8mn-interconnect", },
{ .compatible = "fsl,imx8m-noc", },
{ .compatible = "fsl,imx8m-nic", },
{ /* sentinel */ },
};
MODULE_DEVICE_TABLE(of, imx_bus_of_match);
static struct platform_driver imx_bus_platdrv = {
.probe = imx_bus_probe,
.driver = {
.name = "imx-bus-devfreq",
.of_match_table = of_match_ptr(imx_bus_of_match),
},
};
module_platform_driver(imx_bus_platdrv);
MODULE_DESCRIPTION("Generic i.MX bus frequency scaling driver");
MODULE_AUTHOR("Leonard Crestez <leonard.crestez@nxp.com>");
MODULE_LICENSE("GPL v2");

View File

@ -420,7 +420,7 @@ tegra_actmon_cpufreq_contribution(struct tegra_devfreq *tegra,
static_cpu_emc_freq = actmon_cpu_to_emc_rate(tegra, cpu_freq);
if (dev_freq >= static_cpu_emc_freq)
if (dev_freq + actmon_dev->boost_freq >= static_cpu_emc_freq)
return 0;
return static_cpu_emc_freq;
@ -807,10 +807,9 @@ static int tegra_devfreq_probe(struct platform_device *pdev)
}
err = platform_get_irq(pdev, 0);
if (err < 0) {
dev_err(&pdev->dev, "Failed to get IRQ: %d\n", err);
if (err < 0)
return err;
}
tegra->irq = err;
irq_set_status_flags(tegra->irq, IRQ_NOAUTOEN);

View File

@ -191,7 +191,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
}
if (adev->runpm) {
dev_pm_set_driver_flags(dev->dev, DPM_FLAG_NEVER_SKIP);
dev_pm_set_driver_flags(dev->dev, DPM_FLAG_NO_DIRECT_COMPLETE);
pm_runtime_use_autosuspend(dev->dev);
pm_runtime_set_autosuspend_delay(dev->dev, 5000);
pm_runtime_set_active(dev->dev);

View File

@ -549,7 +549,7 @@ void intel_runtime_pm_enable(struct intel_runtime_pm *rpm)
* becaue the HDA driver may require us to enable the audio power
* domain during system suspend.
*/
dev_pm_set_driver_flags(kdev, DPM_FLAG_NEVER_SKIP);
dev_pm_set_driver_flags(kdev, DPM_FLAG_NO_DIRECT_COMPLETE);
pm_runtime_set_autosuspend_delay(kdev, 10000); /* 10s */
pm_runtime_mark_last_busy(kdev);

View File

@ -158,7 +158,7 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
}
if (radeon_is_px(dev)) {
dev_pm_set_driver_flags(dev->dev, DPM_FLAG_NEVER_SKIP);
dev_pm_set_driver_flags(dev->dev, DPM_FLAG_NO_DIRECT_COMPLETE);
pm_runtime_use_autosuspend(dev->dev);
pm_runtime_set_autosuspend_delay(dev->dev, 5000);
pm_runtime_set_active(dev->dev);

View File

@ -357,12 +357,12 @@ static int dw_i2c_plat_probe(struct platform_device *pdev)
if (dev->flags & ACCESS_NO_IRQ_SUSPEND) {
dev_pm_set_driver_flags(&pdev->dev,
DPM_FLAG_SMART_PREPARE |
DPM_FLAG_LEAVE_SUSPENDED);
DPM_FLAG_MAY_SKIP_RESUME);
} else {
dev_pm_set_driver_flags(&pdev->dev,
DPM_FLAG_SMART_PREPARE |
DPM_FLAG_SMART_SUSPEND |
DPM_FLAG_LEAVE_SUSPENDED);
DPM_FLAG_MAY_SKIP_RESUME);
}
/* The code below assumes runtime PM to be disabled. */

View File

@ -241,7 +241,7 @@ static int mei_me_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
* MEI requires to resume from runtime suspend mode
* in order to perform link reset flow upon system suspend.
*/
dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NO_DIRECT_COMPLETE);
/*
* ME maps runtime suspend/resume to D0i states,

View File

@ -128,7 +128,7 @@ static int mei_txe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
* MEI requires to resume from runtime suspend mode
* in order to perform link reset flow upon system suspend.
*/
dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NO_DIRECT_COMPLETE);
/*
* TXE maps runtime suspend/resume to own power gating states,

View File

@ -7549,7 +7549,7 @@ static int e1000_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
e1000_print_device_info(adapter);
dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NO_DIRECT_COMPLETE);
if (pci_dev_run_wake(pdev) && hw->mac.type < e1000_pch_cnp)
pm_runtime_put_noidle(&pdev->dev);

View File

@ -3445,7 +3445,7 @@ static int igb_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
}
}
dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NO_DIRECT_COMPLETE);
pm_runtime_put_noidle(&pdev->dev);
return 0;

View File

@ -4825,7 +4825,7 @@ static int igc_probe(struct pci_dev *pdev,
pcie_print_link_status(pdev);
netdev_info(netdev, "MAC: %pM\n", netdev->dev_addr);
dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NEVER_SKIP);
dev_pm_set_driver_flags(&pdev->dev, DPM_FLAG_NO_DIRECT_COMPLETE);
pm_runtime_put_noidle(&pdev->dev);

View File

@ -275,7 +275,7 @@ static int pciehp_suspend(struct pcie_device *dev)
* If the port is already runtime suspended we can keep it that
* way.
*/
if (dev_pm_smart_suspend_and_suspended(&dev->port->dev))
if (dev_pm_skip_suspend(&dev->port->dev))
return 0;
pciehp_disable_interrupt(dev);

View File

@ -776,7 +776,7 @@ static int pci_pm_suspend(struct device *dev)
static int pci_pm_suspend_late(struct device *dev)
{
if (dev_pm_smart_suspend_and_suspended(dev))
if (dev_pm_skip_suspend(dev))
return 0;
pci_fixup_device(pci_fixup_suspend, to_pci_dev(dev));
@ -789,10 +789,8 @@ static int pci_pm_suspend_noirq(struct device *dev)
struct pci_dev *pci_dev = to_pci_dev(dev);
const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
if (dev_pm_smart_suspend_and_suspended(dev)) {
dev->power.may_skip_resume = true;
if (dev_pm_skip_suspend(dev))
return 0;
}
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_suspend_late(dev, PMSG_SUSPEND);
@ -880,8 +878,8 @@ static int pci_pm_suspend_noirq(struct device *dev)
* pci_pm_complete() to take care of fixing up the device's state
* anyway, if need be.
*/
dev->power.may_skip_resume = device_may_wakeup(dev) ||
!device_can_wakeup(dev);
if (device_can_wakeup(dev) && !device_may_wakeup(dev))
dev->power.may_skip_resume = false;
return 0;
}
@ -893,17 +891,9 @@ static int pci_pm_resume_noirq(struct device *dev)
pci_power_t prev_state = pci_dev->current_state;
bool skip_bus_pm = pci_dev->skip_bus_pm;
if (dev_pm_may_skip_resume(dev))
if (dev_pm_skip_resume(dev))
return 0;
/*
* Devices with DPM_FLAG_SMART_SUSPEND may be left in runtime suspend
* during system suspend, so update their runtime PM status to "active"
* as they are going to be put into D0 shortly.
*/
if (dev_pm_smart_suspend_and_suspended(dev))
pm_runtime_set_active(dev);
/*
* In the suspend-to-idle case, devices left in D0 during suspend will
* stay in D0, so it is not necessary to restore or update their
@ -928,6 +918,14 @@ static int pci_pm_resume_noirq(struct device *dev)
return 0;
}
static int pci_pm_resume_early(struct device *dev)
{
if (dev_pm_skip_resume(dev))
return 0;
return pm_generic_resume_early(dev);
}
static int pci_pm_resume(struct device *dev)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
@ -961,6 +959,7 @@ static int pci_pm_resume(struct device *dev)
#define pci_pm_suspend_late NULL
#define pci_pm_suspend_noirq NULL
#define pci_pm_resume NULL
#define pci_pm_resume_early NULL
#define pci_pm_resume_noirq NULL
#endif /* !CONFIG_SUSPEND */
@ -1127,7 +1126,7 @@ static int pci_pm_poweroff(struct device *dev)
static int pci_pm_poweroff_late(struct device *dev)
{
if (dev_pm_smart_suspend_and_suspended(dev))
if (dev_pm_skip_suspend(dev))
return 0;
pci_fixup_device(pci_fixup_suspend, to_pci_dev(dev));
@ -1140,7 +1139,7 @@ static int pci_pm_poweroff_noirq(struct device *dev)
struct pci_dev *pci_dev = to_pci_dev(dev);
const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
if (dev_pm_smart_suspend_and_suspended(dev))
if (dev_pm_skip_suspend(dev))
return 0;
if (pci_has_legacy_pm_support(pci_dev))
@ -1358,6 +1357,7 @@ static const struct dev_pm_ops pci_dev_pm_ops = {
.suspend = pci_pm_suspend,
.suspend_late = pci_pm_suspend_late,
.resume = pci_pm_resume,
.resume_early = pci_pm_resume_early,
.freeze = pci_pm_freeze,
.thaw = pci_pm_thaw,
.poweroff = pci_pm_poweroff,

View File

@ -115,7 +115,7 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
pci_save_state(dev);
dev_pm_set_driver_flags(&dev->dev, DPM_FLAG_NEVER_SKIP |
dev_pm_set_driver_flags(&dev->dev, DPM_FLAG_NO_DIRECT_COMPLETE |
DPM_FLAG_SMART_SUSPEND);
if (pci_bridge_d3_possible(dev)) {

View File

@ -26,9 +26,6 @@
#include <asm/cpu_device_id.h>
#include <asm/intel-family.h>
/* Local defines */
#define MSR_PLATFORM_POWER_LIMIT 0x0000065C
/* bitmasks for RAPL MSRs, used by primitive access functions */
#define ENERGY_STATUS_MASK 0xffffffff
@ -989,6 +986,7 @@ static const struct x86_cpu_id rapl_ids[] __initconst = {
X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, &rapl_defaults_core),
X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT_PLUS, &rapl_defaults_core),
X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT_D, &rapl_defaults_core),
X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT, &rapl_defaults_core),
X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_D, &rapl_defaults_core),
X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_L, &rapl_defaults_core),

View File

@ -80,16 +80,6 @@ config QCOM_PDR_HELPERS
tristate
select QCOM_QMI_HELPERS
config QCOM_PM
bool "Qualcomm Power Management"
depends on ARCH_QCOM && !ARM64
select ARM_CPU_SUSPEND
select QCOM_SCM
help
QCOM Platform specific power driver to manage cores and L2 low power
modes. It interface with various system drivers to put the cores in
low power modes.
config QCOM_QMI_HELPERS
tristate
depends on NET

View File

@ -8,7 +8,6 @@ obj-$(CONFIG_QCOM_GSBI) += qcom_gsbi.o
obj-$(CONFIG_QCOM_MDT_LOADER) += mdt_loader.o
obj-$(CONFIG_QCOM_OCMEM) += ocmem.o
obj-$(CONFIG_QCOM_PDR_HELPERS) += pdr_interface.o
obj-$(CONFIG_QCOM_PM) += spm.o
obj-$(CONFIG_QCOM_QMI_HELPERS) += qmi_helpers.o
qmi_helpers-y += qmi_encdec.o qmi_interface.o
obj-$(CONFIG_QCOM_RMTFS_MEM) += rmtfs_mem.o

View File

@ -2022,8 +2022,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (bdev_read_only(I_BDEV(bd_inode)))
return -EPERM;
/* uswsusp needs write permission to the swap */
if (IS_SWAPFILE(bd_inode) && !hibernation_available())
if (IS_SWAPFILE(bd_inode) && !is_hibernate_resume_dev(bd_inode))
return -ETXTBSY;
if (!iov_iter_count(from))

View File

@ -330,7 +330,7 @@ struct cpufreq_driver {
*
* get_intermediate should return a stable intermediate frequency
* platform wants to switch to and target_intermediate() should set CPU
* to to that frequency, before jumping to the frequency corresponding
* to that frequency, before jumping to the frequency corresponding
* to 'index'. Core will take care of sending notifications and driver
* doesn't have to handle them in target_intermediate() or
* target_index().

View File

@ -544,31 +544,17 @@ struct pm_subsys_data {
* These flags can be set by device drivers at the probe time. They need not be
* cleared by the drivers as the driver core will take care of that.
*
* NEVER_SKIP: Do not skip all system suspend/resume callbacks for the device.
* SMART_PREPARE: Check the return value of the driver's ->prepare callback.
* SMART_SUSPEND: No need to resume the device from runtime suspend.
* LEAVE_SUSPENDED: Avoid resuming the device during system resume if possible.
* NO_DIRECT_COMPLETE: Do not apply direct-complete optimization to the device.
* SMART_PREPARE: Take the driver ->prepare callback return value into account.
* SMART_SUSPEND: Avoid resuming the device from runtime suspend.
* MAY_SKIP_RESUME: Allow driver "noirq" and "early" callbacks to be skipped.
*
* Setting SMART_PREPARE instructs bus types and PM domains which may want
* system suspend/resume callbacks to be skipped for the device to return 0 from
* their ->prepare callbacks if the driver's ->prepare callback returns 0 (in
* other words, the system suspend/resume callbacks can only be skipped for the
* device if its driver doesn't object against that). This flag has no effect
* if NEVER_SKIP is set.
*
* Setting SMART_SUSPEND instructs bus types and PM domains which may want to
* runtime resume the device upfront during system suspend that doing so is not
* necessary from the driver's perspective. It also may cause them to skip
* invocations of the ->suspend_late and ->suspend_noirq callbacks provided by
* the driver if they decide to leave the device in runtime suspend.
*
* Setting LEAVE_SUSPENDED informs the PM core and middle-layer code that the
* driver prefers the device to be left in suspend after system resume.
* See Documentation/driver-api/pm/devices.rst for details.
*/
#define DPM_FLAG_NEVER_SKIP BIT(0)
#define DPM_FLAG_NO_DIRECT_COMPLETE BIT(0)
#define DPM_FLAG_SMART_PREPARE BIT(1)
#define DPM_FLAG_SMART_SUSPEND BIT(2)
#define DPM_FLAG_LEAVE_SUSPENDED BIT(3)
#define DPM_FLAG_MAY_SKIP_RESUME BIT(3)
struct dev_pm_info {
pm_message_t power_state;
@ -758,8 +744,8 @@ extern int pm_generic_poweroff_late(struct device *dev);
extern int pm_generic_poweroff(struct device *dev);
extern void pm_generic_complete(struct device *dev);
extern bool dev_pm_may_skip_resume(struct device *dev);
extern bool dev_pm_smart_suspend_and_suspended(struct device *dev);
extern bool dev_pm_skip_resume(struct device *dev);
extern bool dev_pm_skip_suspend(struct device *dev);
#else /* !CONFIG_PM_SLEEP */

View File

@ -102,9 +102,9 @@ static inline bool pm_runtime_enabled(struct device *dev)
return !dev->power.disable_depth;
}
static inline bool pm_runtime_callbacks_present(struct device *dev)
static inline bool pm_runtime_has_no_callbacks(struct device *dev)
{
return !dev->power.no_callbacks;
return dev->power.no_callbacks;
}
static inline void pm_runtime_mark_last_busy(struct device *dev)

View File

@ -466,6 +466,12 @@ static inline bool system_entering_hibernation(void) { return false; }
static inline bool hibernation_available(void) { return false; }
#endif /* CONFIG_HIBERNATION */
#ifdef CONFIG_HIBERNATION_SNAPSHOT_DEV
int is_hibernate_resume_dev(const struct inode *);
#else
static inline int is_hibernate_resume_dev(const struct inode *i) { return 0; }
#endif
/* Hibernation and suspend events */
#define PM_HIBERNATION_PREPARE 0x0001 /* Going to hibernate */
#define PM_POST_HIBERNATION 0x0002 /* Hibernation finished */

View File

@ -80,6 +80,18 @@ config HIBERNATION
For more information take a look at <file:Documentation/power/swsusp.rst>.
config HIBERNATION_SNAPSHOT_DEV
bool "Userspace snapshot device"
depends on HIBERNATION
default y
---help---
Device used by the uswsusp tools.
Say N if no snapshotting from userspace is needed, this also
reduces the attack surface of the kernel.
If in doubt, say Y.
config PM_STD_PARTITION
string "Default resume partition"
depends on HIBERNATION

View File

@ -10,7 +10,8 @@ obj-$(CONFIG_VT_CONSOLE_SLEEP) += console.o
obj-$(CONFIG_FREEZER) += process.o
obj-$(CONFIG_SUSPEND) += suspend.o
obj-$(CONFIG_PM_TEST_SUSPEND) += suspend_test.o
obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o user.o
obj-$(CONFIG_HIBERNATION) += hibernate.o snapshot.o swap.o
obj-$(CONFIG_HIBERNATION_SNAPSHOT_DEV) += user.o
obj-$(CONFIG_PM_AUTOSLEEP) += autosleep.o
obj-$(CONFIG_PM_WAKELOCKS) += wakelock.o

View File

@ -67,6 +67,18 @@ bool freezer_test_done;
static const struct platform_hibernation_ops *hibernation_ops;
static atomic_t hibernate_atomic = ATOMIC_INIT(1);
bool hibernate_acquire(void)
{
return atomic_add_unless(&hibernate_atomic, -1, 0);
}
void hibernate_release(void)
{
atomic_inc(&hibernate_atomic);
}
bool hibernation_available(void)
{
return nohibernate == 0 && !security_locked_down(LOCKDOWN_HIBERNATION);
@ -704,7 +716,7 @@ int hibernate(void)
lock_system_sleep();
/* The snapshot device should not be opened while we're running */
if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
if (!hibernate_acquire()) {
error = -EBUSY;
goto Unlock;
}
@ -775,7 +787,7 @@ int hibernate(void)
Exit:
__pm_notifier_call_chain(PM_POST_HIBERNATION, nr_calls, NULL);
pm_restore_console();
atomic_inc(&snapshot_device_available);
hibernate_release();
Unlock:
unlock_system_sleep();
pr_info("hibernation exit\n");
@ -880,7 +892,7 @@ static int software_resume(void)
goto Unlock;
/* The snapshot device should not be opened while we're running */
if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
if (!hibernate_acquire()) {
error = -EBUSY;
swsusp_close(FMODE_READ);
goto Unlock;
@ -911,7 +923,7 @@ static int software_resume(void)
__pm_notifier_call_chain(PM_POST_RESTORE, nr_calls, NULL);
pm_restore_console();
pr_info("resume failed (%d)\n", error);
atomic_inc(&snapshot_device_available);
hibernate_release();
/* For success case, the suspend path will release the lock */
Unlock:
mutex_unlock(&system_transition_mutex);

View File

@ -154,8 +154,8 @@ extern int snapshot_write_next(struct snapshot_handle *handle);
extern void snapshot_write_finalize(struct snapshot_handle *handle);
extern int snapshot_image_loaded(struct snapshot_handle *handle);
/* If unset, the snapshot device cannot be open. */
extern atomic_t snapshot_device_available;
extern bool hibernate_acquire(void);
extern void hibernate_release(void);
extern sector_t alloc_swapdev_block(int swap);
extern void free_all_swap_pages(int swap);

View File

@ -35,9 +35,13 @@ static struct snapshot_data {
bool ready;
bool platform_support;
bool free_bitmaps;
struct inode *bd_inode;
} snapshot_state;
atomic_t snapshot_device_available = ATOMIC_INIT(1);
int is_hibernate_resume_dev(const struct inode *bd_inode)
{
return hibernation_available() && snapshot_state.bd_inode == bd_inode;
}
static int snapshot_open(struct inode *inode, struct file *filp)
{
@ -49,13 +53,13 @@ static int snapshot_open(struct inode *inode, struct file *filp)
lock_system_sleep();
if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
if (!hibernate_acquire()) {
error = -EBUSY;
goto Unlock;
}
if ((filp->f_flags & O_ACCMODE) == O_RDWR) {
atomic_inc(&snapshot_device_available);
hibernate_release();
error = -ENOSYS;
goto Unlock;
}
@ -92,11 +96,12 @@ static int snapshot_open(struct inode *inode, struct file *filp)
__pm_notifier_call_chain(PM_POST_RESTORE, nr_calls, NULL);
}
if (error)
atomic_inc(&snapshot_device_available);
hibernate_release();
data->frozen = false;
data->ready = false;
data->platform_support = false;
data->bd_inode = NULL;
Unlock:
unlock_system_sleep();
@ -112,6 +117,7 @@ static int snapshot_release(struct inode *inode, struct file *filp)
swsusp_free();
data = filp->private_data;
data->bd_inode = NULL;
free_all_swap_pages(data->swap);
if (data->frozen) {
pm_restore_gfp_mask();
@ -122,7 +128,7 @@ static int snapshot_release(struct inode *inode, struct file *filp)
}
pm_notifier_call_chain(data->mode == O_RDONLY ?
PM_POST_HIBERNATION : PM_POST_RESTORE);
atomic_inc(&snapshot_device_available);
hibernate_release();
unlock_system_sleep();
@ -204,6 +210,7 @@ struct compat_resume_swap_area {
static int snapshot_set_swap_area(struct snapshot_data *data,
void __user *argp)
{
struct block_device *bdev;
sector_t offset;
dev_t swdev;
@ -234,9 +241,12 @@ static int snapshot_set_swap_area(struct snapshot_data *data,
data->swap = -1;
return -EINVAL;
}
data->swap = swap_type_of(swdev, offset, NULL);
data->swap = swap_type_of(swdev, offset, &bdev);
if (data->swap < 0)
return -ENODEV;
data->bd_inode = bdev->bd_inode;
bdput(bdev);
return 0;
}

View File

@ -62,7 +62,7 @@ int cmd_info(int argc, char **argv)
default:
print_wrong_arg_exit();
}
};
}
if (!params.params)
params.params = 0x7;

View File

@ -72,7 +72,7 @@ int cmd_set(int argc, char **argv)
default:
print_wrong_arg_exit();
}
};
}
if (!params.params)
print_wrong_arg_exit();

View File

@ -117,7 +117,7 @@ static int amd_fam14h_get_pci_info(struct cstate *state,
break;
default:
return -1;
};
}
return 0;
}

View File

@ -53,7 +53,7 @@ static int cpuidle_start(void)
dprint("CPU %d - State: %d - Val: %llu\n",
cpu, state, previous_count[cpu][state]);
}
};
}
return 0;
}
@ -72,7 +72,7 @@ static int cpuidle_stop(void)
dprint("CPU %d - State: %d - Val: %llu\n",
cpu, state, previous_count[cpu][state]);
}
};
}
return 0;
}
@ -172,7 +172,7 @@ static struct cpuidle_monitor *cpuidle_register(void)
cpuidle_cstates[num].id = num;
cpuidle_cstates[num].get_count_percent =
cpuidle_get_count_percent;
};
}
/* Free this at program termination */
previous_count = malloc(sizeof(long long *) * cpu_count);

View File

@ -79,7 +79,7 @@ static int hsw_ext_get_count(enum intel_hsw_ext_id id, unsigned long long *val,
break;
default:
return -1;
};
}
if (read_msr(cpu, msr, val))
return -1;
return 0;

View File

@ -91,7 +91,7 @@ static int nhm_get_count(enum intel_nhm_id id, unsigned long long *val,
break;
default:
return -1;
};
}
if (read_msr(cpu, msr, val))
return -1;

View File

@ -77,7 +77,7 @@ static int snb_get_count(enum intel_snb_id id, unsigned long long *val,
break;
default:
return -1;
};
}
if (read_msr(cpu, msr, val))
return -1;
return 0;