From 7e5705c635ecfccde559ebbbe1eaf05b5cc60529 Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Fri, 27 Sep 2019 09:26:42 -0700 Subject: [PATCH 01/83] tools/power/cpupower: Fix initializer override in hsw_ext_cstates When building cpupower with clang, the following warning appears: utils/idle_monitor/hsw_ext_idle.c:42:16: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides] .desc = N_("Processor Package C2"), ^~~~~~~~~~~~~~~~~~~~~~ ./utils/helpers/helpers.h:25:33: note: expanded from macro 'N_' #define N_(String) gettext_noop(String) ^~~~~~ ./utils/helpers/helpers.h:23:30: note: expanded from macro 'gettext_noop' #define gettext_noop(String) String ^~~~~~ utils/idle_monitor/hsw_ext_idle.c:41:16: note: previous initialization is here .desc = N_("Processor Package C9"), ^~~~~~~~~~~~~~~~~~~~~~ ./utils/helpers/helpers.h:25:33: note: expanded from macro 'N_' #define N_(String) gettext_noop(String) ^~~~~~ ./utils/helpers/helpers.h:23:30: note: expanded from macro 'gettext_noop' #define gettext_noop(String) String ^~~~~~ 1 warning generated. This appears to be a copy and paste or merge mistake because the name and id fields both have PC9 in them, not PC2. Remove the second assignment to fix the warning. Fixes: 7ee767b69b68 ("cpupower: Add Haswell family 0x45 specific idle monitor to show PC8,9,10 states") Link: https://github.com/ClangBuiltLinux/linux/issues/718 Signed-off-by: Nathan Chancellor Signed-off-by: Shuah Khan --- tools/power/cpupower/utils/idle_monitor/hsw_ext_idle.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/power/cpupower/utils/idle_monitor/hsw_ext_idle.c b/tools/power/cpupower/utils/idle_monitor/hsw_ext_idle.c index 7c7451d3f494..58dbdfd4fa13 100644 --- a/tools/power/cpupower/utils/idle_monitor/hsw_ext_idle.c +++ b/tools/power/cpupower/utils/idle_monitor/hsw_ext_idle.c @@ -39,7 +39,6 @@ static cstate_t hsw_ext_cstates[HSW_EXT_CSTATE_COUNT] = { { .name = "PC9", .desc = N_("Processor Package C9"), - .desc = N_("Processor Package C2"), .id = PC9, .range = RANGE_PACKAGE, .get_count_percent = hsw_ext_get_count_percent, From fef4ac873369fcfe98f255ad905cfd055e755f22 Mon Sep 17 00:00:00 2001 From: Todd Brandt Date: Thu, 19 Sep 2019 12:09:12 -0700 Subject: [PATCH 02/83] pm-graph info added to MAINTAINERS Signed-off-by: Todd Brandt Reviewed-by: Len Brown Signed-off-by: Rafael J. Wysocki --- MAINTAINERS | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 55199ef7fa74..5cab726841b2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -12993,6 +12993,15 @@ L: linux-scsi@vger.kernel.org S: Supported F: drivers/scsi/pm8001/ +PM-GRAPH UTILITY +M: "Todd E Brandt" +L: linux-pm@vger.kernel.org +W: https://01.org/pm-graph +B: https://bugzilla.kernel.org/buglist.cgi?component=pm-graph&product=Tools +T: git git://github.com/intel/pm-graph +S: Supported +F: tools/power/pm-graph + PNP SUPPORT M: "Rafael J. Wysocki" S: Maintained From b4bc9f9e27edd8de76d44675c8f0c6c2ccb6b22c Mon Sep 17 00:00:00 2001 From: "H. Nikolaus Schaller" Date: Wed, 11 Sep 2019 19:47:07 +0200 Subject: [PATCH 03/83] cpufreq: ti-cpufreq: add support for omap34xx and omap36xx This adds code and tables to read the silicon revision and eFuse (speed binned / 720 MHz grade) bits for selecting opp-v2 table entries. Since these bits are not always part of the syscon register range (like for am33xx, am43, dra7), we add code to directly read the register values using ioremap() if syscon access fails. The format of the opp-supported-hw entries is that it has two 32 bit bitfields. E.g.: opp-supported-hw = <0xffffffff 3> The first value is matched against the bit position of the silicon revision which is (see TRM) omap34xx: BIT(0) ES1.0 BIT(1) ES2.0 BIT(2) ES2.1 BIT(3) ES3.0 BIT(4) ES3.1 BIT(7) ES3.1.2 omap36xx: BIT(0) ES1.0 BIT(1) ES1.1 BIT(2) ES1.2 The second value is matched against the speed grade eFuse: BIT(0) no high speed OPP BIT(1) high speed OPP This means for the example above that it is always enabled while e.g. opp-supported-hw = <0x1 2> enables the OPP only for ES1.0 BIT(0) and if the high speed eFuse is set BIT(1). Signed-off-by: H. Nikolaus Schaller Reviewed-by: Tony Lindgren Tested-by: Adam Ford Signed-off-by: Viresh Kumar --- drivers/cpufreq/ti-cpufreq.c | 91 +++++++++++++++++++++++++++++++++++- 1 file changed, 89 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/ti-cpufreq.c b/drivers/cpufreq/ti-cpufreq.c index aeaa883a8c9d..13d426559333 100644 --- a/drivers/cpufreq/ti-cpufreq.c +++ b/drivers/cpufreq/ti-cpufreq.c @@ -31,6 +31,11 @@ #define DRA7_EFUSE_OD_MPU_OPP BIT(1) #define DRA7_EFUSE_HIGH_MPU_OPP BIT(2) +#define OMAP3_CONTROL_DEVICE_STATUS 0x4800244C +#define OMAP3_CONTROL_IDCODE 0x4830A204 +#define OMAP34xx_ProdID_SKUID 0x4830A20C +#define OMAP3_SYSCON_BASE (0x48000000 + 0x2000 + 0x270) + #define VERSION_COUNT 2 struct ti_cpufreq_data; @@ -85,6 +90,13 @@ static unsigned long dra7_efuse_xlate(struct ti_cpufreq_data *opp_data, return calculated_efuse; } +static unsigned long omap3_efuse_xlate(struct ti_cpufreq_data *opp_data, + unsigned long efuse) +{ + /* OPP enable bit ("Speed Binned") */ + return BIT(efuse); +} + static struct ti_cpufreq_soc_data am3x_soc_data = { .efuse_xlate = amx3_efuse_xlate, .efuse_fallback = AM33XX_800M_ARM_MPU_MAX_FREQ, @@ -112,6 +124,56 @@ static struct ti_cpufreq_soc_data dra7_soc_data = { .multi_regulator = true, }; +/* + * OMAP35x TRM (SPRUF98K): + * CONTROL_IDCODE (0x4830 A204) describes Silicon revisions. + * Control OMAP Status Register 15:0 (Address 0x4800 244C) + * to separate between omap3503, omap3515, omap3525, omap3530 + * and feature presence. + * There are encodings for versions limited to 400/266MHz + * but we ignore. + * Not clear if this also holds for omap34xx. + * some eFuse values e.g. CONTROL_FUSE_OPP1_VDD1 + * are stored in the SYSCON register range + * Register 0x4830A20C [ProdID.SKUID] [0:3] + * 0x0 for normal 600/430MHz device. + * 0x8 for 720/520MHz device. + * Not clear what omap34xx value is. + */ + +static struct ti_cpufreq_soc_data omap34xx_soc_data = { + .efuse_xlate = omap3_efuse_xlate, + .efuse_offset = OMAP34xx_ProdID_SKUID - OMAP3_SYSCON_BASE, + .efuse_shift = 3, + .efuse_mask = BIT(3), + .rev_offset = OMAP3_CONTROL_IDCODE - OMAP3_SYSCON_BASE, + .multi_regulator = false, +}; + +/* + * AM/DM37x TRM (SPRUGN4M) + * CONTROL_IDCODE (0x4830 A204) describes Silicon revisions. + * Control Device Status Register 15:0 (Address 0x4800 244C) + * to separate between am3703, am3715, dm3725, dm3730 + * and feature presence. + * Speed Binned = Bit 9 + * 0 800/600 MHz + * 1 1000/800 MHz + * some eFuse values e.g. CONTROL_FUSE_OPP 1G_VDD1 + * are stored in the SYSCON register range. + * There is no 0x4830A20C [ProdID.SKUID] register (exists but + * seems to always read as 0). + */ + +static struct ti_cpufreq_soc_data omap36xx_soc_data = { + .efuse_xlate = omap3_efuse_xlate, + .efuse_offset = OMAP3_CONTROL_DEVICE_STATUS - OMAP3_SYSCON_BASE, + .efuse_shift = 9, + .efuse_mask = BIT(9), + .rev_offset = OMAP3_CONTROL_IDCODE - OMAP3_SYSCON_BASE, + .multi_regulator = false, +}; + /** * ti_cpufreq_get_efuse() - Parse and return efuse value present on SoC * @opp_data: pointer to ti_cpufreq_data context @@ -128,7 +190,17 @@ static int ti_cpufreq_get_efuse(struct ti_cpufreq_data *opp_data, ret = regmap_read(opp_data->syscon, opp_data->soc_data->efuse_offset, &efuse); - if (ret) { + if (ret == -EIO) { + /* not a syscon register! */ + void __iomem *regs = ioremap(OMAP3_SYSCON_BASE + + opp_data->soc_data->efuse_offset, 4); + + if (!regs) + return -ENOMEM; + efuse = readl(regs); + iounmap(regs); + } + else if (ret) { dev_err(dev, "Failed to read the efuse value from syscon: %d\n", ret); @@ -159,7 +231,17 @@ static int ti_cpufreq_get_rev(struct ti_cpufreq_data *opp_data, ret = regmap_read(opp_data->syscon, opp_data->soc_data->rev_offset, &revision); - if (ret) { + if (ret == -EIO) { + /* not a syscon register! */ + void __iomem *regs = ioremap(OMAP3_SYSCON_BASE + + opp_data->soc_data->rev_offset, 4); + + if (!regs) + return -ENOMEM; + revision = readl(regs); + iounmap(regs); + } + else if (ret) { dev_err(dev, "Failed to read the revision number from syscon: %d\n", ret); @@ -191,6 +273,11 @@ static const struct of_device_id ti_cpufreq_of_match[] = { { .compatible = "ti,am33xx", .data = &am3x_soc_data, }, { .compatible = "ti,am43", .data = &am4x_soc_data, }, { .compatible = "ti,dra7", .data = &dra7_soc_data }, + { .compatible = "ti,omap34xx", .data = &omap34xx_soc_data, }, + { .compatible = "ti,omap36xx", .data = &omap36xx_soc_data, }, + /* legacy */ + { .compatible = "ti,omap3430", .data = &omap34xx_soc_data, }, + { .compatible = "ti,omap3630", .data = &omap36xx_soc_data, }, {}, }; From b7dbe349e1eb5a1c07b58da83d8ee60030682a3a Mon Sep 17 00:00:00 2001 From: "H. Nikolaus Schaller" Date: Wed, 11 Sep 2019 19:47:08 +0200 Subject: [PATCH 04/83] ARM: dts: omap34xx & omap36xx: replace opp-v1 tables by opp-v2 for With the driver installed, we can change the opp-v1 table format to opp-v2. In addition, move omap3 from whitelist to blacklist in cpufreq-dt-platdev in the same patch, because doing either first breaks operation and may make trouble in bisect. We also can remove opp-v1 table for omap3-n950-n9 since its 1GHz capability is now automatically detected. We also fix a wrong OPP4 voltage for omap3430 which must be 0.6V + 54*12.5mV = 1275mV. Otherwise the twl4030 driver will reject this OPP. Note: the high speed OPPs that were not available in the opp-v1 tables are tagged "turbo-mode;" which means they are not automatically activated by the govenors or cpu-freq. To enable you have to write echo 1 >/sys/devices/system/cpu/cpufreq/boost Note: to hard disable an OPP in a board.dts file use e.g. &cpu0_opp_table: { /delete-node/ opp1g-1000000000; /* do not use */ }; or alternatively: &cpu0_opp_table: { opp1g-1000000000 { status = "disabled"; /* do not use */ }; }; Signed-off-by: H. Nikolaus Schaller Acked-by: Tony Lindgren Tested-by: Adam Ford Signed-off-by: Viresh Kumar --- arch/arm/boot/dts/omap3-n950-n9.dtsi | 7 --- arch/arm/boot/dts/omap34xx.dtsi | 66 ++++++++++++++++++++++++---- arch/arm/boot/dts/omap36xx.dtsi | 54 +++++++++++++++++++---- drivers/cpufreq/cpufreq-dt-platdev.c | 2 +- 4 files changed, 104 insertions(+), 25 deletions(-) diff --git a/arch/arm/boot/dts/omap3-n950-n9.dtsi b/arch/arm/boot/dts/omap3-n950-n9.dtsi index 6681d4519e97..a075b63f3087 100644 --- a/arch/arm/boot/dts/omap3-n950-n9.dtsi +++ b/arch/arm/boot/dts/omap3-n950-n9.dtsi @@ -11,13 +11,6 @@ / { cpus { cpu@0 { cpu0-supply = <&vcc>; - operating-points = < - /* kHz uV */ - 300000 1012500 - 600000 1200000 - 800000 1325000 - 1000000 1375000 - >; }; }; diff --git a/arch/arm/boot/dts/omap34xx.dtsi b/arch/arm/boot/dts/omap34xx.dtsi index 7b09cbee8bb8..c4dd9801840d 100644 --- a/arch/arm/boot/dts/omap34xx.dtsi +++ b/arch/arm/boot/dts/omap34xx.dtsi @@ -16,19 +16,67 @@ / { cpus { cpu: cpu@0 { - /* OMAP343x/OMAP35xx variants OPP1-5 */ - operating-points = < - /* kHz uV */ - 125000 975000 - 250000 1075000 - 500000 1200000 - 550000 1270000 - 600000 1350000 - >; + /* OMAP343x/OMAP35xx variants OPP1-6 */ + operating-points-v2 = <&cpu0_opp_table>; + clock-latency = <300000>; /* From legacy driver */ }; }; + /* see Documentation/devicetree/bindings/opp/opp.txt */ + cpu0_opp_table: opp-table { + compatible = "operating-points-v2-ti-cpu"; + syscon = <&scm_conf>; + + opp1-125000000 { + opp-hz = /bits/ 64 <125000000>; + /* + * we currently only select the max voltage from table + * Table 3-3 of the omap3530 Data sheet (SPRS507F). + * Format is: + */ + opp-microvolt = <975000 975000 975000>; + /* + * first value is silicon revision bit mask + * second one 720MHz Device Identification bit mask + */ + opp-supported-hw = <0xffffffff 3>; + }; + + opp2-250000000 { + opp-hz = /bits/ 64 <250000000>; + opp-microvolt = <1075000 1075000 1075000>; + opp-supported-hw = <0xffffffff 3>; + opp-suspend; + }; + + opp3-500000000 { + opp-hz = /bits/ 64 <500000000>; + opp-microvolt = <1200000 1200000 1200000>; + opp-supported-hw = <0xffffffff 3>; + }; + + opp4-550000000 { + opp-hz = /bits/ 64 <550000000>; + opp-microvolt = <1275000 1275000 1275000>; + opp-supported-hw = <0xffffffff 3>; + }; + + opp5-600000000 { + opp-hz = /bits/ 64 <600000000>; + opp-microvolt = <1350000 1350000 1350000>; + opp-supported-hw = <0xffffffff 3>; + }; + + opp6-720000000 { + opp-hz = /bits/ 64 <720000000>; + opp-microvolt = <1350000 1350000 1350000>; + /* only high-speed grade omap3530 devices */ + opp-supported-hw = <0xffffffff 2>; + turbo-mode; + }; + }; + ocp@68000000 { omap3_pmx_core2: pinmux@480025d8 { compatible = "ti,omap3-padconf", "pinctrl-single"; diff --git a/arch/arm/boot/dts/omap36xx.dtsi b/arch/arm/boot/dts/omap36xx.dtsi index 1e552f08f120..2fcd0c5d72ba 100644 --- a/arch/arm/boot/dts/omap36xx.dtsi +++ b/arch/arm/boot/dts/omap36xx.dtsi @@ -19,15 +19,53 @@ aliases { }; cpus { - /* OMAP3630/OMAP37xx 'standard device' variants OPP50 to OPP130 */ + /* OMAP3630/OMAP37xx variants OPP50 to OPP130 and OPP1G */ cpu: cpu@0 { - operating-points = < - /* kHz uV */ - 300000 1012500 - 600000 1200000 - 800000 1325000 - >; - clock-latency = <300000>; /* From legacy driver */ + operating-points-v2 = <&cpu0_opp_table>; + + clock-latency = <300000>; /* From omap-cpufreq driver */ + }; + }; + + /* see Documentation/devicetree/bindings/opp/opp.txt */ + cpu0_opp_table: opp-table { + compatible = "operating-points-v2-ti-cpu"; + syscon = <&scm_conf>; + + opp50-300000000 { + opp-hz = /bits/ 64 <300000000>; + /* + * we currently only select the max voltage from table + * Table 4-19 of the DM3730 Data sheet (SPRS685B) + * Format is: + */ + opp-microvolt = <1012500 1012500 1012500>; + /* + * first value is silicon revision bit mask + * second one is "speed binned" bit mask + */ + opp-supported-hw = <0xffffffff 3>; + opp-suspend; + }; + + opp100-600000000 { + opp-hz = /bits/ 64 <600000000>; + opp-microvolt = <1200000 1200000 1200000>; + opp-supported-hw = <0xffffffff 3>; + }; + + opp130-800000000 { + opp-hz = /bits/ 64 <800000000>; + opp-microvolt = <1325000 1325000 1325000>; + opp-supported-hw = <0xffffffff 3>; + }; + + opp1g-1000000000 { + opp-hz = /bits/ 64 <1000000000>; + opp-microvolt = <1375000 1375000 1375000>; + /* only on am/dm37x with speed-binned bit set */ + opp-supported-hw = <0xffffffff 2>; + turbo-mode; }; }; diff --git a/drivers/cpufreq/cpufreq-dt-platdev.c b/drivers/cpufreq/cpufreq-dt-platdev.c index bca8d1f47fd2..54bc76743b1f 100644 --- a/drivers/cpufreq/cpufreq-dt-platdev.c +++ b/drivers/cpufreq/cpufreq-dt-platdev.c @@ -86,7 +86,6 @@ static const struct of_device_id whitelist[] __initconst = { { .compatible = "st-ericsson,u9540", }, { .compatible = "ti,omap2", }, - { .compatible = "ti,omap3", }, { .compatible = "ti,omap4", }, { .compatible = "ti,omap5", }, @@ -137,6 +136,7 @@ static const struct of_device_id blacklist[] __initconst = { { .compatible = "ti,am33xx", }, { .compatible = "ti,am43", }, { .compatible = "ti,dra7", }, + { .compatible = "ti,omap3", }, { } }; From b552904a73a3bb9a22b8b1db652a6f9285d535f3 Mon Sep 17 00:00:00 2001 From: "H. Nikolaus Schaller" Date: Wed, 11 Sep 2019 19:47:09 +0200 Subject: [PATCH 05/83] DTS: bindings: omap: update bindings documentation * clarify that we now need either "ti,omap3430" or "ti,omap3630" or "ti,am3517" for omap3 chips * clarify that "ti,omap3" has no default * clarify that AM33x is not an "ti,omap3" * clarify that the list of boards is incomplete * remove some "ti,am33xx", "ti,omap3" * add some missing "ti,omap4" Signed-off-by: H. Nikolaus Schaller Acked-by: Tony Lindgren Tested-by: Adam Ford Signed-off-by: Viresh Kumar --- .../devicetree/bindings/arm/omap/omap.txt | 30 +++++++++++-------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/Documentation/devicetree/bindings/arm/omap/omap.txt b/Documentation/devicetree/bindings/arm/omap/omap.txt index b301f753ed2c..e77635c5422c 100644 --- a/Documentation/devicetree/bindings/arm/omap/omap.txt +++ b/Documentation/devicetree/bindings/arm/omap/omap.txt @@ -43,7 +43,7 @@ SoC Families: - OMAP2 generic - defaults to OMAP2420 compatible = "ti,omap2" -- OMAP3 generic - defaults to OMAP3430 +- OMAP3 generic compatible = "ti,omap3" - OMAP4 generic - defaults to OMAP4430 compatible = "ti,omap4" @@ -51,6 +51,8 @@ SoC Families: compatible = "ti,omap5" - DRA7 generic - defaults to DRA742 compatible = "ti,dra7" +- AM33x generic + compatible = "ti,am33xx" - AM43x generic - defaults to AM4372 compatible = "ti,am43" @@ -63,12 +65,14 @@ SoCs: - OMAP3430 compatible = "ti,omap3430", "ti,omap3" + legacy: "ti,omap34xx" - please do not use any more - AM3517 compatible = "ti,am3517", "ti,omap3" - OMAP3630 - compatible = "ti,omap36xx", "ti,omap3" -- AM33xx - compatible = "ti,am33xx", "ti,omap3" + compatible = "ti,omap3630", "ti,omap3" + legacy: "ti,omap36xx" - please do not use any more +- AM335x + compatible = "ti,am33xx" - OMAP4430 compatible = "ti,omap4430", "ti,omap4" @@ -110,19 +114,19 @@ SoCs: - AM4372 compatible = "ti,am4372", "ti,am43" -Boards: +Boards (incomplete list of examples): - OMAP3 BeagleBoard : Low cost community board - compatible = "ti,omap3-beagle", "ti,omap3" + compatible = "ti,omap3-beagle", "ti,omap3430", "ti,omap3" - OMAP3 Tobi with Overo : Commercial expansion board with daughter board - compatible = "gumstix,omap3-overo-tobi", "gumstix,omap3-overo", "ti,omap3" + compatible = "gumstix,omap3-overo-tobi", "gumstix,omap3-overo", "ti,omap3430", "ti,omap3" - OMAP4 SDP : Software Development Board - compatible = "ti,omap4-sdp", "ti,omap4430" + compatible = "ti,omap4-sdp", "ti,omap4430", "ti,omap4" - OMAP4 PandaBoard : Low cost community board - compatible = "ti,omap4-panda", "ti,omap4430" + compatible = "ti,omap4-panda", "ti,omap4430", "ti,omap4" - OMAP4 DuoVero with Parlor : Commercial expansion board with daughter board compatible = "gumstix,omap4-duovero-parlor", "gumstix,omap4-duovero", "ti,omap4430", "ti,omap4"; @@ -134,16 +138,16 @@ Boards: compatible = "variscite,var-dvk-om44", "variscite,var-som-om44", "ti,omap4460", "ti,omap4"; - OMAP3 EVM : Software Development Board for OMAP35x, AM/DM37x - compatible = "ti,omap3-evm", "ti,omap3" + compatible = "ti,omap3-evm", "ti,omap3630", "ti,omap3" - AM335X EVM : Software Development Board for AM335x - compatible = "ti,am335x-evm", "ti,am33xx", "ti,omap3" + compatible = "ti,am335x-evm", "ti,am33xx" - AM335X Bone : Low cost community board - compatible = "ti,am335x-bone", "ti,am33xx", "ti,omap3" + compatible = "ti,am335x-bone", "ti,am33xx" - AM3359 ICEv2 : Low cost Industrial Communication Engine EVM. - compatible = "ti,am3359-icev2", "ti,am33xx", "ti,omap3" + compatible = "ti,am3359-icev2", "ti,am33xx" - AM335X OrionLXm : Substation Automation Platform compatible = "novatech,am335x-lxm", "ti,am33xx" From 6ddf6c91e6f884e55d3bb3dcf84ead5bfed273ce Mon Sep 17 00:00:00 2001 From: "H. Nikolaus Schaller" Date: Wed, 11 Sep 2019 19:47:10 +0200 Subject: [PATCH 06/83] ARM: dts: omap3: bulk convert compatible to be explicitly ti,omap3430 or ti,omap3630 or ti,am3517 For the ti-cpufreq driver we need a clear separation between omap34 and omap36 families since they have different silicon revisions and efuses. So far ti,omap3630/ti,omap36xx is just an additional flag to ti,omap3 while omap34 has no required entry. Therefore we can not match omap34 boards properly. This needs to add ti,omap3430 and ti,omap3630 where it is missing. We also clean up some instances of missing ti,am3517 so that we can rely on seeing either one of: ti,am3517 ti,omap3430 ti,omap3630 in addition to ti,omap3. We leave ti,omap34xx and ti,omap36xx untouched for compatibility. The script to do the conversion is: manually fix am3517_mt_ventoux.dts find arch/arm/boot/dts -name '*.dts*' -exec fgrep -q '"ti,omap34xx"' {} \; ! -exec fgrep -q '"ti,omap3430"' {} \; -exec sed -i '' 's/"ti,omap34xx"/"ti,omap3430", "ti,omap34xx"/' {} \; find arch/arm/boot/dts -name '*.dts*' -exec fgrep -q '"ti,omap36xx"' {} \; ! -exec fgrep -q '"ti,omap3630"' {} \; -exec sed -i '' 's/"ti,omap36xx"/"ti,omap3630", "ti,omap36xx"/' {} \; find arch/arm/boot/dts \( -name 'omap*.dts*' -o -name 'logic*.dts*' \) -exec fgrep -q '"ti,omap3"' {} \; ! -exec fgrep -q '"ti,omap3630"' {} \; ! -exec fgrep -q '"ti,omap36xx"' {} \; ! -exec fgrep -q '"ti,am3517"' {} \; ! -exec fgrep -q '"ti,omap34xx"' {} \; ! -exec fgrep -q '"ti,omap3430"' {} \; -exec sed -i '' 's/"ti,omap3"/"ti,omap3430", "ti,omap3"/' {} \; So if your out-of-tree omap3 board does not show any OPPs, please check the compatibility entry and update if needed. Signed-off-by: H. Nikolaus Schaller Acked-by: Tony Lindgren Tested-by: Adam Ford Signed-off-by: Viresh Kumar --- arch/arm/boot/dts/am3517_mt_ventoux.dts | 2 +- arch/arm/boot/dts/logicpd-som-lv-35xx-devkit.dts | 2 +- arch/arm/boot/dts/logicpd-torpedo-35xx-devkit.dts | 2 +- arch/arm/boot/dts/omap3-beagle-xm.dts | 2 +- arch/arm/boot/dts/omap3-beagle.dts | 2 +- arch/arm/boot/dts/omap3-cm-t3530.dts | 2 +- arch/arm/boot/dts/omap3-cm-t3730.dts | 2 +- arch/arm/boot/dts/omap3-devkit8000-lcd43.dts | 2 +- arch/arm/boot/dts/omap3-devkit8000-lcd70.dts | 2 +- arch/arm/boot/dts/omap3-devkit8000.dts | 2 +- arch/arm/boot/dts/omap3-gta04.dtsi | 2 +- arch/arm/boot/dts/omap3-ha-lcd.dts | 2 +- arch/arm/boot/dts/omap3-ha.dts | 2 +- arch/arm/boot/dts/omap3-igep0020-rev-f.dts | 2 +- arch/arm/boot/dts/omap3-igep0020.dts | 2 +- arch/arm/boot/dts/omap3-igep0030-rev-g.dts | 2 +- arch/arm/boot/dts/omap3-igep0030.dts | 2 +- arch/arm/boot/dts/omap3-ldp.dts | 2 +- arch/arm/boot/dts/omap3-lilly-a83x.dtsi | 2 +- arch/arm/boot/dts/omap3-lilly-dbb056.dts | 2 +- arch/arm/boot/dts/omap3-n9.dts | 2 +- arch/arm/boot/dts/omap3-n950.dts | 2 +- arch/arm/boot/dts/omap3-overo-storm-alto35.dts | 2 +- arch/arm/boot/dts/omap3-overo-storm-chestnut43.dts | 2 +- arch/arm/boot/dts/omap3-overo-storm-gallop43.dts | 2 +- arch/arm/boot/dts/omap3-overo-storm-palo35.dts | 2 +- arch/arm/boot/dts/omap3-overo-storm-palo43.dts | 2 +- arch/arm/boot/dts/omap3-overo-storm-summit.dts | 2 +- arch/arm/boot/dts/omap3-overo-storm-tobi.dts | 2 +- arch/arm/boot/dts/omap3-overo-storm-tobiduo.dts | 2 +- arch/arm/boot/dts/omap3-pandora-1ghz.dts | 2 +- arch/arm/boot/dts/omap3-sbc-t3530.dts | 2 +- arch/arm/boot/dts/omap3-sbc-t3730.dts | 2 +- arch/arm/boot/dts/omap3-sniper.dts | 2 +- arch/arm/boot/dts/omap3-thunder.dts | 2 +- arch/arm/boot/dts/omap3-zoom3.dts | 2 +- arch/arm/boot/dts/omap3430-sdp.dts | 2 +- 37 files changed, 37 insertions(+), 37 deletions(-) diff --git a/arch/arm/boot/dts/am3517_mt_ventoux.dts b/arch/arm/boot/dts/am3517_mt_ventoux.dts index e507e4ae0d88..e7d7124a34ba 100644 --- a/arch/arm/boot/dts/am3517_mt_ventoux.dts +++ b/arch/arm/boot/dts/am3517_mt_ventoux.dts @@ -8,7 +8,7 @@ / { model = "TeeJet Mt.Ventoux"; - compatible = "teejet,mt_ventoux", "ti,omap3"; + compatible = "teejet,mt_ventoux", "ti,am3517", "ti,omap3"; memory@80000000 { device_type = "memory"; diff --git a/arch/arm/boot/dts/logicpd-som-lv-35xx-devkit.dts b/arch/arm/boot/dts/logicpd-som-lv-35xx-devkit.dts index f7a841a28865..2a0a98fe67f0 100644 --- a/arch/arm/boot/dts/logicpd-som-lv-35xx-devkit.dts +++ b/arch/arm/boot/dts/logicpd-som-lv-35xx-devkit.dts @@ -9,5 +9,5 @@ / { model = "LogicPD Zoom OMAP35xx SOM-LV Development Kit"; - compatible = "logicpd,dm3730-som-lv-devkit", "ti,omap3"; + compatible = "logicpd,dm3730-som-lv-devkit", "ti,omap3430", "ti,omap3"; }; diff --git a/arch/arm/boot/dts/logicpd-torpedo-35xx-devkit.dts b/arch/arm/boot/dts/logicpd-torpedo-35xx-devkit.dts index 7675bc3fa868..57bae2aa910e 100644 --- a/arch/arm/boot/dts/logicpd-torpedo-35xx-devkit.dts +++ b/arch/arm/boot/dts/logicpd-torpedo-35xx-devkit.dts @@ -9,5 +9,5 @@ / { model = "LogicPD Zoom OMAP35xx Torpedo Development Kit"; - compatible = "logicpd,dm3730-torpedo-devkit", "ti,omap3"; + compatible = "logicpd,dm3730-torpedo-devkit", "ti,omap3430", "ti,omap3"; }; diff --git a/arch/arm/boot/dts/omap3-beagle-xm.dts b/arch/arm/boot/dts/omap3-beagle-xm.dts index 1aa99fc1487a..125ed933ca75 100644 --- a/arch/arm/boot/dts/omap3-beagle-xm.dts +++ b/arch/arm/boot/dts/omap3-beagle-xm.dts @@ -8,7 +8,7 @@ / { model = "TI OMAP3 BeagleBoard xM"; - compatible = "ti,omap3-beagle-xm", "ti,omap36xx", "ti,omap3"; + compatible = "ti,omap3-beagle-xm", "ti,omap3630", "ti,omap36xx", "ti,omap3"; cpus { cpu@0 { diff --git a/arch/arm/boot/dts/omap3-beagle.dts b/arch/arm/boot/dts/omap3-beagle.dts index e3df3c166902..4ed3f93f5841 100644 --- a/arch/arm/boot/dts/omap3-beagle.dts +++ b/arch/arm/boot/dts/omap3-beagle.dts @@ -8,7 +8,7 @@ / { model = "TI OMAP3 BeagleBoard"; - compatible = "ti,omap3-beagle", "ti,omap3"; + compatible = "ti,omap3-beagle", "ti,omap3430", "ti,omap3"; cpus { cpu@0 { diff --git a/arch/arm/boot/dts/omap3-cm-t3530.dts b/arch/arm/boot/dts/omap3-cm-t3530.dts index 76e52c78cbb4..32dbaeaed147 100644 --- a/arch/arm/boot/dts/omap3-cm-t3530.dts +++ b/arch/arm/boot/dts/omap3-cm-t3530.dts @@ -9,7 +9,7 @@ / { model = "CompuLab CM-T3530"; - compatible = "compulab,omap3-cm-t3530", "ti,omap34xx", "ti,omap3"; + compatible = "compulab,omap3-cm-t3530", "ti,omap3430", "ti,omap34xx", "ti,omap3"; /* Regulator to trigger the reset signal of the Wifi module */ mmc2_sdio_reset: regulator-mmc2-sdio-reset { diff --git a/arch/arm/boot/dts/omap3-cm-t3730.dts b/arch/arm/boot/dts/omap3-cm-t3730.dts index 6e944dfa0f3d..683819bf0915 100644 --- a/arch/arm/boot/dts/omap3-cm-t3730.dts +++ b/arch/arm/boot/dts/omap3-cm-t3730.dts @@ -9,7 +9,7 @@ / { model = "CompuLab CM-T3730"; - compatible = "compulab,omap3-cm-t3730", "ti,omap36xx", "ti,omap3"; + compatible = "compulab,omap3-cm-t3730", "ti,omap3630", "ti,omap36xx", "ti,omap3"; wl12xx_vmmc2: wl12xx_vmmc2 { compatible = "regulator-fixed"; diff --git a/arch/arm/boot/dts/omap3-devkit8000-lcd43.dts b/arch/arm/boot/dts/omap3-devkit8000-lcd43.dts index a80fc60bc773..afed85078ad8 100644 --- a/arch/arm/boot/dts/omap3-devkit8000-lcd43.dts +++ b/arch/arm/boot/dts/omap3-devkit8000-lcd43.dts @@ -11,7 +11,7 @@ #include "omap3-devkit8000-lcd-common.dtsi" / { model = "TimLL OMAP3 Devkit8000 with 4.3'' LCD panel"; - compatible = "timll,omap3-devkit8000", "ti,omap3"; + compatible = "timll,omap3-devkit8000", "ti,omap3430", "ti,omap3"; lcd0: display { panel-timing { diff --git a/arch/arm/boot/dts/omap3-devkit8000-lcd70.dts b/arch/arm/boot/dts/omap3-devkit8000-lcd70.dts index 0753776071f8..07c51a105c0d 100644 --- a/arch/arm/boot/dts/omap3-devkit8000-lcd70.dts +++ b/arch/arm/boot/dts/omap3-devkit8000-lcd70.dts @@ -11,7 +11,7 @@ #include "omap3-devkit8000-lcd-common.dtsi" / { model = "TimLL OMAP3 Devkit8000 with 7.0'' LCD panel"; - compatible = "timll,omap3-devkit8000", "ti,omap3"; + compatible = "timll,omap3-devkit8000", "ti,omap3430", "ti,omap3"; lcd0: display { panel-timing { diff --git a/arch/arm/boot/dts/omap3-devkit8000.dts b/arch/arm/boot/dts/omap3-devkit8000.dts index faafc48d8f61..162d0726b008 100644 --- a/arch/arm/boot/dts/omap3-devkit8000.dts +++ b/arch/arm/boot/dts/omap3-devkit8000.dts @@ -7,7 +7,7 @@ #include "omap3-devkit8000-common.dtsi" / { model = "TimLL OMAP3 Devkit8000"; - compatible = "timll,omap3-devkit8000", "ti,omap3"; + compatible = "timll,omap3-devkit8000", "ti,omap3430", "ti,omap3"; aliases { display1 = &dvi0; diff --git a/arch/arm/boot/dts/omap3-gta04.dtsi b/arch/arm/boot/dts/omap3-gta04.dtsi index d01fc8744fd7..f65ecc2db29a 100644 --- a/arch/arm/boot/dts/omap3-gta04.dtsi +++ b/arch/arm/boot/dts/omap3-gta04.dtsi @@ -11,7 +11,7 @@ / { model = "OMAP3 GTA04"; - compatible = "ti,omap3-gta04", "ti,omap36xx", "ti,omap3"; + compatible = "ti,omap3-gta04", "ti,omap3630", "ti,omap36xx", "ti,omap3"; cpus { cpu@0 { diff --git a/arch/arm/boot/dts/omap3-ha-lcd.dts b/arch/arm/boot/dts/omap3-ha-lcd.dts index badb9b3c8897..c9ecbc45c8e2 100644 --- a/arch/arm/boot/dts/omap3-ha-lcd.dts +++ b/arch/arm/boot/dts/omap3-ha-lcd.dts @@ -8,7 +8,7 @@ / { model = "TI OMAP3 HEAD acoustics LCD-baseboard with TAO3530 SOM"; - compatible = "headacoustics,omap3-ha-lcd", "technexion,omap3-tao3530", "ti,omap34xx", "ti,omap3"; + compatible = "headacoustics,omap3-ha-lcd", "technexion,omap3-tao3530", "ti,omap3430", "ti,omap34xx", "ti,omap3"; }; &omap3_pmx_core { diff --git a/arch/arm/boot/dts/omap3-ha.dts b/arch/arm/boot/dts/omap3-ha.dts index a5365252bfbe..35c4e15abeb7 100644 --- a/arch/arm/boot/dts/omap3-ha.dts +++ b/arch/arm/boot/dts/omap3-ha.dts @@ -8,7 +8,7 @@ / { model = "TI OMAP3 HEAD acoustics baseboard with TAO3530 SOM"; - compatible = "headacoustics,omap3-ha", "technexion,omap3-tao3530", "ti,omap34xx", "ti,omap3"; + compatible = "headacoustics,omap3-ha", "technexion,omap3-tao3530", "ti,omap3430", "ti,omap34xx", "ti,omap3"; }; &omap3_pmx_core { diff --git a/arch/arm/boot/dts/omap3-igep0020-rev-f.dts b/arch/arm/boot/dts/omap3-igep0020-rev-f.dts index 03dcd05fb8a0..d134ce1cffc0 100644 --- a/arch/arm/boot/dts/omap3-igep0020-rev-f.dts +++ b/arch/arm/boot/dts/omap3-igep0020-rev-f.dts @@ -10,7 +10,7 @@ / { model = "IGEPv2 Rev. F (TI OMAP AM/DM37x)"; - compatible = "isee,omap3-igep0020-rev-f", "ti,omap36xx", "ti,omap3"; + compatible = "isee,omap3-igep0020-rev-f", "ti,omap3630", "ti,omap36xx", "ti,omap3"; /* Regulator to trigger the WL_EN signal of the Wifi module */ lbep5clwmc_wlen: regulator-lbep5clwmc-wlen { diff --git a/arch/arm/boot/dts/omap3-igep0020.dts b/arch/arm/boot/dts/omap3-igep0020.dts index 6d0519e3dfd0..e341535a7162 100644 --- a/arch/arm/boot/dts/omap3-igep0020.dts +++ b/arch/arm/boot/dts/omap3-igep0020.dts @@ -10,7 +10,7 @@ / { model = "IGEPv2 Rev. C (TI OMAP AM/DM37x)"; - compatible = "isee,omap3-igep0020", "ti,omap36xx", "ti,omap3"; + compatible = "isee,omap3-igep0020", "ti,omap3630", "ti,omap36xx", "ti,omap3"; vmmcsdio_fixed: fixedregulator-mmcsdio { compatible = "regulator-fixed"; diff --git a/arch/arm/boot/dts/omap3-igep0030-rev-g.dts b/arch/arm/boot/dts/omap3-igep0030-rev-g.dts index 060acd1e803a..9ca1d0f61964 100644 --- a/arch/arm/boot/dts/omap3-igep0030-rev-g.dts +++ b/arch/arm/boot/dts/omap3-igep0030-rev-g.dts @@ -10,7 +10,7 @@ / { model = "IGEP COM MODULE Rev. G (TI OMAP AM/DM37x)"; - compatible = "isee,omap3-igep0030-rev-g", "ti,omap36xx", "ti,omap3"; + compatible = "isee,omap3-igep0030-rev-g", "ti,omap3630", "ti,omap36xx", "ti,omap3"; /* Regulator to trigger the WL_EN signal of the Wifi module */ lbep5clwmc_wlen: regulator-lbep5clwmc-wlen { diff --git a/arch/arm/boot/dts/omap3-igep0030.dts b/arch/arm/boot/dts/omap3-igep0030.dts index 25170bd3c573..32f31035daa2 100644 --- a/arch/arm/boot/dts/omap3-igep0030.dts +++ b/arch/arm/boot/dts/omap3-igep0030.dts @@ -10,7 +10,7 @@ / { model = "IGEP COM MODULE Rev. E (TI OMAP AM/DM37x)"; - compatible = "isee,omap3-igep0030", "ti,omap36xx", "ti,omap3"; + compatible = "isee,omap3-igep0030", "ti,omap3630", "ti,omap36xx", "ti,omap3"; vmmcsdio_fixed: fixedregulator-mmcsdio { compatible = "regulator-fixed"; diff --git a/arch/arm/boot/dts/omap3-ldp.dts b/arch/arm/boot/dts/omap3-ldp.dts index 9a5fde2d9bce..ec9ba04ef43b 100644 --- a/arch/arm/boot/dts/omap3-ldp.dts +++ b/arch/arm/boot/dts/omap3-ldp.dts @@ -10,7 +10,7 @@ / { model = "TI OMAP3430 LDP (Zoom1 Labrador)"; - compatible = "ti,omap3-ldp", "ti,omap3"; + compatible = "ti,omap3-ldp", "ti,omap3430", "ti,omap3"; memory@80000000 { device_type = "memory"; diff --git a/arch/arm/boot/dts/omap3-lilly-a83x.dtsi b/arch/arm/boot/dts/omap3-lilly-a83x.dtsi index c22833d4e568..73d477898ec2 100644 --- a/arch/arm/boot/dts/omap3-lilly-a83x.dtsi +++ b/arch/arm/boot/dts/omap3-lilly-a83x.dtsi @@ -7,7 +7,7 @@ / { model = "INCOstartec LILLY-A83X module (DM3730)"; - compatible = "incostartec,omap3-lilly-a83x", "ti,omap36xx", "ti,omap3"; + compatible = "incostartec,omap3-lilly-a83x", "ti,omap3630", "ti,omap36xx", "ti,omap3"; chosen { bootargs = "console=ttyO0,115200n8 vt.global_cursor_default=0 consoleblank=0"; diff --git a/arch/arm/boot/dts/omap3-lilly-dbb056.dts b/arch/arm/boot/dts/omap3-lilly-dbb056.dts index fec335400074..ecb4ef738e07 100644 --- a/arch/arm/boot/dts/omap3-lilly-dbb056.dts +++ b/arch/arm/boot/dts/omap3-lilly-dbb056.dts @@ -8,7 +8,7 @@ / { model = "INCOstartec LILLY-DBB056 (DM3730)"; - compatible = "incostartec,omap3-lilly-dbb056", "incostartec,omap3-lilly-a83x", "ti,omap36xx", "ti,omap3"; + compatible = "incostartec,omap3-lilly-dbb056", "incostartec,omap3-lilly-a83x", "ti,omap3630", "ti,omap36xx", "ti,omap3"; }; &twl { diff --git a/arch/arm/boot/dts/omap3-n9.dts b/arch/arm/boot/dts/omap3-n9.dts index 74c0ff2350d3..2495a696cec6 100644 --- a/arch/arm/boot/dts/omap3-n9.dts +++ b/arch/arm/boot/dts/omap3-n9.dts @@ -12,7 +12,7 @@ / { model = "Nokia N9"; - compatible = "nokia,omap3-n9", "ti,omap36xx", "ti,omap3"; + compatible = "nokia,omap3-n9", "ti,omap3630", "ti,omap36xx", "ti,omap3"; }; &i2c2 { diff --git a/arch/arm/boot/dts/omap3-n950.dts b/arch/arm/boot/dts/omap3-n950.dts index 9886bf8b90ab..31d47a1fad84 100644 --- a/arch/arm/boot/dts/omap3-n950.dts +++ b/arch/arm/boot/dts/omap3-n950.dts @@ -12,7 +12,7 @@ / { model = "Nokia N950"; - compatible = "nokia,omap3-n950", "ti,omap36xx", "ti,omap3"; + compatible = "nokia,omap3-n950", "ti,omap3630", "ti,omap36xx", "ti,omap3"; keys { compatible = "gpio-keys"; diff --git a/arch/arm/boot/dts/omap3-overo-storm-alto35.dts b/arch/arm/boot/dts/omap3-overo-storm-alto35.dts index 18338576c41d..7f04dfad8203 100644 --- a/arch/arm/boot/dts/omap3-overo-storm-alto35.dts +++ b/arch/arm/boot/dts/omap3-overo-storm-alto35.dts @@ -14,5 +14,5 @@ / { model = "OMAP36xx/AM37xx/DM37xx Gumstix Overo on Alto35"; - compatible = "gumstix,omap3-overo-alto35", "gumstix,omap3-overo", "ti,omap36xx", "ti,omap3"; + compatible = "gumstix,omap3-overo-alto35", "gumstix,omap3-overo", "ti,omap3630", "ti,omap36xx", "ti,omap3"; }; diff --git a/arch/arm/boot/dts/omap3-overo-storm-chestnut43.dts b/arch/arm/boot/dts/omap3-overo-storm-chestnut43.dts index f204c8af8281..bc5a04e03336 100644 --- a/arch/arm/boot/dts/omap3-overo-storm-chestnut43.dts +++ b/arch/arm/boot/dts/omap3-overo-storm-chestnut43.dts @@ -14,7 +14,7 @@ / { model = "OMAP36xx/AM37xx/DM37xx Gumstix Overo on Chestnut43"; - compatible = "gumstix,omap3-overo-chestnut43", "gumstix,omap3-overo", "ti,omap36xx", "ti,omap3"; + compatible = "gumstix,omap3-overo-chestnut43", "gumstix,omap3-overo", "ti,omap3630", "ti,omap36xx", "ti,omap3"; }; &omap3_pmx_core2 { diff --git a/arch/arm/boot/dts/omap3-overo-storm-gallop43.dts b/arch/arm/boot/dts/omap3-overo-storm-gallop43.dts index c633f7cee68e..065c31cbf0e2 100644 --- a/arch/arm/boot/dts/omap3-overo-storm-gallop43.dts +++ b/arch/arm/boot/dts/omap3-overo-storm-gallop43.dts @@ -14,7 +14,7 @@ / { model = "OMAP36xx/AM37xx/DM37xx Gumstix Overo on Gallop43"; - compatible = "gumstix,omap3-overo-gallop43", "gumstix,omap3-overo", "ti,omap36xx", "ti,omap3"; + compatible = "gumstix,omap3-overo-gallop43", "gumstix,omap3-overo", "ti,omap3630", "ti,omap36xx", "ti,omap3"; }; &omap3_pmx_core2 { diff --git a/arch/arm/boot/dts/omap3-overo-storm-palo35.dts b/arch/arm/boot/dts/omap3-overo-storm-palo35.dts index fb88ebc9858c..e38c1c51392c 100644 --- a/arch/arm/boot/dts/omap3-overo-storm-palo35.dts +++ b/arch/arm/boot/dts/omap3-overo-storm-palo35.dts @@ -14,7 +14,7 @@ / { model = "OMAP36xx/AM37xx/DM37xx Gumstix Overo on Palo35"; - compatible = "gumstix,omap3-overo-palo35", "gumstix,omap3-overo", "ti,omap36xx", "ti,omap3"; + compatible = "gumstix,omap3-overo-palo35", "gumstix,omap3-overo", "ti,omap3630", "ti,omap36xx", "ti,omap3"; }; &omap3_pmx_core2 { diff --git a/arch/arm/boot/dts/omap3-overo-storm-palo43.dts b/arch/arm/boot/dts/omap3-overo-storm-palo43.dts index 76cca00d97b6..e6dc23159c4d 100644 --- a/arch/arm/boot/dts/omap3-overo-storm-palo43.dts +++ b/arch/arm/boot/dts/omap3-overo-storm-palo43.dts @@ -14,7 +14,7 @@ / { model = "OMAP36xx/AM37xx/DM37xx Gumstix Overo on Palo43"; - compatible = "gumstix,omap3-overo-palo43", "gumstix,omap3-overo", "ti,omap36xx", "ti,omap3"; + compatible = "gumstix,omap3-overo-palo43", "gumstix,omap3-overo", "ti,omap3630", "ti,omap36xx", "ti,omap3"; }; &omap3_pmx_core2 { diff --git a/arch/arm/boot/dts/omap3-overo-storm-summit.dts b/arch/arm/boot/dts/omap3-overo-storm-summit.dts index cc081a9e4c1e..587c08ce282d 100644 --- a/arch/arm/boot/dts/omap3-overo-storm-summit.dts +++ b/arch/arm/boot/dts/omap3-overo-storm-summit.dts @@ -14,7 +14,7 @@ / { model = "OMAP36xx/AM37xx/DM37xx Gumstix Overo on Summit"; - compatible = "gumstix,omap3-overo-summit", "gumstix,omap3-overo", "ti,omap36xx", "ti,omap3"; + compatible = "gumstix,omap3-overo-summit", "gumstix,omap3-overo", "ti,omap3630", "ti,omap36xx", "ti,omap3"; }; &omap3_pmx_core2 { diff --git a/arch/arm/boot/dts/omap3-overo-storm-tobi.dts b/arch/arm/boot/dts/omap3-overo-storm-tobi.dts index 1de41c0826e0..f57de6010994 100644 --- a/arch/arm/boot/dts/omap3-overo-storm-tobi.dts +++ b/arch/arm/boot/dts/omap3-overo-storm-tobi.dts @@ -14,6 +14,6 @@ / { model = "OMAP36xx/AM37xx/DM37xx Gumstix Overo on Tobi"; - compatible = "gumstix,omap3-overo-tobi", "gumstix,omap3-overo", "ti,omap36xx", "ti,omap3"; + compatible = "gumstix,omap3-overo-tobi", "gumstix,omap3-overo", "ti,omap3630", "ti,omap36xx", "ti,omap3"; }; diff --git a/arch/arm/boot/dts/omap3-overo-storm-tobiduo.dts b/arch/arm/boot/dts/omap3-overo-storm-tobiduo.dts index 9ed13118ed8e..281af6c113be 100644 --- a/arch/arm/boot/dts/omap3-overo-storm-tobiduo.dts +++ b/arch/arm/boot/dts/omap3-overo-storm-tobiduo.dts @@ -14,5 +14,5 @@ / { model = "OMAP36xx/AM37xx/DM37xx Gumstix Overo on TobiDuo"; - compatible = "gumstix,omap3-overo-tobiduo", "gumstix,omap3-overo", "ti,omap36xx", "ti,omap3"; + compatible = "gumstix,omap3-overo-tobiduo", "gumstix,omap3-overo", "ti,omap3630", "ti,omap36xx", "ti,omap3"; }; diff --git a/arch/arm/boot/dts/omap3-pandora-1ghz.dts b/arch/arm/boot/dts/omap3-pandora-1ghz.dts index 81b957f33c9f..ea509956d7ac 100644 --- a/arch/arm/boot/dts/omap3-pandora-1ghz.dts +++ b/arch/arm/boot/dts/omap3-pandora-1ghz.dts @@ -16,7 +16,7 @@ / { model = "Pandora Handheld Console 1GHz"; - compatible = "openpandora,omap3-pandora-1ghz", "ti,omap36xx", "ti,omap3"; + compatible = "openpandora,omap3-pandora-1ghz", "ti,omap3630", "ti,omap36xx", "ti,omap3"; }; &omap3_pmx_core2 { diff --git a/arch/arm/boot/dts/omap3-sbc-t3530.dts b/arch/arm/boot/dts/omap3-sbc-t3530.dts index ae96002abb3b..24bf3fd86641 100644 --- a/arch/arm/boot/dts/omap3-sbc-t3530.dts +++ b/arch/arm/boot/dts/omap3-sbc-t3530.dts @@ -8,7 +8,7 @@ / { model = "CompuLab SBC-T3530 with CM-T3530"; - compatible = "compulab,omap3-sbc-t3530", "compulab,omap3-cm-t3530", "ti,omap34xx", "ti,omap3"; + compatible = "compulab,omap3-sbc-t3530", "compulab,omap3-cm-t3530", "ti,omap3430", "ti,omap34xx", "ti,omap3"; aliases { display0 = &dvi0; diff --git a/arch/arm/boot/dts/omap3-sbc-t3730.dts b/arch/arm/boot/dts/omap3-sbc-t3730.dts index 7de6df16fc17..eb3893b9535e 100644 --- a/arch/arm/boot/dts/omap3-sbc-t3730.dts +++ b/arch/arm/boot/dts/omap3-sbc-t3730.dts @@ -8,7 +8,7 @@ / { model = "CompuLab SBC-T3730 with CM-T3730"; - compatible = "compulab,omap3-sbc-t3730", "compulab,omap3-cm-t3730", "ti,omap36xx", "ti,omap3"; + compatible = "compulab,omap3-sbc-t3730", "compulab,omap3-cm-t3730", "ti,omap3630", "ti,omap36xx", "ti,omap3"; aliases { display0 = &dvi0; diff --git a/arch/arm/boot/dts/omap3-sniper.dts b/arch/arm/boot/dts/omap3-sniper.dts index 40a87330e8c3..b6879cdc5c13 100644 --- a/arch/arm/boot/dts/omap3-sniper.dts +++ b/arch/arm/boot/dts/omap3-sniper.dts @@ -9,7 +9,7 @@ / { model = "LG Optimus Black"; - compatible = "lg,omap3-sniper", "ti,omap36xx", "ti,omap3"; + compatible = "lg,omap3-sniper", "ti,omap3630", "ti,omap36xx", "ti,omap3"; cpus { cpu@0 { diff --git a/arch/arm/boot/dts/omap3-thunder.dts b/arch/arm/boot/dts/omap3-thunder.dts index 6276e7079b36..64221e3b3477 100644 --- a/arch/arm/boot/dts/omap3-thunder.dts +++ b/arch/arm/boot/dts/omap3-thunder.dts @@ -8,7 +8,7 @@ / { model = "TI OMAP3 Thunder baseboard with TAO3530 SOM"; - compatible = "technexion,omap3-thunder", "technexion,omap3-tao3530", "ti,omap34xx", "ti,omap3"; + compatible = "technexion,omap3-thunder", "technexion,omap3-tao3530", "ti,omap3430", "ti,omap34xx", "ti,omap3"; }; &omap3_pmx_core { diff --git a/arch/arm/boot/dts/omap3-zoom3.dts b/arch/arm/boot/dts/omap3-zoom3.dts index db3a2fe84e99..d240e39f2151 100644 --- a/arch/arm/boot/dts/omap3-zoom3.dts +++ b/arch/arm/boot/dts/omap3-zoom3.dts @@ -9,7 +9,7 @@ / { model = "TI Zoom3"; - compatible = "ti,omap3-zoom3", "ti,omap36xx", "ti,omap3"; + compatible = "ti,omap3-zoom3", "ti,omap3630", "ti,omap36xx", "ti,omap3"; cpus { cpu@0 { diff --git a/arch/arm/boot/dts/omap3430-sdp.dts b/arch/arm/boot/dts/omap3430-sdp.dts index 0abd61108a53..7bfde8aac7ae 100644 --- a/arch/arm/boot/dts/omap3430-sdp.dts +++ b/arch/arm/boot/dts/omap3430-sdp.dts @@ -8,7 +8,7 @@ / { model = "TI OMAP3430 SDP"; - compatible = "ti,omap3430-sdp", "ti,omap3"; + compatible = "ti,omap3430-sdp", "ti,omap3430", "ti,omap3"; memory@80000000 { device_type = "memory"; From 42e52616f382a455d64629b156685d5adc21ce85 Mon Sep 17 00:00:00 2001 From: "H. Nikolaus Schaller" Date: Wed, 11 Sep 2019 19:47:11 +0200 Subject: [PATCH 07/83] cpufreq: ti-cpufreq: omap36xx use "cpu0","vbb" if run in multi_regulator mode In preparation for using the multi_regulator capability of this driver for handling the ABB LDO for OPP1G of the omap36xx we have to take care that the (legacy) vdd-supply name is cpu0-supply = <&vcc>; To do this we add another field to the SoC description table which optionally can specify a list of regulator names. For omap36xx we define "cpu0-supply" and "vbb-supply". The default remains "vdd-supply" and "vbb-supply". Signed-off-by: H. Nikolaus Schaller Acked-by: Tony Lindgren Acked-by: Rob Herring Tested-by: Adam Ford Signed-off-by: Viresh Kumar --- .../devicetree/bindings/cpufreq/ti-cpufreq.txt | 6 +++++- drivers/cpufreq/ti-cpufreq.c | 12 ++++++++++-- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt b/Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt index 0c38e4b8fc51..1758051798fe 100644 --- a/Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt +++ b/Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt @@ -15,12 +15,16 @@ In 'cpus' nodes: In 'operating-points-v2' table: - compatible: Should be - - 'operating-points-v2-ti-cpu' for am335x, am43xx, and dra7xx/am57xx SoCs + - 'operating-points-v2-ti-cpu' for am335x, am43xx, and dra7xx/am57xx, + omap34xx, omap36xx and am3517 SoCs - syscon: A phandle pointing to a syscon node representing the control module register space of the SoC. Optional properties: -------------------- +- "vdd-supply", "vbb-supply": to define two regulators for dra7xx +- "cpu0-supply", "vbb-supply": to define two regulators for omap36xx + For each opp entry in 'operating-points-v2' table: - opp-supported-hw: Two bitfields indicating: 1. Which revision of the SoC the OPP is supported by diff --git a/drivers/cpufreq/ti-cpufreq.c b/drivers/cpufreq/ti-cpufreq.c index 13d426559333..a161098652e2 100644 --- a/drivers/cpufreq/ti-cpufreq.c +++ b/drivers/cpufreq/ti-cpufreq.c @@ -41,6 +41,7 @@ struct ti_cpufreq_data; struct ti_cpufreq_soc_data { + const char * const *reg_names; unsigned long (*efuse_xlate)(struct ti_cpufreq_data *opp_data, unsigned long efuse); unsigned long efuse_fallback; @@ -165,7 +166,10 @@ static struct ti_cpufreq_soc_data omap34xx_soc_data = { * seems to always read as 0). */ +static const char * const omap3_reg_names[] = {"cpu0", "vbb"}; + static struct ti_cpufreq_soc_data omap36xx_soc_data = { + .reg_names = omap3_reg_names, .efuse_xlate = omap3_efuse_xlate, .efuse_offset = OMAP3_CONTROL_DEVICE_STATUS - OMAP3_SYSCON_BASE, .efuse_shift = 9, @@ -299,7 +303,7 @@ static int ti_cpufreq_probe(struct platform_device *pdev) const struct of_device_id *match; struct opp_table *ti_opp_table; struct ti_cpufreq_data *opp_data; - const char * const reg_names[] = {"vdd", "vbb"}; + const char * const default_reg_names[] = {"vdd", "vbb"}; int ret; match = dev_get_platdata(&pdev->dev); @@ -355,9 +359,13 @@ static int ti_cpufreq_probe(struct platform_device *pdev) opp_data->opp_table = ti_opp_table; if (opp_data->soc_data->multi_regulator) { + const char * const *reg_names = default_reg_names; + + if (opp_data->soc_data->reg_names) + reg_names = opp_data->soc_data->reg_names; ti_opp_table = dev_pm_opp_set_regulators(opp_data->cpu_dev, reg_names, - ARRAY_SIZE(reg_names)); + ARRAY_SIZE(default_reg_names)); if (IS_ERR(ti_opp_table)) { dev_pm_opp_put_supported_hw(opp_data->opp_table); ret = PTR_ERR(ti_opp_table); From 341afbc9ea3983a2261c9e495e0b66b36b5dda20 Mon Sep 17 00:00:00 2001 From: "H. Nikolaus Schaller" Date: Wed, 11 Sep 2019 19:47:12 +0200 Subject: [PATCH 08/83] ARM: dts: omap36xx: using OPP1G needs to control the abb_ldo See DM3730,DM275 data sheet (SPRS685B) footnote (6) in Table 4-19 which says that ABB must be switched to FBB mode when using the OPP1G. The LOD definition abb_mpu_iva already exists so that we need to add plumbing for vbb-supply = <&abb_mpu_iva> and define two voltage vectors for each OPP so that the abb LDO is also updated by the ti-cpufreq driver. We also must switch the ti_cpufreq_soc_data to multi_regulator. Note: reading out the abb reglator voltage to verify that it does do transitions can be done by cat /sys/devices/platform/68000000.ocp/483072f0.regulator-abb-mpu/regulator/regulator.*/microvolts Likewise, read the twl4030 provided VDD voltage by cat /sys/devices/platform/68000000.ocp/48070000.i2c/i2c-0/0-0048/48070000.i2c:twl@48:regulator-vdd1/regulator/regulator.*/microvolts Note: to check if the ABB FBB is enabled/disabled, check registers PRM_LDO_ABB_CTRL 0x483072F4 bit 3:0 1=bypass 5=FBB PRM_LDO_ABB_SETUP 0x483072F0 0x00=bypass 0x11=FBB e.g. /dev/mem opened. Memory mapped at address 0xb6fe4000. Value at address 0x483072F4 (0xb6fe42f4): 0x3205 /dev/mem opened. Memory mapped at address 0xb6f89000. Value at address 0x483072F4 (0xb6f892f4): 0x3201 Note: omap34xx and am3517 have/need no comparable LDO or mechanism. Suggested-by: Adam Ford Signed-off-by: H. Nikolaus Schaller Acked-by: Tony Lindgren Tested-by: Adam Ford Signed-off-by: Viresh Kumar --- arch/arm/boot/dts/omap36xx.dtsi | 21 ++++++++++++++++----- drivers/cpufreq/ti-cpufreq.c | 2 +- 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/arch/arm/boot/dts/omap36xx.dtsi b/arch/arm/boot/dts/omap36xx.dtsi index 2fcd0c5d72ba..c618cb257d00 100644 --- a/arch/arm/boot/dts/omap36xx.dtsi +++ b/arch/arm/boot/dts/omap36xx.dtsi @@ -23,6 +23,7 @@ cpus { cpu: cpu@0 { operating-points-v2 = <&cpu0_opp_table>; + vbb-supply = <&abb_mpu_iva>; clock-latency = <300000>; /* From omap-cpufreq driver */ }; }; @@ -37,9 +38,11 @@ opp50-300000000 { /* * we currently only select the max voltage from table * Table 4-19 of the DM3730 Data sheet (SPRS685B) - * Format is: + * Format is: cpu0-supply: + * vbb-supply: */ - opp-microvolt = <1012500 1012500 1012500>; + opp-microvolt = <1012500 1012500 1012500>, + <1012500 1012500 1012500>; /* * first value is silicon revision bit mask * second one is "speed binned" bit mask @@ -50,25 +53,33 @@ opp50-300000000 { opp100-600000000 { opp-hz = /bits/ 64 <600000000>; - opp-microvolt = <1200000 1200000 1200000>; + opp-microvolt = <1200000 1200000 1200000>, + <1200000 1200000 1200000>; opp-supported-hw = <0xffffffff 3>; }; opp130-800000000 { opp-hz = /bits/ 64 <800000000>; - opp-microvolt = <1325000 1325000 1325000>; + opp-microvolt = <1325000 1325000 1325000>, + <1325000 1325000 1325000>; opp-supported-hw = <0xffffffff 3>; }; opp1g-1000000000 { opp-hz = /bits/ 64 <1000000000>; - opp-microvolt = <1375000 1375000 1375000>; + opp-microvolt = <1375000 1375000 1375000>, + <1375000 1375000 1375000>; /* only on am/dm37x with speed-binned bit set */ opp-supported-hw = <0xffffffff 2>; turbo-mode; }; }; + opp_supply_mpu_iva: opp_supply { + compatible = "ti,omap-opp-supply"; + ti,absolute-max-voltage-uv = <1375000>; + }; + ocp@68000000 { uart4: serial@49042000 { compatible = "ti,omap3-uart"; diff --git a/drivers/cpufreq/ti-cpufreq.c b/drivers/cpufreq/ti-cpufreq.c index a161098652e2..c7e5d3d43118 100644 --- a/drivers/cpufreq/ti-cpufreq.c +++ b/drivers/cpufreq/ti-cpufreq.c @@ -175,7 +175,7 @@ static struct ti_cpufreq_soc_data omap36xx_soc_data = { .efuse_shift = 9, .efuse_mask = BIT(9), .rev_offset = OMAP3_CONTROL_IDCODE - OMAP3_SYSCON_BASE, - .multi_regulator = false, + .multi_regulator = true, }; /** From 3fbeef397212046cc514fe9fcd07e67e6ca32163 Mon Sep 17 00:00:00 2001 From: Adam Ford Date: Wed, 11 Sep 2019 19:47:13 +0200 Subject: [PATCH 09/83] cpufreq: ti-cpufreq: Add support for AM3517 The AM3517 only lists 600MHz @ 1.2V, but the register values for 0x4830A204 = 1b86 802f, it seems like am3517 might be a derivative of the omap36 which OPPs would be OPP50 (300 MHz) and OPP100 (600 MHz). This patch simply adds the am3517 to the compatible table similar to a mix of the omap3430 and omap3430 structure. Signed-off-by: Adam Ford Acked-by: Tony Lindgren Tested-by: Adam Ford Signed-off-by: H. Nikolaus Schaller Signed-off-by: Viresh Kumar --- drivers/cpufreq/ti-cpufreq.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/cpufreq/ti-cpufreq.c b/drivers/cpufreq/ti-cpufreq.c index c7e5d3d43118..557cb513bf7f 100644 --- a/drivers/cpufreq/ti-cpufreq.c +++ b/drivers/cpufreq/ti-cpufreq.c @@ -178,6 +178,21 @@ static struct ti_cpufreq_soc_data omap36xx_soc_data = { .multi_regulator = true, }; +/* + * AM3517 is quite similar to AM/DM37x except that it has no + * high speed grade eFuse and no abb ldo + */ + +static struct ti_cpufreq_soc_data am3517_soc_data = { + .efuse_xlate = omap3_efuse_xlate, + .efuse_offset = OMAP3_CONTROL_DEVICE_STATUS - OMAP3_SYSCON_BASE, + .efuse_shift = 0, + .efuse_mask = 0, + .rev_offset = OMAP3_CONTROL_IDCODE - OMAP3_SYSCON_BASE, + .multi_regulator = false, +}; + + /** * ti_cpufreq_get_efuse() - Parse and return efuse value present on SoC * @opp_data: pointer to ti_cpufreq_data context @@ -275,6 +290,7 @@ static int ti_cpufreq_setup_syscon_register(struct ti_cpufreq_data *opp_data) static const struct of_device_id ti_cpufreq_of_match[] = { { .compatible = "ti,am33xx", .data = &am3x_soc_data, }, + { .compatible = "ti,am3517", .data = &am3517_soc_data, }, { .compatible = "ti,am43", .data = &am4x_soc_data, }, { .compatible = "ti,dra7", .data = &dra7_soc_data }, { .compatible = "ti,omap34xx", .data = &omap34xx_soc_data, }, From 09865094536c759d84aa0b9ce9a27ffed1f2bd9e Mon Sep 17 00:00:00 2001 From: Adam Ford Date: Wed, 11 Sep 2019 19:47:14 +0200 Subject: [PATCH 10/83] ARM: dts: Add OPP-V2 table for AM3517 The AM3517 only lists 600MHz @ 1.2V, but the register values for 0x4830A204 = 1b86 802f, it seems like am3517 might be a derivative of the omap36 which OPPs would be OPP50 (300 MHz) and OPP100 (600 MHz). This patch sets up the OPP50 and OPP100 tables at 300MHz and 600MHz for the AM3517 with each having an operating voltage at 1.2V. Signed-off-by: Adam Ford Acked-by: Tony Lindgren Tested-by: Adam Ford Signed-off-by: H. Nikolaus Schaller Signed-off-by: Viresh Kumar --- arch/arm/boot/dts/am3517.dtsi | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/arch/arm/boot/dts/am3517.dtsi b/arch/arm/boot/dts/am3517.dtsi index bf3002009b00..76f819f4ba48 100644 --- a/arch/arm/boot/dts/am3517.dtsi +++ b/arch/arm/boot/dts/am3517.dtsi @@ -16,6 +16,37 @@ aliases { can = &hecc; }; + cpus { + cpu: cpu@0 { + /* Based on OMAP3630 variants OPP50 and OPP100 */ + operating-points-v2 = <&cpu0_opp_table>; + + clock-latency = <300000>; /* From legacy driver */ + }; + }; + + cpu0_opp_table: opp-table { + compatible = "operating-points-v2-ti-cpu"; + syscon = <&scm_conf>; + /* + * AM3517 TRM only lists 600MHz @ 1.2V, but omap36xx + * appear to operate at 300MHz as well. Since AM3517 only + * lists one operating voltage, it will remain fixed at 1.2V + */ + opp50-300000000 { + opp-hz = /bits/ 64 <300000000>; + opp-microvolt = <1200000>; + opp-supported-hw = <0xffffffff 0xffffffff>; + opp-suspend; + }; + + opp100-600000000 { + opp-hz = /bits/ 64 <600000000>; + opp-microvolt = <1200000>; + opp-supported-hw = <0xffffffff 0xffffffff>; + }; + }; + ocp@68000000 { am35x_otg_hs: am35x_otg_hs@5c040000 { compatible = "ti,omap3-musb"; From 069ce2ef1a6dd84cbd4d897b333e30f825e021f0 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 10 Oct 2019 23:32:17 +0200 Subject: [PATCH 11/83] cpuidle: teo: Ignore disabled idle states that are too deep Prevent disabled CPU idle state with target residencies beyond the anticipated idle duration from being taken into account by the TEO governor. Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems") Signed-off-by: Rafael J. Wysocki Cc: 5.1+ # 5.1+ --- drivers/cpuidle/governors/teo.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c index b5a0e498f798..8806db95a913 100644 --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -257,6 +257,13 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, struct cpuidle_state_usage *su = &dev->states_usage[i]; if (s->disabled || su->disable) { + /* + * Ignore disabled states with target residencies beyond + * the anticipated idle duration. + */ + if (s->target_residency > duration_us) + continue; + /* * If the "early hits" metric of a disabled state is * greater than the current maximum, it should be taken From 4f690bb8ce4cc5d3fabe3a8e9c2401de1554cdc1 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 10 Oct 2019 23:32:59 +0200 Subject: [PATCH 12/83] cpuidle: teo: Rename local variable in teo_select() Rename a local variable in teo_select() in preparation for subsequent code modifications, no intentional impact. Signed-off-by: Rafael J. Wysocki Cc: 5.1+ # 5.1+ --- drivers/cpuidle/governors/teo.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c index 8806db95a913..de3139b17a50 100644 --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -233,7 +233,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, { struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); int latency_req = cpuidle_governor_latency_req(dev->cpu); - unsigned int duration_us, count; + unsigned int duration_us, early_hits; int max_early_idx, constraint_idx, idx, i; ktime_t delta_tick; @@ -247,7 +247,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, cpu_data->sleep_length_ns = tick_nohz_get_sleep_length(&delta_tick); duration_us = ktime_to_us(cpu_data->sleep_length_ns); - count = 0; + early_hits = 0; max_early_idx = -1; constraint_idx = drv->state_count; idx = -1; @@ -270,12 +270,12 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * into account, because it would be a mistake to select * a deeper state with lower "early hits" metric. The * index cannot be changed to point to it, however, so - * just increase the max count alone and let the index - * still point to a shallower idle state. + * just increase the "early hits" count alone and let + * the index still point to a shallower idle state. */ if (max_early_idx >= 0 && - count < cpu_data->states[i].early_hits) - count = cpu_data->states[i].early_hits; + early_hits < cpu_data->states[i].early_hits) + early_hits = cpu_data->states[i].early_hits; continue; } @@ -291,10 +291,10 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, idx = i; - if (count < cpu_data->states[i].early_hits && + if (early_hits < cpu_data->states[i].early_hits && !(tick_nohz_tick_stopped() && drv->states[i].target_residency < TICK_USEC)) { - count = cpu_data->states[i].early_hits; + early_hits = cpu_data->states[i].early_hits; max_early_idx = i; } } @@ -323,10 +323,9 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, if (idx < 0) { idx = 0; /* No states enabled. Must use 0. */ } else if (idx > 0) { + unsigned int count = 0; u64 sum = 0; - count = 0; - /* * Count and sum the most recent idle duration values less than * the current expected idle duration value. From e43dcf20215f0287ea113102617ca04daa76b70e Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 10 Oct 2019 23:36:15 +0200 Subject: [PATCH 13/83] cpuidle: teo: Consider hits and misses metrics of disabled states The TEO governor uses idle duration "bins" defined in accordance with the CPU idle states table provided by the driver, so that each "bin" covers the idle duration range between the target residency of the idle state corresponding to it and the target residency of the closest deeper idle state. The governor collects statistics for each bin regardless of whether or not the idle state corresponding to it is currently enabled. In particular, the "hits" and "misses" metrics measure the likelihood of a situation in which both the time till the next timer (sleep length) and the idle duration measured after wakeup fall into the given bin. Namely, if the "hits" value is greater than the "misses" one, that situation is more likely than the one in which the sleep length falls into the given bin, but the idle duration measured after wakeup falls into a bin corresponding to one of the shallower idle states. If the idle state corresponding to the given bin is disabled, it cannot be selected and if it turns out to be the one that should be selected, a shallower idle state needs to be used instead of it. Nevertheless, the metrics collected for the bin corresponding to it are still valid and need to be taken into account as though that state had not been disabled. For this reason, make teo_select() always use the "hits" and "misses" values of the idle duration range that the sleep length falls into even if the specific idle state corresponding to it is disabled and if the "hits" values is greater than the "misses" one, select the closest enabled shallower idle state in that case. Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems") Signed-off-by: Rafael J. Wysocki Cc: 5.1+ # 5.1+ --- drivers/cpuidle/governors/teo.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c index de3139b17a50..5a0f60ea4ab9 100644 --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -233,7 +233,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, { struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); int latency_req = cpuidle_governor_latency_req(dev->cpu); - unsigned int duration_us, early_hits; + unsigned int duration_us, hits, misses, early_hits; int max_early_idx, constraint_idx, idx, i; ktime_t delta_tick; @@ -247,6 +247,8 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, cpu_data->sleep_length_ns = tick_nohz_get_sleep_length(&delta_tick); duration_us = ktime_to_us(cpu_data->sleep_length_ns); + hits = 0; + misses = 0; early_hits = 0; max_early_idx = -1; constraint_idx = drv->state_count; @@ -264,6 +266,17 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, if (s->target_residency > duration_us) continue; + /* + * This state is disabled, so the range of idle duration + * values corresponding to it is covered by the current + * candidate state, but still the "hits" and "misses" + * metrics of the disabled state need to be used to + * decide whether or not the state covering the range in + * question is good enough. + */ + hits = cpu_data->states[i].hits; + misses = cpu_data->states[i].misses; + /* * If the "early hits" metric of a disabled state is * greater than the current maximum, it should be taken @@ -280,8 +293,11 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, continue; } - if (idx < 0) + if (idx < 0) { idx = i; /* first enabled state */ + hits = cpu_data->states[i].hits; + misses = cpu_data->states[i].misses; + } if (s->target_residency > duration_us) break; @@ -290,6 +306,8 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, constraint_idx = i; idx = i; + hits = cpu_data->states[i].hits; + misses = cpu_data->states[i].misses; if (early_hits < cpu_data->states[i].early_hits && !(tick_nohz_tick_stopped() && @@ -307,8 +325,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * "early hits" metric, but if that cannot be determined, just use the * state selected so far. */ - if (cpu_data->states[idx].hits <= cpu_data->states[idx].misses && - max_early_idx >= 0) { + if (hits <= misses && max_early_idx >= 0) { idx = max_early_idx; duration_us = drv->states[idx].target_residency; } From 159e48560f51d9c2aa02d762a18cd24f7868ab27 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 10 Oct 2019 23:37:39 +0200 Subject: [PATCH 14/83] cpuidle: teo: Fix "early hits" handling for disabled idle states The TEO governor uses idle duration "bins" defined in accordance with the CPU idle states table provided by the driver, so that each "bin" covers the idle duration range between the target residency of the idle state corresponding to it and the target residency of the closest deeper idle state. The governor collects statistics for each bin regardless of whether or not the idle state corresponding to it is currently enabled. In particular, the "early hits" metric measures the likelihood of a situation in which the idle duration measured after wakeup falls into to given bin, but the time till the next timer (sleep length) falls into a bin corresponding to one of the deeper idle states. It is used when the "hits" and "misses" metrics indicate that the state "matching" the sleep length should not be selected, so that the state with the maximum "early hits" value is selected instead of it. If the idle state corresponding to the given bin is disabled, it cannot be selected and if it turns out to be the one that should be selected, a shallower idle state needs to be used instead of it. Nevertheless, the metrics collected for the bin corresponding to it are still valid and need to be taken into account as though that state had not been disabled. As far as the "early hits" metric is concerned, teo_select() tries to take disabled states into account, but the state index corresponding to the maximum "early hits" value computed by it may be incorrect. Namely, it always uses the index of the previous maximum "early hits" state then, but there may be enabled idle states closer to the disabled one in question. In particular, if the current candidate state (whose index is the idx value) is closer to the disabled one and the "early hits" value of the disabled state is greater than the current maximum, the index of the current candidate state (idx) should replace the "maximum early hits state" index. Modify the code to handle that case correctly. Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems") Reported-by: Doug Smythies Signed-off-by: Rafael J. Wysocki Cc: 5.1+ # 5.1+ --- drivers/cpuidle/governors/teo.c | 35 ++++++++++++++++++++++++--------- 1 file changed, 26 insertions(+), 9 deletions(-) diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c index 5a0f60ea4ab9..b9b9156618e6 100644 --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -277,18 +277,35 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, hits = cpu_data->states[i].hits; misses = cpu_data->states[i].misses; + if (early_hits >= cpu_data->states[i].early_hits || + idx < 0) + continue; + /* - * If the "early hits" metric of a disabled state is - * greater than the current maximum, it should be taken - * into account, because it would be a mistake to select - * a deeper state with lower "early hits" metric. The - * index cannot be changed to point to it, however, so - * just increase the "early hits" count alone and let - * the index still point to a shallower idle state. + * If the current candidate state has been the one with + * the maximum "early hits" metric so far, the "early + * hits" metric of the disabled state replaces the + * current "early hits" count to avoid selecting a + * deeper state with lower "early hits" metric. */ - if (max_early_idx >= 0 && - early_hits < cpu_data->states[i].early_hits) + if (max_early_idx == idx) { early_hits = cpu_data->states[i].early_hits; + continue; + } + + /* + * The current candidate state is closer to the disabled + * one than the current maximum "early hits" state, so + * replace the latter with it, but in case the maximum + * "early hits" state index has not been set so far, + * check if the current candidate state is not too + * shallow for that role. + */ + if (!(tick_nohz_tick_stopped() && + drv->states[idx].target_residency < TICK_USEC)) { + early_hits = cpu_data->states[i].early_hits; + max_early_idx = idx; + } continue; } From da6043fe85eb5ec621e34a92540735dcebbea134 Mon Sep 17 00:00:00 2001 From: Andy Whitcroft Date: Wed, 25 Sep 2019 15:39:12 +0100 Subject: [PATCH 15/83] PM / hibernate: memory_bm_find_bit(): Tighten node optimisation When looking for a bit by number we make use of the cached result from the preceding lookup to speed up operation. Firstly we check if the requested pfn is within the cached zone and if not lookup the new zone. We then check if the offset for that pfn falls within the existing cached node. This happens regardless of whether the node is within the zone we are now scanning. With certain memory layouts it is possible for this to false trigger creating a temporary alias for the pfn to a different bit. This leads the hibernation code to free memory which it was never allocated with the expected fallout. Ensure the zone we are scanning matches the cached zone before considering the cached node. Deep thanks go to Andrea for many, many, many hours of hacking and testing that went into cornering this bug. Reported-by: Andrea Righi Tested-by: Andrea Righi Signed-off-by: Andy Whitcroft Signed-off-by: Rafael J. Wysocki --- kernel/power/snapshot.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index 83105874f255..26b9168321e7 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -734,8 +734,15 @@ static int memory_bm_find_bit(struct memory_bitmap *bm, unsigned long pfn, * We have found the zone. Now walk the radix tree to find the leaf node * for our PFN. */ + + /* + * If the zone we wish to scan is the the current zone and the + * pfn falls into the current node then we do not need to walk + * the tree. + */ node = bm->cur.node; - if (((pfn - zone->start_pfn) & ~BM_BLOCK_MASK) == bm->cur.node_pfn) + if (zone == bm->cur.zone && + ((pfn - zone->start_pfn) & ~BM_BLOCK_MASK) == bm->cur.node_pfn) goto node_found; node = zone->rtree; From 1b82a4b5d331ba3814a53f2fc289c2d2716bd3fd Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Fri, 18 Oct 2019 11:37:45 +0100 Subject: [PATCH 16/83] cpufreq: scpi: remove stale/outdated comment about the driver Commit 343a8d17fa8d ("cpufreq: scpi: remove arm_big_little dependency") removed the arm_big_little dependency from scpi driver and doesn't provide any ops to arm_big_little cpufreq driver. Lets remove that stale comment. Acked-by: Nicolas Pitre Signed-off-by: Sudeep Holla Signed-off-by: Viresh Kumar --- drivers/cpufreq/scpi-cpufreq.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/cpufreq/scpi-cpufreq.c b/drivers/cpufreq/scpi-cpufreq.c index 2b51e0718c9f..20d1f85d5f5a 100644 --- a/drivers/cpufreq/scpi-cpufreq.c +++ b/drivers/cpufreq/scpi-cpufreq.c @@ -1,8 +1,6 @@ /* * System Control and Power Interface (SCPI) based CPUFreq Interface driver * - * It provides necessary ops to arm_big_little cpufreq driver. - * * Copyright (C) 2015 ARM Ltd. * Sudeep Holla * From a0f950d3a0addc9552233aa2ffbdc086aa02106a Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 21 Oct 2019 11:20:34 +0100 Subject: [PATCH 17/83] cpufreq: merge arm_big_little and vexpress-spc arm_big_little cpufreq driver was designed as a generic big little driver that could be used by any platform and make use of bL switcher. Over years alternate solutions have been designed and merged to deal with bL/HMP systems like EAS. Also since no other driver made use of generic arm_big_little cpufreq driver except Vexpress SPC, we can merge them together as vexpress-spc driver used only on Vexpress TC2(CA15_CA7) platform. Acked-by: Nicolas Pitre Signed-off-by: Sudeep Holla Signed-off-by: Viresh Kumar --- MAINTAINERS | 5 +- drivers/cpufreq/Kconfig.arm | 12 +- drivers/cpufreq/Makefile | 2 - drivers/cpufreq/arm_big_little.c | 658 ------------------------ drivers/cpufreq/arm_big_little.h | 43 -- drivers/cpufreq/vexpress-spc-cpufreq.c | 660 ++++++++++++++++++++++++- 6 files changed, 651 insertions(+), 729 deletions(-) delete mode 100644 drivers/cpufreq/arm_big_little.c delete mode 100644 drivers/cpufreq/arm_big_little.h diff --git a/MAINTAINERS b/MAINTAINERS index 296de2b51c83..07f5d8dc3027 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4273,14 +4273,13 @@ F: include/linux/cpufreq.h F: include/linux/sched/cpufreq.h F: tools/testing/selftests/cpufreq/ -CPU FREQUENCY DRIVERS - ARM BIG LITTLE +CPU FREQUENCY DRIVERS - VEXPRESS SPC ARM BIG LITTLE M: Viresh Kumar M: Sudeep Holla L: linux-pm@vger.kernel.org W: http://www.arm.com/products/processors/technologies/biglittleprocessing.php S: Maintained -F: drivers/cpufreq/arm_big_little.h -F: drivers/cpufreq/arm_big_little.c +F: drivers/cpufreq/vexpress-spc-cpufreq.c CPU POWER MONITORING SUBSYSTEM M: Thomas Renninger diff --git a/drivers/cpufreq/Kconfig.arm b/drivers/cpufreq/Kconfig.arm index a905796f7f85..3858d86cf409 100644 --- a/drivers/cpufreq/Kconfig.arm +++ b/drivers/cpufreq/Kconfig.arm @@ -49,14 +49,6 @@ config ARM_ARMADA_8K_CPUFREQ If in doubt, say N. -# big LITTLE core layer and glue drivers -config ARM_BIG_LITTLE_CPUFREQ - tristate "Generic ARM big LITTLE CPUfreq driver" - depends on ARM_CPU_TOPOLOGY && HAVE_CLK - select PM_OPP - help - This enables the Generic CPUfreq driver for ARM big.LITTLE platforms. - config ARM_SCPI_CPUFREQ tristate "SCPI based CPUfreq driver" depends on ARM_SCPI_PROTOCOL && COMMON_CLK_SCPI @@ -69,7 +61,9 @@ config ARM_SCPI_CPUFREQ config ARM_VEXPRESS_SPC_CPUFREQ tristate "Versatile Express SPC based CPUfreq driver" - depends on ARM_BIG_LITTLE_CPUFREQ && ARCH_VEXPRESS_SPC + depends on ARM_CPU_TOPOLOGY && HAVE_CLK + depends on ARCH_VEXPRESS_SPC + select PM_OPP help This add the CPUfreq driver support for Versatile Express big.LITTLE platforms using SPC for power management. diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile index 9a9f5ccd13d9..f6670c4abbb0 100644 --- a/drivers/cpufreq/Makefile +++ b/drivers/cpufreq/Makefile @@ -47,8 +47,6 @@ obj-$(CONFIG_X86_SFI_CPUFREQ) += sfi-cpufreq.o ################################################################################## # ARM SoC drivers -obj-$(CONFIG_ARM_BIG_LITTLE_CPUFREQ) += arm_big_little.o - obj-$(CONFIG_ARM_ARMADA_37XX_CPUFREQ) += armada-37xx-cpufreq.o obj-$(CONFIG_ARM_ARMADA_8K_CPUFREQ) += armada-8k-cpufreq.o obj-$(CONFIG_ARM_BRCMSTB_AVS_CPUFREQ) += brcmstb-avs-cpufreq.o diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c deleted file mode 100644 index 7fe52fcddcf1..000000000000 --- a/drivers/cpufreq/arm_big_little.c +++ /dev/null @@ -1,658 +0,0 @@ -/* - * ARM big.LITTLE Platforms CPUFreq support - * - * Copyright (C) 2013 ARM Ltd. - * Sudeep KarkadaNagesha - * - * Copyright (C) 2013 Linaro. - * Viresh Kumar - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - */ - -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "arm_big_little.h" - -/* Currently we support only two clusters */ -#define A15_CLUSTER 0 -#define A7_CLUSTER 1 -#define MAX_CLUSTERS 2 - -#ifdef CONFIG_BL_SWITCHER -#include -static bool bL_switching_enabled; -#define is_bL_switching_enabled() bL_switching_enabled -#define set_switching_enabled(x) (bL_switching_enabled = (x)) -#else -#define is_bL_switching_enabled() false -#define set_switching_enabled(x) do { } while (0) -#define bL_switch_request(...) do { } while (0) -#define bL_switcher_put_enabled() do { } while (0) -#define bL_switcher_get_enabled() do { } while (0) -#endif - -#define ACTUAL_FREQ(cluster, freq) ((cluster == A7_CLUSTER) ? freq << 1 : freq) -#define VIRT_FREQ(cluster, freq) ((cluster == A7_CLUSTER) ? freq >> 1 : freq) - -static struct thermal_cooling_device *cdev[MAX_CLUSTERS]; -static const struct cpufreq_arm_bL_ops *arm_bL_ops; -static struct clk *clk[MAX_CLUSTERS]; -static struct cpufreq_frequency_table *freq_table[MAX_CLUSTERS + 1]; -static atomic_t cluster_usage[MAX_CLUSTERS + 1]; - -static unsigned int clk_big_min; /* (Big) clock frequencies */ -static unsigned int clk_little_max; /* Maximum clock frequency (Little) */ - -static DEFINE_PER_CPU(unsigned int, physical_cluster); -static DEFINE_PER_CPU(unsigned int, cpu_last_req_freq); - -static struct mutex cluster_lock[MAX_CLUSTERS]; - -static inline int raw_cpu_to_cluster(int cpu) -{ - return topology_physical_package_id(cpu); -} - -static inline int cpu_to_cluster(int cpu) -{ - return is_bL_switching_enabled() ? - MAX_CLUSTERS : raw_cpu_to_cluster(cpu); -} - -static unsigned int find_cluster_maxfreq(int cluster) -{ - int j; - u32 max_freq = 0, cpu_freq; - - for_each_online_cpu(j) { - cpu_freq = per_cpu(cpu_last_req_freq, j); - - if ((cluster == per_cpu(physical_cluster, j)) && - (max_freq < cpu_freq)) - max_freq = cpu_freq; - } - - pr_debug("%s: cluster: %d, max freq: %d\n", __func__, cluster, - max_freq); - - return max_freq; -} - -static unsigned int clk_get_cpu_rate(unsigned int cpu) -{ - u32 cur_cluster = per_cpu(physical_cluster, cpu); - u32 rate = clk_get_rate(clk[cur_cluster]) / 1000; - - /* For switcher we use virtual A7 clock rates */ - if (is_bL_switching_enabled()) - rate = VIRT_FREQ(cur_cluster, rate); - - pr_debug("%s: cpu: %d, cluster: %d, freq: %u\n", __func__, cpu, - cur_cluster, rate); - - return rate; -} - -static unsigned int bL_cpufreq_get_rate(unsigned int cpu) -{ - if (is_bL_switching_enabled()) { - pr_debug("%s: freq: %d\n", __func__, per_cpu(cpu_last_req_freq, - cpu)); - - return per_cpu(cpu_last_req_freq, cpu); - } else { - return clk_get_cpu_rate(cpu); - } -} - -static unsigned int -bL_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate) -{ - u32 new_rate, prev_rate; - int ret; - bool bLs = is_bL_switching_enabled(); - - mutex_lock(&cluster_lock[new_cluster]); - - if (bLs) { - prev_rate = per_cpu(cpu_last_req_freq, cpu); - per_cpu(cpu_last_req_freq, cpu) = rate; - per_cpu(physical_cluster, cpu) = new_cluster; - - new_rate = find_cluster_maxfreq(new_cluster); - new_rate = ACTUAL_FREQ(new_cluster, new_rate); - } else { - new_rate = rate; - } - - pr_debug("%s: cpu: %d, old cluster: %d, new cluster: %d, freq: %d\n", - __func__, cpu, old_cluster, new_cluster, new_rate); - - ret = clk_set_rate(clk[new_cluster], new_rate * 1000); - if (!ret) { - /* - * FIXME: clk_set_rate hasn't returned an error here however it - * may be that clk_change_rate failed due to hardware or - * firmware issues and wasn't able to report that due to the - * current design of the clk core layer. To work around this - * problem we will read back the clock rate and check it is - * correct. This needs to be removed once clk core is fixed. - */ - if (clk_get_rate(clk[new_cluster]) != new_rate * 1000) - ret = -EIO; - } - - if (WARN_ON(ret)) { - pr_err("clk_set_rate failed: %d, new cluster: %d\n", ret, - new_cluster); - if (bLs) { - per_cpu(cpu_last_req_freq, cpu) = prev_rate; - per_cpu(physical_cluster, cpu) = old_cluster; - } - - mutex_unlock(&cluster_lock[new_cluster]); - - return ret; - } - - mutex_unlock(&cluster_lock[new_cluster]); - - /* Recalc freq for old cluster when switching clusters */ - if (old_cluster != new_cluster) { - pr_debug("%s: cpu: %d, old cluster: %d, new cluster: %d\n", - __func__, cpu, old_cluster, new_cluster); - - /* Switch cluster */ - bL_switch_request(cpu, new_cluster); - - mutex_lock(&cluster_lock[old_cluster]); - - /* Set freq of old cluster if there are cpus left on it */ - new_rate = find_cluster_maxfreq(old_cluster); - new_rate = ACTUAL_FREQ(old_cluster, new_rate); - - if (new_rate) { - pr_debug("%s: Updating rate of old cluster: %d, to freq: %d\n", - __func__, old_cluster, new_rate); - - if (clk_set_rate(clk[old_cluster], new_rate * 1000)) - pr_err("%s: clk_set_rate failed: %d, old cluster: %d\n", - __func__, ret, old_cluster); - } - mutex_unlock(&cluster_lock[old_cluster]); - } - - return 0; -} - -/* Set clock frequency */ -static int bL_cpufreq_set_target(struct cpufreq_policy *policy, - unsigned int index) -{ - u32 cpu = policy->cpu, cur_cluster, new_cluster, actual_cluster; - unsigned int freqs_new; - int ret; - - cur_cluster = cpu_to_cluster(cpu); - new_cluster = actual_cluster = per_cpu(physical_cluster, cpu); - - freqs_new = freq_table[cur_cluster][index].frequency; - - if (is_bL_switching_enabled()) { - if ((actual_cluster == A15_CLUSTER) && - (freqs_new < clk_big_min)) { - new_cluster = A7_CLUSTER; - } else if ((actual_cluster == A7_CLUSTER) && - (freqs_new > clk_little_max)) { - new_cluster = A15_CLUSTER; - } - } - - ret = bL_cpufreq_set_rate(cpu, actual_cluster, new_cluster, freqs_new); - - if (!ret) { - arch_set_freq_scale(policy->related_cpus, freqs_new, - policy->cpuinfo.max_freq); - } - - return ret; -} - -static inline u32 get_table_count(struct cpufreq_frequency_table *table) -{ - int count; - - for (count = 0; table[count].frequency != CPUFREQ_TABLE_END; count++) - ; - - return count; -} - -/* get the minimum frequency in the cpufreq_frequency_table */ -static inline u32 get_table_min(struct cpufreq_frequency_table *table) -{ - struct cpufreq_frequency_table *pos; - uint32_t min_freq = ~0; - cpufreq_for_each_entry(pos, table) - if (pos->frequency < min_freq) - min_freq = pos->frequency; - return min_freq; -} - -/* get the maximum frequency in the cpufreq_frequency_table */ -static inline u32 get_table_max(struct cpufreq_frequency_table *table) -{ - struct cpufreq_frequency_table *pos; - uint32_t max_freq = 0; - cpufreq_for_each_entry(pos, table) - if (pos->frequency > max_freq) - max_freq = pos->frequency; - return max_freq; -} - -static int merge_cluster_tables(void) -{ - int i, j, k = 0, count = 1; - struct cpufreq_frequency_table *table; - - for (i = 0; i < MAX_CLUSTERS; i++) - count += get_table_count(freq_table[i]); - - table = kcalloc(count, sizeof(*table), GFP_KERNEL); - if (!table) - return -ENOMEM; - - freq_table[MAX_CLUSTERS] = table; - - /* Add in reverse order to get freqs in increasing order */ - for (i = MAX_CLUSTERS - 1; i >= 0; i--) { - for (j = 0; freq_table[i][j].frequency != CPUFREQ_TABLE_END; - j++) { - table[k].frequency = VIRT_FREQ(i, - freq_table[i][j].frequency); - pr_debug("%s: index: %d, freq: %d\n", __func__, k, - table[k].frequency); - k++; - } - } - - table[k].driver_data = k; - table[k].frequency = CPUFREQ_TABLE_END; - - pr_debug("%s: End, table: %p, count: %d\n", __func__, table, k); - - return 0; -} - -static void _put_cluster_clk_and_freq_table(struct device *cpu_dev, - const struct cpumask *cpumask) -{ - u32 cluster = raw_cpu_to_cluster(cpu_dev->id); - - if (!freq_table[cluster]) - return; - - clk_put(clk[cluster]); - dev_pm_opp_free_cpufreq_table(cpu_dev, &freq_table[cluster]); - if (arm_bL_ops->free_opp_table) - arm_bL_ops->free_opp_table(cpumask); - dev_dbg(cpu_dev, "%s: cluster: %d\n", __func__, cluster); -} - -static void put_cluster_clk_and_freq_table(struct device *cpu_dev, - const struct cpumask *cpumask) -{ - u32 cluster = cpu_to_cluster(cpu_dev->id); - int i; - - if (atomic_dec_return(&cluster_usage[cluster])) - return; - - if (cluster < MAX_CLUSTERS) - return _put_cluster_clk_and_freq_table(cpu_dev, cpumask); - - for_each_present_cpu(i) { - struct device *cdev = get_cpu_device(i); - if (!cdev) { - pr_err("%s: failed to get cpu%d device\n", __func__, i); - return; - } - - _put_cluster_clk_and_freq_table(cdev, cpumask); - } - - /* free virtual table */ - kfree(freq_table[cluster]); -} - -static int _get_cluster_clk_and_freq_table(struct device *cpu_dev, - const struct cpumask *cpumask) -{ - u32 cluster = raw_cpu_to_cluster(cpu_dev->id); - int ret; - - if (freq_table[cluster]) - return 0; - - ret = arm_bL_ops->init_opp_table(cpumask); - if (ret) { - dev_err(cpu_dev, "%s: init_opp_table failed, cpu: %d, err: %d\n", - __func__, cpu_dev->id, ret); - goto out; - } - - ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &freq_table[cluster]); - if (ret) { - dev_err(cpu_dev, "%s: failed to init cpufreq table, cpu: %d, err: %d\n", - __func__, cpu_dev->id, ret); - goto free_opp_table; - } - - clk[cluster] = clk_get(cpu_dev, NULL); - if (!IS_ERR(clk[cluster])) { - dev_dbg(cpu_dev, "%s: clk: %p & freq table: %p, cluster: %d\n", - __func__, clk[cluster], freq_table[cluster], - cluster); - return 0; - } - - dev_err(cpu_dev, "%s: Failed to get clk for cpu: %d, cluster: %d\n", - __func__, cpu_dev->id, cluster); - ret = PTR_ERR(clk[cluster]); - dev_pm_opp_free_cpufreq_table(cpu_dev, &freq_table[cluster]); - -free_opp_table: - if (arm_bL_ops->free_opp_table) - arm_bL_ops->free_opp_table(cpumask); -out: - dev_err(cpu_dev, "%s: Failed to get data for cluster: %d\n", __func__, - cluster); - return ret; -} - -static int get_cluster_clk_and_freq_table(struct device *cpu_dev, - const struct cpumask *cpumask) -{ - u32 cluster = cpu_to_cluster(cpu_dev->id); - int i, ret; - - if (atomic_inc_return(&cluster_usage[cluster]) != 1) - return 0; - - if (cluster < MAX_CLUSTERS) { - ret = _get_cluster_clk_and_freq_table(cpu_dev, cpumask); - if (ret) - atomic_dec(&cluster_usage[cluster]); - return ret; - } - - /* - * Get data for all clusters and fill virtual cluster with a merge of - * both - */ - for_each_present_cpu(i) { - struct device *cdev = get_cpu_device(i); - if (!cdev) { - pr_err("%s: failed to get cpu%d device\n", __func__, i); - return -ENODEV; - } - - ret = _get_cluster_clk_and_freq_table(cdev, cpumask); - if (ret) - goto put_clusters; - } - - ret = merge_cluster_tables(); - if (ret) - goto put_clusters; - - /* Assuming 2 cluster, set clk_big_min and clk_little_max */ - clk_big_min = get_table_min(freq_table[0]); - clk_little_max = VIRT_FREQ(1, get_table_max(freq_table[1])); - - pr_debug("%s: cluster: %d, clk_big_min: %d, clk_little_max: %d\n", - __func__, cluster, clk_big_min, clk_little_max); - - return 0; - -put_clusters: - for_each_present_cpu(i) { - struct device *cdev = get_cpu_device(i); - if (!cdev) { - pr_err("%s: failed to get cpu%d device\n", __func__, i); - return -ENODEV; - } - - _put_cluster_clk_and_freq_table(cdev, cpumask); - } - - atomic_dec(&cluster_usage[cluster]); - - return ret; -} - -/* Per-CPU initialization */ -static int bL_cpufreq_init(struct cpufreq_policy *policy) -{ - u32 cur_cluster = cpu_to_cluster(policy->cpu); - struct device *cpu_dev; - int ret; - - cpu_dev = get_cpu_device(policy->cpu); - if (!cpu_dev) { - pr_err("%s: failed to get cpu%d device\n", __func__, - policy->cpu); - return -ENODEV; - } - - if (cur_cluster < MAX_CLUSTERS) { - int cpu; - - cpumask_copy(policy->cpus, topology_core_cpumask(policy->cpu)); - - for_each_cpu(cpu, policy->cpus) - per_cpu(physical_cluster, cpu) = cur_cluster; - } else { - /* Assumption: during init, we are always running on A15 */ - per_cpu(physical_cluster, policy->cpu) = A15_CLUSTER; - } - - ret = get_cluster_clk_and_freq_table(cpu_dev, policy->cpus); - if (ret) - return ret; - - policy->freq_table = freq_table[cur_cluster]; - policy->cpuinfo.transition_latency = - arm_bL_ops->get_transition_latency(cpu_dev); - - dev_pm_opp_of_register_em(policy->cpus); - - if (is_bL_switching_enabled()) - per_cpu(cpu_last_req_freq, policy->cpu) = clk_get_cpu_rate(policy->cpu); - - dev_info(cpu_dev, "%s: CPU %d initialized\n", __func__, policy->cpu); - return 0; -} - -static int bL_cpufreq_exit(struct cpufreq_policy *policy) -{ - struct device *cpu_dev; - int cur_cluster = cpu_to_cluster(policy->cpu); - - if (cur_cluster < MAX_CLUSTERS) { - cpufreq_cooling_unregister(cdev[cur_cluster]); - cdev[cur_cluster] = NULL; - } - - cpu_dev = get_cpu_device(policy->cpu); - if (!cpu_dev) { - pr_err("%s: failed to get cpu%d device\n", __func__, - policy->cpu); - return -ENODEV; - } - - put_cluster_clk_and_freq_table(cpu_dev, policy->related_cpus); - dev_dbg(cpu_dev, "%s: Exited, cpu: %d\n", __func__, policy->cpu); - - return 0; -} - -static void bL_cpufreq_ready(struct cpufreq_policy *policy) -{ - int cur_cluster = cpu_to_cluster(policy->cpu); - - /* Do not register a cpu_cooling device if we are in IKS mode */ - if (cur_cluster >= MAX_CLUSTERS) - return; - - cdev[cur_cluster] = of_cpufreq_cooling_register(policy); -} - -static struct cpufreq_driver bL_cpufreq_driver = { - .name = "arm-big-little", - .flags = CPUFREQ_STICKY | - CPUFREQ_HAVE_GOVERNOR_PER_POLICY | - CPUFREQ_NEED_INITIAL_FREQ_CHECK, - .verify = cpufreq_generic_frequency_table_verify, - .target_index = bL_cpufreq_set_target, - .get = bL_cpufreq_get_rate, - .init = bL_cpufreq_init, - .exit = bL_cpufreq_exit, - .ready = bL_cpufreq_ready, - .attr = cpufreq_generic_attr, -}; - -#ifdef CONFIG_BL_SWITCHER -static int bL_cpufreq_switcher_notifier(struct notifier_block *nfb, - unsigned long action, void *_arg) -{ - pr_debug("%s: action: %ld\n", __func__, action); - - switch (action) { - case BL_NOTIFY_PRE_ENABLE: - case BL_NOTIFY_PRE_DISABLE: - cpufreq_unregister_driver(&bL_cpufreq_driver); - break; - - case BL_NOTIFY_POST_ENABLE: - set_switching_enabled(true); - cpufreq_register_driver(&bL_cpufreq_driver); - break; - - case BL_NOTIFY_POST_DISABLE: - set_switching_enabled(false); - cpufreq_register_driver(&bL_cpufreq_driver); - break; - - default: - return NOTIFY_DONE; - } - - return NOTIFY_OK; -} - -static struct notifier_block bL_switcher_notifier = { - .notifier_call = bL_cpufreq_switcher_notifier, -}; - -static int __bLs_register_notifier(void) -{ - return bL_switcher_register_notifier(&bL_switcher_notifier); -} - -static int __bLs_unregister_notifier(void) -{ - return bL_switcher_unregister_notifier(&bL_switcher_notifier); -} -#else -static int __bLs_register_notifier(void) { return 0; } -static int __bLs_unregister_notifier(void) { return 0; } -#endif - -int bL_cpufreq_register(const struct cpufreq_arm_bL_ops *ops) -{ - int ret, i; - - if (arm_bL_ops) { - pr_debug("%s: Already registered: %s, exiting\n", __func__, - arm_bL_ops->name); - return -EBUSY; - } - - if (!ops || !strlen(ops->name) || !ops->init_opp_table || - !ops->get_transition_latency) { - pr_err("%s: Invalid arm_bL_ops, exiting\n", __func__); - return -ENODEV; - } - - arm_bL_ops = ops; - - set_switching_enabled(bL_switcher_get_enabled()); - - for (i = 0; i < MAX_CLUSTERS; i++) - mutex_init(&cluster_lock[i]); - - ret = cpufreq_register_driver(&bL_cpufreq_driver); - if (ret) { - pr_info("%s: Failed registering platform driver: %s, err: %d\n", - __func__, ops->name, ret); - arm_bL_ops = NULL; - } else { - ret = __bLs_register_notifier(); - if (ret) { - cpufreq_unregister_driver(&bL_cpufreq_driver); - arm_bL_ops = NULL; - } else { - pr_info("%s: Registered platform driver: %s\n", - __func__, ops->name); - } - } - - bL_switcher_put_enabled(); - return ret; -} -EXPORT_SYMBOL_GPL(bL_cpufreq_register); - -void bL_cpufreq_unregister(const struct cpufreq_arm_bL_ops *ops) -{ - if (arm_bL_ops != ops) { - pr_err("%s: Registered with: %s, can't unregister, exiting\n", - __func__, arm_bL_ops->name); - return; - } - - bL_switcher_get_enabled(); - __bLs_unregister_notifier(); - cpufreq_unregister_driver(&bL_cpufreq_driver); - bL_switcher_put_enabled(); - pr_info("%s: Un-registered platform driver: %s\n", __func__, - arm_bL_ops->name); - arm_bL_ops = NULL; -} -EXPORT_SYMBOL_GPL(bL_cpufreq_unregister); - -MODULE_AUTHOR("Viresh Kumar "); -MODULE_DESCRIPTION("Generic ARM big LITTLE cpufreq driver"); -MODULE_LICENSE("GPL v2"); diff --git a/drivers/cpufreq/arm_big_little.h b/drivers/cpufreq/arm_big_little.h deleted file mode 100644 index 88a176e466c8..000000000000 --- a/drivers/cpufreq/arm_big_little.h +++ /dev/null @@ -1,43 +0,0 @@ -/* - * ARM big.LITTLE platform's CPUFreq header file - * - * Copyright (C) 2013 ARM Ltd. - * Sudeep KarkadaNagesha - * - * Copyright (C) 2013 Linaro. - * Viresh Kumar - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - */ -#ifndef CPUFREQ_ARM_BIG_LITTLE_H -#define CPUFREQ_ARM_BIG_LITTLE_H - -#include -#include -#include - -struct cpufreq_arm_bL_ops { - char name[CPUFREQ_NAME_LEN]; - - /* - * This must set opp table for cpu_dev in a similar way as done by - * dev_pm_opp_of_add_table(). - */ - int (*init_opp_table)(const struct cpumask *cpumask); - - /* Optional */ - int (*get_transition_latency)(struct device *cpu_dev); - void (*free_opp_table)(const struct cpumask *cpumask); -}; - -int bL_cpufreq_register(const struct cpufreq_arm_bL_ops *ops); -void bL_cpufreq_unregister(const struct cpufreq_arm_bL_ops *ops); - -#endif /* CPUFREQ_ARM_BIG_LITTLE_H */ diff --git a/drivers/cpufreq/vexpress-spc-cpufreq.c b/drivers/cpufreq/vexpress-spc-cpufreq.c index 53237289e606..622dc42e42b1 100644 --- a/drivers/cpufreq/vexpress-spc-cpufreq.c +++ b/drivers/cpufreq/vexpress-spc-cpufreq.c @@ -1,31 +1,660 @@ +// SPDX-License-Identifier: GPL-2.0 /* * Versatile Express SPC CPUFreq Interface driver * - * It provides necessary ops to arm_big_little cpufreq driver. + * Copyright (C) 2013 - 2019 ARM Ltd. + * Sudeep Holla * - * Copyright (C) 2013 ARM Ltd. - * Sudeep KarkadaNagesha - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. + * Copyright (C) 2013 Linaro. + * Viresh Kumar */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include #include #include +#include +#include +#include #include +#include +#include #include #include +#include +#include #include -#include "arm_big_little.h" +struct cpufreq_arm_bL_ops { + char name[CPUFREQ_NAME_LEN]; + + /* + * This must set opp table for cpu_dev in a similar way as done by + * dev_pm_opp_of_add_table(). + */ + int (*init_opp_table)(const struct cpumask *cpumask); + + /* Optional */ + int (*get_transition_latency)(struct device *cpu_dev); + void (*free_opp_table)(const struct cpumask *cpumask); +}; + +/* Currently we support only two clusters */ +#define A15_CLUSTER 0 +#define A7_CLUSTER 1 +#define MAX_CLUSTERS 2 + +#ifdef CONFIG_BL_SWITCHER +#include +static bool bL_switching_enabled; +#define is_bL_switching_enabled() bL_switching_enabled +#define set_switching_enabled(x) (bL_switching_enabled = (x)) +#else +#define is_bL_switching_enabled() false +#define set_switching_enabled(x) do { } while (0) +#define bL_switch_request(...) do { } while (0) +#define bL_switcher_put_enabled() do { } while (0) +#define bL_switcher_get_enabled() do { } while (0) +#endif + +#define ACTUAL_FREQ(cluster, freq) ((cluster == A7_CLUSTER) ? freq << 1 : freq) +#define VIRT_FREQ(cluster, freq) ((cluster == A7_CLUSTER) ? freq >> 1 : freq) + +static struct thermal_cooling_device *cdev[MAX_CLUSTERS]; +static const struct cpufreq_arm_bL_ops *arm_bL_ops; +static struct clk *clk[MAX_CLUSTERS]; +static struct cpufreq_frequency_table *freq_table[MAX_CLUSTERS + 1]; +static atomic_t cluster_usage[MAX_CLUSTERS + 1]; + +static unsigned int clk_big_min; /* (Big) clock frequencies */ +static unsigned int clk_little_max; /* Maximum clock frequency (Little) */ + +static DEFINE_PER_CPU(unsigned int, physical_cluster); +static DEFINE_PER_CPU(unsigned int, cpu_last_req_freq); + +static struct mutex cluster_lock[MAX_CLUSTERS]; + +static inline int raw_cpu_to_cluster(int cpu) +{ + return topology_physical_package_id(cpu); +} + +static inline int cpu_to_cluster(int cpu) +{ + return is_bL_switching_enabled() ? + MAX_CLUSTERS : raw_cpu_to_cluster(cpu); +} + +static unsigned int find_cluster_maxfreq(int cluster) +{ + int j; + u32 max_freq = 0, cpu_freq; + + for_each_online_cpu(j) { + cpu_freq = per_cpu(cpu_last_req_freq, j); + + if ((cluster == per_cpu(physical_cluster, j)) && + (max_freq < cpu_freq)) + max_freq = cpu_freq; + } + + pr_debug("%s: cluster: %d, max freq: %d\n", __func__, cluster, + max_freq); + + return max_freq; +} + +static unsigned int clk_get_cpu_rate(unsigned int cpu) +{ + u32 cur_cluster = per_cpu(physical_cluster, cpu); + u32 rate = clk_get_rate(clk[cur_cluster]) / 1000; + + /* For switcher we use virtual A7 clock rates */ + if (is_bL_switching_enabled()) + rate = VIRT_FREQ(cur_cluster, rate); + + pr_debug("%s: cpu: %d, cluster: %d, freq: %u\n", __func__, cpu, + cur_cluster, rate); + + return rate; +} + +static unsigned int bL_cpufreq_get_rate(unsigned int cpu) +{ + if (is_bL_switching_enabled()) { + pr_debug("%s: freq: %d\n", __func__, per_cpu(cpu_last_req_freq, + cpu)); + + return per_cpu(cpu_last_req_freq, cpu); + } else { + return clk_get_cpu_rate(cpu); + } +} + +static unsigned int +bL_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate) +{ + u32 new_rate, prev_rate; + int ret; + bool bLs = is_bL_switching_enabled(); + + mutex_lock(&cluster_lock[new_cluster]); + + if (bLs) { + prev_rate = per_cpu(cpu_last_req_freq, cpu); + per_cpu(cpu_last_req_freq, cpu) = rate; + per_cpu(physical_cluster, cpu) = new_cluster; + + new_rate = find_cluster_maxfreq(new_cluster); + new_rate = ACTUAL_FREQ(new_cluster, new_rate); + } else { + new_rate = rate; + } + + pr_debug("%s: cpu: %d, old cluster: %d, new cluster: %d, freq: %d\n", + __func__, cpu, old_cluster, new_cluster, new_rate); + + ret = clk_set_rate(clk[new_cluster], new_rate * 1000); + if (!ret) { + /* + * FIXME: clk_set_rate hasn't returned an error here however it + * may be that clk_change_rate failed due to hardware or + * firmware issues and wasn't able to report that due to the + * current design of the clk core layer. To work around this + * problem we will read back the clock rate and check it is + * correct. This needs to be removed once clk core is fixed. + */ + if (clk_get_rate(clk[new_cluster]) != new_rate * 1000) + ret = -EIO; + } + + if (WARN_ON(ret)) { + pr_err("clk_set_rate failed: %d, new cluster: %d\n", ret, + new_cluster); + if (bLs) { + per_cpu(cpu_last_req_freq, cpu) = prev_rate; + per_cpu(physical_cluster, cpu) = old_cluster; + } + + mutex_unlock(&cluster_lock[new_cluster]); + + return ret; + } + + mutex_unlock(&cluster_lock[new_cluster]); + + /* Recalc freq for old cluster when switching clusters */ + if (old_cluster != new_cluster) { + pr_debug("%s: cpu: %d, old cluster: %d, new cluster: %d\n", + __func__, cpu, old_cluster, new_cluster); + + /* Switch cluster */ + bL_switch_request(cpu, new_cluster); + + mutex_lock(&cluster_lock[old_cluster]); + + /* Set freq of old cluster if there are cpus left on it */ + new_rate = find_cluster_maxfreq(old_cluster); + new_rate = ACTUAL_FREQ(old_cluster, new_rate); + + if (new_rate) { + pr_debug("%s: Updating rate of old cluster: %d, to freq: %d\n", + __func__, old_cluster, new_rate); + + if (clk_set_rate(clk[old_cluster], new_rate * 1000)) + pr_err("%s: clk_set_rate failed: %d, old cluster: %d\n", + __func__, ret, old_cluster); + } + mutex_unlock(&cluster_lock[old_cluster]); + } + + return 0; +} + +/* Set clock frequency */ +static int bL_cpufreq_set_target(struct cpufreq_policy *policy, + unsigned int index) +{ + u32 cpu = policy->cpu, cur_cluster, new_cluster, actual_cluster; + unsigned int freqs_new; + int ret; + + cur_cluster = cpu_to_cluster(cpu); + new_cluster = actual_cluster = per_cpu(physical_cluster, cpu); + + freqs_new = freq_table[cur_cluster][index].frequency; + + if (is_bL_switching_enabled()) { + if ((actual_cluster == A15_CLUSTER) && + (freqs_new < clk_big_min)) { + new_cluster = A7_CLUSTER; + } else if ((actual_cluster == A7_CLUSTER) && + (freqs_new > clk_little_max)) { + new_cluster = A15_CLUSTER; + } + } + + ret = bL_cpufreq_set_rate(cpu, actual_cluster, new_cluster, freqs_new); + + if (!ret) { + arch_set_freq_scale(policy->related_cpus, freqs_new, + policy->cpuinfo.max_freq); + } + + return ret; +} + +static inline u32 get_table_count(struct cpufreq_frequency_table *table) +{ + int count; + + for (count = 0; table[count].frequency != CPUFREQ_TABLE_END; count++) + ; + + return count; +} + +/* get the minimum frequency in the cpufreq_frequency_table */ +static inline u32 get_table_min(struct cpufreq_frequency_table *table) +{ + struct cpufreq_frequency_table *pos; + uint32_t min_freq = ~0; + cpufreq_for_each_entry(pos, table) + if (pos->frequency < min_freq) + min_freq = pos->frequency; + return min_freq; +} + +/* get the maximum frequency in the cpufreq_frequency_table */ +static inline u32 get_table_max(struct cpufreq_frequency_table *table) +{ + struct cpufreq_frequency_table *pos; + uint32_t max_freq = 0; + cpufreq_for_each_entry(pos, table) + if (pos->frequency > max_freq) + max_freq = pos->frequency; + return max_freq; +} + +static int merge_cluster_tables(void) +{ + int i, j, k = 0, count = 1; + struct cpufreq_frequency_table *table; + + for (i = 0; i < MAX_CLUSTERS; i++) + count += get_table_count(freq_table[i]); + + table = kcalloc(count, sizeof(*table), GFP_KERNEL); + if (!table) + return -ENOMEM; + + freq_table[MAX_CLUSTERS] = table; + + /* Add in reverse order to get freqs in increasing order */ + for (i = MAX_CLUSTERS - 1; i >= 0; i--) { + for (j = 0; freq_table[i][j].frequency != CPUFREQ_TABLE_END; + j++) { + table[k].frequency = VIRT_FREQ(i, + freq_table[i][j].frequency); + pr_debug("%s: index: %d, freq: %d\n", __func__, k, + table[k].frequency); + k++; + } + } + + table[k].driver_data = k; + table[k].frequency = CPUFREQ_TABLE_END; + + pr_debug("%s: End, table: %p, count: %d\n", __func__, table, k); + + return 0; +} + +static void _put_cluster_clk_and_freq_table(struct device *cpu_dev, + const struct cpumask *cpumask) +{ + u32 cluster = raw_cpu_to_cluster(cpu_dev->id); + + if (!freq_table[cluster]) + return; + + clk_put(clk[cluster]); + dev_pm_opp_free_cpufreq_table(cpu_dev, &freq_table[cluster]); + if (arm_bL_ops->free_opp_table) + arm_bL_ops->free_opp_table(cpumask); + dev_dbg(cpu_dev, "%s: cluster: %d\n", __func__, cluster); +} + +static void put_cluster_clk_and_freq_table(struct device *cpu_dev, + const struct cpumask *cpumask) +{ + u32 cluster = cpu_to_cluster(cpu_dev->id); + int i; + + if (atomic_dec_return(&cluster_usage[cluster])) + return; + + if (cluster < MAX_CLUSTERS) + return _put_cluster_clk_and_freq_table(cpu_dev, cpumask); + + for_each_present_cpu(i) { + struct device *cdev = get_cpu_device(i); + if (!cdev) { + pr_err("%s: failed to get cpu%d device\n", __func__, i); + return; + } + + _put_cluster_clk_and_freq_table(cdev, cpumask); + } + + /* free virtual table */ + kfree(freq_table[cluster]); +} + +static int _get_cluster_clk_and_freq_table(struct device *cpu_dev, + const struct cpumask *cpumask) +{ + u32 cluster = raw_cpu_to_cluster(cpu_dev->id); + int ret; + + if (freq_table[cluster]) + return 0; + + ret = arm_bL_ops->init_opp_table(cpumask); + if (ret) { + dev_err(cpu_dev, "%s: init_opp_table failed, cpu: %d, err: %d\n", + __func__, cpu_dev->id, ret); + goto out; + } + + ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &freq_table[cluster]); + if (ret) { + dev_err(cpu_dev, "%s: failed to init cpufreq table, cpu: %d, err: %d\n", + __func__, cpu_dev->id, ret); + goto free_opp_table; + } + + clk[cluster] = clk_get(cpu_dev, NULL); + if (!IS_ERR(clk[cluster])) { + dev_dbg(cpu_dev, "%s: clk: %p & freq table: %p, cluster: %d\n", + __func__, clk[cluster], freq_table[cluster], + cluster); + return 0; + } + + dev_err(cpu_dev, "%s: Failed to get clk for cpu: %d, cluster: %d\n", + __func__, cpu_dev->id, cluster); + ret = PTR_ERR(clk[cluster]); + dev_pm_opp_free_cpufreq_table(cpu_dev, &freq_table[cluster]); + +free_opp_table: + if (arm_bL_ops->free_opp_table) + arm_bL_ops->free_opp_table(cpumask); +out: + dev_err(cpu_dev, "%s: Failed to get data for cluster: %d\n", __func__, + cluster); + return ret; +} + +static int get_cluster_clk_and_freq_table(struct device *cpu_dev, + const struct cpumask *cpumask) +{ + u32 cluster = cpu_to_cluster(cpu_dev->id); + int i, ret; + + if (atomic_inc_return(&cluster_usage[cluster]) != 1) + return 0; + + if (cluster < MAX_CLUSTERS) { + ret = _get_cluster_clk_and_freq_table(cpu_dev, cpumask); + if (ret) + atomic_dec(&cluster_usage[cluster]); + return ret; + } + + /* + * Get data for all clusters and fill virtual cluster with a merge of + * both + */ + for_each_present_cpu(i) { + struct device *cdev = get_cpu_device(i); + if (!cdev) { + pr_err("%s: failed to get cpu%d device\n", __func__, i); + return -ENODEV; + } + + ret = _get_cluster_clk_and_freq_table(cdev, cpumask); + if (ret) + goto put_clusters; + } + + ret = merge_cluster_tables(); + if (ret) + goto put_clusters; + + /* Assuming 2 cluster, set clk_big_min and clk_little_max */ + clk_big_min = get_table_min(freq_table[0]); + clk_little_max = VIRT_FREQ(1, get_table_max(freq_table[1])); + + pr_debug("%s: cluster: %d, clk_big_min: %d, clk_little_max: %d\n", + __func__, cluster, clk_big_min, clk_little_max); + + return 0; + +put_clusters: + for_each_present_cpu(i) { + struct device *cdev = get_cpu_device(i); + if (!cdev) { + pr_err("%s: failed to get cpu%d device\n", __func__, i); + return -ENODEV; + } + + _put_cluster_clk_and_freq_table(cdev, cpumask); + } + + atomic_dec(&cluster_usage[cluster]); + + return ret; +} + +/* Per-CPU initialization */ +static int bL_cpufreq_init(struct cpufreq_policy *policy) +{ + u32 cur_cluster = cpu_to_cluster(policy->cpu); + struct device *cpu_dev; + int ret; + + cpu_dev = get_cpu_device(policy->cpu); + if (!cpu_dev) { + pr_err("%s: failed to get cpu%d device\n", __func__, + policy->cpu); + return -ENODEV; + } + + if (cur_cluster < MAX_CLUSTERS) { + int cpu; + + cpumask_copy(policy->cpus, topology_core_cpumask(policy->cpu)); + + for_each_cpu(cpu, policy->cpus) + per_cpu(physical_cluster, cpu) = cur_cluster; + } else { + /* Assumption: during init, we are always running on A15 */ + per_cpu(physical_cluster, policy->cpu) = A15_CLUSTER; + } + + ret = get_cluster_clk_and_freq_table(cpu_dev, policy->cpus); + if (ret) + return ret; + + policy->freq_table = freq_table[cur_cluster]; + policy->cpuinfo.transition_latency = + arm_bL_ops->get_transition_latency(cpu_dev); + + dev_pm_opp_of_register_em(policy->cpus); + + if (is_bL_switching_enabled()) + per_cpu(cpu_last_req_freq, policy->cpu) = clk_get_cpu_rate(policy->cpu); + + dev_info(cpu_dev, "%s: CPU %d initialized\n", __func__, policy->cpu); + return 0; +} + +static int bL_cpufreq_exit(struct cpufreq_policy *policy) +{ + struct device *cpu_dev; + int cur_cluster = cpu_to_cluster(policy->cpu); + + if (cur_cluster < MAX_CLUSTERS) { + cpufreq_cooling_unregister(cdev[cur_cluster]); + cdev[cur_cluster] = NULL; + } + + cpu_dev = get_cpu_device(policy->cpu); + if (!cpu_dev) { + pr_err("%s: failed to get cpu%d device\n", __func__, + policy->cpu); + return -ENODEV; + } + + put_cluster_clk_and_freq_table(cpu_dev, policy->related_cpus); + dev_dbg(cpu_dev, "%s: Exited, cpu: %d\n", __func__, policy->cpu); + + return 0; +} + +static void bL_cpufreq_ready(struct cpufreq_policy *policy) +{ + int cur_cluster = cpu_to_cluster(policy->cpu); + + /* Do not register a cpu_cooling device if we are in IKS mode */ + if (cur_cluster >= MAX_CLUSTERS) + return; + + cdev[cur_cluster] = of_cpufreq_cooling_register(policy); +} + +static struct cpufreq_driver bL_cpufreq_driver = { + .name = "arm-big-little", + .flags = CPUFREQ_STICKY | + CPUFREQ_HAVE_GOVERNOR_PER_POLICY | + CPUFREQ_NEED_INITIAL_FREQ_CHECK, + .verify = cpufreq_generic_frequency_table_verify, + .target_index = bL_cpufreq_set_target, + .get = bL_cpufreq_get_rate, + .init = bL_cpufreq_init, + .exit = bL_cpufreq_exit, + .ready = bL_cpufreq_ready, + .attr = cpufreq_generic_attr, +}; + +#ifdef CONFIG_BL_SWITCHER +static int bL_cpufreq_switcher_notifier(struct notifier_block *nfb, + unsigned long action, void *_arg) +{ + pr_debug("%s: action: %ld\n", __func__, action); + + switch (action) { + case BL_NOTIFY_PRE_ENABLE: + case BL_NOTIFY_PRE_DISABLE: + cpufreq_unregister_driver(&bL_cpufreq_driver); + break; + + case BL_NOTIFY_POST_ENABLE: + set_switching_enabled(true); + cpufreq_register_driver(&bL_cpufreq_driver); + break; + + case BL_NOTIFY_POST_DISABLE: + set_switching_enabled(false); + cpufreq_register_driver(&bL_cpufreq_driver); + break; + + default: + return NOTIFY_DONE; + } + + return NOTIFY_OK; +} + +static struct notifier_block bL_switcher_notifier = { + .notifier_call = bL_cpufreq_switcher_notifier, +}; + +static int __bLs_register_notifier(void) +{ + return bL_switcher_register_notifier(&bL_switcher_notifier); +} + +static int __bLs_unregister_notifier(void) +{ + return bL_switcher_unregister_notifier(&bL_switcher_notifier); +} +#else +static int __bLs_register_notifier(void) { return 0; } +static int __bLs_unregister_notifier(void) { return 0; } +#endif + +int bL_cpufreq_register(const struct cpufreq_arm_bL_ops *ops) +{ + int ret, i; + + if (arm_bL_ops) { + pr_debug("%s: Already registered: %s, exiting\n", __func__, + arm_bL_ops->name); + return -EBUSY; + } + + if (!ops || !strlen(ops->name) || !ops->init_opp_table || + !ops->get_transition_latency) { + pr_err("%s: Invalid arm_bL_ops, exiting\n", __func__); + return -ENODEV; + } + + arm_bL_ops = ops; + + set_switching_enabled(bL_switcher_get_enabled()); + + for (i = 0; i < MAX_CLUSTERS; i++) + mutex_init(&cluster_lock[i]); + + ret = cpufreq_register_driver(&bL_cpufreq_driver); + if (ret) { + pr_info("%s: Failed registering platform driver: %s, err: %d\n", + __func__, ops->name, ret); + arm_bL_ops = NULL; + } else { + ret = __bLs_register_notifier(); + if (ret) { + cpufreq_unregister_driver(&bL_cpufreq_driver); + arm_bL_ops = NULL; + } else { + pr_info("%s: Registered platform driver: %s\n", + __func__, ops->name); + } + } + + bL_switcher_put_enabled(); + return ret; +} + +void bL_cpufreq_unregister(const struct cpufreq_arm_bL_ops *ops) +{ + if (arm_bL_ops != ops) { + pr_err("%s: Registered with: %s, can't unregister, exiting\n", + __func__, arm_bL_ops->name); + return; + } + + bL_switcher_get_enabled(); + __bLs_unregister_notifier(); + cpufreq_unregister_driver(&bL_cpufreq_driver); + bL_switcher_put_enabled(); + pr_info("%s: Un-registered platform driver: %s\n", __func__, + arm_bL_ops->name); + arm_bL_ops = NULL; +} static int ve_spc_init_opp_table(const struct cpumask *cpumask) { @@ -68,4 +697,7 @@ static struct platform_driver ve_spc_cpufreq_platdrv = { }; module_platform_driver(ve_spc_cpufreq_platdrv); -MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Viresh Kumar "); +MODULE_AUTHOR("Sudeep Holla "); +MODULE_DESCRIPTION("Vexpress SPC ARM big LITTLE cpufreq driver"); +MODULE_LICENSE("GPL v2"); From 1f1b4650e0be8178fa303a78889ceda6b4385223 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Fri, 18 Oct 2019 11:37:47 +0100 Subject: [PATCH 18/83] cpufreq: vexpress-spc: drop unnessary cpufreq_arm_bL_ops abstraction cpufreq_arm_bL_ops is no longer needed after merging the generic arm_big_little and vexpress-spc driver. Remove it along with the unused bL_cpufreq_{,un}register routines and rename some bL_* functions to ve_spc_*. Acked-by: Nicolas Pitre Signed-off-by: Sudeep Holla Signed-off-by: Viresh Kumar --- drivers/cpufreq/vexpress-spc-cpufreq.c | 154 +++++++------------------ 1 file changed, 40 insertions(+), 114 deletions(-) diff --git a/drivers/cpufreq/vexpress-spc-cpufreq.c b/drivers/cpufreq/vexpress-spc-cpufreq.c index 622dc42e42b1..3eaeefea66b9 100644 --- a/drivers/cpufreq/vexpress-spc-cpufreq.c +++ b/drivers/cpufreq/vexpress-spc-cpufreq.c @@ -26,20 +26,6 @@ #include #include -struct cpufreq_arm_bL_ops { - char name[CPUFREQ_NAME_LEN]; - - /* - * This must set opp table for cpu_dev in a similar way as done by - * dev_pm_opp_of_add_table(). - */ - int (*init_opp_table)(const struct cpumask *cpumask); - - /* Optional */ - int (*get_transition_latency)(struct device *cpu_dev); - void (*free_opp_table)(const struct cpumask *cpumask); -}; - /* Currently we support only two clusters */ #define A15_CLUSTER 0 #define A7_CLUSTER 1 @@ -62,7 +48,6 @@ static bool bL_switching_enabled; #define VIRT_FREQ(cluster, freq) ((cluster == A7_CLUSTER) ? freq >> 1 : freq) static struct thermal_cooling_device *cdev[MAX_CLUSTERS]; -static const struct cpufreq_arm_bL_ops *arm_bL_ops; static struct clk *clk[MAX_CLUSTERS]; static struct cpufreq_frequency_table *freq_table[MAX_CLUSTERS + 1]; static atomic_t cluster_usage[MAX_CLUSTERS + 1]; @@ -120,7 +105,7 @@ static unsigned int clk_get_cpu_rate(unsigned int cpu) return rate; } -static unsigned int bL_cpufreq_get_rate(unsigned int cpu) +static unsigned int ve_spc_cpufreq_get_rate(unsigned int cpu) { if (is_bL_switching_enabled()) { pr_debug("%s: freq: %d\n", __func__, per_cpu(cpu_last_req_freq, @@ -133,7 +118,7 @@ static unsigned int bL_cpufreq_get_rate(unsigned int cpu) } static unsigned int -bL_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate) +ve_spc_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate) { u32 new_rate, prev_rate; int ret; @@ -213,8 +198,8 @@ bL_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate) } /* Set clock frequency */ -static int bL_cpufreq_set_target(struct cpufreq_policy *policy, - unsigned int index) +static int ve_spc_cpufreq_set_target(struct cpufreq_policy *policy, + unsigned int index) { u32 cpu = policy->cpu, cur_cluster, new_cluster, actual_cluster; unsigned int freqs_new; @@ -235,7 +220,8 @@ static int bL_cpufreq_set_target(struct cpufreq_policy *policy, } } - ret = bL_cpufreq_set_rate(cpu, actual_cluster, new_cluster, freqs_new); + ret = ve_spc_cpufreq_set_rate(cpu, actual_cluster, new_cluster, + freqs_new); if (!ret) { arch_set_freq_scale(policy->related_cpus, freqs_new, @@ -321,8 +307,6 @@ static void _put_cluster_clk_and_freq_table(struct device *cpu_dev, clk_put(clk[cluster]); dev_pm_opp_free_cpufreq_table(cpu_dev, &freq_table[cluster]); - if (arm_bL_ops->free_opp_table) - arm_bL_ops->free_opp_table(cpumask); dev_dbg(cpu_dev, "%s: cluster: %d\n", __func__, cluster); } @@ -361,18 +345,19 @@ static int _get_cluster_clk_and_freq_table(struct device *cpu_dev, if (freq_table[cluster]) return 0; - ret = arm_bL_ops->init_opp_table(cpumask); - if (ret) { - dev_err(cpu_dev, "%s: init_opp_table failed, cpu: %d, err: %d\n", - __func__, cpu_dev->id, ret); + /* + * platform specific SPC code must initialise the opp table + * so just check if the OPP count is non-zero + */ + ret = dev_pm_opp_get_opp_count(cpu_dev) <= 0; + if (ret) goto out; - } ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &freq_table[cluster]); if (ret) { dev_err(cpu_dev, "%s: failed to init cpufreq table, cpu: %d, err: %d\n", __func__, cpu_dev->id, ret); - goto free_opp_table; + goto out; } clk[cluster] = clk_get(cpu_dev, NULL); @@ -388,9 +373,6 @@ static int _get_cluster_clk_and_freq_table(struct device *cpu_dev, ret = PTR_ERR(clk[cluster]); dev_pm_opp_free_cpufreq_table(cpu_dev, &freq_table[cluster]); -free_opp_table: - if (arm_bL_ops->free_opp_table) - arm_bL_ops->free_opp_table(cpumask); out: dev_err(cpu_dev, "%s: Failed to get data for cluster: %d\n", __func__, cluster); @@ -459,7 +441,7 @@ static int get_cluster_clk_and_freq_table(struct device *cpu_dev, } /* Per-CPU initialization */ -static int bL_cpufreq_init(struct cpufreq_policy *policy) +static int ve_spc_cpufreq_init(struct cpufreq_policy *policy) { u32 cur_cluster = cpu_to_cluster(policy->cpu); struct device *cpu_dev; @@ -489,8 +471,7 @@ static int bL_cpufreq_init(struct cpufreq_policy *policy) return ret; policy->freq_table = freq_table[cur_cluster]; - policy->cpuinfo.transition_latency = - arm_bL_ops->get_transition_latency(cpu_dev); + policy->cpuinfo.transition_latency = 1000000; /* 1 ms */ dev_pm_opp_of_register_em(policy->cpus); @@ -501,7 +482,7 @@ static int bL_cpufreq_init(struct cpufreq_policy *policy) return 0; } -static int bL_cpufreq_exit(struct cpufreq_policy *policy) +static int ve_spc_cpufreq_exit(struct cpufreq_policy *policy) { struct device *cpu_dev; int cur_cluster = cpu_to_cluster(policy->cpu); @@ -524,7 +505,7 @@ static int bL_cpufreq_exit(struct cpufreq_policy *policy) return 0; } -static void bL_cpufreq_ready(struct cpufreq_policy *policy) +static void ve_spc_cpufreq_ready(struct cpufreq_policy *policy) { int cur_cluster = cpu_to_cluster(policy->cpu); @@ -535,17 +516,17 @@ static void bL_cpufreq_ready(struct cpufreq_policy *policy) cdev[cur_cluster] = of_cpufreq_cooling_register(policy); } -static struct cpufreq_driver bL_cpufreq_driver = { - .name = "arm-big-little", +static struct cpufreq_driver ve_spc_cpufreq_driver = { + .name = "vexpress-spc", .flags = CPUFREQ_STICKY | CPUFREQ_HAVE_GOVERNOR_PER_POLICY | CPUFREQ_NEED_INITIAL_FREQ_CHECK, .verify = cpufreq_generic_frequency_table_verify, - .target_index = bL_cpufreq_set_target, - .get = bL_cpufreq_get_rate, - .init = bL_cpufreq_init, - .exit = bL_cpufreq_exit, - .ready = bL_cpufreq_ready, + .target_index = ve_spc_cpufreq_set_target, + .get = ve_spc_cpufreq_get_rate, + .init = ve_spc_cpufreq_init, + .exit = ve_spc_cpufreq_exit, + .ready = ve_spc_cpufreq_ready, .attr = cpufreq_generic_attr, }; @@ -558,17 +539,17 @@ static int bL_cpufreq_switcher_notifier(struct notifier_block *nfb, switch (action) { case BL_NOTIFY_PRE_ENABLE: case BL_NOTIFY_PRE_DISABLE: - cpufreq_unregister_driver(&bL_cpufreq_driver); + cpufreq_unregister_driver(&ve_spc_cpufreq_driver); break; case BL_NOTIFY_POST_ENABLE: set_switching_enabled(true); - cpufreq_register_driver(&bL_cpufreq_driver); + cpufreq_register_driver(&ve_spc_cpufreq_driver); break; case BL_NOTIFY_POST_DISABLE: set_switching_enabled(false); - cpufreq_register_driver(&bL_cpufreq_driver); + cpufreq_register_driver(&ve_spc_cpufreq_driver); break; default: @@ -596,95 +577,40 @@ static int __bLs_register_notifier(void) { return 0; } static int __bLs_unregister_notifier(void) { return 0; } #endif -int bL_cpufreq_register(const struct cpufreq_arm_bL_ops *ops) +static int ve_spc_cpufreq_probe(struct platform_device *pdev) { int ret, i; - if (arm_bL_ops) { - pr_debug("%s: Already registered: %s, exiting\n", __func__, - arm_bL_ops->name); - return -EBUSY; - } - - if (!ops || !strlen(ops->name) || !ops->init_opp_table || - !ops->get_transition_latency) { - pr_err("%s: Invalid arm_bL_ops, exiting\n", __func__); - return -ENODEV; - } - - arm_bL_ops = ops; - set_switching_enabled(bL_switcher_get_enabled()); for (i = 0; i < MAX_CLUSTERS; i++) mutex_init(&cluster_lock[i]); - ret = cpufreq_register_driver(&bL_cpufreq_driver); + ret = cpufreq_register_driver(&ve_spc_cpufreq_driver); if (ret) { pr_info("%s: Failed registering platform driver: %s, err: %d\n", - __func__, ops->name, ret); - arm_bL_ops = NULL; + __func__, ve_spc_cpufreq_driver.name, ret); } else { ret = __bLs_register_notifier(); - if (ret) { - cpufreq_unregister_driver(&bL_cpufreq_driver); - arm_bL_ops = NULL; - } else { + if (ret) + cpufreq_unregister_driver(&ve_spc_cpufreq_driver); + else pr_info("%s: Registered platform driver: %s\n", - __func__, ops->name); - } + __func__, ve_spc_cpufreq_driver.name); } bL_switcher_put_enabled(); return ret; } -void bL_cpufreq_unregister(const struct cpufreq_arm_bL_ops *ops) -{ - if (arm_bL_ops != ops) { - pr_err("%s: Registered with: %s, can't unregister, exiting\n", - __func__, arm_bL_ops->name); - return; - } - - bL_switcher_get_enabled(); - __bLs_unregister_notifier(); - cpufreq_unregister_driver(&bL_cpufreq_driver); - bL_switcher_put_enabled(); - pr_info("%s: Un-registered platform driver: %s\n", __func__, - arm_bL_ops->name); - arm_bL_ops = NULL; -} - -static int ve_spc_init_opp_table(const struct cpumask *cpumask) -{ - struct device *cpu_dev = get_cpu_device(cpumask_first(cpumask)); - /* - * platform specific SPC code must initialise the opp table - * so just check if the OPP count is non-zero - */ - return dev_pm_opp_get_opp_count(cpu_dev) <= 0; -} - -static int ve_spc_get_transition_latency(struct device *cpu_dev) -{ - return 1000000; /* 1 ms */ -} - -static const struct cpufreq_arm_bL_ops ve_spc_cpufreq_ops = { - .name = "vexpress-spc", - .get_transition_latency = ve_spc_get_transition_latency, - .init_opp_table = ve_spc_init_opp_table, -}; - -static int ve_spc_cpufreq_probe(struct platform_device *pdev) -{ - return bL_cpufreq_register(&ve_spc_cpufreq_ops); -} - static int ve_spc_cpufreq_remove(struct platform_device *pdev) { - bL_cpufreq_unregister(&ve_spc_cpufreq_ops); + bL_switcher_get_enabled(); + __bLs_unregister_notifier(); + cpufreq_unregister_driver(&ve_spc_cpufreq_driver); + bL_switcher_put_enabled(); + pr_info("%s: Un-registered platform driver: %s\n", __func__, + ve_spc_cpufreq_driver.name); return 0; } From 09402d5725bf8c521cbe162e1037f19b30e2afaa Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Fri, 18 Oct 2019 11:37:48 +0100 Subject: [PATCH 19/83] cpufreq: vexpress-spc: remove lots of debug messages This driver have been used and tested for year now and the extensive debug/log messages in the driver are not really required anymore. Get rid of those unnecessary log messages. Acked-by: Nicolas Pitre Signed-off-by: Sudeep Holla Signed-off-by: Viresh Kumar --- drivers/cpufreq/vexpress-spc-cpufreq.c | 69 ++++++-------------------- 1 file changed, 14 insertions(+), 55 deletions(-) diff --git a/drivers/cpufreq/vexpress-spc-cpufreq.c b/drivers/cpufreq/vexpress-spc-cpufreq.c index 3eaeefea66b9..132610424747 100644 --- a/drivers/cpufreq/vexpress-spc-cpufreq.c +++ b/drivers/cpufreq/vexpress-spc-cpufreq.c @@ -84,9 +84,6 @@ static unsigned int find_cluster_maxfreq(int cluster) max_freq = cpu_freq; } - pr_debug("%s: cluster: %d, max freq: %d\n", __func__, cluster, - max_freq); - return max_freq; } @@ -99,22 +96,15 @@ static unsigned int clk_get_cpu_rate(unsigned int cpu) if (is_bL_switching_enabled()) rate = VIRT_FREQ(cur_cluster, rate); - pr_debug("%s: cpu: %d, cluster: %d, freq: %u\n", __func__, cpu, - cur_cluster, rate); - return rate; } static unsigned int ve_spc_cpufreq_get_rate(unsigned int cpu) { - if (is_bL_switching_enabled()) { - pr_debug("%s: freq: %d\n", __func__, per_cpu(cpu_last_req_freq, - cpu)); - + if (is_bL_switching_enabled()) return per_cpu(cpu_last_req_freq, cpu); - } else { + else return clk_get_cpu_rate(cpu); - } } static unsigned int @@ -137,9 +127,6 @@ ve_spc_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate) new_rate = rate; } - pr_debug("%s: cpu: %d, old cluster: %d, new cluster: %d, freq: %d\n", - __func__, cpu, old_cluster, new_cluster, new_rate); - ret = clk_set_rate(clk[new_cluster], new_rate * 1000); if (!ret) { /* @@ -155,8 +142,6 @@ ve_spc_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate) } if (WARN_ON(ret)) { - pr_err("clk_set_rate failed: %d, new cluster: %d\n", ret, - new_cluster); if (bLs) { per_cpu(cpu_last_req_freq, cpu) = prev_rate; per_cpu(physical_cluster, cpu) = old_cluster; @@ -171,9 +156,6 @@ ve_spc_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate) /* Recalc freq for old cluster when switching clusters */ if (old_cluster != new_cluster) { - pr_debug("%s: cpu: %d, old cluster: %d, new cluster: %d\n", - __func__, cpu, old_cluster, new_cluster); - /* Switch cluster */ bL_switch_request(cpu, new_cluster); @@ -183,13 +165,10 @@ ve_spc_cpufreq_set_rate(u32 cpu, u32 old_cluster, u32 new_cluster, u32 rate) new_rate = find_cluster_maxfreq(old_cluster); new_rate = ACTUAL_FREQ(old_cluster, new_rate); - if (new_rate) { - pr_debug("%s: Updating rate of old cluster: %d, to freq: %d\n", - __func__, old_cluster, new_rate); - - if (clk_set_rate(clk[old_cluster], new_rate * 1000)) - pr_err("%s: clk_set_rate failed: %d, old cluster: %d\n", - __func__, ret, old_cluster); + if (new_rate && + clk_set_rate(clk[old_cluster], new_rate * 1000)) { + pr_err("%s: clk_set_rate failed: %d, old cluster: %d\n", + __func__, ret, old_cluster); } mutex_unlock(&cluster_lock[old_cluster]); } @@ -283,8 +262,6 @@ static int merge_cluster_tables(void) j++) { table[k].frequency = VIRT_FREQ(i, freq_table[i][j].frequency); - pr_debug("%s: index: %d, freq: %d\n", __func__, k, - table[k].frequency); k++; } } @@ -292,8 +269,6 @@ static int merge_cluster_tables(void) table[k].driver_data = k; table[k].frequency = CPUFREQ_TABLE_END; - pr_debug("%s: End, table: %p, count: %d\n", __func__, table, k); - return 0; } @@ -307,7 +282,6 @@ static void _put_cluster_clk_and_freq_table(struct device *cpu_dev, clk_put(clk[cluster]); dev_pm_opp_free_cpufreq_table(cpu_dev, &freq_table[cluster]); - dev_dbg(cpu_dev, "%s: cluster: %d\n", __func__, cluster); } static void put_cluster_clk_and_freq_table(struct device *cpu_dev, @@ -324,10 +298,9 @@ static void put_cluster_clk_and_freq_table(struct device *cpu_dev, for_each_present_cpu(i) { struct device *cdev = get_cpu_device(i); - if (!cdev) { - pr_err("%s: failed to get cpu%d device\n", __func__, i); + + if (!cdev) return; - } _put_cluster_clk_and_freq_table(cdev, cpumask); } @@ -354,19 +327,12 @@ static int _get_cluster_clk_and_freq_table(struct device *cpu_dev, goto out; ret = dev_pm_opp_init_cpufreq_table(cpu_dev, &freq_table[cluster]); - if (ret) { - dev_err(cpu_dev, "%s: failed to init cpufreq table, cpu: %d, err: %d\n", - __func__, cpu_dev->id, ret); + if (ret) goto out; - } clk[cluster] = clk_get(cpu_dev, NULL); - if (!IS_ERR(clk[cluster])) { - dev_dbg(cpu_dev, "%s: clk: %p & freq table: %p, cluster: %d\n", - __func__, clk[cluster], freq_table[cluster], - cluster); + if (!IS_ERR(clk[cluster])) return 0; - } dev_err(cpu_dev, "%s: Failed to get clk for cpu: %d, cluster: %d\n", __func__, cpu_dev->id, cluster); @@ -401,10 +367,9 @@ static int get_cluster_clk_and_freq_table(struct device *cpu_dev, */ for_each_present_cpu(i) { struct device *cdev = get_cpu_device(i); - if (!cdev) { - pr_err("%s: failed to get cpu%d device\n", __func__, i); + + if (!cdev) return -ENODEV; - } ret = _get_cluster_clk_and_freq_table(cdev, cpumask); if (ret) @@ -419,18 +384,14 @@ static int get_cluster_clk_and_freq_table(struct device *cpu_dev, clk_big_min = get_table_min(freq_table[0]); clk_little_max = VIRT_FREQ(1, get_table_max(freq_table[1])); - pr_debug("%s: cluster: %d, clk_big_min: %d, clk_little_max: %d\n", - __func__, cluster, clk_big_min, clk_little_max); - return 0; put_clusters: for_each_present_cpu(i) { struct device *cdev = get_cpu_device(i); - if (!cdev) { - pr_err("%s: failed to get cpu%d device\n", __func__, i); + + if (!cdev) return -ENODEV; - } _put_cluster_clk_and_freq_table(cdev, cpumask); } @@ -500,8 +461,6 @@ static int ve_spc_cpufreq_exit(struct cpufreq_policy *policy) } put_cluster_clk_and_freq_table(cpu_dev, policy->related_cpus); - dev_dbg(cpu_dev, "%s: Exited, cpu: %d\n", __func__, policy->cpu); - return 0; } From e318d2c8f32d409f304ece12e50a759b2ed78d1b Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Fri, 18 Oct 2019 11:37:49 +0100 Subject: [PATCH 20/83] cpufreq: vexpress-spc: fix some coding style issues Fix the following checkpatch checks/warnings: CHECK: Unnecessary parentheses around the code CHECK: Alignment should match open parenthesis CHECK: Prefer kernel type 'u32' over 'uint32_t' WARNING: Missing a blank line after declarations Acked-by: Nicolas Pitre Signed-off-by: Sudeep Holla Signed-off-by: Viresh Kumar --- drivers/cpufreq/vexpress-spc-cpufreq.c | 36 +++++++++++++------------- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/drivers/cpufreq/vexpress-spc-cpufreq.c b/drivers/cpufreq/vexpress-spc-cpufreq.c index 132610424747..3259498d7eaa 100644 --- a/drivers/cpufreq/vexpress-spc-cpufreq.c +++ b/drivers/cpufreq/vexpress-spc-cpufreq.c @@ -79,8 +79,8 @@ static unsigned int find_cluster_maxfreq(int cluster) for_each_online_cpu(j) { cpu_freq = per_cpu(cpu_last_req_freq, j); - if ((cluster == per_cpu(physical_cluster, j)) && - (max_freq < cpu_freq)) + if (cluster == per_cpu(physical_cluster, j) && + max_freq < cpu_freq) max_freq = cpu_freq; } @@ -190,13 +190,11 @@ static int ve_spc_cpufreq_set_target(struct cpufreq_policy *policy, freqs_new = freq_table[cur_cluster][index].frequency; if (is_bL_switching_enabled()) { - if ((actual_cluster == A15_CLUSTER) && - (freqs_new < clk_big_min)) { + if (actual_cluster == A15_CLUSTER && freqs_new < clk_big_min) new_cluster = A7_CLUSTER; - } else if ((actual_cluster == A7_CLUSTER) && - (freqs_new > clk_little_max)) { + else if (actual_cluster == A7_CLUSTER && + freqs_new > clk_little_max) new_cluster = A15_CLUSTER; - } } ret = ve_spc_cpufreq_set_rate(cpu, actual_cluster, new_cluster, @@ -224,7 +222,8 @@ static inline u32 get_table_count(struct cpufreq_frequency_table *table) static inline u32 get_table_min(struct cpufreq_frequency_table *table) { struct cpufreq_frequency_table *pos; - uint32_t min_freq = ~0; + u32 min_freq = ~0; + cpufreq_for_each_entry(pos, table) if (pos->frequency < min_freq) min_freq = pos->frequency; @@ -235,7 +234,8 @@ static inline u32 get_table_min(struct cpufreq_frequency_table *table) static inline u32 get_table_max(struct cpufreq_frequency_table *table) { struct cpufreq_frequency_table *pos; - uint32_t max_freq = 0; + u32 max_freq = 0; + cpufreq_for_each_entry(pos, table) if (pos->frequency > max_freq) max_freq = pos->frequency; @@ -259,10 +259,9 @@ static int merge_cluster_tables(void) /* Add in reverse order to get freqs in increasing order */ for (i = MAX_CLUSTERS - 1; i >= 0; i--) { for (j = 0; freq_table[i][j].frequency != CPUFREQ_TABLE_END; - j++) { - table[k].frequency = VIRT_FREQ(i, - freq_table[i][j].frequency); - k++; + j++, k++) { + table[k].frequency = + VIRT_FREQ(i, freq_table[i][j].frequency); } } @@ -335,13 +334,13 @@ static int _get_cluster_clk_and_freq_table(struct device *cpu_dev, return 0; dev_err(cpu_dev, "%s: Failed to get clk for cpu: %d, cluster: %d\n", - __func__, cpu_dev->id, cluster); + __func__, cpu_dev->id, cluster); ret = PTR_ERR(clk[cluster]); dev_pm_opp_free_cpufreq_table(cpu_dev, &freq_table[cluster]); out: dev_err(cpu_dev, "%s: Failed to get data for cluster: %d\n", __func__, - cluster); + cluster); return ret; } @@ -411,7 +410,7 @@ static int ve_spc_cpufreq_init(struct cpufreq_policy *policy) cpu_dev = get_cpu_device(policy->cpu); if (!cpu_dev) { pr_err("%s: failed to get cpu%d device\n", __func__, - policy->cpu); + policy->cpu); return -ENODEV; } @@ -437,7 +436,8 @@ static int ve_spc_cpufreq_init(struct cpufreq_policy *policy) dev_pm_opp_of_register_em(policy->cpus); if (is_bL_switching_enabled()) - per_cpu(cpu_last_req_freq, policy->cpu) = clk_get_cpu_rate(policy->cpu); + per_cpu(cpu_last_req_freq, policy->cpu) = + clk_get_cpu_rate(policy->cpu); dev_info(cpu_dev, "%s: CPU %d initialized\n", __func__, policy->cpu); return 0; @@ -456,7 +456,7 @@ static int ve_spc_cpufreq_exit(struct cpufreq_policy *policy) cpu_dev = get_cpu_device(policy->cpu); if (!cpu_dev) { pr_err("%s: failed to get cpu%d device\n", __func__, - policy->cpu); + policy->cpu); return -ENODEV; } From af44d180e3de4cb411ce327b147ea3513f0bbbcb Mon Sep 17 00:00:00 2001 From: Anson Huang Date: Tue, 22 Oct 2019 16:33:19 +0800 Subject: [PATCH 21/83] cpufreq: imx-cpufreq-dt: Correct i.MX8MN's default speed grade value i.MX8MN has different speed grade definition compared to i.MX8MQ/i.MX8MM, when fuses are NOT written, the default speed_grade should be set to minimum available OPP defined in DT which is 1.2GHz, the corresponding speed_grade value should be 0xb. Fixes: 5b8010ba70d5 ("cpufreq: imx-cpufreq-dt: Add i.MX8MN support") Signed-off-by: Anson Huang Signed-off-by: Viresh Kumar --- drivers/cpufreq/imx-cpufreq-dt.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/cpufreq/imx-cpufreq-dt.c b/drivers/cpufreq/imx-cpufreq-dt.c index 35db14cf3102..85a6efd6b68f 100644 --- a/drivers/cpufreq/imx-cpufreq-dt.c +++ b/drivers/cpufreq/imx-cpufreq-dt.c @@ -44,19 +44,19 @@ static int imx_cpufreq_dt_probe(struct platform_device *pdev) mkt_segment = (cell_value & OCOTP_CFG3_MKT_SEGMENT_MASK) >> OCOTP_CFG3_MKT_SEGMENT_SHIFT; /* - * Early samples without fuses written report "0 0" which means - * consumer segment and minimum speed grading. - * - * According to datasheet minimum speed grading is not supported for - * consumer parts so clamp to 1 to avoid warning for "no OPPs" + * Early samples without fuses written report "0 0" which may NOT + * match any OPP defined in DT. So clamp to minimum OPP defined in + * DT to avoid warning for "no OPPs". * * Applies to i.MX8M series SoCs. */ - if (mkt_segment == 0 && speed_grade == 0 && ( - of_machine_is_compatible("fsl,imx8mm") || - of_machine_is_compatible("fsl,imx8mn") || - of_machine_is_compatible("fsl,imx8mq"))) - speed_grade = 1; + if (mkt_segment == 0 && speed_grade == 0) { + if (of_machine_is_compatible("fsl,imx8mm") || + of_machine_is_compatible("fsl,imx8mq")) + speed_grade = 1; + if (of_machine_is_compatible("fsl,imx8mn")) + speed_grade = 0xb; + } supported_hw[0] = BIT(speed_grade); supported_hw[1] = BIT(mkt_segment); From e458eb97df7aa0865066efc9fb70bbdfab319b59 Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Tue, 22 Oct 2019 17:09:06 -0700 Subject: [PATCH 22/83] cpufreq: s3c64xx: Remove pointless NULL check in s3c64xx_cpufreq_driver_init When building with Clang + -Wtautological-pointer-compare: drivers/cpufreq/s3c64xx-cpufreq.c:152:6: warning: comparison of array 's3c64xx_freq_table' equal to a null pointer is always false [-Wtautological-pointer-compare] if (s3c64xx_freq_table == NULL) { ^~~~~~~~~~~~~~~~~~ ~~~~ 1 warning generated. The definition of s3c64xx_freq_table is surrounded by an ifdef directive for CONFIG_CPU_S3C6410, which is always true for this driver because it depends on it in drivers/cpufreq/Kconfig.arm (and if it weren't, there would be a build error because s3c64xx_freq_table would not be a defined symbol). Resolve this warning by removing the unnecessary NULL check because it is always false as Clang notes. While we are at it, remove the unnecessary ifdef conditional because it is always true. Fixes: b3748ddd8056 ("[ARM] S3C64XX: Initial support for DVFS") Link: https://github.com/ClangBuiltLinux/linux/issues/748 Signed-off-by: Nathan Chancellor Signed-off-by: Viresh Kumar --- drivers/cpufreq/s3c64xx-cpufreq.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/drivers/cpufreq/s3c64xx-cpufreq.c b/drivers/cpufreq/s3c64xx-cpufreq.c index af0c00dabb22..c6bdfc308e99 100644 --- a/drivers/cpufreq/s3c64xx-cpufreq.c +++ b/drivers/cpufreq/s3c64xx-cpufreq.c @@ -19,7 +19,6 @@ static struct regulator *vddarm; static unsigned long regulator_latency; -#ifdef CONFIG_CPU_S3C6410 struct s3c64xx_dvfs { unsigned int vddarm_min; unsigned int vddarm_max; @@ -48,7 +47,6 @@ static struct cpufreq_frequency_table s3c64xx_freq_table[] = { { 0, 4, 800000 }, { 0, 0, CPUFREQ_TABLE_END }, }; -#endif static int s3c64xx_cpufreq_set_target(struct cpufreq_policy *policy, unsigned int index) @@ -149,11 +147,6 @@ static int s3c64xx_cpufreq_driver_init(struct cpufreq_policy *policy) if (policy->cpu != 0) return -EINVAL; - if (s3c64xx_freq_table == NULL) { - pr_err("No frequency information for this CPU\n"); - return -ENODEV; - } - policy->clk = clk_get(NULL, "armclk"); if (IS_ERR(policy->clk)) { pr_err("Unable to obtain ARMCLK: %ld\n", From 4a6e135238798144ca8a2eab65018521c66240da Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Wed, 23 Oct 2019 12:08:10 +0100 Subject: [PATCH 23/83] cpufreq: vexpress-spc: use macros instead of hardcoded values for cluster ids A15 and A7 cluster identifiers are fixed to 0 and 1 respectively. There are macros for the same and used in most of the places except this instance. Lets use macros instead of hardcoded values for cluster ids even here. Cc: Viresh Kumar Cc: "Rafael J. Wysocki" Signed-off-by: Sudeep Holla Signed-off-by: Viresh Kumar --- drivers/cpufreq/vexpress-spc-cpufreq.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/vexpress-spc-cpufreq.c b/drivers/cpufreq/vexpress-spc-cpufreq.c index 3259498d7eaa..093ef8d3a8d4 100644 --- a/drivers/cpufreq/vexpress-spc-cpufreq.c +++ b/drivers/cpufreq/vexpress-spc-cpufreq.c @@ -380,8 +380,9 @@ static int get_cluster_clk_and_freq_table(struct device *cpu_dev, goto put_clusters; /* Assuming 2 cluster, set clk_big_min and clk_little_max */ - clk_big_min = get_table_min(freq_table[0]); - clk_little_max = VIRT_FREQ(1, get_table_max(freq_table[1])); + clk_big_min = get_table_min(freq_table[A15_CLUSTER]); + clk_little_max = VIRT_FREQ(A7_CLUSTER, + get_table_max(freq_table[A7_CLUSTER])); return 0; From e32beb064105e13d4db0b03b4da7142065db1b34 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Wed, 23 Oct 2019 13:18:51 +0100 Subject: [PATCH 24/83] cpufreq: vexpress-spc: find and skip duplicates when merging frequencies Currently the cpufreq core aborts the validation and return error immediately when it encounter duplicate frequency table entries. This change was introduced long back since commit da0c6dc00c69 ("cpufreq: Handle sorted frequency tables more efficiently"). However, this missed the testing with modified firmware for long time. Inorder to make it work with default settings, we need to ensure the merged table for bL switcher contains no duplicates. Find the duplicates and skip them when merging the frequenct tables of A15 and A7 clusters. Cc: Viresh Kumar Cc: "Rafael J. Wysocki" Signed-off-by: Sudeep Holla Signed-off-by: Viresh Kumar --- drivers/cpufreq/vexpress-spc-cpufreq.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/drivers/cpufreq/vexpress-spc-cpufreq.c b/drivers/cpufreq/vexpress-spc-cpufreq.c index 093ef8d3a8d4..506e3f2bf53a 100644 --- a/drivers/cpufreq/vexpress-spc-cpufreq.c +++ b/drivers/cpufreq/vexpress-spc-cpufreq.c @@ -242,6 +242,19 @@ static inline u32 get_table_max(struct cpufreq_frequency_table *table) return max_freq; } +static bool search_frequency(struct cpufreq_frequency_table *table, int size, + unsigned int freq) +{ + int count; + + for (count = 0; count < size; count++) { + if (table[count].frequency == freq) + return true; + } + + return false; +} + static int merge_cluster_tables(void) { int i, j, k = 0, count = 1; @@ -257,10 +270,13 @@ static int merge_cluster_tables(void) freq_table[MAX_CLUSTERS] = table; /* Add in reverse order to get freqs in increasing order */ - for (i = MAX_CLUSTERS - 1; i >= 0; i--) { + for (i = MAX_CLUSTERS - 1; i >= 0; i--, count = k) { for (j = 0; freq_table[i][j].frequency != CPUFREQ_TABLE_END; - j++, k++) { - table[k].frequency = + j++) { + if (i == A15_CLUSTER && + search_frequency(table, count, freq_table[i][j].frequency)) + continue; /* skip duplicates */ + table[k++].frequency = VIRT_FREQ(i, freq_table[i][j].frequency); } } From 918c1fe9fbbe46fcf56837ff21f0ef96424e8b29 Mon Sep 17 00:00:00 2001 From: Zhenzhong Duan Date: Wed, 23 Oct 2019 09:57:14 +0800 Subject: [PATCH 25/83] cpuidle: Do not unset the driver if it is there already Fix __cpuidle_set_driver() to check if any of the CPUs in the mask has a driver different from drv already and, if so, return -EBUSY before updating any cpuidle_drivers per-CPU pointers. Fixes: 82467a5a885d ("cpuidle: simplify multiple driver support") Cc: 3.11+ # 3.11+ Signed-off-by: Zhenzhong Duan [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/driver.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c index 80c1a830d991..9db154224999 100644 --- a/drivers/cpuidle/driver.c +++ b/drivers/cpuidle/driver.c @@ -62,25 +62,24 @@ static inline void __cpuidle_unset_driver(struct cpuidle_driver *drv) * __cpuidle_set_driver - set per CPU driver variables for the given driver. * @drv: a valid pointer to a struct cpuidle_driver * - * For each CPU in the driver's cpumask, unset the registered driver per CPU - * to @drv. - * - * Returns 0 on success, -EBUSY if the CPUs have driver(s) already. + * Returns 0 on success, -EBUSY if any CPU in the cpumask have a driver + * different from drv already. */ static inline int __cpuidle_set_driver(struct cpuidle_driver *drv) { int cpu; for_each_cpu(cpu, drv->cpumask) { + struct cpuidle_driver *old_drv; - if (__cpuidle_get_cpu_driver(cpu)) { - __cpuidle_unset_driver(drv); + old_drv = __cpuidle_get_cpu_driver(cpu); + if (old_drv && old_drv != drv) return -EBUSY; - } - - per_cpu(cpuidle_drivers, cpu) = drv; } + for_each_cpu(cpu, drv->cpumask) + per_cpu(cpuidle_drivers, cpu) = drv; + return 0; } From fa583f71a99c85e52781ed877c82c8757437b680 Mon Sep 17 00:00:00 2001 From: Yin Fengwei Date: Thu, 24 Oct 2019 15:04:20 +0800 Subject: [PATCH 26/83] ACPI: processor_idle: Skip dummy wait if kernel is in guest In function acpi_idle_do_entry(), an ioport access is used for dummy wait to guarantee hardware behavior. But it could trigger unnecessary VMexit if kernel is running as guest in virtualization environment. If it's in virtualization environment, the deeper C state enter operation (inb()) will trap to hypervisor. It's not needed to do dummy wait after the inb() call. So we could just remove the dummy io port access to avoid unnecessary VMexit. And keep dummy io port access to maintain timing for native environment. Signed-off-by: Yin Fengwei Signed-off-by: Rafael J. Wysocki --- drivers/acpi/processor_idle.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index ed56c6d20b08..2ae95df2e74f 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -642,6 +642,19 @@ static int acpi_idle_bm_check(void) return bm_status; } +static void wait_for_freeze(void) +{ +#ifdef CONFIG_X86 + /* No delay is needed if we are in guest */ + if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) + return; +#endif + /* Dummy wait op - must do something useless after P_LVL2 read + because chipsets cannot guarantee that STPCLK# signal + gets asserted in time to freeze execution properly. */ + inl(acpi_gbl_FADT.xpm_timer_block.address); +} + /** * acpi_idle_do_entry - enter idle state using the appropriate method * @cx: cstate data @@ -658,10 +671,7 @@ static void __cpuidle acpi_idle_do_entry(struct acpi_processor_cx *cx) } else { /* IO port based C-state */ inb(cx->address); - /* Dummy wait op - must do something useless after P_LVL2 read - because chipsets cannot guarantee that STPCLK# signal - gets asserted in time to freeze execution properly. */ - inl(acpi_gbl_FADT.xpm_timer_block.address); + wait_for_freeze(); } } @@ -682,8 +692,7 @@ static int acpi_idle_play_dead(struct cpuidle_device *dev, int index) safe_halt(); else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) { inb(cx->address); - /* See comment in acpi_idle_do_entry() */ - inl(acpi_gbl_FADT.xpm_timer_block.address); + wait_for_freeze(); } else return -ENODEV; } From 737ffb27f2f1a8fe6644cac535486f7f25bbf6cb Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Tue, 22 Oct 2019 15:47:57 +0530 Subject: [PATCH 27/83] cpufreq: Clarify the comment in cpufreq_set_policy() One of the responsibility of the ->verify() callback is to make sure that the policy's min frequency is <= max frequency as this isn't guaranteed by the QoS framework which gave us those values. Update the comment in cpufreq_set_policy() to clarify that. Suggested-by: Rafael J. Wysocki Signed-off-by: Viresh Kumar [ rjw: Minor changes of the new comment ] Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cpufreq.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 48a224a6b178..dd1628192310 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2385,7 +2385,10 @@ int cpufreq_set_policy(struct cpufreq_policy *policy, new_policy->min = freq_qos_read_value(&policy->constraints, FREQ_QOS_MIN); new_policy->max = freq_qos_read_value(&policy->constraints, FREQ_QOS_MAX); - /* verify the cpu speed can be set within this limit */ + /* + * Verify that the CPU speed can be set within these limits and make sure + * that min <= max. + */ ret = cpufreq_driver->verify(new_policy); if (ret) return ret; From db0d32d84031188443e25edbd50a71a6e7ac5d1d Mon Sep 17 00:00:00 2001 From: John Hubbard Date: Wed, 30 Oct 2019 22:21:59 -0700 Subject: [PATCH 28/83] cpufreq: powernv: fix stack bloat and hard limit on number of CPUs The following build warning occurred on powerpc 64-bit builds: drivers/cpufreq/powernv-cpufreq.c: In function 'init_chip_info': drivers/cpufreq/powernv-cpufreq.c:1070:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] This is with a cross-compiler based on gcc 8.1.0, which I got from: https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/8.1.0/ The warning is due to putting 1024 bytes on the stack: unsigned int chip[256]; ...and it's also undesirable to have a hard limit on the number of CPUs here. Fix both problems by dynamically allocating based on num_possible_cpus, as recommended by Michael Ellerman. Fixes: 053819e0bf840 ("cpufreq: powernv: Handle throttling due to Pmax capping at chip level") Signed-off-by: John Hubbard Acked-by: Viresh Kumar Cc: 4.10+ # 4.10+ Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/powernv-cpufreq.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c index 6061850e59c9..56f4bc0d209e 100644 --- a/drivers/cpufreq/powernv-cpufreq.c +++ b/drivers/cpufreq/powernv-cpufreq.c @@ -1041,9 +1041,14 @@ static struct cpufreq_driver powernv_cpufreq_driver = { static int init_chip_info(void) { - unsigned int chip[256]; + unsigned int *chip; unsigned int cpu, i; unsigned int prev_chip_id = UINT_MAX; + int ret = 0; + + chip = kcalloc(num_possible_cpus(), sizeof(*chip), GFP_KERNEL); + if (!chip) + return -ENOMEM; for_each_possible_cpu(cpu) { unsigned int id = cpu_to_chip_id(cpu); @@ -1055,8 +1060,10 @@ static int init_chip_info(void) } chips = kcalloc(nr_chips, sizeof(struct chip), GFP_KERNEL); - if (!chips) - return -ENOMEM; + if (!chips) { + ret = -ENOMEM; + goto free_and_return; + } for (i = 0; i < nr_chips; i++) { chips[i].id = chip[i]; @@ -1066,7 +1073,9 @@ static int init_chip_info(void) per_cpu(chip_info, cpu) = &chips[i]; } - return 0; +free_and_return: + kfree(chip); + return ret; } static inline void clean_chip_info(void) From cae478114fbe2e6f4cb9194360cf0789a923be13 Mon Sep 17 00:00:00 2001 From: Zhang Rui Date: Thu, 31 Oct 2019 21:18:11 +0800 Subject: [PATCH 29/83] powercap/intel_rapl: add support for CometLake Mobile Add CometLake Mobile support in intel_rapl driver Signed-off-by: Zhang Rui Signed-off-by: Rafael J. Wysocki --- drivers/powercap/intel_rapl_common.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index 94ddd7d659c8..cc1e82d513a9 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -978,6 +978,7 @@ static const struct x86_cpu_id rapl_ids[] __initconst = { INTEL_CPU_FAM6(ICELAKE_NNPI, rapl_defaults_core), INTEL_CPU_FAM6(ICELAKE_X, rapl_defaults_hsw_server), INTEL_CPU_FAM6(ICELAKE_D, rapl_defaults_hsw_server), + INTEL_CPU_FAM6(COMETLAKE_L, rapl_defaults_core), INTEL_CPU_FAM6(ATOM_SILVERMONT, rapl_defaults_byt), INTEL_CPU_FAM6(ATOM_AIRMONT, rapl_defaults_cht), From f84fdcbc8ec02ea34bbc641359c2a69d0d1242d4 Mon Sep 17 00:00:00 2001 From: Zhang Rui Date: Thu, 31 Oct 2019 21:18:12 +0800 Subject: [PATCH 30/83] powercap/intel_rapl: add support for Cometlake desktop Add CometLake desktop support in intel_rapl driver Signed-off-by: Zhang Rui Signed-off-by: Rafael J. Wysocki --- drivers/powercap/intel_rapl_common.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index cc1e82d513a9..a67701ed93e8 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -979,6 +979,7 @@ static const struct x86_cpu_id rapl_ids[] __initconst = { INTEL_CPU_FAM6(ICELAKE_X, rapl_defaults_hsw_server), INTEL_CPU_FAM6(ICELAKE_D, rapl_defaults_hsw_server), INTEL_CPU_FAM6(COMETLAKE_L, rapl_defaults_core), + INTEL_CPU_FAM6(COMETLAKE, rapl_defaults_core), INTEL_CPU_FAM6(ATOM_SILVERMONT, rapl_defaults_byt), INTEL_CPU_FAM6(ATOM_AIRMONT, rapl_defaults_cht), From d80a4ac20800035c46a3868ad9e11ebda0049c7d Mon Sep 17 00:00:00 2001 From: Abhishek Goel Date: Thu, 17 Oct 2019 00:56:39 -0500 Subject: [PATCH 31/83] cpupower : Handle set and info subcommands correctly Cpupower tool has set and info options which are being used only by x86 machines. This patch removes support for these two subcommands from cpupower utility for POWER. Thus, these two subcommands will now be available only for intel. This removes the ambiguous error message while using set option in case of using non-intel systems. Without this patch on a POWER system: root@ubuntu:~# cpupower info System does not support Intel's performance bias setting root@ubuntu:~# cpupower set -b 10 Error setting perf-bias value on CPU With this patch on a POWER box: root@ubuntu:~# cpupower info Subcommand not supported on POWER Same result for set subcommand. This patch does not affect results on a intel box. Signed-off-by: Abhishek Goel Acked-by: Thomas Renninger Reviewed-by: Shuah Khan Signed-off-by: Shuah Khan --- tools/power/cpupower/utils/cpupower-info.c | 9 +++++++++ tools/power/cpupower/utils/cpupower-set.c | 9 +++++++++ 2 files changed, 18 insertions(+) diff --git a/tools/power/cpupower/utils/cpupower-info.c b/tools/power/cpupower/utils/cpupower-info.c index 4c9d342b70ff..d3755ea70d4d 100644 --- a/tools/power/cpupower/utils/cpupower-info.c +++ b/tools/power/cpupower/utils/cpupower-info.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "helpers/helpers.h" #include "helpers/sysfs.h" @@ -30,6 +31,7 @@ int cmd_info(int argc, char **argv) extern char *optarg; extern int optind, opterr, optopt; unsigned int cpu; + struct utsname uts; union { struct { @@ -39,6 +41,13 @@ int cmd_info(int argc, char **argv) } params = {}; int ret = 0; + ret = uname(&uts); + if (!ret && (!strcmp(uts.machine, "ppc64le") || + !strcmp(uts.machine, "ppc64"))) { + fprintf(stderr, _("Subcommand not supported on POWER.\n")); + return ret; + } + setlocale(LC_ALL, ""); textdomain(PACKAGE); diff --git a/tools/power/cpupower/utils/cpupower-set.c b/tools/power/cpupower/utils/cpupower-set.c index 3cd95c6cb974..3cca6f715dd9 100644 --- a/tools/power/cpupower/utils/cpupower-set.c +++ b/tools/power/cpupower/utils/cpupower-set.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "helpers/helpers.h" #include "helpers/sysfs.h" @@ -31,6 +32,7 @@ int cmd_set(int argc, char **argv) extern char *optarg; extern int optind, opterr, optopt; unsigned int cpu; + struct utsname uts; union { struct { @@ -41,6 +43,13 @@ int cmd_set(int argc, char **argv) int perf_bias = 0; int ret = 0; + ret = uname(&uts); + if (!ret && (!strcmp(uts.machine, "ppc64le") || + !strcmp(uts.machine, "ppc64"))) { + fprintf(stderr, _("Subcommand not supported on POWER.\n")); + return ret; + } + setlocale(LC_ALL, ""); textdomain(PACKAGE); From c23734487fb44ee16c1b007ba72d793c085e4ec4 Mon Sep 17 00:00:00 2001 From: Ondrej Jirman Date: Fri, 1 Nov 2019 17:41:51 +0100 Subject: [PATCH 32/83] cpufreq: sun50i: Fix CPU speed bin detection I have observed failures to boot on Orange Pi 3, because this driver determined that my SoC is from the normal bin, but my SoC only works reliably with the OPP values for the slowest bin. By querying H6 owners, it was found that e-fuse values found in the wild are in the range of 1-3, value of 7 was not reported, yet. From this and from unused defines in BSP code, it can be assumed that meaning of efuse values on H6 actually is: - 1 = slowest bin - 2 = normal bin - 3 = fastest bin Vendor code actually treats 0 and 2 as invalid efuse values, but later treats all invalid values as a normal bin. This looks like a mistake in bin detection code, that was plastered over by a hack in cpufreq code, so let's not repeat it here. It probably only works because there are no SoCs in the wild with efuse value of 0, and fast bin SoCs are made to use normal bin OPP tables, which is also safe. Let's play it safe and interpret 0 as the slowest bin, but fix detection of other bins to match this research. More research will be done before actual OPP tables are merged. Fixes: f328584f7bff ("cpufreq: Add sun50i nvmem based CPU scaling driver") Acked-by: Maxime Ripard Signed-off-by: Ondrej Jirman Signed-off-by: Viresh Kumar --- drivers/cpufreq/sun50i-cpufreq-nvmem.c | 25 ++++++++++--------------- 1 file changed, 10 insertions(+), 15 deletions(-) diff --git a/drivers/cpufreq/sun50i-cpufreq-nvmem.c b/drivers/cpufreq/sun50i-cpufreq-nvmem.c index eca32e443716..9907a165135b 100644 --- a/drivers/cpufreq/sun50i-cpufreq-nvmem.c +++ b/drivers/cpufreq/sun50i-cpufreq-nvmem.c @@ -25,7 +25,7 @@ static struct platform_device *cpufreq_dt_pdev, *sun50i_cpufreq_pdev; /** - * sun50i_cpufreq_get_efuse() - Parse and return efuse value present on SoC + * sun50i_cpufreq_get_efuse() - Determine speed grade from efuse value * @versions: Set to the value parsed from efuse * * Returns 0 if success. @@ -69,21 +69,16 @@ static int sun50i_cpufreq_get_efuse(u32 *versions) return PTR_ERR(speedbin); efuse_value = (*speedbin >> NVMEM_SHIFT) & NVMEM_MASK; - switch (efuse_value) { - case 0b0001: - *versions = 1; - break; - case 0b0011: - *versions = 2; - break; - default: - /* - * For other situations, we treat it as bin0. - * This vf table can be run for any good cpu. - */ + + /* + * We treat unexpected efuse values as if the SoC was from + * the slowest bin. Expected efuse values are 1-3, slowest + * to fastest. + */ + if (efuse_value >= 1 && efuse_value <= 3) + *versions = efuse_value - 1; + else *versions = 0; - break; - } kfree(speedbin); return 0; From c389ec67b7f8e028438643c4af4bbff550192187 Mon Sep 17 00:00:00 2001 From: Chanwoo Choi Date: Tue, 5 Nov 2019 14:41:19 +0900 Subject: [PATCH 33/83] MAINTAINERS: Update myself as maintainer for DEVFREQ subsystem support Update myself to the DEVFREQ entry as maintainer from reviewer and the git repository information to manage the devfreq patches. I've been reviewing and tesing the devfreq support for the couple of years as reviewer. >From now, I'll help and reiview the devfreq as maintainer. Suggested-by: MyungJoo Ham Signed-off-by: Chanwoo Choi Acked-by: MyungJoo Ham Signed-off-by: Rafael J. Wysocki --- MAINTAINERS | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index cba1095547fd..ebc1078c1ecb 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3532,7 +3532,7 @@ BUS FREQUENCY DRIVER FOR SAMSUNG EXYNOS M: Chanwoo Choi L: linux-pm@vger.kernel.org L: linux-samsung-soc@vger.kernel.org -T: git git://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq.git +T: git git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git S: Maintained F: drivers/devfreq/exynos-bus.c F: Documentation/devicetree/bindings/devfreq/exynos-bus.txt @@ -4762,9 +4762,9 @@ F: include/linux/devcoredump.h DEVICE FREQUENCY (DEVFREQ) M: MyungJoo Ham M: Kyungmin Park -R: Chanwoo Choi +M: Chanwoo Choi L: linux-pm@vger.kernel.org -T: git git://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq.git +T: git git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git S: Maintained F: drivers/devfreq/ F: include/linux/devfreq.h @@ -4774,7 +4774,7 @@ F: include/trace/events/devfreq.h DEVICE FREQUENCY EVENT (DEVFREQ-EVENT) M: Chanwoo Choi L: linux-pm@vger.kernel.org -T: git git://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq.git +T: git git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux.git S: Supported F: drivers/devfreq/event/ F: drivers/devfreq/devfreq-event.c From d3f5d2a192a299f56579ae6e6283f9011b00208f Mon Sep 17 00:00:00 2001 From: Janakarajan Natarajan Date: Tue, 5 Nov 2019 17:16:52 +0000 Subject: [PATCH 34/83] cpupower: Move needs_root variable into a sub-struct Move the needs_root variable into a sub-struct. This is in preparation for adding a new flag for cpuidle_monitor. Update all uses of the needs_root variable to reflect this change. Signed-off-by: Janakarajan Natarajan Acked-by: Thomas Renninger Signed-off-by: Shuah Khan --- tools/power/cpupower/utils/idle_monitor/amd_fam14h_idle.c | 2 +- tools/power/cpupower/utils/idle_monitor/cpuidle_sysfs.c | 2 +- tools/power/cpupower/utils/idle_monitor/cpupower-monitor.c | 2 +- tools/power/cpupower/utils/idle_monitor/cpupower-monitor.h | 4 +++- tools/power/cpupower/utils/idle_monitor/hsw_ext_idle.c | 2 +- tools/power/cpupower/utils/idle_monitor/mperf_monitor.c | 2 +- tools/power/cpupower/utils/idle_monitor/nhm_idle.c | 2 +- tools/power/cpupower/utils/idle_monitor/snb_idle.c | 2 +- 8 files changed, 10 insertions(+), 8 deletions(-) diff --git a/tools/power/cpupower/utils/idle_monitor/amd_fam14h_idle.c b/tools/power/cpupower/utils/idle_monitor/amd_fam14h_idle.c index 3f893b99b337..33dc34db4f3c 100644 --- a/tools/power/cpupower/utils/idle_monitor/amd_fam14h_idle.c +++ b/tools/power/cpupower/utils/idle_monitor/amd_fam14h_idle.c @@ -328,7 +328,7 @@ struct cpuidle_monitor amd_fam14h_monitor = { .stop = amd_fam14h_stop, .do_register = amd_fam14h_register, .unregister = amd_fam14h_unregister, - .needs_root = 1, + .flags.needs_root = 1, .overflow_s = OVERFLOW_MS / 1000, }; #endif /* #if defined(__i386__) || defined(__x86_64__) */ diff --git a/tools/power/cpupower/utils/idle_monitor/cpuidle_sysfs.c b/tools/power/cpupower/utils/idle_monitor/cpuidle_sysfs.c index f634aeb65c5f..3c4cee160b0e 100644 --- a/tools/power/cpupower/utils/idle_monitor/cpuidle_sysfs.c +++ b/tools/power/cpupower/utils/idle_monitor/cpuidle_sysfs.c @@ -207,6 +207,6 @@ struct cpuidle_monitor cpuidle_sysfs_monitor = { .stop = cpuidle_stop, .do_register = cpuidle_register, .unregister = cpuidle_unregister, - .needs_root = 0, + .flags.needs_root = 0, .overflow_s = UINT_MAX, }; diff --git a/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.c b/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.c index d3c3e6e7aa26..6d44fec55ad5 100644 --- a/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.c +++ b/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.c @@ -408,7 +408,7 @@ int cmd_monitor(int argc, char **argv) dprint("Try to register: %s\n", all_monitors[num]->name); test_mon = all_monitors[num]->do_register(); if (test_mon) { - if (test_mon->needs_root && !run_as_root) { + if (test_mon->flags.needs_root && !run_as_root) { fprintf(stderr, _("Available monitor %s needs " "root access\n"), test_mon->name); continue; diff --git a/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.h b/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.h index a2d901d3bfaf..9b612d999660 100644 --- a/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.h +++ b/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.h @@ -60,7 +60,9 @@ struct cpuidle_monitor { struct cpuidle_monitor* (*do_register) (void); void (*unregister)(void); unsigned int overflow_s; - int needs_root; + struct { + unsigned int needs_root:1; + } flags; }; extern long long timespec_diff_us(struct timespec start, struct timespec end); diff --git a/tools/power/cpupower/utils/idle_monitor/hsw_ext_idle.c b/tools/power/cpupower/utils/idle_monitor/hsw_ext_idle.c index 58dbdfd4fa13..97ad3233a521 100644 --- a/tools/power/cpupower/utils/idle_monitor/hsw_ext_idle.c +++ b/tools/power/cpupower/utils/idle_monitor/hsw_ext_idle.c @@ -187,7 +187,7 @@ struct cpuidle_monitor intel_hsw_ext_monitor = { .stop = hsw_ext_stop, .do_register = hsw_ext_register, .unregister = hsw_ext_unregister, - .needs_root = 1, + .flags.needs_root = 1, .overflow_s = 922000000 /* 922337203 seconds TSC overflow at 20GHz */ }; diff --git a/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c b/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c index 44806a6dae11..7cae74202a4d 100644 --- a/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c +++ b/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c @@ -333,7 +333,7 @@ struct cpuidle_monitor mperf_monitor = { .stop = mperf_stop, .do_register = mperf_register, .unregister = mperf_unregister, - .needs_root = 1, + .flags.needs_root = 1, .overflow_s = 922000000 /* 922337203 seconds TSC overflow at 20GHz */ }; diff --git a/tools/power/cpupower/utils/idle_monitor/nhm_idle.c b/tools/power/cpupower/utils/idle_monitor/nhm_idle.c index be7256696a37..114271165182 100644 --- a/tools/power/cpupower/utils/idle_monitor/nhm_idle.c +++ b/tools/power/cpupower/utils/idle_monitor/nhm_idle.c @@ -208,7 +208,7 @@ struct cpuidle_monitor intel_nhm_monitor = { .stop = nhm_stop, .do_register = intel_nhm_register, .unregister = intel_nhm_unregister, - .needs_root = 1, + .flags.needs_root = 1, .overflow_s = 922000000 /* 922337203 seconds TSC overflow at 20GHz */ }; diff --git a/tools/power/cpupower/utils/idle_monitor/snb_idle.c b/tools/power/cpupower/utils/idle_monitor/snb_idle.c index 968333571cad..df8b223cc096 100644 --- a/tools/power/cpupower/utils/idle_monitor/snb_idle.c +++ b/tools/power/cpupower/utils/idle_monitor/snb_idle.c @@ -192,7 +192,7 @@ struct cpuidle_monitor intel_snb_monitor = { .stop = snb_stop, .do_register = snb_register, .unregister = snb_unregister, - .needs_root = 1, + .flags.needs_root = 1, .overflow_s = 922000000 /* 922337203 seconds TSC overflow at 20GHz */ }; From 7adafe541fe5e015261a92d39db8b163db477337 Mon Sep 17 00:00:00 2001 From: Janakarajan Natarajan Date: Tue, 5 Nov 2019 17:16:54 +0000 Subject: [PATCH 35/83] cpupower: mperf_monitor: Introduce per_cpu_schedule flag The per_cpu_schedule flag is used to move the cpupower process to the cpu on which we are looking to read the APERF/MPERF registers. This prevents IPIs from being generated by read_msr()s as we are already on the cpu of interest. Ex: If cpupower is running on CPU 0 and we execute read_msr(20, MSR_APERF, val) then, read_msr(20, MSR_MPERF, val) the msr module will generate an IPI from CPU 0 to CPU 20 to query for the MSR_APERF and then the MSR_MPERF in separate IPIs. This delay, caused by IPI latency, between reading the APERF and MPERF registers may cause both of them to go out of sync. The use of the per_cpu_schedule flag reduces the probability of this from happening. It comes at the cost of a negligible increase in cpu consumption caused by the migration of cpupower across each of the cpus of the system. Signed-off-by: Janakarajan Natarajan Acked-by: Thomas Renninger Signed-off-by: Shuah Khan --- .../utils/idle_monitor/cpupower-monitor.h | 1 + .../utils/idle_monitor/mperf_monitor.c | 44 ++++++++++++++----- 2 files changed, 34 insertions(+), 11 deletions(-) diff --git a/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.h b/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.h index 9b612d999660..5b5eb1da0cce 100644 --- a/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.h +++ b/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.h @@ -62,6 +62,7 @@ struct cpuidle_monitor { unsigned int overflow_s; struct { unsigned int needs_root:1; + unsigned int per_cpu_schedule:1; } flags; }; diff --git a/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c b/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c index 7cae74202a4d..afb2e6f8edd3 100644 --- a/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c +++ b/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c @@ -86,15 +86,35 @@ static int mperf_get_tsc(unsigned long long *tsc) return ret; } -static int mperf_init_stats(unsigned int cpu) +static int get_aperf_mperf(int cpu, unsigned long long *aval, + unsigned long long *mval) { - unsigned long long val; int ret; - ret = read_msr(cpu, MSR_APERF, &val); - aperf_previous_count[cpu] = val; - ret |= read_msr(cpu, MSR_MPERF, &val); - mperf_previous_count[cpu] = val; + /* + * Running on the cpu from which we read the registers will + * prevent APERF/MPERF from going out of sync because of IPI + * latency introduced by read_msr()s. + */ + if (mperf_monitor.flags.per_cpu_schedule) { + if (bind_cpu(cpu)) + return 1; + } + + ret = read_msr(cpu, MSR_APERF, aval); + ret |= read_msr(cpu, MSR_MPERF, mval); + + return ret; +} + +static int mperf_init_stats(unsigned int cpu) +{ + unsigned long long aval, mval; + int ret; + + ret = get_aperf_mperf(cpu, &aval, &mval); + aperf_previous_count[cpu] = aval; + mperf_previous_count[cpu] = mval; is_valid[cpu] = !ret; return 0; @@ -102,13 +122,12 @@ static int mperf_init_stats(unsigned int cpu) static int mperf_measure_stats(unsigned int cpu) { - unsigned long long val; + unsigned long long aval, mval; int ret; - ret = read_msr(cpu, MSR_APERF, &val); - aperf_current_count[cpu] = val; - ret |= read_msr(cpu, MSR_MPERF, &val); - mperf_current_count[cpu] = val; + ret = get_aperf_mperf(cpu, &aval, &mval); + aperf_current_count[cpu] = aval; + mperf_current_count[cpu] = mval; is_valid[cpu] = !ret; return 0; @@ -305,6 +324,9 @@ struct cpuidle_monitor *mperf_register(void) if (init_maxfreq_mode()) return NULL; + if (cpupower_cpu_info.vendor == X86_VENDOR_AMD) + mperf_monitor.flags.per_cpu_schedule = 1; + /* Free this at program termination */ is_valid = calloc(cpu_count, sizeof(int)); mperf_previous_count = calloc(cpu_count, sizeof(unsigned long long)); From 6af2ed53f0402c09b36d2b38698e18a25ca732a7 Mon Sep 17 00:00:00 2001 From: Janakarajan Natarajan Date: Tue, 5 Nov 2019 17:16:55 +0000 Subject: [PATCH 36/83] cpupower: mperf_monitor: Update cpupower to use the RDPRU instruction AMD Zen 2 introduces the RDPRU instruction which can be used to access some processor registers which are typically only accessible in privilege level 0. ECX specifies the register to read and EDX:EAX will contain the value read. ECX: 0 - Register MPERF 1 - Register APERF This has the added advantage of not having to use the msr module, since the userspace to kernel transitions which occur during each read_msr() might cause APERF and MPERF to go out of sync. Signed-off-by: Janakarajan Natarajan Acked-by: Thomas Renninger Signed-off-by: Shuah Khan --- tools/power/cpupower/utils/helpers/cpuid.c | 4 ++++ tools/power/cpupower/utils/helpers/helpers.h | 1 + .../utils/idle_monitor/mperf_monitor.c | 20 +++++++++++++++++++ 3 files changed, 25 insertions(+) diff --git a/tools/power/cpupower/utils/helpers/cpuid.c b/tools/power/cpupower/utils/helpers/cpuid.c index 5cc39d4e23ed..73bfafc60e9b 100644 --- a/tools/power/cpupower/utils/helpers/cpuid.c +++ b/tools/power/cpupower/utils/helpers/cpuid.c @@ -131,6 +131,10 @@ int get_cpu_info(struct cpupower_cpu_info *cpu_info) if (ext_cpuid_level >= 0x80000007 && (cpuid_edx(0x80000007) & (1 << 9))) cpu_info->caps |= CPUPOWER_CAP_AMD_CBP; + + if (ext_cpuid_level >= 0x80000008 && + cpuid_ebx(0x80000008) & (1 << 4)) + cpu_info->caps |= CPUPOWER_CAP_AMD_RDPRU; } if (cpu_info->vendor == X86_VENDOR_INTEL) { diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h index 357b19bb136e..c258eeccd05f 100644 --- a/tools/power/cpupower/utils/helpers/helpers.h +++ b/tools/power/cpupower/utils/helpers/helpers.h @@ -69,6 +69,7 @@ enum cpupower_cpu_vendor {X86_VENDOR_UNKNOWN = 0, X86_VENDOR_INTEL, #define CPUPOWER_CAP_HAS_TURBO_RATIO 0x00000010 #define CPUPOWER_CAP_IS_SNB 0x00000020 #define CPUPOWER_CAP_INTEL_IDA 0x00000040 +#define CPUPOWER_CAP_AMD_RDPRU 0x00000080 #define CPUPOWER_AMD_CPBDIS 0x02000000 diff --git a/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c b/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c index afb2e6f8edd3..e7d48cb563c0 100644 --- a/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c +++ b/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c @@ -19,6 +19,10 @@ #define MSR_APERF 0xE8 #define MSR_MPERF 0xE7 +#define RDPRU ".byte 0x0f, 0x01, 0xfd" +#define RDPRU_ECX_MPERF 0 +#define RDPRU_ECX_APERF 1 + #define MSR_TSC 0x10 #define MSR_AMD_HWCR 0xc0010015 @@ -89,6 +93,8 @@ static int mperf_get_tsc(unsigned long long *tsc) static int get_aperf_mperf(int cpu, unsigned long long *aval, unsigned long long *mval) { + unsigned long low_a, high_a; + unsigned long low_m, high_m; int ret; /* @@ -101,6 +107,20 @@ static int get_aperf_mperf(int cpu, unsigned long long *aval, return 1; } + if (cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_RDPRU) { + asm volatile(RDPRU + : "=a" (low_a), "=d" (high_a) + : "c" (RDPRU_ECX_APERF)); + asm volatile(RDPRU + : "=a" (low_m), "=d" (high_m) + : "c" (RDPRU_ECX_MPERF)); + + *aval = ((low_a) | (high_a) << 32); + *mval = ((low_m) | (high_m) << 32); + + return 0; + } + ret = read_msr(cpu, MSR_APERF, aval); ret |= read_msr(cpu, MSR_MPERF, mval); From 4611a4fb0cce3973dce8c9d74e5d6261ffa4210f Mon Sep 17 00:00:00 2001 From: Janakarajan Natarajan Date: Tue, 5 Nov 2019 17:16:57 +0000 Subject: [PATCH 37/83] cpupower: ToDo: Update ToDo with ideas for per_cpu_schedule handling Based on Thomas Renninger's feedback/ideas. Re-structure the code to better handle the per_cpu_schedule mechanism which was introduced when adding support for AMD Zen based processors. Signed-off-by: Janakarajan Natarajan Acked-by: Thomas Renninger Signed-off-by: Shuah Khan --- tools/power/cpupower/ToDo | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/tools/power/cpupower/ToDo b/tools/power/cpupower/ToDo index 6e8b89f282e6..b196a139a3e4 100644 --- a/tools/power/cpupower/ToDo +++ b/tools/power/cpupower/ToDo @@ -8,3 +8,17 @@ ToDos sorted by priority: - Add another c1e debug idle monitor -> Is by design racy with BIOS, but could be added with a --force option and some "be careful" messages +- Add cpu_start()/cpu_stop() callbacks for monitor + -> This is to move the per_cpu logic from inside the + monitor to outside it. This can be given higher + priority in fork_it. +- Fork as many processes as there are CPUs in case the + per_cpu_schedule flag is set. + -> Bind forked process to each cpu. + -> Execute start measures via the forked processes on + each cpu. + -> Run test executable in a forked process. + -> Execute stop measures via the forked processes on + each cpu. + This would be ideal as it will not introduce noise in the + tested executable. From fcbd8037f7df694aa7bfb7ce82c0c7f5e53e7b7b Mon Sep 17 00:00:00 2001 From: Lukasz Luba Date: Wed, 5 Jun 2019 11:12:32 +0200 Subject: [PATCH 38/83] include: dt-bindings: add Performance Monitoring Unit for Exynos This patch add support of a new feature which can be used in DT: Performance Monitoring Unit with defined event data type. In this patch the event data types are defined for Exynos PPMU. The patch also updates the MAINTAINERS file accordingly and adds the header file to devfreq event subsystem. Acked-by: Chanwoo Choi Reviewed-by: Rob Herring Signed-off-by: Lukasz Luba Signed-off-by: Chanwoo Choi --- MAINTAINERS | 1 + include/dt-bindings/pmu/exynos_ppmu.h | 25 +++++++++++++++++++++++++ 2 files changed, 26 insertions(+) create mode 100644 include/dt-bindings/pmu/exynos_ppmu.h diff --git a/MAINTAINERS b/MAINTAINERS index cba1095547fd..2bf811b5dfd4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4778,6 +4778,7 @@ T: git git://git.kernel.org/pub/scm/linux/kernel/git/mzx/devfreq.git S: Supported F: drivers/devfreq/event/ F: drivers/devfreq/devfreq-event.c +F: include/dt-bindings/pmu/exynos_ppmu.h F: include/linux/devfreq-event.h F: Documentation/devicetree/bindings/devfreq/event/ diff --git a/include/dt-bindings/pmu/exynos_ppmu.h b/include/dt-bindings/pmu/exynos_ppmu.h new file mode 100644 index 000000000000..8724abe130f3 --- /dev/null +++ b/include/dt-bindings/pmu/exynos_ppmu.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Samsung Exynos PPMU event types for counting in regs + * + * Copyright (c) 2019, Samsung Electronics + * Author: Lukasz Luba + */ + +#ifndef __DT_BINDINGS_PMU_EXYNOS_PPMU_H +#define __DT_BINDINGS_PMU_EXYNOS_PPMU_H + +#define PPMU_RO_BUSY_CYCLE_CNT 0x0 +#define PPMU_WO_BUSY_CYCLE_CNT 0x1 +#define PPMU_RW_BUSY_CYCLE_CNT 0x2 +#define PPMU_RO_REQUEST_CNT 0x3 +#define PPMU_WO_REQUEST_CNT 0x4 +#define PPMU_RO_DATA_CNT 0x5 +#define PPMU_WO_DATA_CNT 0x6 +#define PPMU_RO_LATENCY 0x12 +#define PPMU_WO_LATENCY 0x16 +#define PPMU_V2_RO_DATA_CNT 0x4 +#define PPMU_V2_WO_DATA_CNT 0x5 +#define PPMU_V2_EVT3_RW_DATA_CNT 0x22 + +#endif From 3b7b37846ba69232343a2f2e7eb5f66aaa81c071 Mon Sep 17 00:00:00 2001 From: Lukasz Luba Date: Wed, 5 Jun 2019 11:12:35 +0200 Subject: [PATCH 39/83] Documentation: devicetree: add PPMU events description Extend the documenation by events description with new 'event-data-type' field. Add example how the event might be defined in DT. Signed-off-by: Lukasz Luba Acked-by: Chanwoo Choi Signed-off-by: Chanwoo Choi --- .../bindings/devfreq/event/exynos-ppmu.txt | 26 +++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt b/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt index 3e36c1d11386..fb46b491791c 100644 --- a/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt +++ b/Documentation/devicetree/bindings/devfreq/event/exynos-ppmu.txt @@ -10,14 +10,23 @@ The Exynos PPMU driver uses the devfreq-event class to provide event data to various devfreq devices. The devfreq devices would use the event data when derterming the current state of each IP. -Required properties: +Required properties for PPMU device: - compatible: Should be "samsung,exynos-ppmu" or "samsung,exynos-ppmu-v2. - reg: physical base address of each PPMU and length of memory mapped region. -Optional properties: +Optional properties for PPMU device: - clock-names : the name of clock used by the PPMU, "ppmu" - clocks : phandles for clock specified in "clock-names" property +Required properties for 'events' child node of PPMU device: +- event-name : the unique event name among PPMU device +Optional properties for 'events' child node of PPMU device: +- event-data-type : Define the type of data which shell be counted +by the counter. You can check include/dt-bindings/pmu/exynos_ppmu.h for +all possible type, i.e. count read requests, count write data in bytes, +etc. This field is optional and when it is missing, the driver code +will use default data type. + Example1 : PPMUv1 nodes in exynos3250.dtsi are listed below. ppmu_dmc0: ppmu_dmc0@106a0000 { @@ -145,3 +154,16 @@ Example3 : PPMUv2 nodes in exynos5433.dtsi are listed below. reg = <0x104d0000 0x2000>; status = "disabled"; }; + +Example4 : 'event-data-type' in exynos4412-ppmu-common.dtsi are listed below. + + &ppmu_dmc0 { + status = "okay"; + events { + ppmu_dmc0_3: ppmu-event3-dmc0 { + event-name = "ppmu-event3-dmc0"; + event-data-type = <(PPMU_RO_DATA_CNT | + PPMU_WO_DATA_CNT)>; + }; + }; + }; From df4d7b1451bf51e75406b6339e964d816f8e947e Mon Sep 17 00:00:00 2001 From: Matthias Kaehlcke Date: Wed, 18 Sep 2019 17:09:46 -0700 Subject: [PATCH 40/83] PM / devfreq: Make log message more explicit when devfreq device already exists Before creating a new devfreq device devfreq_add_device() checks if there is already a devfreq dev associated with the requesting device (parent). If that's the case the function rejects to create another devfreq dev for that parent and logs an error. The error message is very unspecific, make it a bit more explicit. Reviewed-by: Chanwoo Choi Signed-off-by: Matthias Kaehlcke Signed-off-by: MyungJoo Ham Signed-off-by: Chanwoo Choi --- drivers/devfreq/devfreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index 446490c9d635..b905963cea7d 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -625,7 +625,7 @@ struct devfreq *devfreq_add_device(struct device *dev, devfreq = find_device_devfreq(dev); mutex_unlock(&devfreq_list_lock); if (!IS_ERR(devfreq)) { - dev_err(dev, "%s: Unable to create devfreq for the device.\n", + dev_err(dev, "%s: devfreq device already exists!\n", __func__); err = -EINVAL; goto err_out; From 36eba5deffac3793372ebad4899591103aa3a947 Mon Sep 17 00:00:00 2001 From: Kamil Konieczny Date: Wed, 7 Aug 2019 15:38:38 +0200 Subject: [PATCH 41/83] dt-bindings: devfreq: exynos-bus: Remove unused property Remove unused DT property "exynos,voltage-tolerance". Signed-off-by: Kamil Konieczny Acked-by: Chanwoo Choi Acked-by: Rob Herring Signed-off-by: Chanwoo Choi --- Documentation/devicetree/bindings/devfreq/exynos-bus.txt | 2 -- 1 file changed, 2 deletions(-) diff --git a/Documentation/devicetree/bindings/devfreq/exynos-bus.txt b/Documentation/devicetree/bindings/devfreq/exynos-bus.txt index f8e946471a58..e71f752cc18f 100644 --- a/Documentation/devicetree/bindings/devfreq/exynos-bus.txt +++ b/Documentation/devicetree/bindings/devfreq/exynos-bus.txt @@ -50,8 +50,6 @@ Required properties only for passive bus device: Optional properties only for parent bus device: - exynos,saturation-ratio: the percentage value which is used to calibrate the performance count against total cycle count. -- exynos,voltage-tolerance: the percentage value for bus voltage tolerance - which is used to calculate the max voltage. Detailed correlation between sub-blocks and power line according to Exynos SoC: - In case of Exynos3250, there are two power line as following: From d68adc8f85cd757bd33c8d7b2660ad6f16f7f3dc Mon Sep 17 00:00:00 2001 From: Leonard Crestez Date: Tue, 24 Sep 2019 10:26:53 +0300 Subject: [PATCH 42/83] PM / devfreq: Check NULL governor in available_governors_show The governor is initialized after sysfs attributes become visible so in theory the governor field can be NULL here. Fixes: bcf23c79c4e46 ("PM / devfreq: Fix available_governor sysfs") Signed-off-by: Leonard Crestez Reviewed-by: Matthias Kaehlcke Reviewed-by: Chanwoo Choi Signed-off-by: Chanwoo Choi --- drivers/devfreq/devfreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index b905963cea7d..60859a2400bc 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -1195,7 +1195,7 @@ static ssize_t available_governors_show(struct device *d, * The devfreq with immutable governor (e.g., passive) shows * only own governor. */ - if (df->governor->immutable) { + if (df->governor && df->governor->immutable) { count = scnprintf(&buf[count], DEVFREQ_NAME_LEN, "%s ", df->governor_name); /* From 2abb0d5268ae7b5ddf82099b1f8d5aa8414637d4 Mon Sep 17 00:00:00 2001 From: Leonard Crestez Date: Tue, 24 Sep 2019 10:52:23 +0300 Subject: [PATCH 43/83] PM / devfreq: Lock devfreq in trans_stat_show There is no locking in this sysfs show function so stats printing can race with a devfreq_update_status called as part of freq switching or with initialization. Also add an assert in devfreq_update_status to make it clear that lock must be held by caller. Fixes: 39688ce6facd ("PM / devfreq: account suspend/resume for stats") Cc: stable@vger.kernel.org Signed-off-by: Leonard Crestez Reviewed-by: Matthias Kaehlcke Reviewed-by: Chanwoo Choi Signed-off-by: Chanwoo Choi --- drivers/devfreq/devfreq.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index 60859a2400bc..d6c3dce9e9d5 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -160,6 +160,7 @@ int devfreq_update_status(struct devfreq *devfreq, unsigned long freq) int lev, prev_lev, ret = 0; unsigned long cur_time; + lockdep_assert_held(&devfreq->lock); cur_time = jiffies; /* Immediately exit if previous_freq is not initialized yet. */ @@ -1397,12 +1398,17 @@ static ssize_t trans_stat_show(struct device *dev, int i, j; unsigned int max_state = devfreq->profile->max_state; - if (!devfreq->stop_polling && - devfreq_update_status(devfreq, devfreq->previous_freq)) - return 0; if (max_state == 0) return sprintf(buf, "Not Supported.\n"); + mutex_lock(&devfreq->lock); + if (!devfreq->stop_polling && + devfreq_update_status(devfreq, devfreq->previous_freq)) { + mutex_unlock(&devfreq->lock); + return 0; + } + mutex_unlock(&devfreq->lock); + len = sprintf(buf, " From : To\n"); len += sprintf(buf + len, " :"); for (i = 0; i < max_state; i++) From 1f125dee4feda21bca39ed7f1165198d96fca233 Mon Sep 17 00:00:00 2001 From: Marek Szyprowski Date: Tue, 1 Oct 2019 14:46:41 +0200 Subject: [PATCH 44/83] PM / devfreq: exynos-ppmu: remove useless assignment The error code is propagated to the caller, so there is no need to keep it additionally in the unused variable. Signed-off-by: Marek Szyprowski Acked-by: Chanwoo Choi Signed-off-by: Chanwoo Choi --- drivers/devfreq/event/exynos-ppmu.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/devfreq/event/exynos-ppmu.c b/drivers/devfreq/event/exynos-ppmu.c index 87b42055e6bc..85c7a77bf3f0 100644 --- a/drivers/devfreq/event/exynos-ppmu.c +++ b/drivers/devfreq/event/exynos-ppmu.c @@ -673,7 +673,6 @@ static int exynos_ppmu_probe(struct platform_device *pdev) for (i = 0; i < info->num_events; i++) { edev[i] = devm_devfreq_event_add_edev(&pdev->dev, &desc[i]); if (IS_ERR(edev[i])) { - ret = PTR_ERR(edev[i]); dev_err(&pdev->dev, "failed to add devfreq-event device\n"); return PTR_ERR(edev[i]); From dccdea01adf3b1413fbbb342128702741069cb6c Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:55:59 +0300 Subject: [PATCH 45/83] PM / devfreq: tegra30: Change irq type to unsigned int IRQ numbers are always positive, hence the corresponding variable should be unsigned to keep types consistent. This is a minor change that cleans up code a tad more. Suggested-by: Thierry Reding Acked-by: MyungJoo Ham Reviewed-by: Chanwoo Choi Tested-by: Peter Geis Signed-off-by: Dmitry Osipenko Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index a6ba75f4106d..a27300f40b0b 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -160,7 +160,7 @@ struct tegra_devfreq { struct tegra_devfreq_device devices[ARRAY_SIZE(actmon_device_configs)]; - int irq; + unsigned int irq; }; struct tegra_actmon_emc_ratio { @@ -618,12 +618,12 @@ static int tegra_devfreq_probe(struct platform_device *pdev) return PTR_ERR(tegra->emc_clock); } - tegra->irq = platform_get_irq(pdev, 0); - if (tegra->irq < 0) { - err = tegra->irq; + err = platform_get_irq(pdev, 0); + if (err < 0) { dev_err(&pdev->dev, "Failed to get IRQ: %d\n", err); return err; } + tegra->irq = err; reset_control_assert(tegra->reset); From d49eeb1e838594b8fea9ac29736d7b4e949a530f Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:00 +0300 Subject: [PATCH 46/83] PM / devfreq: tegra30: Keep interrupt disabled while governor is stopped There is no real need to keep interrupt always-enabled, will be nicer to keep it disabled while governor is inactive. Suggested-by: Thierry Reding Reviewed-by: Chanwoo Choi Tested-by: Peter Geis Signed-off-by: Dmitry Osipenko Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 47 ++++++++++++++++--------------- 1 file changed, 24 insertions(+), 23 deletions(-) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index a27300f40b0b..a0a5f3f7b789 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -416,8 +417,6 @@ static void tegra_actmon_start(struct tegra_devfreq *tegra) { unsigned int i; - disable_irq(tegra->irq); - actmon_writel(tegra, ACTMON_SAMPLING_PERIOD - 1, ACTMON_GLB_PERIOD_CTRL); @@ -442,8 +441,6 @@ static void tegra_actmon_stop(struct tegra_devfreq *tegra) } actmon_write_barrier(tegra); - - enable_irq(tegra->irq); } static int tegra_devfreq_target(struct device *dev, unsigned long *freq, @@ -552,6 +549,12 @@ static int tegra_governor_event_handler(struct devfreq *devfreq, { struct tegra_devfreq *tegra = dev_get_drvdata(devfreq->dev.parent); + /* + * Couple devfreq-device with the governor early because it is + * needed at the moment of governor's start (used by ISR). + */ + tegra->devfreq = devfreq; + switch (event) { case DEVFREQ_GOV_START: devfreq_monitor_start(devfreq); @@ -586,10 +589,11 @@ static struct devfreq_governor tegra_devfreq_governor = { static int tegra_devfreq_probe(struct platform_device *pdev) { - struct tegra_devfreq *tegra; struct tegra_devfreq_device *dev; - unsigned int i; + struct tegra_devfreq *tegra; + struct devfreq *devfreq; unsigned long rate; + unsigned int i; int err; tegra = devm_kzalloc(&pdev->dev, sizeof(*tegra), GFP_KERNEL); @@ -625,6 +629,16 @@ static int tegra_devfreq_probe(struct platform_device *pdev) } tegra->irq = err; + irq_set_status_flags(tegra->irq, IRQ_NOAUTOEN); + + err = devm_request_threaded_irq(&pdev->dev, tegra->irq, NULL, + actmon_thread_isr, IRQF_ONESHOT, + "tegra-devfreq", tegra); + if (err) { + dev_err(&pdev->dev, "Interrupt request failed: %d\n", err); + return err; + } + reset_control_assert(tegra->reset); err = clk_prepare_enable(tegra->clock); @@ -672,28 +686,15 @@ static int tegra_devfreq_probe(struct platform_device *pdev) } tegra_devfreq_profile.initial_freq = clk_get_rate(tegra->emc_clock); - tegra->devfreq = devfreq_add_device(&pdev->dev, - &tegra_devfreq_profile, - "tegra_actmon", - NULL); - if (IS_ERR(tegra->devfreq)) { - err = PTR_ERR(tegra->devfreq); + devfreq = devfreq_add_device(&pdev->dev, &tegra_devfreq_profile, + "tegra_actmon", NULL); + if (IS_ERR(devfreq)) { + err = PTR_ERR(devfreq); goto remove_governor; } - err = devm_request_threaded_irq(&pdev->dev, tegra->irq, NULL, - actmon_thread_isr, IRQF_ONESHOT, - "tegra-devfreq", tegra); - if (err) { - dev_err(&pdev->dev, "Interrupt request failed: %d\n", err); - goto remove_devfreq; - } - return 0; -remove_devfreq: - devfreq_remove_device(tegra->devfreq); - remove_governor: devfreq_remove_governor(&tegra_devfreq_governor); From 7296443b900e6a432da3d6b1a7862beb83946d19 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:01 +0300 Subject: [PATCH 47/83] PM / devfreq: tegra30: Handle possible round-rate error The EMC clock rate rounding technically could fail, hence let's handle the error cases properly. Reviewed-by: Chanwoo Choi Tested-by: Peter Geis Signed-off-by: Dmitry Osipenko Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index a0a5f3f7b789..66dfa98d8c6b 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -592,8 +592,8 @@ static int tegra_devfreq_probe(struct platform_device *pdev) struct tegra_devfreq_device *dev; struct tegra_devfreq *tegra; struct devfreq *devfreq; - unsigned long rate; unsigned int i; + long rate; int err; tegra = devm_kzalloc(&pdev->dev, sizeof(*tegra), GFP_KERNEL); @@ -650,8 +650,14 @@ static int tegra_devfreq_probe(struct platform_device *pdev) reset_control_deassert(tegra->reset); - tegra->max_freq = clk_round_rate(tegra->emc_clock, ULONG_MAX) / KHZ; + rate = clk_round_rate(tegra->emc_clock, ULONG_MAX); + if (rate < 0) { + dev_err(&pdev->dev, "Failed to round clock rate: %ld\n", rate); + return rate; + } + tegra->cur_freq = clk_get_rate(tegra->emc_clock) / KHZ; + tegra->max_freq = rate / KHZ; for (i = 0; i < ARRAY_SIZE(actmon_device_configs); i++) { dev = tegra->devices + i; @@ -662,6 +668,13 @@ static int tegra_devfreq_probe(struct platform_device *pdev) for (rate = 0; rate <= tegra->max_freq * KHZ; rate++) { rate = clk_round_rate(tegra->emc_clock, rate); + if (rate < 0) { + dev_err(&pdev->dev, + "Failed to round clock rate: %ld\n", rate); + err = rate; + goto remove_opps; + } + err = dev_pm_opp_add(&pdev->dev, rate, 0); if (err) { dev_err(&pdev->dev, "Failed to add OPP: %d\n", err); From e7955a34a2344b5a237aabd4d8a43e30c3183b79 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:02 +0300 Subject: [PATCH 48/83] PM / devfreq: tegra30: Drop write-barrier There is no need in a write-barrier now, given that interrupt masking is handled by CPU's GIC now. Hence we know exactly that interrupt won't fire after stopping the devfreq's governor. In other cases we don't care about potential buffering of the writes to hardware and thus there is no need to stall CPU. Reviewed-by: Chanwoo Choi Tested-by: Peter Geis Signed-off-by: Dmitry Osipenko Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 14 -------------- 1 file changed, 14 deletions(-) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index 66dfa98d8c6b..b50bd1615010 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -230,12 +230,6 @@ static void tegra_devfreq_update_wmark(struct tegra_devfreq *tegra, ACTMON_DEV_LOWER_WMARK); } -static void actmon_write_barrier(struct tegra_devfreq *tegra) -{ - /* ensure the update has reached the ACTMON */ - readl(tegra->regs + ACTMON_GLB_STATUS); -} - static void actmon_isr_device(struct tegra_devfreq *tegra, struct tegra_devfreq_device *dev) { @@ -287,8 +281,6 @@ static void actmon_isr_device(struct tegra_devfreq *tegra, device_writel(dev, dev_ctrl, ACTMON_DEV_CTRL); device_writel(dev, ACTMON_INTR_STATUS_CLEAR, ACTMON_DEV_INTR_STATUS); - - actmon_write_barrier(tegra); } static unsigned long actmon_cpu_to_emc_rate(struct tegra_devfreq *tegra, @@ -376,8 +368,6 @@ static int tegra_actmon_rate_notify_cb(struct notifier_block *nb, tegra_devfreq_update_wmark(tegra, dev); } - actmon_write_barrier(tegra); - return NOTIFY_OK; } @@ -423,8 +413,6 @@ static void tegra_actmon_start(struct tegra_devfreq *tegra) for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) tegra_actmon_configure_device(tegra, &tegra->devices[i]); - actmon_write_barrier(tegra); - enable_irq(tegra->irq); } @@ -439,8 +427,6 @@ static void tegra_actmon_stop(struct tegra_devfreq *tegra) device_writel(&tegra->devices[i], ACTMON_INTR_STATUS_CLEAR, ACTMON_DEV_INTR_STATUS); } - - actmon_write_barrier(tegra); } static int tegra_devfreq_target(struct device *dev, unsigned long *freq, From 53b4b2aeee26f42cde5ff2a16dd0d8590c51a55a Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:03 +0300 Subject: [PATCH 49/83] PM / devfreq: tegra30: Fix integer overflow on CPU's freq max out There is another kHz-conversion bug in the code, resulting in integer overflow. Although, this time the resulting value is 4294966296 and it's close to ULONG_MAX, which is okay in this case. Reviewed-by: Chanwoo Choi Tested-by: Peter Geis Signed-off-by: Dmitry Osipenko Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index b50bd1615010..7d7b7eecc19c 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -69,6 +69,8 @@ #define KHZ 1000 +#define KHZ_MAX (ULONG_MAX / KHZ) + /* Assume that the bus is saturated if the utilization is 25% */ #define BUS_SATURATION_RATIO 25 @@ -170,7 +172,7 @@ struct tegra_actmon_emc_ratio { }; static struct tegra_actmon_emc_ratio actmon_emc_ratios[] = { - { 1400000, ULONG_MAX }, + { 1400000, KHZ_MAX }, { 1200000, 750000 }, { 1100000, 600000 }, { 1000000, 500000 }, From 0ce3884654d186842eed858b6d41925539696454 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:04 +0300 Subject: [PATCH 50/83] PM / devfreq: tegra30: Use kHz units uniformly in the code Part of the code uses Hz units and the other kHz, let's switch to kHz everywhere for consistency. A small benefit from this change (besides code's cleanup) is that now powertop utility correctly displays devfreq's stats, for some reason it expects them to be in kHz. Tested-by: Peter Geis Reviewed-by: Chanwoo Choi Signed-off-by: Dmitry Osipenko Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index 7d7b7eecc19c..9ccde64be0a0 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -448,7 +448,7 @@ static int tegra_devfreq_target(struct device *dev, unsigned long *freq, rate = dev_pm_opp_get_freq(opp); dev_pm_opp_put(opp); - err = clk_set_min_rate(tegra->emc_clock, rate); + err = clk_set_min_rate(tegra->emc_clock, rate * KHZ); if (err) return err; @@ -477,7 +477,7 @@ static int tegra_devfreq_get_dev_status(struct device *dev, stat->private_data = tegra; /* The below are to be used by the other governors */ - stat->current_frequency = cur_freq * KHZ; + stat->current_frequency = cur_freq; actmon_dev = &tegra->devices[MCALL]; @@ -527,7 +527,7 @@ static int tegra_governor_get_target(struct devfreq *devfreq, target_freq = max(target_freq, dev->target_freq); } - *freq = target_freq * KHZ; + *freq = target_freq; return 0; } @@ -663,7 +663,7 @@ static int tegra_devfreq_probe(struct platform_device *pdev) goto remove_opps; } - err = dev_pm_opp_add(&pdev->dev, rate, 0); + err = dev_pm_opp_add(&pdev->dev, rate / KHZ, 0); if (err) { dev_err(&pdev->dev, "Failed to add OPP: %d\n", err); goto remove_opps; @@ -686,7 +686,8 @@ static int tegra_devfreq_probe(struct platform_device *pdev) goto unreg_notifier; } - tegra_devfreq_profile.initial_freq = clk_get_rate(tegra->emc_clock); + tegra_devfreq_profile.initial_freq = tegra->cur_freq; + devfreq = devfreq_add_device(&pdev->dev, &tegra_devfreq_profile, "tegra_actmon", NULL); if (IS_ERR(devfreq)) { From 11eb6ec5c0d4ad513c62e33009bcb15d84dfff56 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:05 +0300 Subject: [PATCH 51/83] PM / devfreq: tegra30: Use CPUFreq notifier The CPU's client need to take into account that CPUFreq may change while memory activity not, staying high. Thus an appropriate frequency notifier should be used in addition to the clk-notifier. Reviewed-by: Chanwoo Choi Tested-by: Peter Geis Signed-off-by: Dmitry Osipenko Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 180 +++++++++++++++++++++++++----- 1 file changed, 155 insertions(+), 25 deletions(-) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index 9ccde64be0a0..2d720e7e2236 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "governor.h" @@ -34,6 +35,8 @@ #define ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN BIT(30) #define ACTMON_DEV_CTRL_ENB BIT(31) +#define ACTMON_DEV_CTRL_STOP 0x00000000 + #define ACTMON_DEV_UPPER_WMARK 0x4 #define ACTMON_DEV_LOWER_WMARK 0x8 #define ACTMON_DEV_INIT_AVG 0xc @@ -159,7 +162,10 @@ struct tegra_devfreq { struct clk *emc_clock; unsigned long max_freq; unsigned long cur_freq; - struct notifier_block rate_change_nb; + struct notifier_block clk_rate_change_nb; + + struct delayed_work cpufreq_update_work; + struct notifier_block cpu_rate_change_nb; struct tegra_devfreq_device devices[ARRAY_SIZE(actmon_device_configs)]; @@ -303,22 +309,32 @@ static unsigned long actmon_cpu_to_emc_rate(struct tegra_devfreq *tegra, return 0; } +static unsigned long actmon_device_target_freq(struct tegra_devfreq *tegra, + struct tegra_devfreq_device *dev) +{ + unsigned int avg_sustain_coef; + unsigned long target_freq; + + target_freq = dev->avg_count / ACTMON_SAMPLING_PERIOD; + avg_sustain_coef = 100 * 100 / dev->config->boost_up_threshold; + target_freq = do_percent(target_freq, avg_sustain_coef); + target_freq += dev->boost_freq; + + return target_freq; +} + static void actmon_update_target(struct tegra_devfreq *tegra, struct tegra_devfreq_device *dev) { unsigned long cpu_freq = 0; unsigned long static_cpu_emc_freq = 0; - unsigned int avg_sustain_coef; if (dev->config->avg_dependency_threshold) { - cpu_freq = cpufreq_get(0); + cpu_freq = cpufreq_quick_get(0); static_cpu_emc_freq = actmon_cpu_to_emc_rate(tegra, cpu_freq); } - dev->target_freq = dev->avg_count / ACTMON_SAMPLING_PERIOD; - avg_sustain_coef = 100 * 100 / dev->config->boost_up_threshold; - dev->target_freq = do_percent(dev->target_freq, avg_sustain_coef); - dev->target_freq += dev->boost_freq; + dev->target_freq = actmon_device_target_freq(tegra, dev); if (dev->avg_count >= dev->config->avg_dependency_threshold) dev->target_freq = max(dev->target_freq, static_cpu_emc_freq); @@ -349,8 +365,8 @@ static irqreturn_t actmon_thread_isr(int irq, void *data) return handled ? IRQ_HANDLED : IRQ_NONE; } -static int tegra_actmon_rate_notify_cb(struct notifier_block *nb, - unsigned long action, void *ptr) +static int tegra_actmon_clk_notify_cb(struct notifier_block *nb, + unsigned long action, void *ptr) { struct clk_notifier_data *data = ptr; struct tegra_devfreq *tegra; @@ -360,7 +376,7 @@ static int tegra_actmon_rate_notify_cb(struct notifier_block *nb, if (action != POST_RATE_CHANGE) return NOTIFY_OK; - tegra = container_of(nb, struct tegra_devfreq, rate_change_nb); + tegra = container_of(nb, struct tegra_devfreq, clk_rate_change_nb); tegra->cur_freq = data->new_rate / KHZ; @@ -373,6 +389,79 @@ static int tegra_actmon_rate_notify_cb(struct notifier_block *nb, return NOTIFY_OK; } +static void tegra_actmon_delayed_update(struct work_struct *work) +{ + struct tegra_devfreq *tegra = container_of(work, struct tegra_devfreq, + cpufreq_update_work.work); + + mutex_lock(&tegra->devfreq->lock); + update_devfreq(tegra->devfreq); + mutex_unlock(&tegra->devfreq->lock); +} + +static unsigned long +tegra_actmon_cpufreq_contribution(struct tegra_devfreq *tegra, + unsigned int cpu_freq) +{ + unsigned long static_cpu_emc_freq, dev_freq; + + /* check whether CPU's freq is taken into account at all */ + if (tegra->devices[MCCPU].avg_count < + tegra->devices[MCCPU].config->avg_dependency_threshold) + return 0; + + static_cpu_emc_freq = actmon_cpu_to_emc_rate(tegra, cpu_freq); + dev_freq = actmon_device_target_freq(tegra, &tegra->devices[MCCPU]); + + if (dev_freq >= static_cpu_emc_freq) + return 0; + + return static_cpu_emc_freq; +} + +static int tegra_actmon_cpu_notify_cb(struct notifier_block *nb, + unsigned long action, void *ptr) +{ + struct cpufreq_freqs *freqs = ptr; + struct tegra_devfreq *tegra; + unsigned long old, new, delay; + + if (action != CPUFREQ_POSTCHANGE) + return NOTIFY_OK; + + tegra = container_of(nb, struct tegra_devfreq, cpu_rate_change_nb); + + /* + * Quickly check whether CPU frequency should be taken into account + * at all, without blocking CPUFreq's core. + */ + if (mutex_trylock(&tegra->devfreq->lock)) { + old = tegra_actmon_cpufreq_contribution(tegra, freqs->old); + new = tegra_actmon_cpufreq_contribution(tegra, freqs->new); + mutex_unlock(&tegra->devfreq->lock); + + /* + * If CPU's frequency shouldn't be taken into account at + * the moment, then there is no need to update the devfreq's + * state because ISR will re-check CPU's frequency on the + * next interrupt. + */ + if (old == new) + return NOTIFY_OK; + } + + /* + * CPUFreq driver should support CPUFREQ_ASYNC_NOTIFICATION in order + * to allow asynchronous notifications. This means we can't block + * here for too long, otherwise CPUFreq's core will complain with a + * warning splat. + */ + delay = msecs_to_jiffies(ACTMON_SAMPLING_PERIOD); + schedule_delayed_work(&tegra->cpufreq_update_work, delay); + + return NOTIFY_OK; +} + static void tegra_actmon_configure_device(struct tegra_devfreq *tegra, struct tegra_devfreq_device *dev) { @@ -405,9 +494,22 @@ static void tegra_actmon_configure_device(struct tegra_devfreq *tegra, device_writel(dev, val, ACTMON_DEV_CTRL); } -static void tegra_actmon_start(struct tegra_devfreq *tegra) +static void tegra_actmon_stop_devices(struct tegra_devfreq *tegra) +{ + struct tegra_devfreq_device *dev = tegra->devices; + unsigned int i; + + for (i = 0; i < ARRAY_SIZE(tegra->devices); i++, dev++) { + device_writel(dev, ACTMON_DEV_CTRL_STOP, ACTMON_DEV_CTRL); + device_writel(dev, ACTMON_INTR_STATUS_CLEAR, + ACTMON_DEV_INTR_STATUS); + } +} + +static int tegra_actmon_start(struct tegra_devfreq *tegra) { unsigned int i; + int err; actmon_writel(tegra, ACTMON_SAMPLING_PERIOD - 1, ACTMON_GLB_PERIOD_CTRL); @@ -415,20 +517,41 @@ static void tegra_actmon_start(struct tegra_devfreq *tegra) for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) tegra_actmon_configure_device(tegra, &tegra->devices[i]); + /* + * We are estimating CPU's memory bandwidth requirement based on + * amount of memory accesses and system's load, judging by CPU's + * frequency. We also don't want to receive events about CPU's + * frequency transaction when governor is stopped, hence notifier + * is registered dynamically. + */ + err = cpufreq_register_notifier(&tegra->cpu_rate_change_nb, + CPUFREQ_TRANSITION_NOTIFIER); + if (err) { + dev_err(tegra->devfreq->dev.parent, + "Failed to register rate change notifier: %d\n", err); + goto err_stop; + } + enable_irq(tegra->irq); + + return 0; + +err_stop: + tegra_actmon_stop_devices(tegra); + + return err; } static void tegra_actmon_stop(struct tegra_devfreq *tegra) { - unsigned int i; - disable_irq(tegra->irq); - for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) { - device_writel(&tegra->devices[i], 0x00000000, ACTMON_DEV_CTRL); - device_writel(&tegra->devices[i], ACTMON_INTR_STATUS_CLEAR, - ACTMON_DEV_INTR_STATUS); - } + cpufreq_unregister_notifier(&tegra->cpu_rate_change_nb, + CPUFREQ_TRANSITION_NOTIFIER); + + cancel_delayed_work_sync(&tegra->cpufreq_update_work); + + tegra_actmon_stop_devices(tegra); } static int tegra_devfreq_target(struct device *dev, unsigned long *freq, @@ -536,6 +659,7 @@ static int tegra_governor_event_handler(struct devfreq *devfreq, unsigned int event, void *data) { struct tegra_devfreq *tegra = dev_get_drvdata(devfreq->dev.parent); + int ret = 0; /* * Couple devfreq-device with the governor early because it is @@ -546,7 +670,7 @@ static int tegra_governor_event_handler(struct devfreq *devfreq, switch (event) { case DEVFREQ_GOV_START: devfreq_monitor_start(devfreq); - tegra_actmon_start(tegra); + ret = tegra_actmon_start(tegra); break; case DEVFREQ_GOV_STOP: @@ -561,11 +685,11 @@ static int tegra_governor_event_handler(struct devfreq *devfreq, case DEVFREQ_GOV_RESUME: devfreq_monitor_resume(devfreq); - tegra_actmon_start(tegra); + ret = tegra_actmon_start(tegra); break; } - return 0; + return ret; } static struct devfreq_governor tegra_devfreq_governor = { @@ -672,8 +796,14 @@ static int tegra_devfreq_probe(struct platform_device *pdev) platform_set_drvdata(pdev, tegra); - tegra->rate_change_nb.notifier_call = tegra_actmon_rate_notify_cb; - err = clk_notifier_register(tegra->emc_clock, &tegra->rate_change_nb); + tegra->cpu_rate_change_nb.notifier_call = tegra_actmon_cpu_notify_cb; + + INIT_DELAYED_WORK(&tegra->cpufreq_update_work, + tegra_actmon_delayed_update); + + tegra->clk_rate_change_nb.notifier_call = tegra_actmon_clk_notify_cb; + err = clk_notifier_register(tegra->emc_clock, + &tegra->clk_rate_change_nb); if (err) { dev_err(&pdev->dev, "Failed to register rate change notifier\n"); @@ -701,7 +831,7 @@ static int tegra_devfreq_probe(struct platform_device *pdev) devfreq_remove_governor(&tegra_devfreq_governor); unreg_notifier: - clk_notifier_unregister(tegra->emc_clock, &tegra->rate_change_nb); + clk_notifier_unregister(tegra->emc_clock, &tegra->clk_rate_change_nb); remove_opps: dev_pm_opp_remove_all_dynamic(&pdev->dev); @@ -719,7 +849,7 @@ static int tegra_devfreq_remove(struct platform_device *pdev) devfreq_remove_device(tegra->devfreq); devfreq_remove_governor(&tegra_devfreq_governor); - clk_notifier_unregister(tegra->emc_clock, &tegra->rate_change_nb); + clk_notifier_unregister(tegra->emc_clock, &tegra->clk_rate_change_nb); dev_pm_opp_remove_all_dynamic(&pdev->dev); reset_control_reset(tegra->reset); From 6f2a35d65b3c4bbc579a115589edcc7ed6239dce Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:06 +0300 Subject: [PATCH 52/83] PM / devfreq: tegra30: Move clk-notifier's registration to governor's start There is no point in receiving of the notifications while governor is stopped, let's keep them disabled like we do for the CPU freq-change notifications. This also fixes a potential use-after-free bug if notification happens after device's removal. Reviewed-by: Chanwoo Choi Tested-by: Peter Geis Signed-off-by: Dmitry Osipenko Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 39 ++++++++++++++++++------------- 1 file changed, 23 insertions(+), 16 deletions(-) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index 2d720e7e2236..6960d8ba0577 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -514,6 +514,21 @@ static int tegra_actmon_start(struct tegra_devfreq *tegra) actmon_writel(tegra, ACTMON_SAMPLING_PERIOD - 1, ACTMON_GLB_PERIOD_CTRL); + /* + * CLK notifications are needed in order to reconfigure the upper + * consecutive watermark in accordance to the actual clock rate + * to avoid unnecessary upper interrupts. + */ + err = clk_notifier_register(tegra->emc_clock, + &tegra->clk_rate_change_nb); + if (err) { + dev_err(tegra->devfreq->dev.parent, + "Failed to register rate change notifier\n"); + return err; + } + + tegra->cur_freq = clk_get_rate(tegra->emc_clock) / KHZ; + for (i = 0; i < ARRAY_SIZE(tegra->devices); i++) tegra_actmon_configure_device(tegra, &tegra->devices[i]); @@ -539,6 +554,8 @@ static int tegra_actmon_start(struct tegra_devfreq *tegra) err_stop: tegra_actmon_stop_devices(tegra); + clk_notifier_unregister(tegra->emc_clock, &tegra->clk_rate_change_nb); + return err; } @@ -552,6 +569,8 @@ static void tegra_actmon_stop(struct tegra_devfreq *tegra) cancel_delayed_work_sync(&tegra->cpufreq_update_work); tegra_actmon_stop_devices(tegra); + + clk_notifier_unregister(tegra->emc_clock, &tegra->clk_rate_change_nb); } static int tegra_devfreq_target(struct device *dev, unsigned long *freq, @@ -768,7 +787,6 @@ static int tegra_devfreq_probe(struct platform_device *pdev) return rate; } - tegra->cur_freq = clk_get_rate(tegra->emc_clock) / KHZ; tegra->max_freq = rate / KHZ; for (i = 0; i < ARRAY_SIZE(actmon_device_configs); i++) { @@ -796,27 +814,20 @@ static int tegra_devfreq_probe(struct platform_device *pdev) platform_set_drvdata(pdev, tegra); + tegra->clk_rate_change_nb.notifier_call = tegra_actmon_clk_notify_cb; tegra->cpu_rate_change_nb.notifier_call = tegra_actmon_cpu_notify_cb; INIT_DELAYED_WORK(&tegra->cpufreq_update_work, tegra_actmon_delayed_update); - tegra->clk_rate_change_nb.notifier_call = tegra_actmon_clk_notify_cb; - err = clk_notifier_register(tegra->emc_clock, - &tegra->clk_rate_change_nb); - if (err) { - dev_err(&pdev->dev, - "Failed to register rate change notifier\n"); - goto remove_opps; - } - err = devfreq_add_governor(&tegra_devfreq_governor); if (err) { dev_err(&pdev->dev, "Failed to add governor: %d\n", err); - goto unreg_notifier; + goto remove_opps; } - tegra_devfreq_profile.initial_freq = tegra->cur_freq; + tegra_devfreq_profile.initial_freq = clk_get_rate(tegra->emc_clock); + tegra_devfreq_profile.initial_freq /= KHZ; devfreq = devfreq_add_device(&pdev->dev, &tegra_devfreq_profile, "tegra_actmon", NULL); @@ -830,9 +841,6 @@ static int tegra_devfreq_probe(struct platform_device *pdev) remove_governor: devfreq_remove_governor(&tegra_devfreq_governor); -unreg_notifier: - clk_notifier_unregister(tegra->emc_clock, &tegra->clk_rate_change_nb); - remove_opps: dev_pm_opp_remove_all_dynamic(&pdev->dev); @@ -849,7 +857,6 @@ static int tegra_devfreq_remove(struct platform_device *pdev) devfreq_remove_device(tegra->devfreq); devfreq_remove_governor(&tegra_devfreq_governor); - clk_notifier_unregister(tegra->emc_clock, &tegra->clk_rate_change_nb); dev_pm_opp_remove_all_dynamic(&pdev->dev); reset_control_reset(tegra->reset); From 142665582736f7f0a5c11a606271705020ba2f4e Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:07 +0300 Subject: [PATCH 53/83] PM / devfreq: tegra30: Reset boosting on startup Governor could be stopped while boosting is active. We have assumption that everything is reset on governor's restart, including the boosting value, which was missed. Reviewed-by: Chanwoo Choi Tested-by: Peter Geis Signed-off-by: Dmitry Osipenko Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index 6960d8ba0577..9cb2d6468175 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -467,6 +467,9 @@ static void tegra_actmon_configure_device(struct tegra_devfreq *tegra, { u32 val = 0; + /* reset boosting on governor's restart */ + dev->boost_freq = 0; + dev->target_freq = tegra->cur_freq; dev->avg_count = tegra->cur_freq * ACTMON_SAMPLING_PERIOD; From 61d932084174ceb4876a2b1a792dceb4fe5526eb Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:08 +0300 Subject: [PATCH 54/83] PM / devfreq: tegra30: Don't enable consecutive-down interrupt on startup The consecutive-down event tells that we should perform frequency de-boosting, but boosting is in a reset state on start and hence the event won't do anything useful for us and it will be just a dummy interrupt request. Reviewed-by: Chanwoo Choi Tested-by: Peter Geis Signed-off-by: Dmitry Osipenko Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index 9cb2d6468175..bc46af155b99 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -490,7 +490,6 @@ static void tegra_actmon_configure_device(struct tegra_devfreq *tegra, << ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_NUM_SHIFT; val |= ACTMON_DEV_CTRL_AVG_ABOVE_WMARK_EN; val |= ACTMON_DEV_CTRL_AVG_BELOW_WMARK_EN; - val |= ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN; val |= ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN; val |= ACTMON_DEV_CTRL_ENB; From b87dea3bbab276fc34f5cc65749f1f39f213afd0 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:09 +0300 Subject: [PATCH 55/83] PM / devfreq: tegra30: Constify structs Constify unmodifiable structs, for consistency. Reviewed-by: Chanwoo Choi Tested-by: Peter Geis Signed-off-by: Dmitry Osipenko Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index bc46af155b99..9bd4dd982927 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -108,7 +108,7 @@ enum tegra_actmon_device { MCCPU, }; -static struct tegra_devfreq_device_config actmon_device_configs[] = { +static const struct tegra_devfreq_device_config actmon_device_configs[] = { { /* MCALL: All memory accesses (including from the CPUs) */ .offset = 0x1c0, @@ -177,7 +177,7 @@ struct tegra_actmon_emc_ratio { unsigned long emc_freq; }; -static struct tegra_actmon_emc_ratio actmon_emc_ratios[] = { +static const struct tegra_actmon_emc_ratio actmon_emc_ratios[] = { { 1400000, KHZ_MAX }, { 1200000, 750000 }, { 1100000, 600000 }, @@ -295,7 +295,7 @@ static unsigned long actmon_cpu_to_emc_rate(struct tegra_devfreq *tegra, unsigned long cpu_freq) { unsigned int i; - struct tegra_actmon_emc_ratio *ratio = actmon_emc_ratios; + const struct tegra_actmon_emc_ratio *ratio = actmon_emc_ratios; for (i = 0; i < ARRAY_SIZE(actmon_emc_ratios); i++, ratio++) { if (cpu_freq >= ratio->cpu_freq) { From 9cff2177789f2cf87432be76620c13ba9c632b68 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:10 +0300 Subject: [PATCH 56/83] PM / devfreq: tegra30: Include appropriate header It's not very correct to include mod_devicetable.h for the OF device drivers and of_device.h should be included instead. Reviewed-by: Chanwoo Choi Tested-by: Peter Geis Signed-off-by: Dmitry Osipenko Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index 9bd4dd982927..7c8126e74750 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -13,7 +13,7 @@ #include #include #include -#include +#include #include #include #include From 333abefb281218ab965e34932feecb1be064d535 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:11 +0300 Subject: [PATCH 57/83] PM / devfreq: tegra30: Don't enable already enabled consecutive interrupts Consecutive up/down interrupt-bit is set in the interrupt status register only if that interrupt was previously enabled. Thus enabling the already enabled interrupt doesn't do much for us. Reviewed-by: Chanwoo Choi Signed-off-by: Dmitry Osipenko Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index 7c8126e74750..4a5d513904a2 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -261,8 +261,6 @@ static void actmon_isr_device(struct tegra_devfreq *tegra, if (dev->boost_freq >= tegra->max_freq) dev->boost_freq = tegra->max_freq; - else - dev_ctrl |= ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN; } else if (intr_status & ACTMON_DEV_INTR_CONSECUTIVE_LOWER) { /* * new_boost = old_boost * down_coef @@ -275,8 +273,6 @@ static void actmon_isr_device(struct tegra_devfreq *tegra, if (dev->boost_freq < (ACTMON_BOOST_FREQ_STEP >> 1)) dev->boost_freq = 0; - else - dev_ctrl |= ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN; } if (dev->config->avg_dependency_threshold) { From 88ec816446fab2012c73c7096fb8b1924fe6837c Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:12 +0300 Subject: [PATCH 58/83] PM / devfreq: tegra30: Disable consecutive interrupts when appropriate Consecutive interrupts should be disabled when boosting is completed. Currently the disabling of "lower" interrupt happens only for MCCPU monitor that uses dependency threshold, but even in a case of MCCPU the interrupt isn't getting disabled if CPU's activity is above the threshold. This results in a lot of dummy interrupt requests. The boosting feature is used by both MCCPU and MCALL, boosting should be stopped once it reaches 0 for both of the monitors and regardless of the activity level. The boosting stops to grow once the maximum limit is hit and thus the "upper" interrupt needs to be disabled when the limit is reached. Reviewed-by: Chanwoo Choi Signed-off-by: Dmitry Osipenko Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index 4a5d513904a2..852bde6249c7 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -259,8 +259,10 @@ static void actmon_isr_device(struct tegra_devfreq *tegra, dev_ctrl |= ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN; - if (dev->boost_freq >= tegra->max_freq) + if (dev->boost_freq >= tegra->max_freq) { + dev_ctrl &= ~ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN; dev->boost_freq = tegra->max_freq; + } } else if (intr_status & ACTMON_DEV_INTR_CONSECUTIVE_LOWER) { /* * new_boost = old_boost * down_coef @@ -271,15 +273,10 @@ static void actmon_isr_device(struct tegra_devfreq *tegra, dev_ctrl |= ACTMON_DEV_CTRL_CONSECUTIVE_ABOVE_WMARK_EN; - if (dev->boost_freq < (ACTMON_BOOST_FREQ_STEP >> 1)) - dev->boost_freq = 0; - } - - if (dev->config->avg_dependency_threshold) { - if (dev->avg_count >= dev->config->avg_dependency_threshold) - dev_ctrl |= ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN; - else if (dev->boost_freq == 0) + if (dev->boost_freq < (ACTMON_BOOST_FREQ_STEP >> 1)) { dev_ctrl &= ~ACTMON_DEV_CTRL_CONSECUTIVE_BELOW_WMARK_EN; + dev->boost_freq = 0; + } } device_writel(dev, dev_ctrl, ACTMON_DEV_CTRL); From 28615e37be96877e5bb3559f566e50a291cf7a05 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:13 +0300 Subject: [PATCH 59/83] PM / devfreq: tegra30: Use kHz units for dependency threshold The dependency threshold designates a memory activity level below which CPU's frequency isn't accounted. Currently the threshold is given in "memory cycle" units and that value depends on the polling interval which is fixed to 12ms in the driver. Later on we'd want to add support for a variable polling interval and thus the threshold value either needs to be scaled in accordance to the polling interval or it needs to be represented in a units that do not depend on the polling interval. It is nicer to have threshold value being defined independently of the polling interval, thus this patch converts the dependency threshold units from "cycle" to "kHz". Having this change as a separate-preparatory patch will make easier to follow further patches. Signed-off-by: Dmitry Osipenko Reviewed-by: Chanwoo Choi Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 32 +++++++++++++++++-------------- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index 852bde6249c7..3bd920829bfd 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -96,9 +96,10 @@ struct tegra_devfreq_device_config { unsigned int boost_down_threshold; /* - * Threshold of activity (cycles) below which the CPU frequency isn't - * to be taken into account. This is to avoid increasing the EMC - * frequency when the CPU is very busy but not accessing the bus often. + * Threshold of activity (cycles translated to kHz) below which the + * CPU frequency isn't to be taken into account. This is to avoid + * increasing the EMC frequency when the CPU is very busy but not + * accessing the bus often. */ u32 avg_dependency_threshold; }; @@ -126,7 +127,7 @@ static const struct tegra_devfreq_device_config actmon_device_configs[] = { .boost_down_coeff = 90, .boost_up_threshold = 27, .boost_down_threshold = 10, - .avg_dependency_threshold = 50000, + .avg_dependency_threshold = 16000, /* 16MHz in kHz units */ }, }; @@ -311,7 +312,6 @@ static unsigned long actmon_device_target_freq(struct tegra_devfreq *tegra, target_freq = dev->avg_count / ACTMON_SAMPLING_PERIOD; avg_sustain_coef = 100 * 100 / dev->config->boost_up_threshold; target_freq = do_percent(target_freq, avg_sustain_coef); - target_freq += dev->boost_freq; return target_freq; } @@ -322,15 +322,18 @@ static void actmon_update_target(struct tegra_devfreq *tegra, unsigned long cpu_freq = 0; unsigned long static_cpu_emc_freq = 0; - if (dev->config->avg_dependency_threshold) { - cpu_freq = cpufreq_quick_get(0); - static_cpu_emc_freq = actmon_cpu_to_emc_rate(tegra, cpu_freq); - } - dev->target_freq = actmon_device_target_freq(tegra, dev); - if (dev->avg_count >= dev->config->avg_dependency_threshold) + if (dev->config->avg_dependency_threshold && + dev->config->avg_dependency_threshold <= dev->target_freq) { + cpu_freq = cpufreq_quick_get(0); + static_cpu_emc_freq = actmon_cpu_to_emc_rate(tegra, cpu_freq); + + dev->target_freq += dev->boost_freq; dev->target_freq = max(dev->target_freq, static_cpu_emc_freq); + } else { + dev->target_freq += dev->boost_freq; + } } static irqreturn_t actmon_thread_isr(int irq, void *data) @@ -396,15 +399,16 @@ static unsigned long tegra_actmon_cpufreq_contribution(struct tegra_devfreq *tegra, unsigned int cpu_freq) { + struct tegra_devfreq_device *actmon_dev = &tegra->devices[MCCPU]; unsigned long static_cpu_emc_freq, dev_freq; + dev_freq = actmon_device_target_freq(tegra, actmon_dev); + /* check whether CPU's freq is taken into account at all */ - if (tegra->devices[MCCPU].avg_count < - tegra->devices[MCCPU].config->avg_dependency_threshold) + if (dev_freq < actmon_dev->config->avg_dependency_threshold) return 0; static_cpu_emc_freq = actmon_cpu_to_emc_rate(tegra, cpu_freq); - dev_freq = actmon_device_target_freq(tegra, &tegra->devices[MCCPU]); if (dev_freq >= static_cpu_emc_freq) return 0; From 5c0f6c79595760c9e366c3517314051af530e3e6 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:14 +0300 Subject: [PATCH 60/83] PM / devfreq: Add new interrupt_driven flag for governors Currently interrupt-driven governors (like NVIDIA Tegra30 ACTMON governor) are used to set polling_ms=0 in order to avoid periodic polling of device status by devfreq core. This means that polling interval can't be changed by userspace for such governors. The new governor flag allows interrupt-driven governors to convey that devfreq core shouldn't perform polling of device status and thus generic devfreq polling interval could be supported by these governors now. Signed-off-by: Dmitry Osipenko Reviewed-by: Chanwoo Choi Signed-off-by: Chanwoo Choi --- drivers/devfreq/devfreq.c | 17 +++++++++++++++++ drivers/devfreq/governor.h | 3 +++ 2 files changed, 20 insertions(+) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index d6c3dce9e9d5..f840e61e5a27 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -410,6 +410,9 @@ static void devfreq_monitor(struct work_struct *work) */ void devfreq_monitor_start(struct devfreq *devfreq) { + if (devfreq->governor->interrupt_driven) + return; + INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor); if (devfreq->profile->polling_ms) queue_delayed_work(devfreq_wq, &devfreq->work, @@ -427,6 +430,9 @@ EXPORT_SYMBOL(devfreq_monitor_start); */ void devfreq_monitor_stop(struct devfreq *devfreq) { + if (devfreq->governor->interrupt_driven) + return; + cancel_delayed_work_sync(&devfreq->work); } EXPORT_SYMBOL(devfreq_monitor_stop); @@ -454,6 +460,10 @@ void devfreq_monitor_suspend(struct devfreq *devfreq) devfreq_update_status(devfreq, devfreq->previous_freq); devfreq->stop_polling = true; mutex_unlock(&devfreq->lock); + + if (devfreq->governor->interrupt_driven) + return; + cancel_delayed_work_sync(&devfreq->work); } EXPORT_SYMBOL(devfreq_monitor_suspend); @@ -474,11 +484,15 @@ void devfreq_monitor_resume(struct devfreq *devfreq) if (!devfreq->stop_polling) goto out; + if (devfreq->governor->interrupt_driven) + goto out_update; + if (!delayed_work_pending(&devfreq->work) && devfreq->profile->polling_ms) queue_delayed_work(devfreq_wq, &devfreq->work, msecs_to_jiffies(devfreq->profile->polling_ms)); +out_update: devfreq->last_stat_updated = jiffies; devfreq->stop_polling = false; @@ -510,6 +524,9 @@ void devfreq_interval_update(struct devfreq *devfreq, unsigned int *delay) if (devfreq->stop_polling) goto out; + if (devfreq->governor->interrupt_driven) + goto out; + /* if new delay is zero, stop polling */ if (!new_delay) { mutex_unlock(&devfreq->lock); diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h index bbe5ff9fcecf..dc7533ccc3db 100644 --- a/drivers/devfreq/governor.h +++ b/drivers/devfreq/governor.h @@ -31,6 +31,8 @@ * @name: Governor's name * @immutable: Immutable flag for governor. If the value is 1, * this govenror is never changeable to other governor. + * @interrupt_driven: Devfreq core won't schedule polling work for this + * governor if value is set to 1. * @get_target_freq: Returns desired operating frequency for the device. * Basically, get_target_freq will run * devfreq_dev_profile.get_dev_status() to get the @@ -49,6 +51,7 @@ struct devfreq_governor { const char name[DEVFREQ_NAME_LEN]; const unsigned int immutable; + const unsigned int interrupt_driven; int (*get_target_freq)(struct devfreq *this, unsigned long *freq); int (*event_handler)(struct devfreq *devfreq, unsigned int event, void *data); From f61ee201068ac844cc8ed7f9112e47f3452f8262 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:15 +0300 Subject: [PATCH 61/83] PM / devfreq: tegra30: Support variable polling interval The ACTMON governor is interrupt-driven and currently hardware's polling interval is fixed to 16ms in the driver. Devfreq supports variable polling interval by the generic governors, let's re-use the generic interface for changing of the polling interval. Now the polling interval can be changed dynamically via /sys/class/devfreq/devfreq0/polling_interval. Signed-off-by: Dmitry Osipenko Reviewed-by: Chanwoo Choi Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 78 ++++++++++++++++++++++++++----- 1 file changed, 66 insertions(+), 12 deletions(-) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index 3bd920829bfd..e44f1a48f838 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -171,6 +171,8 @@ struct tegra_devfreq { struct tegra_devfreq_device devices[ARRAY_SIZE(actmon_device_configs)]; unsigned int irq; + + bool started; }; struct tegra_actmon_emc_ratio { @@ -209,18 +211,26 @@ static void device_writel(struct tegra_devfreq_device *dev, u32 val, writel_relaxed(val, dev->regs + offset); } -static unsigned long do_percent(unsigned long val, unsigned int pct) +static unsigned long do_percent(unsigned long long val, unsigned int pct) { - return val * pct / 100; + val = val * pct; + do_div(val, 100); + + /* + * High freq + high boosting percent + large polling interval are + * resulting in integer overflow when watermarks are calculated. + */ + return min_t(u64, val, U32_MAX); } static void tegra_devfreq_update_avg_wmark(struct tegra_devfreq *tegra, struct tegra_devfreq_device *dev) { - u32 avg = dev->avg_count; u32 avg_band_freq = tegra->max_freq * ACTMON_DEFAULT_AVG_BAND / KHZ; - u32 band = avg_band_freq * ACTMON_SAMPLING_PERIOD; + u32 band = avg_band_freq * tegra->devfreq->profile->polling_ms; + u32 avg; + avg = min(dev->avg_count, U32_MAX - band); device_writel(dev, avg + band, ACTMON_DEV_AVG_UPPER_WMARK); avg = max(dev->avg_count, band); @@ -230,7 +240,7 @@ static void tegra_devfreq_update_avg_wmark(struct tegra_devfreq *tegra, static void tegra_devfreq_update_wmark(struct tegra_devfreq *tegra, struct tegra_devfreq_device *dev) { - u32 val = tegra->cur_freq * ACTMON_SAMPLING_PERIOD; + u32 val = tegra->cur_freq * tegra->devfreq->profile->polling_ms; device_writel(dev, do_percent(val, dev->config->boost_up_threshold), ACTMON_DEV_UPPER_WMARK); @@ -309,7 +319,7 @@ static unsigned long actmon_device_target_freq(struct tegra_devfreq *tegra, unsigned int avg_sustain_coef; unsigned long target_freq; - target_freq = dev->avg_count / ACTMON_SAMPLING_PERIOD; + target_freq = dev->avg_count / tegra->devfreq->profile->polling_ms; avg_sustain_coef = 100 * 100 / dev->config->boost_up_threshold; target_freq = do_percent(target_freq, avg_sustain_coef); @@ -469,7 +479,7 @@ static void tegra_actmon_configure_device(struct tegra_devfreq *tegra, dev->target_freq = tegra->cur_freq; - dev->avg_count = tegra->cur_freq * ACTMON_SAMPLING_PERIOD; + dev->avg_count = tegra->cur_freq * tegra->devfreq->profile->polling_ms; device_writel(dev, dev->avg_count, ACTMON_DEV_INIT_AVG); tegra_devfreq_update_avg_wmark(tegra, dev); @@ -505,12 +515,15 @@ static void tegra_actmon_stop_devices(struct tegra_devfreq *tegra) } } -static int tegra_actmon_start(struct tegra_devfreq *tegra) +static int tegra_actmon_resume(struct tegra_devfreq *tegra) { unsigned int i; int err; - actmon_writel(tegra, ACTMON_SAMPLING_PERIOD - 1, + if (!tegra->devfreq->profile->polling_ms || !tegra->started) + return 0; + + actmon_writel(tegra, tegra->devfreq->profile->polling_ms - 1, ACTMON_GLB_PERIOD_CTRL); /* @@ -558,8 +571,26 @@ static int tegra_actmon_start(struct tegra_devfreq *tegra) return err; } -static void tegra_actmon_stop(struct tegra_devfreq *tegra) +static int tegra_actmon_start(struct tegra_devfreq *tegra) { + int ret = 0; + + if (!tegra->started) { + tegra->started = true; + + ret = tegra_actmon_resume(tegra); + if (ret) + tegra->started = false; + } + + return ret; +} + +static void tegra_actmon_pause(struct tegra_devfreq *tegra) +{ + if (!tegra->devfreq->profile->polling_ms || !tegra->started) + return; + disable_irq(tegra->irq); cpufreq_unregister_notifier(&tegra->cpu_rate_change_nb, @@ -572,6 +603,12 @@ static void tegra_actmon_stop(struct tegra_devfreq *tegra) clk_notifier_unregister(tegra->emc_clock, &tegra->clk_rate_change_nb); } +static void tegra_actmon_stop(struct tegra_devfreq *tegra) +{ + tegra_actmon_pause(tegra); + tegra->started = false; +} + static int tegra_devfreq_target(struct device *dev, unsigned long *freq, u32 flags) { @@ -629,7 +666,7 @@ static int tegra_devfreq_get_dev_status(struct device *dev, stat->busy_time *= 100 / BUS_SATURATION_RATIO; /* Number of cycles in a sampling period */ - stat->total_time = ACTMON_SAMPLING_PERIOD * cur_freq; + stat->total_time = tegra->devfreq->profile->polling_ms * cur_freq; stat->busy_time = min(stat->busy_time, stat->total_time); @@ -637,7 +674,7 @@ static int tegra_devfreq_get_dev_status(struct device *dev, } static struct devfreq_dev_profile tegra_devfreq_profile = { - .polling_ms = 0, + .polling_ms = ACTMON_SAMPLING_PERIOD, .target = tegra_devfreq_target, .get_dev_status = tegra_devfreq_get_dev_status, }; @@ -677,6 +714,7 @@ static int tegra_governor_event_handler(struct devfreq *devfreq, unsigned int event, void *data) { struct tegra_devfreq *tegra = dev_get_drvdata(devfreq->dev.parent); + unsigned int *new_delay = data; int ret = 0; /* @@ -696,6 +734,21 @@ static int tegra_governor_event_handler(struct devfreq *devfreq, devfreq_monitor_stop(devfreq); break; + case DEVFREQ_GOV_INTERVAL: + /* + * ACTMON hardware supports up to 256 milliseconds for the + * sampling period. + */ + if (*new_delay > 256) { + ret = -EINVAL; + break; + } + + tegra_actmon_pause(tegra); + devfreq_interval_update(devfreq, new_delay); + ret = tegra_actmon_resume(tegra); + break; + case DEVFREQ_GOV_SUSPEND: tegra_actmon_stop(tegra); devfreq_monitor_suspend(devfreq); @@ -715,6 +768,7 @@ static struct devfreq_governor tegra_devfreq_governor = { .get_target_freq = tegra_governor_get_target, .event_handler = tegra_governor_event_handler, .immutable = true, + .interrupt_driven = true, }; static int tegra_devfreq_probe(struct platform_device *pdev) From fee22854c0273569836de2039d9c432ea4df2cfc Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 5 Nov 2019 00:56:16 +0300 Subject: [PATCH 62/83] PM / devfreq: tegra30: Tune up MCCPU boost-down coefficient MCCPU boosts up very aggressively by 800% and boosts down very mildly by 10%. This doesn't work well when system is idling because the very slow de-boosting results in lots of consecutive-down interrupts, in result memory stays clocked high and CPU doesn't enter deepest idling state instead of keeping memory at lowest freq and having CPU cluster turned off. A more faster de-boosting fixes the case of idling system and doesn't affect the case of an active system. Reviewed-by: Chanwoo Choi Signed-off-by: Dmitry Osipenko Signed-off-by: Chanwoo Choi --- drivers/devfreq/tegra30-devfreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/devfreq/tegra30-devfreq.c b/drivers/devfreq/tegra30-devfreq.c index e44f1a48f838..0b65f89d74d5 100644 --- a/drivers/devfreq/tegra30-devfreq.c +++ b/drivers/devfreq/tegra30-devfreq.c @@ -124,7 +124,7 @@ static const struct tegra_devfreq_device_config actmon_device_configs[] = { .offset = 0x200, .irq_mask = 1 << 25, .boost_up_coeff = 800, - .boost_down_coeff = 90, + .boost_down_coeff = 40, .boost_up_threshold = 27, .boost_down_threshold = 10, .avg_dependency_threshold = 16000, /* 16MHz in kHz units */ From 99e98d3fb1008ef7416e16a1fd355cb73a253502 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 4 Nov 2019 12:16:17 +0100 Subject: [PATCH 63/83] cpuidle: Consolidate disabled state checks There are two reasons why CPU idle states may be disabled: either because the driver has disabled them or because they have been disabled by user space via sysfs. In the former case, the state's "disabled" flag is set once during the initialization of the driver and it is never cleared later (it is read-only effectively). In the latter case, the "disable" field of the given state's cpuidle_state_usage struct is set and it may be changed via sysfs. Thus checking whether or not an idle state has been disabled involves reading these two flags every time. In order to avoid the additional check of the state's "disabled" flag (which is effectively read-only anyway), use the value of it at the init time to set a (new) flag in the "disable" field of that state's cpuidle_state_usage structure and use the sysfs interface to manipulate another (new) flag in it. This way the state is disabled whenever the "disable" field of its cpuidle_state_usage structure is nonzero, whatever the reason, and it is the only place to look into to check whether or not the state has been disabled. Signed-off-by: Rafael J. Wysocki Acked-by: Daniel Lezcano Acked-by: Peter Zijlstra (Intel) --- drivers/cpuidle/cpuidle-powernv.c | 7 ++-- drivers/cpuidle/cpuidle.c | 24 +++++++------- drivers/cpuidle/governors/ladder.c | 4 +-- drivers/cpuidle/governors/menu.c | 8 ++--- drivers/cpuidle/governors/teo.c | 5 ++- drivers/cpuidle/sysfs.c | 51 ++++++++++++++++++------------ include/linux/cpuidle.h | 3 ++ 7 files changed, 54 insertions(+), 48 deletions(-) diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c index 84b1ebe212b3..1b299e801f74 100644 --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -56,13 +56,10 @@ static u64 get_snooze_timeout(struct cpuidle_device *dev, return default_snooze_timeout; for (i = index + 1; i < drv->state_count; i++) { - struct cpuidle_state *s = &drv->states[i]; - struct cpuidle_state_usage *su = &dev->states_usage[i]; - - if (s->disabled || su->disable) + if (dev->states_usage[i].disable) continue; - return s->target_residency * tb_ticks_per_usec; + return drv->states[i].target_residency * tb_ticks_per_usec; } return default_snooze_timeout; diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 0895b988fa92..44ae39f2b47a 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -84,12 +84,12 @@ static int find_deepest_state(struct cpuidle_driver *drv, for (i = 1; i < drv->state_count; i++) { struct cpuidle_state *s = &drv->states[i]; - struct cpuidle_state_usage *su = &dev->states_usage[i]; - if (s->disabled || su->disable || s->exit_latency <= latency_req - || s->exit_latency > max_latency - || (s->flags & forbidden_flags) - || (s2idle && !s->enter_s2idle)) + if (dev->states_usage[i].disable || + s->exit_latency <= latency_req || + s->exit_latency > max_latency || + (s->flags & forbidden_flags) || + (s2idle && !s->enter_s2idle)) continue; latency_req = s->exit_latency; @@ -265,8 +265,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, if (diff < drv->states[entered_state].target_residency) { for (i = entered_state - 1; i >= 0; i--) { - if (drv->states[i].disabled || - dev->states_usage[i].disable) + if (dev->states_usage[i].disable) continue; /* Shallower states are enabled, so update. */ @@ -275,8 +274,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, } } else if (diff > delay) { for (i = entered_state + 1; i < drv->state_count; i++) { - if (drv->states[i].disabled || - dev->states_usage[i].disable) + if (dev->states_usage[i].disable) continue; /* @@ -380,7 +378,7 @@ u64 cpuidle_poll_time(struct cpuidle_driver *drv, limit_ns = TICK_NSEC; for (i = 1; i < drv->state_count; i++) { - if (drv->states[i].disabled || dev->states_usage[i].disable) + if (dev->states_usage[i].disable) continue; limit_ns = (u64)drv->states[i].target_residency * NSEC_PER_USEC; @@ -567,12 +565,16 @@ static void __cpuidle_device_init(struct cpuidle_device *dev) */ static int __cpuidle_register_device(struct cpuidle_device *dev) { - int ret; struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev); + int i, ret; if (!try_module_get(drv->owner)) return -EINVAL; + for (i = 0; i < drv->state_count; i++) + if (drv->states[i].disabled) + dev->states_usage[i].disable |= CPUIDLE_STATE_DISABLED_BY_DRIVER; + per_cpu(cpuidle_devices, dev->cpu) = dev; list_add(&dev->device_list, &cpuidle_detected_devices); diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c index 428eeb832fe7..b0126b8c32fe 100644 --- a/drivers/cpuidle/governors/ladder.c +++ b/drivers/cpuidle/governors/ladder.c @@ -84,7 +84,6 @@ static int ladder_select_state(struct cpuidle_driver *drv, /* consider promotion */ if (last_idx < drv->state_count - 1 && - !drv->states[last_idx + 1].disabled && !dev->states_usage[last_idx + 1].disable && last_residency > last_state->threshold.promotion_time && drv->states[last_idx + 1].exit_latency <= latency_req) { @@ -98,8 +97,7 @@ static int ladder_select_state(struct cpuidle_driver *drv, /* consider demotion */ if (last_idx > first_idx && - (drv->states[last_idx].disabled || - dev->states_usage[last_idx].disable || + (dev->states_usage[last_idx].disable || drv->states[last_idx].exit_latency > latency_req)) { int i; diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index e5a5d0c8d66b..38b2b72102a8 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -298,7 +298,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, if (unlikely(drv->state_count <= 1 || latency_req == 0) || ((data->next_timer_us < drv->states[1].target_residency || latency_req < drv->states[1].exit_latency) && - !drv->states[0].disabled && !dev->states_usage[0].disable)) { + !dev->states_usage[0].disable)) { /* * In this case state[0] will be used no matter what, so return * it right away and keep the tick running if state[0] is a @@ -349,9 +349,8 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, idx = -1; for (i = 0; i < drv->state_count; i++) { struct cpuidle_state *s = &drv->states[i]; - struct cpuidle_state_usage *su = &dev->states_usage[i]; - if (s->disabled || su->disable) + if (dev->states_usage[i].disable) continue; if (idx == -1) @@ -422,8 +421,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * tick, so try to correct that. */ for (i = idx - 1; i >= 0; i--) { - if (drv->states[i].disabled || - dev->states_usage[i].disable) + if (dev->states_usage[i].disable) continue; idx = i; diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c index b9b9156618e6..702d560eb347 100644 --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -212,7 +212,7 @@ static int teo_find_shallower_state(struct cpuidle_driver *drv, int i; for (i = state_idx - 1; i >= 0; i--) { - if (drv->states[i].disabled || dev->states_usage[i].disable) + if (dev->states_usage[i].disable) continue; state_idx = i; @@ -256,9 +256,8 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, for (i = 0; i < drv->state_count; i++) { struct cpuidle_state *s = &drv->states[i]; - struct cpuidle_state_usage *su = &dev->states_usage[i]; - if (s->disabled || su->disable) { + if (dev->states_usage[i].disable) { /* * Ignore disabled states with target residencies beyond * the anticipated idle duration. diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c index 2bb2683b493c..9f3755ac8f87 100644 --- a/drivers/cpuidle/sysfs.c +++ b/drivers/cpuidle/sysfs.c @@ -255,25 +255,6 @@ static ssize_t show_state_##_name(struct cpuidle_state *state, \ return sprintf(buf, "%u\n", state->_name);\ } -#define define_store_state_ull_function(_name) \ -static ssize_t store_state_##_name(struct cpuidle_state *state, \ - struct cpuidle_state_usage *state_usage, \ - const char *buf, size_t size) \ -{ \ - unsigned long long value; \ - int err; \ - if (!capable(CAP_SYS_ADMIN)) \ - return -EPERM; \ - err = kstrtoull(buf, 0, &value); \ - if (err) \ - return err; \ - if (value) \ - state_usage->_name = 1; \ - else \ - state_usage->_name = 0; \ - return size; \ -} - #define define_show_state_ull_function(_name) \ static ssize_t show_state_##_name(struct cpuidle_state *state, \ struct cpuidle_state_usage *state_usage, \ @@ -299,11 +280,39 @@ define_show_state_ull_function(usage) define_show_state_ull_function(time) define_show_state_str_function(name) define_show_state_str_function(desc) -define_show_state_ull_function(disable) -define_store_state_ull_function(disable) define_show_state_ull_function(above) define_show_state_ull_function(below) +static ssize_t show_state_disable(struct cpuidle_state *state, + struct cpuidle_state_usage *state_usage, + char *buf) +{ + return sprintf(buf, "%llu\n", + state_usage->disable & CPUIDLE_STATE_DISABLED_BY_USER); +} + +static ssize_t store_state_disable(struct cpuidle_state *state, + struct cpuidle_state_usage *state_usage, + const char *buf, size_t size) +{ + unsigned int value; + int err; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + err = kstrtouint(buf, 0, &value); + if (err) + return err; + + if (value) + state_usage->disable |= CPUIDLE_STATE_DISABLED_BY_USER; + else + state_usage->disable &= ~CPUIDLE_STATE_DISABLED_BY_USER; + + return size; +} + define_one_state_ro(name, show_state_name); define_one_state_ro(desc, show_state_desc); define_one_state_ro(latency, show_state_exit_latency); diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index 4b6b5bea8f79..d23a3b1ddcf6 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -29,6 +29,9 @@ struct cpuidle_driver; * CPUIDLE DEVICE INTERFACE * ****************************/ +#define CPUIDLE_STATE_DISABLED_BY_USER BIT(0) +#define CPUIDLE_STATE_DISABLED_BY_DRIVER BIT(1) + struct cpuidle_state_usage { unsigned long long disable; unsigned long long usage; From 8d2eecead5bf23865cb73062a4e7139a9dbce5a1 Mon Sep 17 00:00:00 2001 From: Jamal Shareef Date: Mon, 4 Nov 2019 21:54:27 -0800 Subject: [PATCH 64/83] cpufreq: intel_pstate: Fix plain int as pointer warning from sparse Fix sparse warning: Using plain integer as NULL pointer. Replace assignment of 0 to pointers with NULL assignment. Signed-off-by: Jamal Shareef Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/intel_pstate.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 53a51c169451..cfcf34e04c3d 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -2664,21 +2664,21 @@ enum { /* Hardware vendor-specific info that has its own power management modes */ static struct acpi_platform_list plat_info[] __initdata = { - {"HP ", "ProLiant", 0, ACPI_SIG_FADT, all_versions, 0, PSS}, - {"ORACLE", "X4-2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, - {"ORACLE", "X4-2L ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, - {"ORACLE", "X4-2B ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, - {"ORACLE", "X3-2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, - {"ORACLE", "X3-2L ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, - {"ORACLE", "X3-2B ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, - {"ORACLE", "X4470M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, - {"ORACLE", "X4270M3 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, - {"ORACLE", "X4270M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, - {"ORACLE", "X4170M2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, - {"ORACLE", "X4170 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, - {"ORACLE", "X4275 M3", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, - {"ORACLE", "X6-2 ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, - {"ORACLE", "Sudbury ", 0, ACPI_SIG_FADT, all_versions, 0, PPC}, + {"HP ", "ProLiant", 0, ACPI_SIG_FADT, all_versions, NULL, PSS}, + {"ORACLE", "X4-2 ", 0, ACPI_SIG_FADT, all_versions, NULL, PPC}, + {"ORACLE", "X4-2L ", 0, ACPI_SIG_FADT, all_versions, NULL, PPC}, + {"ORACLE", "X4-2B ", 0, ACPI_SIG_FADT, all_versions, NULL, PPC}, + {"ORACLE", "X3-2 ", 0, ACPI_SIG_FADT, all_versions, NULL, PPC}, + {"ORACLE", "X3-2L ", 0, ACPI_SIG_FADT, all_versions, NULL, PPC}, + {"ORACLE", "X3-2B ", 0, ACPI_SIG_FADT, all_versions, NULL, PPC}, + {"ORACLE", "X4470M2 ", 0, ACPI_SIG_FADT, all_versions, NULL, PPC}, + {"ORACLE", "X4270M3 ", 0, ACPI_SIG_FADT, all_versions, NULL, PPC}, + {"ORACLE", "X4270M2 ", 0, ACPI_SIG_FADT, all_versions, NULL, PPC}, + {"ORACLE", "X4170M2 ", 0, ACPI_SIG_FADT, all_versions, NULL, PPC}, + {"ORACLE", "X4170 M3", 0, ACPI_SIG_FADT, all_versions, NULL, PPC}, + {"ORACLE", "X4275 M3", 0, ACPI_SIG_FADT, all_versions, NULL, PPC}, + {"ORACLE", "X6-2 ", 0, ACPI_SIG_FADT, all_versions, NULL, PPC}, + {"ORACLE", "Sudbury ", 0, ACPI_SIG_FADT, all_versions, NULL, PPC}, { } /* End */ }; From e6e8df07268c1f75dd9215536e2ce4587b70f977 Mon Sep 17 00:00:00 2001 From: Kai Shen Date: Thu, 7 Nov 2019 05:08:17 +0000 Subject: [PATCH 65/83] cpufreq: Add NULL checks to show() and store() methods of cpufreq Add NULL checks to show() and store() in cpufreq.c to avoid attempts to invoke a NULL callback. Though some interfaces of cpufreq are set as read-only, users can still get write permission using chmod which can lead to a kernel crash, as follows: chmod +w /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq echo 1 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq This bug was found in linux 4.19. Signed-off-by: Kai Shen Reported-by: Feilong Lin Reviewed-by: Feilong Lin Acked-by: Viresh Kumar [ rjw: Subject & changelog ] Cc: All applicable Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cpufreq.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index dd1628192310..0a10cf9d0b1a 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -933,6 +933,9 @@ static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buf) struct freq_attr *fattr = to_attr(attr); ssize_t ret; + if (!fattr->show) + return -EIO; + down_read(&policy->rwsem); ret = fattr->show(policy, buf); up_read(&policy->rwsem); @@ -947,6 +950,9 @@ static ssize_t store(struct kobject *kobj, struct attribute *attr, struct freq_attr *fattr = to_attr(attr); ssize_t ret = -EINVAL; + if (!fattr->store) + return -EIO; + /* * cpus_read_trylock() is used here to work around a circular lock * dependency problem with respect to the cpufreq_register_driver(). From ea0d11c9dd95d685fe94299847446e6ad9594c39 Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Wed, 16 Oct 2019 16:16:27 +0200 Subject: [PATCH 66/83] PM / core: Clean up some function headers in power.h The power.h is a bit messy due to the various existing CONFIG_PM* Kconfig combinations. However the final section for wakeup_source_sysfs*() can be moved inside one of the existing sections rather than adding yet another one, so let's do that to clean up the code a little bit. Signed-off-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/base/power/power.h | 30 ++++++++++++------------------ 1 file changed, 12 insertions(+), 18 deletions(-) diff --git a/drivers/base/power/power.h b/drivers/base/power/power.h index 39a06a0cfdaa..444f5c169a0b 100644 --- a/drivers/base/power/power.h +++ b/drivers/base/power/power.h @@ -117,6 +117,13 @@ static inline bool device_pm_initialized(struct device *dev) return dev->power.in_dpm_list; } +/* drivers/base/power/wakeup_stats.c */ +extern int wakeup_source_sysfs_add(struct device *parent, + struct wakeup_source *ws); +extern void wakeup_source_sysfs_remove(struct wakeup_source *ws); + +extern int pm_wakeup_source_sysfs_add(struct device *parent); + #else /* !CONFIG_PM_SLEEP */ static inline void device_pm_sleep_init(struct device *dev) {} @@ -141,6 +148,11 @@ static inline bool device_pm_initialized(struct device *dev) return device_is_registered(dev); } +static inline int pm_wakeup_source_sysfs_add(struct device *parent) +{ + return 0; +} + #endif /* !CONFIG_PM_SLEEP */ static inline void device_pm_init(struct device *dev) @@ -149,21 +161,3 @@ static inline void device_pm_init(struct device *dev) device_pm_sleep_init(dev); pm_runtime_init(dev); } - -#ifdef CONFIG_PM_SLEEP - -/* drivers/base/power/wakeup_stats.c */ -extern int wakeup_source_sysfs_add(struct device *parent, - struct wakeup_source *ws); -extern void wakeup_source_sysfs_remove(struct wakeup_source *ws); - -extern int pm_wakeup_source_sysfs_add(struct device *parent); - -#else /* !CONFIG_PM_SLEEP */ - -static inline int pm_wakeup_source_sysfs_add(struct device *parent) -{ - return 0; -} - -#endif /* CONFIG_PM_SLEEP */ From 25cb20a212a1f989385dfe23230817e69c62bee5 Mon Sep 17 00:00:00 2001 From: Stephen Boyd Date: Wed, 16 Oct 2019 16:57:53 +0200 Subject: [PATCH 67/83] PM / OPP: Support adjusting OPP voltages at runtime On some SoCs the Adaptive Voltage Scaling (AVS) technique is employed to optimize the operating voltage of a device. At a given frequency, the hardware monitors dynamic factors and either makes a suggestion for how much to adjust a voltage for the current frequency, or it automatically adjusts the voltage without software intervention. Add an API to the OPP library for the former case, so that AVS type devices can update the voltages for an OPP when the hardware determines the voltage should change. The assumption is that drivers like CPUfreq or devfreq will register for the OPP notifiers and adjust the voltage according to suggestions that AVS makes. This patch is derived from [1] submitted by Stephen. [1] https://lore.kernel.org/patchwork/patch/599279/ Signed-off-by: Stephen Boyd [Roger Lu: Changed to rcu less implementation] Signed-off-by: Roger Lu [s.nawrocki@samsung.com: added handling of OPP min/max voltage] Signed-off-by: Sylwester Nawrocki Signed-off-by: Viresh Kumar --- drivers/opp/core.c | 69 ++++++++++++++++++++++++++++++++++++++++++ include/linux/pm_opp.h | 13 ++++++++ 2 files changed, 82 insertions(+) diff --git a/drivers/opp/core.c b/drivers/opp/core.c index 9ff0538ee83a..be7a7d332332 100644 --- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -2102,6 +2102,75 @@ static int _opp_set_availability(struct device *dev, unsigned long freq, return r; } +/** + * dev_pm_opp_adjust_voltage() - helper to change the voltage of an OPP + * @dev: device for which we do this operation + * @freq: OPP frequency to adjust voltage of + * @u_volt: new OPP target voltage + * @u_volt_min: new OPP min voltage + * @u_volt_max: new OPP max voltage + * + * Return: -EINVAL for bad pointers, -ENOMEM if no memory available for the + * copy operation, returns 0 if no modifcation was done OR modification was + * successful. + */ +int dev_pm_opp_adjust_voltage(struct device *dev, unsigned long freq, + unsigned long u_volt, unsigned long u_volt_min, + unsigned long u_volt_max) + +{ + struct opp_table *opp_table; + struct dev_pm_opp *tmp_opp, *opp = ERR_PTR(-ENODEV); + int r = 0; + + /* Find the opp_table */ + opp_table = _find_opp_table(dev); + if (IS_ERR(opp_table)) { + r = PTR_ERR(opp_table); + dev_warn(dev, "%s: Device OPP not found (%d)\n", __func__, r); + return r; + } + + mutex_lock(&opp_table->lock); + + /* Do we have the frequency? */ + list_for_each_entry(tmp_opp, &opp_table->opp_list, node) { + if (tmp_opp->rate == freq) { + opp = tmp_opp; + break; + } + } + + if (IS_ERR(opp)) { + r = PTR_ERR(opp); + goto adjust_unlock; + } + + /* Is update really needed? */ + if (opp->supplies->u_volt == u_volt) + goto adjust_unlock; + + opp->supplies->u_volt = u_volt; + opp->supplies->u_volt_min = u_volt_min; + opp->supplies->u_volt_max = u_volt_max; + + dev_pm_opp_get(opp); + mutex_unlock(&opp_table->lock); + + /* Notify the voltage change of the OPP */ + blocking_notifier_call_chain(&opp_table->head, OPP_EVENT_ADJUST_VOLTAGE, + opp); + + dev_pm_opp_put(opp); + goto adjust_put_table; + +adjust_unlock: + mutex_unlock(&opp_table->lock); +adjust_put_table: + dev_pm_opp_put_opp_table(opp_table); + return r; +} + /** * dev_pm_opp_enable() - Enable a specific OPP * @dev: device for which we do this operation diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index b8197ab014f2..747861816f4f 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -22,6 +22,7 @@ struct opp_table; enum dev_pm_opp_event { OPP_EVENT_ADD, OPP_EVENT_REMOVE, OPP_EVENT_ENABLE, OPP_EVENT_DISABLE, + OPP_EVENT_ADJUST_VOLTAGE, }; /** @@ -113,6 +114,10 @@ int dev_pm_opp_add(struct device *dev, unsigned long freq, void dev_pm_opp_remove(struct device *dev, unsigned long freq); void dev_pm_opp_remove_all_dynamic(struct device *dev); +int dev_pm_opp_adjust_voltage(struct device *dev, unsigned long freq, + unsigned long u_volt, unsigned long u_volt_min, + unsigned long u_volt_max); + int dev_pm_opp_enable(struct device *dev, unsigned long freq); int dev_pm_opp_disable(struct device *dev, unsigned long freq); @@ -242,6 +247,14 @@ static inline void dev_pm_opp_remove_all_dynamic(struct device *dev) { } +static inline int +dev_pm_opp_adjust_voltage(struct device *dev, unsigned long freq, + unsigned long u_volt, unsigned long u_volt_min, + unsigned long u_volt_max) +{ + return 0; +} + static inline int dev_pm_opp_enable(struct device *dev, unsigned long freq) { return 0; From c1d51f684c72b5eb2aecbbd47be3a2977a2dc903 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 7 Nov 2019 15:25:12 +0100 Subject: [PATCH 68/83] cpuidle: Use nanoseconds as the unit of time Currently, the cpuidle subsystem uses microseconds as the unit of time which (among other things) causes the idle loop to incur some integer division overhead for no clear benefit. In order to allow cpuidle to measure time in nanoseconds, add two new fields, exit_latency_ns and target_residency_ns, to represent the exit latency and target residency of an idle state in nanoseconds, respectively, to struct cpuidle_state and initialize them with the help of the corresponding values in microseconds provided by drivers. Additionally, change cpuidle_governor_latency_req() to return the idle state exit latency constraint in nanoseconds. Also meeasure idle state residency (last_residency_ns in struct cpuidle_device and time_ns in struct cpuidle_driver) in nanoseconds and update the cpuidle core and governors accordingly. However, the menu governor still computes typical intervals in microseconds to avoid integer overflows. Signed-off-by: Rafael J. Wysocki Acked-by: Peter Zijlstra (Intel) Acked-by: Doug Smythies Tested-by: Doug Smythies --- drivers/cpuidle/cpuidle.c | 36 ++++---- drivers/cpuidle/driver.c | 29 +++++-- drivers/cpuidle/governor.c | 7 +- drivers/cpuidle/governors/haltpoll.c | 7 +- drivers/cpuidle/governors/ladder.c | 25 +++--- drivers/cpuidle/governors/menu.c | 123 ++++++++++++--------------- drivers/cpuidle/governors/teo.c | 76 ++++++++--------- drivers/cpuidle/poll_state.c | 2 + drivers/cpuidle/sysfs.c | 20 ++++- include/linux/cpuidle.h | 8 +- kernel/sched/idle.c | 2 +- 11 files changed, 174 insertions(+), 161 deletions(-) diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 44ae39f2b47a..bf9b030cd7e1 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -75,24 +75,24 @@ int cpuidle_play_dead(void) static int find_deepest_state(struct cpuidle_driver *drv, struct cpuidle_device *dev, - unsigned int max_latency, + u64 max_latency_ns, unsigned int forbidden_flags, bool s2idle) { - unsigned int latency_req = 0; + u64 latency_req = 0; int i, ret = 0; for (i = 1; i < drv->state_count; i++) { struct cpuidle_state *s = &drv->states[i]; if (dev->states_usage[i].disable || - s->exit_latency <= latency_req || - s->exit_latency > max_latency || + s->exit_latency_ns <= latency_req || + s->exit_latency_ns > max_latency_ns || (s->flags & forbidden_flags) || (s2idle && !s->enter_s2idle)) continue; - latency_req = s->exit_latency; + latency_req = s->exit_latency_ns; ret = i; } return ret; @@ -124,7 +124,7 @@ void cpuidle_use_deepest_state(bool enable) int cpuidle_find_deepest_state(struct cpuidle_driver *drv, struct cpuidle_device *dev) { - return find_deepest_state(drv, dev, UINT_MAX, 0, false); + return find_deepest_state(drv, dev, U64_MAX, 0, false); } #ifdef CONFIG_SUSPEND @@ -180,7 +180,7 @@ int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev) * that interrupts won't be enabled when it exits and allows the tick to * be frozen safely. */ - index = find_deepest_state(drv, dev, UINT_MAX, 0, true); + index = find_deepest_state(drv, dev, U64_MAX, 0, true); if (index > 0) enter_s2idle_proper(drv, dev, index); @@ -209,7 +209,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, * CPU as a broadcast timer, this call may fail if it is not available. */ if (broadcast && tick_broadcast_enter()) { - index = find_deepest_state(drv, dev, target_state->exit_latency, + index = find_deepest_state(drv, dev, target_state->exit_latency_ns, CPUIDLE_FLAG_TIMER_STOP, false); if (index < 0) { default_idle_call(); @@ -247,7 +247,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, local_irq_enable(); if (entered_state >= 0) { - s64 diff, delay = drv->states[entered_state].exit_latency; + s64 diff, delay = drv->states[entered_state].exit_latency_ns; int i; /* @@ -255,15 +255,13 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, * This can be moved to within driver enter routine, * but that results in multiple copies of same code. */ - diff = ktime_us_delta(time_end, time_start); - if (diff > INT_MAX) - diff = INT_MAX; + diff = ktime_sub(time_end, time_start); - dev->last_residency = (int)diff; - dev->states_usage[entered_state].time += dev->last_residency; + dev->last_residency_ns = diff; + dev->states_usage[entered_state].time_ns += diff; dev->states_usage[entered_state].usage++; - if (diff < drv->states[entered_state].target_residency) { + if (diff < drv->states[entered_state].target_residency_ns) { for (i = entered_state - 1; i >= 0; i--) { if (dev->states_usage[i].disable) continue; @@ -281,14 +279,14 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, * Update if a deeper state would have been a * better match for the observed idle duration. */ - if (diff - delay >= drv->states[i].target_residency) + if (diff - delay >= drv->states[i].target_residency_ns) dev->states_usage[entered_state].below++; break; } } } else { - dev->last_residency = 0; + dev->last_residency_ns = 0; } return entered_state; @@ -381,7 +379,7 @@ u64 cpuidle_poll_time(struct cpuidle_driver *drv, if (dev->states_usage[i].disable) continue; - limit_ns = (u64)drv->states[i].target_residency * NSEC_PER_USEC; + limit_ns = (u64)drv->states[i].target_residency_ns; } dev->poll_limit_ns = limit_ns; @@ -552,7 +550,7 @@ static void __cpuidle_unregister_device(struct cpuidle_device *dev) static void __cpuidle_device_init(struct cpuidle_device *dev) { memset(dev->states_usage, 0, sizeof(dev->states_usage)); - dev->last_residency = 0; + dev->last_residency_ns = 0; dev->next_hrtimer = 0; } diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c index 9db154224999..fcaf8b2bab96 100644 --- a/drivers/cpuidle/driver.c +++ b/drivers/cpuidle/driver.c @@ -165,16 +165,27 @@ static void __cpuidle_driver_init(struct cpuidle_driver *drv) if (!drv->cpumask) drv->cpumask = (struct cpumask *)cpu_possible_mask; - /* - * Look for the timer stop flag in the different states, so that we know - * if the broadcast timer has to be set up. The loop is in the reverse - * order, because usually one of the deeper states have this flag set. - */ - for (i = drv->state_count - 1; i >= 0 ; i--) { - if (drv->states[i].flags & CPUIDLE_FLAG_TIMER_STOP) { + for (i = 0; i < drv->state_count; i++) { + struct cpuidle_state *s = &drv->states[i]; + + /* + * Look for the timer stop flag in the different states and if + * it is found, indicate that the broadcast timer has to be set + * up. + */ + if (s->flags & CPUIDLE_FLAG_TIMER_STOP) drv->bctimer = 1; - break; - } + + /* + * The core will use the target residency and exit latency + * values in nanoseconds, but allow drivers to provide them in + * microseconds too. + */ + if (s->target_residency > 0) + s->target_residency_ns = s->target_residency * NSEC_PER_USEC; + + if (s->exit_latency > 0) + s->exit_latency_ns = s->exit_latency * NSEC_PER_USEC; } } diff --git a/drivers/cpuidle/governor.c b/drivers/cpuidle/governor.c index e9801f26c732..e48271e117a3 100644 --- a/drivers/cpuidle/governor.c +++ b/drivers/cpuidle/governor.c @@ -107,11 +107,14 @@ int cpuidle_register_governor(struct cpuidle_governor *gov) * cpuidle_governor_latency_req - Compute a latency constraint for CPU * @cpu: Target CPU */ -int cpuidle_governor_latency_req(unsigned int cpu) +s64 cpuidle_governor_latency_req(unsigned int cpu) { int global_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY); struct device *device = get_cpu_device(cpu); int device_req = dev_pm_qos_raw_resume_latency(device); - return device_req < global_req ? device_req : global_req; + if (device_req > global_req) + device_req = global_req; + + return (s64)device_req * NSEC_PER_USEC; } diff --git a/drivers/cpuidle/governors/haltpoll.c b/drivers/cpuidle/governors/haltpoll.c index 7a703d2e0064..cb2a96eafc02 100644 --- a/drivers/cpuidle/governors/haltpoll.c +++ b/drivers/cpuidle/governors/haltpoll.c @@ -49,7 +49,7 @@ static int haltpoll_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, bool *stop_tick) { - int latency_req = cpuidle_governor_latency_req(dev->cpu); + s64 latency_req = cpuidle_governor_latency_req(dev->cpu); if (!drv->state_count || latency_req == 0) { *stop_tick = false; @@ -75,10 +75,9 @@ static int haltpoll_select(struct cpuidle_driver *drv, return 0; } -static void adjust_poll_limit(struct cpuidle_device *dev, unsigned int block_us) +static void adjust_poll_limit(struct cpuidle_device *dev, u64 block_ns) { unsigned int val; - u64 block_ns = block_us*NSEC_PER_USEC; /* Grow cpu_halt_poll_us if * cpu_halt_poll_us < block_ns < guest_halt_poll_us @@ -115,7 +114,7 @@ static void haltpoll_reflect(struct cpuidle_device *dev, int index) dev->last_state_idx = index; if (index != 0) - adjust_poll_limit(dev, dev->last_residency); + adjust_poll_limit(dev, dev->last_residency_ns); } /** diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c index b0126b8c32fe..8e9058c4ea63 100644 --- a/drivers/cpuidle/governors/ladder.c +++ b/drivers/cpuidle/governors/ladder.c @@ -27,8 +27,8 @@ struct ladder_device_state { struct { u32 promotion_count; u32 demotion_count; - u32 promotion_time; - u32 demotion_time; + u64 promotion_time_ns; + u64 demotion_time_ns; } threshold; struct { int promotion_count; @@ -68,9 +68,10 @@ static int ladder_select_state(struct cpuidle_driver *drv, { struct ladder_device *ldev = this_cpu_ptr(&ladder_devices); struct ladder_device_state *last_state; - int last_residency, last_idx = dev->last_state_idx; + int last_idx = dev->last_state_idx; int first_idx = drv->states[0].flags & CPUIDLE_FLAG_POLLING ? 1 : 0; - int latency_req = cpuidle_governor_latency_req(dev->cpu); + s64 latency_req = cpuidle_governor_latency_req(dev->cpu); + s64 last_residency; /* Special case when user has set very strict latency requirement */ if (unlikely(latency_req == 0)) { @@ -80,13 +81,13 @@ static int ladder_select_state(struct cpuidle_driver *drv, last_state = &ldev->states[last_idx]; - last_residency = dev->last_residency - drv->states[last_idx].exit_latency; + last_residency = dev->last_residency_ns - drv->states[last_idx].exit_latency_ns; /* consider promotion */ if (last_idx < drv->state_count - 1 && !dev->states_usage[last_idx + 1].disable && - last_residency > last_state->threshold.promotion_time && - drv->states[last_idx + 1].exit_latency <= latency_req) { + last_residency > last_state->threshold.promotion_time_ns && + drv->states[last_idx + 1].exit_latency_ns <= latency_req) { last_state->stats.promotion_count++; last_state->stats.demotion_count = 0; if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) { @@ -98,11 +99,11 @@ static int ladder_select_state(struct cpuidle_driver *drv, /* consider demotion */ if (last_idx > first_idx && (dev->states_usage[last_idx].disable || - drv->states[last_idx].exit_latency > latency_req)) { + drv->states[last_idx].exit_latency_ns > latency_req)) { int i; for (i = last_idx - 1; i > first_idx; i--) { - if (drv->states[i].exit_latency <= latency_req) + if (drv->states[i].exit_latency_ns <= latency_req) break; } ladder_do_selection(dev, ldev, last_idx, i); @@ -110,7 +111,7 @@ static int ladder_select_state(struct cpuidle_driver *drv, } if (last_idx > first_idx && - last_residency < last_state->threshold.demotion_time) { + last_residency < last_state->threshold.demotion_time_ns) { last_state->stats.demotion_count++; last_state->stats.promotion_count = 0; if (last_state->stats.demotion_count >= last_state->threshold.demotion_count) { @@ -150,9 +151,9 @@ static int ladder_enable_device(struct cpuidle_driver *drv, lstate->threshold.demotion_count = DEMOTION_COUNT; if (i < drv->state_count - 1) - lstate->threshold.promotion_time = state->exit_latency; + lstate->threshold.promotion_time_ns = state->exit_latency_ns; if (i > first_idx) - lstate->threshold.demotion_time = state->exit_latency; + lstate->threshold.demotion_time_ns = state->exit_latency_ns; } return 0; diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index 38b2b72102a8..b0a7ad566081 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -19,22 +19,12 @@ #include #include -/* - * Please note when changing the tuning values: - * If (MAX_INTERESTING-1) * RESOLUTION > UINT_MAX, the result of - * a scaling operation multiplication may overflow on 32 bit platforms. - * In that case, #define RESOLUTION as ULL to get 64 bit result: - * #define RESOLUTION 1024ULL - * - * The default values do not overflow. - */ #define BUCKETS 12 #define INTERVAL_SHIFT 3 #define INTERVALS (1UL << INTERVAL_SHIFT) #define RESOLUTION 1024 #define DECAY 8 -#define MAX_INTERESTING 50000 - +#define MAX_INTERESTING (50000 * NSEC_PER_USEC) /* * Concepts and ideas behind the menu governor @@ -120,14 +110,14 @@ struct menu_device { int needs_update; int tick_wakeup; - unsigned int next_timer_us; + u64 next_timer_ns; unsigned int bucket; unsigned int correction_factor[BUCKETS]; unsigned int intervals[INTERVALS]; int interval_ptr; }; -static inline int which_bucket(unsigned int duration, unsigned long nr_iowaiters) +static inline int which_bucket(u64 duration_ns, unsigned long nr_iowaiters) { int bucket = 0; @@ -140,15 +130,15 @@ static inline int which_bucket(unsigned int duration, unsigned long nr_iowaiters if (nr_iowaiters) bucket = BUCKETS/2; - if (duration < 10) + if (duration_ns < 10ULL * NSEC_PER_USEC) return bucket; - if (duration < 100) + if (duration_ns < 100ULL * NSEC_PER_USEC) return bucket + 1; - if (duration < 1000) + if (duration_ns < 1000ULL * NSEC_PER_USEC) return bucket + 2; - if (duration < 10000) + if (duration_ns < 10000ULL * NSEC_PER_USEC) return bucket + 3; - if (duration < 100000) + if (duration_ns < 100000ULL * NSEC_PER_USEC) return bucket + 4; return bucket + 5; } @@ -276,13 +266,13 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, bool *stop_tick) { struct menu_device *data = this_cpu_ptr(&menu_devices); - int latency_req = cpuidle_governor_latency_req(dev->cpu); - int i; - int idx; - unsigned int interactivity_req; + s64 latency_req = cpuidle_governor_latency_req(dev->cpu); unsigned int predicted_us; + u64 predicted_ns; + u64 interactivity_req; unsigned long nr_iowaiters; ktime_t delta_next; + int i, idx; if (data->needs_update) { menu_update(drv, dev); @@ -290,14 +280,14 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, } /* determine the expected residency time, round up */ - data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next)); + data->next_timer_ns = tick_nohz_get_sleep_length(&delta_next); nr_iowaiters = nr_iowait_cpu(dev->cpu); - data->bucket = which_bucket(data->next_timer_us, nr_iowaiters); + data->bucket = which_bucket(data->next_timer_ns, nr_iowaiters); if (unlikely(drv->state_count <= 1 || latency_req == 0) || - ((data->next_timer_us < drv->states[1].target_residency || - latency_req < drv->states[1].exit_latency) && + ((data->next_timer_ns < drv->states[1].target_residency_ns || + latency_req < drv->states[1].exit_latency_ns) && !dev->states_usage[0].disable)) { /* * In this case state[0] will be used no matter what, so return @@ -308,18 +298,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, return 0; } - /* - * Force the result of multiplication to be 64 bits even if both - * operands are 32 bits. - * Make sure to round up for half microseconds. - */ - predicted_us = DIV_ROUND_CLOSEST_ULL((uint64_t)data->next_timer_us * - data->correction_factor[data->bucket], - RESOLUTION * DECAY); - /* - * Use the lowest expected idle interval to pick the idle state. - */ - predicted_us = min(predicted_us, get_typical_interval(data, predicted_us)); + /* Round up the result for half microseconds. */ + predicted_us = div_u64(data->next_timer_ns * + data->correction_factor[data->bucket] + + (RESOLUTION * DECAY * NSEC_PER_USEC) / 2, + RESOLUTION * DECAY * NSEC_PER_USEC); + /* Use the lowest expected idle interval to pick the idle state. */ + predicted_ns = (u64)min(predicted_us, + get_typical_interval(data, predicted_us)) * + NSEC_PER_USEC; if (tick_nohz_tick_stopped()) { /* @@ -330,14 +317,15 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * the known time till the closest timer event for the idle * state selection. */ - if (predicted_us < TICK_USEC) - predicted_us = ktime_to_us(delta_next); + if (predicted_ns < TICK_NSEC) + predicted_ns = delta_next; } else { /* * Use the performance multiplier and the user-configurable * latency_req to determine the maximum exit latency. */ - interactivity_req = predicted_us / performance_multiplier(nr_iowaiters); + interactivity_req = div64_u64(predicted_ns, + performance_multiplier(nr_iowaiters)); if (latency_req > interactivity_req) latency_req = interactivity_req; } @@ -356,19 +344,19 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, if (idx == -1) idx = i; /* first enabled state */ - if (s->target_residency > predicted_us) { + if (s->target_residency_ns > predicted_ns) { /* * Use a physical idle state, not busy polling, unless * a timer is going to trigger soon enough. */ if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) && - s->exit_latency <= latency_req && - s->target_residency <= data->next_timer_us) { - predicted_us = s->target_residency; + s->exit_latency_ns <= latency_req && + s->target_residency_ns <= data->next_timer_ns) { + predicted_ns = s->target_residency_ns; idx = i; break; } - if (predicted_us < TICK_USEC) + if (predicted_ns < TICK_NSEC) break; if (!tick_nohz_tick_stopped()) { @@ -378,7 +366,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * tick in that case and let the governor run * again in the next iteration of the loop. */ - predicted_us = drv->states[idx].target_residency; + predicted_ns = drv->states[idx].target_residency_ns; break; } @@ -388,13 +376,13 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * closest timer event, select this one to avoid getting * stuck in the shallow one for too long. */ - if (drv->states[idx].target_residency < TICK_USEC && - s->target_residency <= ktime_to_us(delta_next)) + if (drv->states[idx].target_residency_ns < TICK_NSEC && + s->target_residency_ns <= delta_next) idx = i; return idx; } - if (s->exit_latency > latency_req) + if (s->exit_latency_ns > latency_req) break; idx = i; @@ -408,12 +396,10 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * expected idle duration is shorter than the tick period length. */ if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || - predicted_us < TICK_USEC) && !tick_nohz_tick_stopped()) { - unsigned int delta_next_us = ktime_to_us(delta_next); - + predicted_ns < TICK_NSEC) && !tick_nohz_tick_stopped()) { *stop_tick = false; - if (idx > 0 && drv->states[idx].target_residency > delta_next_us) { + if (idx > 0 && drv->states[idx].target_residency_ns > delta_next) { /* * The tick is not going to be stopped and the target * residency of the state to be returned is not within @@ -425,7 +411,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, continue; idx = i; - if (drv->states[i].target_residency <= delta_next_us) + if (drv->states[i].target_residency_ns <= delta_next) break; } } @@ -461,7 +447,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) struct menu_device *data = this_cpu_ptr(&menu_devices); int last_idx = dev->last_state_idx; struct cpuidle_state *target = &drv->states[last_idx]; - unsigned int measured_us; + u64 measured_ns; unsigned int new_factor; /* @@ -479,7 +465,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) * assume the state was never reached and the exit latency is 0. */ - if (data->tick_wakeup && data->next_timer_us > TICK_USEC) { + if (data->tick_wakeup && data->next_timer_ns > TICK_NSEC) { /* * The nohz code said that there wouldn't be any events within * the tick boundary (if the tick was stopped), but the idle @@ -489,7 +475,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) * have been idle long (but not forever) to help the idle * duration predictor do a better job next time. */ - measured_us = 9 * MAX_INTERESTING / 10; + measured_ns = 9 * MAX_INTERESTING / 10; } else if ((drv->states[last_idx].flags & CPUIDLE_FLAG_POLLING) && dev->poll_time_limit) { /* @@ -499,28 +485,29 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) * the CPU might have been woken up from idle by the next timer. * Assume that to be the case. */ - measured_us = data->next_timer_us; + measured_ns = data->next_timer_ns; } else { /* measured value */ - measured_us = dev->last_residency; + measured_ns = dev->last_residency_ns; /* Deduct exit latency */ - if (measured_us > 2 * target->exit_latency) - measured_us -= target->exit_latency; + if (measured_ns > 2 * target->exit_latency_ns) + measured_ns -= target->exit_latency_ns; else - measured_us /= 2; + measured_ns /= 2; } /* Make sure our coefficients do not exceed unity */ - if (measured_us > data->next_timer_us) - measured_us = data->next_timer_us; + if (measured_ns > data->next_timer_ns) + measured_ns = data->next_timer_ns; /* Update our correction ratio */ new_factor = data->correction_factor[data->bucket]; new_factor -= new_factor / DECAY; - if (data->next_timer_us > 0 && measured_us < MAX_INTERESTING) - new_factor += RESOLUTION * measured_us / data->next_timer_us; + if (data->next_timer_ns > 0 && measured_ns < MAX_INTERESTING) + new_factor += div64_u64(RESOLUTION * measured_ns, + data->next_timer_ns); else /* * we were idle so long that we count it as a perfect @@ -540,7 +527,7 @@ static void menu_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) data->correction_factor[data->bucket] = new_factor; /* update the repeating-pattern data */ - data->intervals[data->interval_ptr++] = measured_us; + data->intervals[data->interval_ptr++] = ktime_to_us(measured_ns); if (data->interval_ptr >= INTERVALS) data->interval_ptr = 0; } diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c index 702d560eb347..ecbcfaefb0cd 100644 --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -104,7 +104,7 @@ struct teo_cpu { u64 sleep_length_ns; struct teo_idle_state states[CPUIDLE_STATE_MAX]; int interval_idx; - unsigned int intervals[INTERVALS]; + u64 intervals[INTERVALS]; }; static DEFINE_PER_CPU(struct teo_cpu, teo_cpus); @@ -117,9 +117,8 @@ static DEFINE_PER_CPU(struct teo_cpu, teo_cpus); static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) { struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); - unsigned int sleep_length_us = ktime_to_us(cpu_data->sleep_length_ns); int i, idx_hit = -1, idx_timer = -1; - unsigned int measured_us; + u64 measured_ns; if (cpu_data->time_span_ns >= cpu_data->sleep_length_ns) { /* @@ -127,23 +126,21 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) * enough to the closest timer event expected at the idle state * selection time to be discarded. */ - measured_us = UINT_MAX; + measured_ns = U64_MAX; } else { - unsigned int lat; + u64 lat_ns = drv->states[dev->last_state_idx].exit_latency_ns; - lat = drv->states[dev->last_state_idx].exit_latency; - - measured_us = ktime_to_us(cpu_data->time_span_ns); + measured_ns = cpu_data->time_span_ns; /* * The delay between the wakeup and the first instruction * executed by the CPU is not likely to be worst-case every * time, so take 1/2 of the exit latency as a very rough * approximation of the average of it. */ - if (measured_us >= lat) - measured_us -= lat / 2; + if (measured_ns >= lat_ns) + measured_ns -= lat_ns / 2; else - measured_us /= 2; + measured_ns /= 2; } /* @@ -155,9 +152,9 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) cpu_data->states[i].early_hits -= early_hits >> DECAY_SHIFT; - if (drv->states[i].target_residency <= sleep_length_us) { + if (drv->states[i].target_residency_ns <= cpu_data->sleep_length_ns) { idx_timer = i; - if (drv->states[i].target_residency <= measured_us) + if (drv->states[i].target_residency_ns <= measured_ns) idx_hit = i; } } @@ -193,7 +190,7 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) * Save idle duration values corresponding to non-timer wakeups for * pattern detection. */ - cpu_data->intervals[cpu_data->interval_idx++] = measured_us; + cpu_data->intervals[cpu_data->interval_idx++] = measured_ns; if (cpu_data->interval_idx > INTERVALS) cpu_data->interval_idx = 0; } @@ -203,11 +200,11 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) * @drv: cpuidle driver containing state data. * @dev: Target CPU. * @state_idx: Index of the capping idle state. - * @duration_us: Idle duration value to match. + * @duration_ns: Idle duration value to match. */ static int teo_find_shallower_state(struct cpuidle_driver *drv, struct cpuidle_device *dev, int state_idx, - unsigned int duration_us) + u64 duration_ns) { int i; @@ -216,7 +213,7 @@ static int teo_find_shallower_state(struct cpuidle_driver *drv, continue; state_idx = i; - if (drv->states[i].target_residency <= duration_us) + if (drv->states[i].target_residency_ns <= duration_ns) break; } return state_idx; @@ -232,8 +229,9 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, bool *stop_tick) { struct teo_cpu *cpu_data = per_cpu_ptr(&teo_cpus, dev->cpu); - int latency_req = cpuidle_governor_latency_req(dev->cpu); - unsigned int duration_us, hits, misses, early_hits; + s64 latency_req = cpuidle_governor_latency_req(dev->cpu); + u64 duration_ns; + unsigned int hits, misses, early_hits; int max_early_idx, constraint_idx, idx, i; ktime_t delta_tick; @@ -244,8 +242,8 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, cpu_data->time_span_ns = local_clock(); - cpu_data->sleep_length_ns = tick_nohz_get_sleep_length(&delta_tick); - duration_us = ktime_to_us(cpu_data->sleep_length_ns); + duration_ns = tick_nohz_get_sleep_length(&delta_tick); + cpu_data->sleep_length_ns = duration_ns; hits = 0; misses = 0; @@ -262,7 +260,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * Ignore disabled states with target residencies beyond * the anticipated idle duration. */ - if (s->target_residency > duration_us) + if (s->target_residency_ns > duration_ns) continue; /* @@ -301,7 +299,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * shallow for that role. */ if (!(tick_nohz_tick_stopped() && - drv->states[idx].target_residency < TICK_USEC)) { + drv->states[idx].target_residency_ns < TICK_NSEC)) { early_hits = cpu_data->states[i].early_hits; max_early_idx = idx; } @@ -315,10 +313,10 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, misses = cpu_data->states[i].misses; } - if (s->target_residency > duration_us) + if (s->target_residency_ns > duration_ns) break; - if (s->exit_latency > latency_req && constraint_idx > i) + if (s->exit_latency_ns > latency_req && constraint_idx > i) constraint_idx = i; idx = i; @@ -327,7 +325,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, if (early_hits < cpu_data->states[i].early_hits && !(tick_nohz_tick_stopped() && - drv->states[i].target_residency < TICK_USEC)) { + drv->states[i].target_residency_ns < TICK_NSEC)) { early_hits = cpu_data->states[i].early_hits; max_early_idx = i; } @@ -343,7 +341,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, */ if (hits <= misses && max_early_idx >= 0) { idx = max_early_idx; - duration_us = drv->states[idx].target_residency; + duration_ns = drv->states[idx].target_residency_ns; } /* @@ -364,9 +362,9 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * the current expected idle duration value. */ for (i = 0; i < INTERVALS; i++) { - unsigned int val = cpu_data->intervals[i]; + u64 val = cpu_data->intervals[i]; - if (val >= duration_us) + if (val >= duration_ns) continue; count++; @@ -378,17 +376,17 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * values are in the interesting range. */ if (count > INTERVALS / 2) { - unsigned int avg_us = div64_u64(sum, count); + u64 avg_ns = div64_u64(sum, count); /* * Avoid spending too much time in an idle state that * would be too shallow. */ - if (!(tick_nohz_tick_stopped() && avg_us < TICK_USEC)) { - duration_us = avg_us; - if (drv->states[idx].target_residency > avg_us) + if (!(tick_nohz_tick_stopped() && avg_ns < TICK_NSEC)) { + duration_ns = avg_ns; + if (drv->states[idx].target_residency_ns > avg_ns) idx = teo_find_shallower_state(drv, dev, - idx, avg_us); + idx, avg_ns); } } } @@ -398,9 +396,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * expected idle duration is shorter than the tick period length. */ if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || - duration_us < TICK_USEC) && !tick_nohz_tick_stopped()) { - unsigned int delta_tick_us = ktime_to_us(delta_tick); - + duration_ns < TICK_NSEC) && !tick_nohz_tick_stopped()) { *stop_tick = false; /* @@ -409,8 +405,8 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * till the closest timer including the tick, try to correct * that. */ - if (idx > 0 && drv->states[idx].target_residency > delta_tick_us) - idx = teo_find_shallower_state(drv, dev, idx, delta_tick_us); + if (idx > 0 && drv->states[idx].target_residency_ns > delta_tick) + idx = teo_find_shallower_state(drv, dev, idx, delta_tick); } return idx; @@ -454,7 +450,7 @@ static int teo_enable_device(struct cpuidle_driver *drv, memset(cpu_data, 0, sizeof(*cpu_data)); for (i = 0; i < INTERVALS; i++) - cpu_data->intervals[i] = UINT_MAX; + cpu_data->intervals[i] = U64_MAX; return 0; } diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c index c8fa5f41dfc4..9f1ace9c53da 100644 --- a/drivers/cpuidle/poll_state.c +++ b/drivers/cpuidle/poll_state.c @@ -49,6 +49,8 @@ void cpuidle_poll_state_init(struct cpuidle_driver *drv) snprintf(state->desc, CPUIDLE_DESC_LEN, "CPUIDLE CORE POLL IDLE"); state->exit_latency = 0; state->target_residency = 0; + state->exit_latency_ns = 0; + state->target_residency_ns = 0; state->power_usage = -1; state->enter = poll_idle; state->disabled = false; diff --git a/drivers/cpuidle/sysfs.c b/drivers/cpuidle/sysfs.c index 9f3755ac8f87..38ef770be90d 100644 --- a/drivers/cpuidle/sysfs.c +++ b/drivers/cpuidle/sysfs.c @@ -273,16 +273,30 @@ static ssize_t show_state_##_name(struct cpuidle_state *state, \ return sprintf(buf, "%s\n", state->_name);\ } -define_show_state_function(exit_latency) -define_show_state_function(target_residency) +#define define_show_state_time_function(_name) \ +static ssize_t show_state_##_name(struct cpuidle_state *state, \ + struct cpuidle_state_usage *state_usage, \ + char *buf) \ +{ \ + return sprintf(buf, "%llu\n", ktime_to_us(state->_name##_ns)); \ +} + +define_show_state_time_function(exit_latency) +define_show_state_time_function(target_residency) define_show_state_function(power_usage) define_show_state_ull_function(usage) -define_show_state_ull_function(time) define_show_state_str_function(name) define_show_state_str_function(desc) define_show_state_ull_function(above) define_show_state_ull_function(below) +static ssize_t show_state_time(struct cpuidle_state *state, + struct cpuidle_state_usage *state_usage, + char *buf) +{ + return sprintf(buf, "%llu\n", ktime_to_us(state_usage->time_ns)); +} + static ssize_t show_state_disable(struct cpuidle_state *state, struct cpuidle_state_usage *state_usage, char *buf) diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index d23a3b1ddcf6..22602747f468 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -35,7 +35,7 @@ struct cpuidle_driver; struct cpuidle_state_usage { unsigned long long disable; unsigned long long usage; - unsigned long long time; /* in US */ + u64 time_ns; unsigned long long above; /* Number of times it's been too deep */ unsigned long long below; /* Number of times it's been too shallow */ #ifdef CONFIG_SUSPEND @@ -48,6 +48,8 @@ struct cpuidle_state { char name[CPUIDLE_NAME_LEN]; char desc[CPUIDLE_DESC_LEN]; + u64 exit_latency_ns; + u64 target_residency_ns; unsigned int flags; unsigned int exit_latency; /* in US */ int power_usage; /* in mW */ @@ -89,7 +91,7 @@ struct cpuidle_device { ktime_t next_hrtimer; int last_state_idx; - int last_residency; + u64 last_residency_ns; u64 poll_limit_ns; struct cpuidle_state_usage states_usage[CPUIDLE_STATE_MAX]; struct cpuidle_state_kobj *kobjs[CPUIDLE_STATE_MAX]; @@ -263,7 +265,7 @@ struct cpuidle_governor { #ifdef CONFIG_CPU_IDLE extern int cpuidle_register_governor(struct cpuidle_governor *gov); -extern int cpuidle_governor_latency_req(unsigned int cpu); +extern s64 cpuidle_governor_latency_req(unsigned int cpu); #else static inline int cpuidle_register_governor(struct cpuidle_governor *gov) {return 0;} diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 8dad5aa600ea..1aa260702b38 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -104,7 +104,7 @@ static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev, * update no idle residency and return. */ if (current_clr_polling_and_test()) { - dev->last_residency = 0; + dev->last_residency_ns = 0; local_irq_enable(); return -EBUSY; } From aca32d7bccf961dc4c6ac6dff99ed363af1a6987 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Mon, 21 Oct 2019 16:51:48 +0200 Subject: [PATCH 69/83] power: avs: smartreflex: Remove superfluous cast in debugfs_create_file() call There is no need to cast a typed pointer to a void pointer when calling a function that accepts the latter. Remove it, as the cast prevents further compiler checks. Signed-off-by: Geert Uytterhoeven Signed-off-by: Rafael J. Wysocki --- drivers/power/avs/smartreflex.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/power/avs/smartreflex.c b/drivers/power/avs/smartreflex.c index 4684e7df833a..5376f3d22f31 100644 --- a/drivers/power/avs/smartreflex.c +++ b/drivers/power/avs/smartreflex.c @@ -905,7 +905,7 @@ static int omap_sr_probe(struct platform_device *pdev) sr_info->dbg_dir = debugfs_create_dir(sr_info->name, sr_dbg_dir); debugfs_create_file("autocomp", S_IRUGO | S_IWUSR, sr_info->dbg_dir, - (void *)sr_info, &pm_sr_fops); + sr_info, &pm_sr_fops); debugfs_create_x32("errweight", S_IRUGO, sr_info->dbg_dir, &sr_info->err_weight); debugfs_create_x32("errmaxlimit", S_IRUGO, sr_info->dbg_dir, From 01ca4827a7481ee0f92faec05db1e7d6a5097282 Mon Sep 17 00:00:00 2001 From: Xiaofei Tan Date: Tue, 15 Oct 2019 16:31:30 +0800 Subject: [PATCH 70/83] PM / wakeirq: remove unnecessary parentheses Remove unnecessary parentheses found by code review. Signed-off-by: Xiaofei Tan Signed-off-by: Rafael J. Wysocki --- drivers/base/power/wakeirq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/base/power/wakeirq.c b/drivers/base/power/wakeirq.c index 5ce77d1ef9fc..8e021082dba8 100644 --- a/drivers/base/power/wakeirq.c +++ b/drivers/base/power/wakeirq.c @@ -272,7 +272,7 @@ void dev_pm_enable_wake_irq_check(struct device *dev, { struct wake_irq *wirq = dev->power.wakeirq; - if (!wirq || !((wirq->status & WAKE_IRQ_DEDICATED_MASK))) + if (!wirq || !(wirq->status & WAKE_IRQ_DEDICATED_MASK)) return; if (likely(wirq->status & WAKE_IRQ_DEDICATED_MANAGED)) { @@ -299,7 +299,7 @@ void dev_pm_disable_wake_irq_check(struct device *dev) { struct wake_irq *wirq = dev->power.wakeirq; - if (!wirq || !((wirq->status & WAKE_IRQ_DEDICATED_MASK))) + if (!wirq || !(wirq->status & WAKE_IRQ_DEDICATED_MASK)) return; if (wirq->status & WAKE_IRQ_DEDICATED_MANAGED) From 2079fe6ea8cbd2fb2fbadba911f1eca6c362eb9b Mon Sep 17 00:00:00 2001 From: Ben Dooks Date: Tue, 15 Oct 2019 14:12:38 +0100 Subject: [PATCH 71/83] ARM: OMAP2+: SmartReflex: add omap_sr_pdata definition The omap_sr_pdata is not declared but is exported, so add a define for it to fix the following warning: arch/arm/mach-omap2/pdata-quirks.c:609:36: warning: symbol 'omap_sr_pdata' was not declared. Should it be static? Signed-off-by: Ben Dooks Signed-off-by: Rafael J. Wysocki --- include/linux/power/smartreflex.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/power/smartreflex.h b/include/linux/power/smartreflex.h index d0b37e937037..971c9264179e 100644 --- a/include/linux/power/smartreflex.h +++ b/include/linux/power/smartreflex.h @@ -293,6 +293,9 @@ struct omap_sr_data { struct voltagedomain *voltdm; }; + +extern struct omap_sr_data omap_sr_pdata[OMAP_SR_NR]; + #ifdef CONFIG_POWER_AVS_OMAP /* Smartreflex module enable/disable interface */ From ca765a8cfe0c78bfa47b9d67121f4e342d4b4512 Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Wed, 16 Oct 2019 15:16:03 +0200 Subject: [PATCH 72/83] PM / Domains: Introduce dev_pm_domain_start() For a subsystem/driver that either doesn't support runtime PM or makes use of pm_runtime_set_active() during ->probe(), may try to access its device when probing, even if it may not be fully powered on from the PM domain's point of view. This may be the case when the used PM domain is a genpd provider, that implements genpd's ->start|stop() device callbacks. There are cases where the subsystem/driver managed to avoid the above problem, simply by calling pm_runtime_enable() and pm_runtime_get_sync() during ->probe(). However, this approach comes with a drawback, especially if the subsystem/driver implements a ->runtime_resume() callback. More precisely, the subsystem/driver then needs to use a device flag, which is checked in its ->runtime_resume() callback, as to avoid powering on its resources the first time the callback is invoked. This is needed because the subsystem/driver has already powered on the resources for the device, during ->probe() and before it called pm_runtime_get_sync(). In a way to avoid this boilerplate code and the inefficient check for "if (first_time_suspend)" in the ->runtime_resume() callback for these subsystems/drivers, let's introduce and export a dev_pm_domain_start() function, that may be called during ->probe() instead. Moreover, let the dev_pm_domain_start() invoke an optional ->start() callback, added to the struct dev_pm_domain, as to allow a PM domain specific implementation. Signed-off-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/base/power/common.c | 20 ++++++++++++++++++++ include/linux/pm.h | 2 ++ include/linux/pm_domain.h | 5 +++++ 3 files changed, 27 insertions(+) diff --git a/drivers/base/power/common.c b/drivers/base/power/common.c index 8db98a1f83dc..bbddb267c2e6 100644 --- a/drivers/base/power/common.c +++ b/drivers/base/power/common.c @@ -187,6 +187,26 @@ void dev_pm_domain_detach(struct device *dev, bool power_off) } EXPORT_SYMBOL_GPL(dev_pm_domain_detach); +/** + * dev_pm_domain_start - Start the device through its PM domain. + * @dev: Device to start. + * + * This function should typically be called during probe by a subsystem/driver, + * when it needs to start its device from the PM domain's perspective. Note + * that, it's assumed that the PM domain is already powered on when this + * function is called. + * + * Returns 0 on success and negative error values on failures. + */ +int dev_pm_domain_start(struct device *dev) +{ + if (dev->pm_domain && dev->pm_domain->start) + return dev->pm_domain->start(dev); + + return 0; +} +EXPORT_SYMBOL_GPL(dev_pm_domain_start); + /** * dev_pm_domain_set - Set PM domain of a device. * @dev: Device whose PM domain is to be set. diff --git a/include/linux/pm.h b/include/linux/pm.h index 4c441be03079..e057d1fa2469 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -637,6 +637,7 @@ extern void dev_pm_put_subsys_data(struct device *dev); * struct dev_pm_domain - power management domain representation. * * @ops: Power management operations associated with this domain. + * @start: Called when a user needs to start the device via the domain. * @detach: Called when removing a device from the domain. * @activate: Called before executing probe routines for bus types and drivers. * @sync: Called after successful driver probe. @@ -648,6 +649,7 @@ extern void dev_pm_put_subsys_data(struct device *dev); */ struct dev_pm_domain { struct dev_pm_ops ops; + int (*start)(struct device *dev); void (*detach)(struct device *dev, bool power_off); int (*activate)(struct device *dev); void (*sync)(struct device *dev); diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index baf02ff91a31..5a31c711b896 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -366,6 +366,7 @@ struct device *dev_pm_domain_attach_by_id(struct device *dev, struct device *dev_pm_domain_attach_by_name(struct device *dev, const char *name); void dev_pm_domain_detach(struct device *dev, bool power_off); +int dev_pm_domain_start(struct device *dev); void dev_pm_domain_set(struct device *dev, struct dev_pm_domain *pd); #else static inline int dev_pm_domain_attach(struct device *dev, bool power_on) @@ -383,6 +384,10 @@ static inline struct device *dev_pm_domain_attach_by_name(struct device *dev, return NULL; } static inline void dev_pm_domain_detach(struct device *dev, bool power_off) {} +static inline int dev_pm_domain_start(struct device *dev) +{ + return 0; +} static inline void dev_pm_domain_set(struct device *dev, struct dev_pm_domain *pd) {} #endif From ea71c59669f17d032f11b13ea8a025cea365584f Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Wed, 16 Oct 2019 15:16:24 +0200 Subject: [PATCH 73/83] PM / Domains: Implement the ->start() callback for genpd To allow a subsystem/driver to explicitly start its device from genpd's point view, let's implement the ->start() callback in the struct dev_pm_domain that corresponds to the genpd. Signed-off-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/base/power/domain.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index cc85e87eaf05..2adf0661fa3e 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -634,6 +634,13 @@ static int genpd_power_on(struct generic_pm_domain *genpd, unsigned int depth) return ret; } +static int genpd_dev_pm_start(struct device *dev) +{ + struct generic_pm_domain *genpd = dev_to_genpd(dev); + + return genpd_start_dev(genpd, dev); +} + static int genpd_dev_pm_qos_notifier(struct notifier_block *nb, unsigned long val, void *ptr) { @@ -1805,6 +1812,7 @@ int pm_genpd_init(struct generic_pm_domain *genpd, genpd->domain.ops.poweroff_noirq = genpd_poweroff_noirq; genpd->domain.ops.restore_noirq = genpd_restore_noirq; genpd->domain.ops.complete = genpd_complete; + genpd->domain.start = genpd_dev_pm_start; if (genpd->flags & GENPD_FLAG_PM_CLK) { genpd->dev_ops.stop = pm_clk_suspend; From 1b32999e205bb5804400aaa61441ecb356381402 Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Wed, 16 Oct 2019 15:16:34 +0200 Subject: [PATCH 74/83] mmc: tmio: Avoid boilerplate code in ->runtime_suspend() Rather than checking the 'runtime_synced' flag each time the ->runtime_suspend() callback is invoked, let's convert into using dev_pm_domain_start() during ->probe() and drop the corresponding boilerplate code. Signed-off-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/mmc/host/tmio_mmc.h | 1 - drivers/mmc/host/tmio_mmc_core.c | 10 ++++------ 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/mmc/host/tmio_mmc.h b/drivers/mmc/host/tmio_mmc.h index 2f0b092d6dcc..c5ba13fae399 100644 --- a/drivers/mmc/host/tmio_mmc.h +++ b/drivers/mmc/host/tmio_mmc.h @@ -163,7 +163,6 @@ struct tmio_mmc_host { unsigned long last_req_ts; struct mutex ios_lock; /* protect set_ios() context */ bool native_hotplug; - bool runtime_synced; bool sdio_irq_enabled; /* Mandatory callback */ diff --git a/drivers/mmc/host/tmio_mmc_core.c b/drivers/mmc/host/tmio_mmc_core.c index 9b6e1001e77c..86b591100f16 100644 --- a/drivers/mmc/host/tmio_mmc_core.c +++ b/drivers/mmc/host/tmio_mmc_core.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include #include @@ -1248,10 +1249,12 @@ int tmio_mmc_host_probe(struct tmio_mmc_host *_host) /* See if we also get DMA */ tmio_mmc_request_dma(_host, pdata); + dev_pm_domain_start(&pdev->dev); + pm_runtime_get_noresume(&pdev->dev); + pm_runtime_set_active(&pdev->dev); pm_runtime_set_autosuspend_delay(&pdev->dev, 50); pm_runtime_use_autosuspend(&pdev->dev); pm_runtime_enable(&pdev->dev); - pm_runtime_get_sync(&pdev->dev); ret = mmc_add_host(mmc); if (ret) @@ -1333,11 +1336,6 @@ int tmio_mmc_host_runtime_resume(struct device *dev) { struct tmio_mmc_host *host = dev_get_drvdata(dev); - if (!host->runtime_synced) { - host->runtime_synced = true; - return 0; - } - tmio_mmc_clk_enable(host); tmio_mmc_hw_reset(host->mmc); From fe0c2baae0bd47958991f13fb7551bf1328b4ea7 Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Wed, 16 Oct 2019 16:16:49 +0200 Subject: [PATCH 75/83] PM / Domains: Convert to dev_to_genpd_safe() in genpd_syscore_switch() The intent with walking the gpd_list via calling genpd_present() from genpd_syscore_switch(), is to make sure the dev->pm_domain pointer belongs to a registered genpd. However, as a genpd can't be removed if there is a device attached to it, let's convert to use the quicker dev_to_genpd_safe() instead. Due to the above change, this allows us to clean up genpd_present() and move it inside CONFIG_PM_GENERIC_DOMAINS_OF, so let's do that as well. Signed-off-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- drivers/base/power/domain.c | 32 ++++++++++++-------------------- 1 file changed, 12 insertions(+), 20 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index 2adf0661fa3e..8e5725b11ee8 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -929,24 +929,6 @@ static int __init genpd_power_off_unused(void) } late_initcall(genpd_power_off_unused); -#if defined(CONFIG_PM_SLEEP) || defined(CONFIG_PM_GENERIC_DOMAINS_OF) - -static bool genpd_present(const struct generic_pm_domain *genpd) -{ - const struct generic_pm_domain *gpd; - - if (IS_ERR_OR_NULL(genpd)) - return false; - - list_for_each_entry(gpd, &gpd_list, gpd_list_node) - if (gpd == genpd) - return true; - - return false; -} - -#endif - #ifdef CONFIG_PM_SLEEP /** @@ -1361,8 +1343,8 @@ static void genpd_syscore_switch(struct device *dev, bool suspend) { struct generic_pm_domain *genpd; - genpd = dev_to_genpd(dev); - if (!genpd_present(genpd)) + genpd = dev_to_genpd_safe(dev); + if (!genpd) return; if (suspend) { @@ -2028,6 +2010,16 @@ static int genpd_add_provider(struct device_node *np, genpd_xlate_t xlate, return 0; } +static bool genpd_present(const struct generic_pm_domain *genpd) +{ + const struct generic_pm_domain *gpd; + + list_for_each_entry(gpd, &gpd_list, gpd_list_node) + if (gpd == genpd) + return true; + return false; +} + /** * of_genpd_add_provider_simple() - Register a simple PM domain provider * @np: Device node pointer associated with the PM domain provider. From b6495b7f004d01b9ecf9ed5fd31368241d3c5589 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Tue, 12 Nov 2019 10:51:16 +0100 Subject: [PATCH 76/83] cpuidle: teo: Exclude cpuidle overhead from computations One purpose of the computations in teo_update() is to determine whether or not the (saved) time till the next timer event and the measured idle duration fall into the same "bin", so avoid using values that include the cpuidle overhead to obtain the latter. Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/governors/teo.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c index ecbcfaefb0cd..b33418f5df70 100644 --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -130,7 +130,14 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) } else { u64 lat_ns = drv->states[dev->last_state_idx].exit_latency_ns; - measured_ns = cpu_data->time_span_ns; + /* + * The computations below are to determine whether or not the + * (saved) time till the next timer event and the measured idle + * duration fall into the same "bin", so use last_residency_ns + * for that instead of time_span_ns which includes the cpuidle + * overhead. + */ + measured_ns = dev->last_residency_ns; /* * The delay between the wakeup and the first instruction * executed by the CPU is not likely to be worst-case every From 63f202e5edf161c2ccffa286a9a701e995427b15 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 13 Nov 2019 01:03:24 +0100 Subject: [PATCH 77/83] cpuidle: teo: Avoid using "early hits" incorrectly If the current state with the maximum "early hits" metric in teo_select() is also the one "matching" the expected idle duration, it will be used as the candidate one for selection even if its "misses" metric is greater than its "hits" metric, which is not correct. In that case, the candidate state should be shallower than the current one and its "early hits" metric should be the maximum among the idle states shallower than the current one. To make that happen, modify teo_select() to save the index of the state whose "early hits" metric is the maximum for the range of states below the current one and go back to that state if it turns out that the current one should be rejected. Fixes: 159e48560f51 ("cpuidle: teo: Fix "early hits" handling for disabled idle states") Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/governors/teo.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c index b33418f5df70..f5dfeed77f0a 100644 --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -239,7 +239,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, s64 latency_req = cpuidle_governor_latency_req(dev->cpu); u64 duration_ns; unsigned int hits, misses, early_hits; - int max_early_idx, constraint_idx, idx, i; + int max_early_idx, prev_max_early_idx, constraint_idx, idx, i; ktime_t delta_tick; if (dev->last_state_idx >= 0) { @@ -256,6 +256,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, misses = 0; early_hits = 0; max_early_idx = -1; + prev_max_early_idx = -1; constraint_idx = drv->state_count; idx = -1; @@ -307,6 +308,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, */ if (!(tick_nohz_tick_stopped() && drv->states[idx].target_residency_ns < TICK_NSEC)) { + prev_max_early_idx = max_early_idx; early_hits = cpu_data->states[i].early_hits; max_early_idx = idx; } @@ -333,6 +335,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, if (early_hits < cpu_data->states[i].early_hits && !(tick_nohz_tick_stopped() && drv->states[i].target_residency_ns < TICK_NSEC)) { + prev_max_early_idx = max_early_idx; early_hits = cpu_data->states[i].early_hits; max_early_idx = i; } @@ -346,9 +349,19 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * "early hits" metric, but if that cannot be determined, just use the * state selected so far. */ - if (hits <= misses && max_early_idx >= 0) { - idx = max_early_idx; - duration_ns = drv->states[idx].target_residency_ns; + if (hits <= misses) { + /* + * The current candidate state is not suitable, so take the one + * whose "early hits" metric is the maximum for the range of + * shallower states. + */ + if (idx == max_early_idx) + max_early_idx = prev_max_early_idx; + + if (max_early_idx >= 0) { + idx = max_early_idx; + duration_ns = drv->states[idx].target_residency_ns; + } } /* From 46770be0cf94149ca48be87719bda1d951066644 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Thu, 14 Nov 2019 09:06:17 +0530 Subject: [PATCH 78/83] cpufreq: Register drivers only after CPU devices have been registered The cpufreq core heavily depends on the availability of the struct device for CPUs and if they aren't available at the time cpufreq driver is registered, we will never succeed in making cpufreq work. This happens due to following sequence of events: - cpufreq_register_driver() - subsys_interface_register() - return 0; //successful registration of driver ... at a later point of time - register_cpu(); - device_register(); - bus_probe_device(); - sif->add_dev(); - cpufreq_add_dev(); - get_cpu_device(); //FAILS - per_cpu(cpu_sys_devices, num) = &cpu->dev; //used by get_cpu_device() - return 0; //CPU registered successfully Because the per-cpu variable cpu_sys_devices is set only after the CPU device is regsitered, cpufreq will never be able to get it when cpufreq_add_dev() is called. This patch avoids this failure by making sure device structure of at least CPU0 is available when the cpufreq driver is registered, else return -EPROBE_DEFER. Reported-by: Bjorn Andersson Co-developed-by: Amit Kucheria Signed-off-by: Viresh Kumar Tested-by: Amit Kucheria Signed-off-by: Rafael J. Wysocki --- drivers/cpufreq/cpufreq.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 0a10cf9d0b1a..7fc1a686f2f6 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2637,6 +2637,13 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data) if (cpufreq_disabled()) return -ENODEV; + /* + * The cpufreq core depends heavily on the availability of device + * structure, make sure they are available before proceeding further. + */ + if (!get_cpu_device(0)) + return -EPROBE_DEFER; + if (!driver_data || !driver_data->verify || !driver_data->init || !(driver_data->setpolicy || driver_data->target_index || driver_data->target) || From 85f6a17f24f9f7faa4aaecf98e12acdd312aa4c9 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 13 Nov 2019 01:10:13 +0100 Subject: [PATCH 79/83] cpuidle: teo: Avoid code duplication in conditionals There are three places in teo_select() where a given amount of time is compared with TICK_NSEC if tick_nohz_tick_stopped() returns true, which is a bit of duplicated code. Avoid that code duplication by defining a helper function to do the check and using it in all of the places in question. No intentional functional impact. Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/governors/teo.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c index f5dfeed77f0a..de7e706efd46 100644 --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -202,6 +202,11 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev) cpu_data->interval_idx = 0; } +static bool teo_time_ok(u64 interval_ns) +{ + return !tick_nohz_tick_stopped() || interval_ns >= TICK_NSEC; +} + /** * teo_find_shallower_state - Find shallower idle state matching given duration. * @drv: cpuidle driver containing state data. @@ -306,8 +311,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * check if the current candidate state is not too * shallow for that role. */ - if (!(tick_nohz_tick_stopped() && - drv->states[idx].target_residency_ns < TICK_NSEC)) { + if (teo_time_ok(drv->states[idx].target_residency_ns)) { prev_max_early_idx = max_early_idx; early_hits = cpu_data->states[i].early_hits; max_early_idx = idx; @@ -333,8 +337,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, misses = cpu_data->states[i].misses; if (early_hits < cpu_data->states[i].early_hits && - !(tick_nohz_tick_stopped() && - drv->states[i].target_residency_ns < TICK_NSEC)) { + teo_time_ok(drv->states[i].target_residency_ns)) { prev_max_early_idx = max_early_idx; early_hits = cpu_data->states[i].early_hits; max_early_idx = i; @@ -402,7 +405,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, * Avoid spending too much time in an idle state that * would be too shallow. */ - if (!(tick_nohz_tick_stopped() && avg_ns < TICK_NSEC)) { + if (teo_time_ok(avg_ns)) { duration_ns = avg_ns; if (drv->states[idx].target_residency_ns > avg_ns) idx = teo_find_shallower_state(drv, dev, From cbda56d5fefcebc01448982a55836c88a825b34c Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 18 Nov 2019 12:11:24 +0100 Subject: [PATCH 80/83] cpuidle: Introduce cpuidle_driver_state_disabled() for driver quirks Commit 99e98d3fb100 ("cpuidle: Consolidate disabled state checks") overlooked the fact that the imx6q and tegra20 cpuidle drivers use the "disabled" field in struct cpuidle_state for quirks which trigger after the initialization of cpuidle, so reading the initial value of that field is not sufficient for those drivers. In order to allow them to implement the quirks without using the "disabled" field in struct cpuidle_state, introduce a new helper function and modify them to use it. Fixes: 99e98d3fb100 ("cpuidle: Consolidate disabled state checks") Reported-by: Len Brown Signed-off-by: Rafael J. Wysocki --- arch/arm/mach-imx/cpuidle-imx6q.c | 4 ++-- arch/arm/mach-tegra/cpuidle-tegra20.c | 2 +- drivers/cpuidle/driver.c | 28 +++++++++++++++++++++++++++ include/linux/cpuidle.h | 4 ++++ 4 files changed, 35 insertions(+), 3 deletions(-) diff --git a/arch/arm/mach-imx/cpuidle-imx6q.c b/arch/arm/mach-imx/cpuidle-imx6q.c index 39a7d9393641..24dd5bbe60e4 100644 --- a/arch/arm/mach-imx/cpuidle-imx6q.c +++ b/arch/arm/mach-imx/cpuidle-imx6q.c @@ -62,13 +62,13 @@ static struct cpuidle_driver imx6q_cpuidle_driver = { */ void imx6q_cpuidle_fec_irqs_used(void) { - imx6q_cpuidle_driver.states[1].disabled = true; + cpuidle_driver_state_disabled(&imx6q_cpuidle_driver, 1, true); } EXPORT_SYMBOL_GPL(imx6q_cpuidle_fec_irqs_used); void imx6q_cpuidle_fec_irqs_unused(void) { - imx6q_cpuidle_driver.states[1].disabled = false; + cpuidle_driver_state_disabled(&imx6q_cpuidle_driver, 1, false); } EXPORT_SYMBOL_GPL(imx6q_cpuidle_fec_irqs_unused); diff --git a/arch/arm/mach-tegra/cpuidle-tegra20.c b/arch/arm/mach-tegra/cpuidle-tegra20.c index 2447427cb4a8..69f3fa270fbe 100644 --- a/arch/arm/mach-tegra/cpuidle-tegra20.c +++ b/arch/arm/mach-tegra/cpuidle-tegra20.c @@ -203,7 +203,7 @@ void tegra20_cpuidle_pcie_irqs_in_use(void) { pr_info_once( "Disabling cpuidle LP2 state, since PCIe IRQs are in use\n"); - tegra_idle_driver.states[1].disabled = true; + cpuidle_driver_state_disabled(&tegra_idle_driver, 1, true); } int __init tegra20_cpuidle_init(void) diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c index fcaf8b2bab96..c76423aaef4d 100644 --- a/drivers/cpuidle/driver.c +++ b/drivers/cpuidle/driver.c @@ -389,3 +389,31 @@ void cpuidle_driver_unref(void) spin_unlock(&cpuidle_driver_lock); } + +/** + * cpuidle_driver_state_disabled - Disable or enable an idle state + * @drv: cpuidle driver owning the state + * @idx: State index + * @disable: Whether or not to disable the state + */ +void cpuidle_driver_state_disabled(struct cpuidle_driver *drv, int idx, + bool disable) +{ + unsigned int cpu; + + mutex_lock(&cpuidle_lock); + + for_each_cpu(cpu, drv->cpumask) { + struct cpuidle_device *dev = per_cpu(cpuidle_devices, cpu); + + if (!dev) + continue; + + if (disable) + dev->states_usage[idx].disable |= CPUIDLE_STATE_DISABLED_BY_DRIVER; + else + dev->states_usage[idx].disable &= ~CPUIDLE_STATE_DISABLED_BY_DRIVER; + } + + mutex_unlock(&cpuidle_lock); +} diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index 22602747f468..afb6a573b46d 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -149,6 +149,8 @@ extern int cpuidle_register_driver(struct cpuidle_driver *drv); extern struct cpuidle_driver *cpuidle_get_driver(void); extern struct cpuidle_driver *cpuidle_driver_ref(void); extern void cpuidle_driver_unref(void); +extern void cpuidle_driver_state_disabled(struct cpuidle_driver *drv, int idx, + bool disable); extern void cpuidle_unregister_driver(struct cpuidle_driver *drv); extern int cpuidle_register_device(struct cpuidle_device *dev); extern void cpuidle_unregister_device(struct cpuidle_device *dev); @@ -186,6 +188,8 @@ static inline int cpuidle_register_driver(struct cpuidle_driver *drv) static inline struct cpuidle_driver *cpuidle_get_driver(void) {return NULL; } static inline struct cpuidle_driver *cpuidle_driver_ref(void) {return NULL; } static inline void cpuidle_driver_unref(void) {} +static inline void cpuidle_driver_state_disabled(struct cpuidle_driver *drv, + int idx, bool disable) { } static inline void cpuidle_unregister_driver(struct cpuidle_driver *drv) { } static inline int cpuidle_register_device(struct cpuidle_device *dev) {return -ENODEV; } From c55b51a06b01d67a99457bb82a8c31081c7faa23 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Sat, 16 Nov 2019 14:16:12 +0100 Subject: [PATCH 81/83] cpuidle: Allow idle injection to apply exit latency limit In some cases it may be useful to specify an exit latency limit for the idle state to be used during CPU idle time injection. Instead of duplicating the information in struct cpuidle_device or propagating the latency limit in the call stack, replace the use_deepest_state field with forced_latency_limit_ns to represent that limit, so that the deepest idle state with exit latency within that limit is forced (i.e. no governors) when it is set. A zero exit latency limit for forced idle means to use governors in the usual way (analogous to use_deepest_state equal to "false" before this change). Additionally, add play_idle_precise() taking two arguments, the duration of forced idle and the idle state exit latency limit, both in nanoseconds, and redefine play_idle() as a wrapper around that new function. This change is preparatory, no functional impact is expected. Suggested-by: Rafael J. Wysocki Signed-off-by: Daniel Lezcano [ rjw: Subject, changelog, cpuidle_use_deepest_state() kerneldoc, whitespace ] Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/cpuidle.c | 13 +++++++------ include/linux/cpu.h | 7 ++++++- include/linux/cpuidle.h | 6 +++--- kernel/sched/idle.c | 14 +++++++------- 4 files changed, 23 insertions(+), 17 deletions(-) diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index bf9b030cd7e1..12077db1158e 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -99,20 +99,21 @@ static int find_deepest_state(struct cpuidle_driver *drv, } /** - * cpuidle_use_deepest_state - Set/clear governor override flag. - * @enable: New value of the flag. + * cpuidle_use_deepest_state - Set/unset governor override mode. + * @latency_limit_ns: Idle state exit latency limit (or no override if 0). * - * Set/unset the current CPU to use the deepest idle state (override governors - * going forward if set). + * If @latency_limit_ns is nonzero, set the current CPU to use the deepest idle + * state with exit latency within @latency_limit_ns (override governors going + * forward), or do not override governors if it is zero. */ -void cpuidle_use_deepest_state(bool enable) +void cpuidle_use_deepest_state(u64 latency_limit_ns) { struct cpuidle_device *dev; preempt_disable(); dev = cpuidle_get_device(); if (dev) - dev->use_deepest_state = enable; + dev->forced_idle_latency_limit_ns = latency_limit_ns; preempt_enable(); } diff --git a/include/linux/cpu.h b/include/linux/cpu.h index d0633ebdaa9c..cc03a7848b63 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -179,7 +179,12 @@ void arch_cpu_idle_dead(void); int cpu_report_state(int cpu); int cpu_check_up_prepare(int cpu); void cpu_set_state_online(int cpu); -void play_idle(unsigned long duration_us); +void play_idle_precise(u64 duration_ns, u64 latency_ns); + +static inline void play_idle(unsigned long duration_us) +{ + play_idle_precise(duration_us * NSEC_PER_USEC, U64_MAX); +} #ifdef CONFIG_HOTPLUG_CPU bool cpu_wait_death(unsigned int cpu, int seconds); diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index afb6a573b46d..72b26ff1de4b 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -85,7 +85,6 @@ struct cpuidle_driver_kobj; struct cpuidle_device { unsigned int registered:1; unsigned int enabled:1; - unsigned int use_deepest_state:1; unsigned int poll_time_limit:1; unsigned int cpu; ktime_t next_hrtimer; @@ -93,6 +92,7 @@ struct cpuidle_device { int last_state_idx; u64 last_residency_ns; u64 poll_limit_ns; + u64 forced_idle_latency_limit_ns; struct cpuidle_state_usage states_usage[CPUIDLE_STATE_MAX]; struct cpuidle_state_kobj *kobjs[CPUIDLE_STATE_MAX]; struct cpuidle_driver_kobj *kobj_driver; @@ -216,7 +216,7 @@ extern int cpuidle_find_deepest_state(struct cpuidle_driver *drv, struct cpuidle_device *dev); extern int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev); -extern void cpuidle_use_deepest_state(bool enable); +extern void cpuidle_use_deepest_state(u64 latency_limit_ns); #else static inline int cpuidle_find_deepest_state(struct cpuidle_driver *drv, struct cpuidle_device *dev) @@ -224,7 +224,7 @@ static inline int cpuidle_find_deepest_state(struct cpuidle_driver *drv, static inline int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev) {return -ENODEV; } -static inline void cpuidle_use_deepest_state(bool enable) +static inline void cpuidle_use_deepest_state(u64 latency_limit_ns) { } #endif diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 1aa260702b38..cd05ffa0abfe 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -165,7 +165,7 @@ static void cpuidle_idle_call(void) * until a proper wakeup interrupt happens. */ - if (idle_should_enter_s2idle() || dev->use_deepest_state) { + if (idle_should_enter_s2idle() || dev->forced_idle_latency_limit_ns) { if (idle_should_enter_s2idle()) { rcu_idle_enter(); @@ -311,7 +311,7 @@ static enum hrtimer_restart idle_inject_timer_fn(struct hrtimer *timer) return HRTIMER_NORESTART; } -void play_idle(unsigned long duration_us) +void play_idle_precise(u64 duration_ns, u64 latency_ns) { struct idle_timer it; @@ -323,29 +323,29 @@ void play_idle(unsigned long duration_us) WARN_ON_ONCE(current->nr_cpus_allowed != 1); WARN_ON_ONCE(!(current->flags & PF_KTHREAD)); WARN_ON_ONCE(!(current->flags & PF_NO_SETAFFINITY)); - WARN_ON_ONCE(!duration_us); + WARN_ON_ONCE(!duration_ns); rcu_sleep_check(); preempt_disable(); current->flags |= PF_IDLE; - cpuidle_use_deepest_state(true); + cpuidle_use_deepest_state(latency_ns); it.done = 0; hrtimer_init_on_stack(&it.timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); it.timer.function = idle_inject_timer_fn; - hrtimer_start(&it.timer, ns_to_ktime(duration_us * NSEC_PER_USEC), + hrtimer_start(&it.timer, ns_to_ktime(duration_ns), HRTIMER_MODE_REL_PINNED); while (!READ_ONCE(it.done)) do_idle(); - cpuidle_use_deepest_state(false); + cpuidle_use_deepest_state(0); current->flags &= ~PF_IDLE; preempt_fold_need_resched(); preempt_enable(); } -EXPORT_SYMBOL_GPL(play_idle); +EXPORT_SYMBOL_GPL(play_idle_precise); void cpu_startup_entry(enum cpuhp_state state) { From 5aa9ba6312e36c18626e73506b92d1513d815435 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Sat, 16 Nov 2019 14:16:13 +0100 Subject: [PATCH 82/83] cpuidle: Pass exit latency limit to cpuidle_use_deepest_state() Modify cpuidle_use_deepest_state() to take an additional exit latency limit argument to be passed to find_deepest_idle_state() and make cpuidle_idle_call() pass dev->forced_idle_latency_limit_ns to it for forced idle. Suggested-by: Rafael J. Wysocki Signed-off-by: Daniel Lezcano [ rjw: Rebase and rearrange code, subject & changelog ] Signed-off-by: Rafael J. Wysocki --- drivers/cpuidle/cpuidle.c | 5 +++-- include/linux/cpuidle.h | 6 ++++-- kernel/sched/idle.c | 8 +++++++- 3 files changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 12077db1158e..569dbac443bd 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -123,9 +123,10 @@ void cpuidle_use_deepest_state(u64 latency_limit_ns) * @dev: cpuidle device for the given CPU. */ int cpuidle_find_deepest_state(struct cpuidle_driver *drv, - struct cpuidle_device *dev) + struct cpuidle_device *dev, + u64 latency_limit_ns) { - return find_deepest_state(drv, dev, U64_MAX, 0, false); + return find_deepest_state(drv, dev, latency_limit_ns, 0, false); } #ifdef CONFIG_SUSPEND diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index 72b26ff1de4b..2dbe46b7c213 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -213,13 +213,15 @@ static inline struct cpuidle_device *cpuidle_get_device(void) {return NULL; } #ifdef CONFIG_CPU_IDLE extern int cpuidle_find_deepest_state(struct cpuidle_driver *drv, - struct cpuidle_device *dev); + struct cpuidle_device *dev, + u64 latency_limit_ns); extern int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev); extern void cpuidle_use_deepest_state(u64 latency_limit_ns); #else static inline int cpuidle_find_deepest_state(struct cpuidle_driver *drv, - struct cpuidle_device *dev) + struct cpuidle_device *dev, + u64 latency_limit_ns) {return -ENODEV; } static inline int cpuidle_enter_s2idle(struct cpuidle_driver *drv, struct cpuidle_device *dev) diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index cd05ffa0abfe..fc9604ddd802 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -166,6 +166,8 @@ static void cpuidle_idle_call(void) */ if (idle_should_enter_s2idle() || dev->forced_idle_latency_limit_ns) { + u64 max_latency_ns; + if (idle_should_enter_s2idle()) { rcu_idle_enter(); @@ -176,12 +178,16 @@ static void cpuidle_idle_call(void) } rcu_idle_exit(); + + max_latency_ns = U64_MAX; + } else { + max_latency_ns = dev->forced_idle_latency_limit_ns; } tick_nohz_idle_stop_tick(); rcu_idle_enter(); - next_state = cpuidle_find_deepest_state(drv, dev); + next_state = cpuidle_find_deepest_state(drv, dev, max_latency_ns); call_cpuidle(drv, dev, next_state); } else { bool stop_tick = true; From 1992b66d2f55cf36a14072cfd977fdf4f0d2f2c2 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Tue, 19 Nov 2019 08:09:23 -0600 Subject: [PATCH 83/83] PM: Wrap documentation to fit in 80 columns Wrap to 80 columns. No textual change except to correct some "it's" that should be "its". Signed-off-by: Bjorn Helgaas Signed-off-by: Rafael J. Wysocki --- Documentation/power/drivers-testing.rst | 7 ++-- Documentation/power/freezing-of-tasks.rst | 35 ++++++++++--------- Documentation/power/opp.rst | 32 +++++++++-------- Documentation/power/pci.rst | 28 +++++++-------- Documentation/power/pm_qos_interface.rst | 26 +++++++------- Documentation/power/runtime_pm.rst | 4 +-- .../power/suspend-and-cpuhotplug.rst | 7 ++-- Documentation/power/swsusp.rst | 14 ++++---- 8 files changed, 80 insertions(+), 73 deletions(-) diff --git a/Documentation/power/drivers-testing.rst b/Documentation/power/drivers-testing.rst index e53f1999fc39..d77d2894f9fe 100644 --- a/Documentation/power/drivers-testing.rst +++ b/Documentation/power/drivers-testing.rst @@ -39,9 +39,10 @@ c) Compile the driver directly into the kernel and try the test modes of d) Attempt to hibernate with the driver compiled directly into the kernel in the "reboot", "shutdown" and "platform" modes. -e) Try the test modes of suspend (see: Documentation/power/basic-pm-debugging.rst, - 2). [As far as the STR tests are concerned, it should not matter whether or - not the driver is built as a module.] +e) Try the test modes of suspend (see: + Documentation/power/basic-pm-debugging.rst, 2). [As far as the STR tests are + concerned, it should not matter whether or not the driver is built as a + module.] f) Attempt to suspend to RAM using the s2ram tool with the driver loaded (see: Documentation/power/basic-pm-debugging.rst, 2). diff --git a/Documentation/power/freezing-of-tasks.rst b/Documentation/power/freezing-of-tasks.rst index ef110fe55e82..8bd693399834 100644 --- a/Documentation/power/freezing-of-tasks.rst +++ b/Documentation/power/freezing-of-tasks.rst @@ -215,30 +215,31 @@ VI. Are there any precautions to be taken to prevent freezing failures? Yes, there are. -First of all, grabbing the 'system_transition_mutex' lock to mutually exclude a piece of code -from system-wide sleep such as suspend/hibernation is not encouraged. -If possible, that piece of code must instead hook onto the suspend/hibernation -notifiers to achieve mutual exclusion. Look at the CPU-Hotplug code -(kernel/cpu.c) for an example. +First of all, grabbing the 'system_transition_mutex' lock to mutually exclude a +piece of code from system-wide sleep such as suspend/hibernation is not +encouraged. If possible, that piece of code must instead hook onto the +suspend/hibernation notifiers to achieve mutual exclusion. Look at the +CPU-Hotplug code (kernel/cpu.c) for an example. -However, if that is not feasible, and grabbing 'system_transition_mutex' is deemed necessary, -it is strongly discouraged to directly call mutex_[un]lock(&system_transition_mutex) since -that could lead to freezing failures, because if the suspend/hibernate code -successfully acquired the 'system_transition_mutex' lock, and hence that other entity failed -to acquire the lock, then that task would get blocked in TASK_UNINTERRUPTIBLE -state. As a consequence, the freezer would not be able to freeze that task, -leading to freezing failure. +However, if that is not feasible, and grabbing 'system_transition_mutex' is +deemed necessary, it is strongly discouraged to directly call +mutex_[un]lock(&system_transition_mutex) since that could lead to freezing +failures, because if the suspend/hibernate code successfully acquired the +'system_transition_mutex' lock, and hence that other entity failed to acquire +the lock, then that task would get blocked in TASK_UNINTERRUPTIBLE state. As a +consequence, the freezer would not be able to freeze that task, leading to +freezing failure. However, the [un]lock_system_sleep() APIs are safe to use in this scenario, since they ask the freezer to skip freezing this task, since it is anyway -"frozen enough" as it is blocked on 'system_transition_mutex', which will be released -only after the entire suspend/hibernation sequence is complete. -So, to summarize, use [un]lock_system_sleep() instead of directly using +"frozen enough" as it is blocked on 'system_transition_mutex', which will be +released only after the entire suspend/hibernation sequence is complete. So, to +summarize, use [un]lock_system_sleep() instead of directly using mutex_[un]lock(&system_transition_mutex). That would prevent freezing failures. V. Miscellaneous ================ /sys/power/pm_freeze_timeout controls how long it will cost at most to freeze -all user space processes or all freezable kernel threads, in unit of millisecond. -The default value is 20000, with range of unsigned integer. +all user space processes or all freezable kernel threads, in unit of +millisecond. The default value is 20000, with range of unsigned integer. diff --git a/Documentation/power/opp.rst b/Documentation/power/opp.rst index 209c7613f5a4..e3cc4f349ea8 100644 --- a/Documentation/power/opp.rst +++ b/Documentation/power/opp.rst @@ -73,19 +73,21 @@ factors. Example usage: Thermal management or other exceptional situations where SoC framework might choose to disable a higher frequency OPP to safely continue operations until that OPP could be re-enabled if possible. -OPP library facilitates this concept in it's implementation. The following +OPP library facilitates this concept in its implementation. The following operational functions operate only on available opps: -opp_find_freq_{ceil, floor}, dev_pm_opp_get_voltage, dev_pm_opp_get_freq, dev_pm_opp_get_opp_count +opp_find_freq_{ceil, floor}, dev_pm_opp_get_voltage, dev_pm_opp_get_freq, +dev_pm_opp_get_opp_count -dev_pm_opp_find_freq_exact is meant to be used to find the opp pointer which can then -be used for dev_pm_opp_enable/disable functions to make an opp available as required. +dev_pm_opp_find_freq_exact is meant to be used to find the opp pointer +which can then be used for dev_pm_opp_enable/disable functions to make an +opp available as required. WARNING: Users of OPP library should refresh their availability count using -get_opp_count if dev_pm_opp_enable/disable functions are invoked for a device, the -exact mechanism to trigger these or the notification mechanism to other -dependent subsystems such as cpufreq are left to the discretion of the SoC -specific framework which uses the OPP library. Similar care needs to be taken -care to refresh the cpufreq table in cases of these operations. +get_opp_count if dev_pm_opp_enable/disable functions are invoked for a +device, the exact mechanism to trigger these or the notification mechanism +to other dependent subsystems such as cpufreq are left to the discretion of +the SoC specific framework which uses the OPP library. Similar care needs +to be taken care to refresh the cpufreq table in cases of these operations. 2. Initial OPP List Registration ================================ @@ -99,11 +101,11 @@ OPPs dynamically using the dev_pm_opp_enable / disable functions. dev_pm_opp_add Add a new OPP for a specific domain represented by the device pointer. The OPP is defined using the frequency and voltage. Once added, the OPP - is assumed to be available and control of it's availability can be done - with the dev_pm_opp_enable/disable functions. OPP library internally stores - and manages this information in the opp struct. This function may be - used by SoC framework to define a optimal list as per the demands of - SoC usage environment. + is assumed to be available and control of its availability can be done + with the dev_pm_opp_enable/disable functions. OPP library + internally stores and manages this information in the opp struct. + This function may be used by SoC framework to define a optimal list + as per the demands of SoC usage environment. WARNING: Do not use this function in interrupt context. @@ -354,7 +356,7 @@ struct dev_pm_opp struct device This is used to identify a domain to the OPP layer. The - nature of the device and it's implementation is left to the user of + nature of the device and its implementation is left to the user of OPP library such as the SoC framework. Overall, in a simplistic view, the data structure operations is represented as diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst index 0e2ef7429304..51e0a493d284 100644 --- a/Documentation/power/pci.rst +++ b/Documentation/power/pci.rst @@ -426,12 +426,12 @@ pm->runtime_idle() callback. 2.4. System-Wide Power Transitions ---------------------------------- There are a few different types of system-wide power transitions, described in -Documentation/driver-api/pm/devices.rst. Each of them requires devices to be handled -in a specific way and the PM core executes subsystem-level power management -callbacks for this purpose. They are executed in phases such that each phase -involves executing the same subsystem-level callback for every device belonging -to the given subsystem before the next phase begins. These phases always run -after tasks have been frozen. +Documentation/driver-api/pm/devices.rst. Each of them requires devices to be +handled in a specific way and the PM core executes subsystem-level power +management callbacks for this purpose. They are executed in phases such that +each phase involves executing the same subsystem-level callback for every device +belonging to the given subsystem before the next phase begins. These phases +always run after tasks have been frozen. 2.4.1. System Suspend ^^^^^^^^^^^^^^^^^^^^^ @@ -636,12 +636,12 @@ System restore requires a hibernation image to be loaded into memory and the pre-hibernation memory contents to be restored before the pre-hibernation system activity can be resumed. -As described in Documentation/driver-api/pm/devices.rst, the hibernation image is loaded -into memory by a fresh instance of the kernel, called the boot kernel, which in -turn is loaded and run by a boot loader in the usual way. After the boot kernel -has loaded the image, it needs to replace its own code and data with the code -and data of the "hibernated" kernel stored within the image, called the image -kernel. For this purpose all devices are frozen just like before creating +As described in Documentation/driver-api/pm/devices.rst, the hibernation image +is loaded into memory by a fresh instance of the kernel, called the boot kernel, +which in turn is loaded and run by a boot loader in the usual way. After the +boot kernel has loaded the image, it needs to replace its own code and data with +the code and data of the "hibernated" kernel stored within the image, called the +image kernel. For this purpose all devices are frozen just like before creating the image during hibernation, in the prepare, freeze, freeze_noirq @@ -691,8 +691,8 @@ controlling the runtime power management of their devices. At the time of this writing there are two ways to define power management callbacks for a PCI device driver, the recommended one, based on using a -dev_pm_ops structure described in Documentation/driver-api/pm/devices.rst, and the -"legacy" one, in which the .suspend(), .suspend_late(), .resume_early(), and +dev_pm_ops structure described in Documentation/driver-api/pm/devices.rst, and +the "legacy" one, in which the .suspend(), .suspend_late(), .resume_early(), and .resume() callbacks from struct pci_driver are used. The legacy approach, however, doesn't allow one to define runtime power management callbacks and is not really suitable for any new drivers. Therefore it is not covered by this diff --git a/Documentation/power/pm_qos_interface.rst b/Documentation/power/pm_qos_interface.rst index 3097694fba69..0d62d506caf0 100644 --- a/Documentation/power/pm_qos_interface.rst +++ b/Documentation/power/pm_qos_interface.rst @@ -8,8 +8,8 @@ one of the parameters. Two different PM QoS frameworks are available: 1. PM QoS classes for cpu_dma_latency -2. the per-device PM QoS framework provides the API to manage the per-device latency -constraints and PM QoS flags. +2. The per-device PM QoS framework provides the API to manage the + per-device latency constraints and PM QoS flags. Each parameters have defined units: @@ -47,14 +47,14 @@ void pm_qos_add_request(handle, param_class, target_value): pm_qos API functions. void pm_qos_update_request(handle, new_target_value): - Will update the list element pointed to by the handle with the new target value - and recompute the new aggregated target, calling the notification tree if the - target is changed. + Will update the list element pointed to by the handle with the new target + value and recompute the new aggregated target, calling the notification tree + if the target is changed. void pm_qos_remove_request(handle): - Will remove the element. After removal it will update the aggregate target and - call the notification tree if the target was changed as a result of removing - the request. + Will remove the element. After removal it will update the aggregate target + and call the notification tree if the target was changed as a result of + removing the request. int pm_qos_request(param_class): Returns the aggregated value for a given PM QoS class. @@ -167,9 +167,9 @@ int dev_pm_qos_expose_flags(device, value) change the value of the PM_QOS_FLAG_NO_POWER_OFF flag. void dev_pm_qos_hide_flags(device) - Drop the request added by dev_pm_qos_expose_flags() from the device's PM QoS list - of flags and remove sysfs attribute pm_qos_no_power_off from the device's power - directory. + Drop the request added by dev_pm_qos_expose_flags() from the device's PM QoS + list of flags and remove sysfs attribute pm_qos_no_power_off from the device's + power directory. Notification mechanisms: @@ -179,8 +179,8 @@ int dev_pm_qos_add_notifier(device, notifier, type): Adds a notification callback function for the device for a particular request type. - The callback is called when the aggregated value of the device constraints list - is changed. + The callback is called when the aggregated value of the device constraints + list is changed. int dev_pm_qos_remove_notifier(device, notifier, type): Removes the notification callback function for the device. diff --git a/Documentation/power/runtime_pm.rst b/Documentation/power/runtime_pm.rst index 2c2ec99b5088..ab8406c84254 100644 --- a/Documentation/power/runtime_pm.rst +++ b/Documentation/power/runtime_pm.rst @@ -268,8 +268,8 @@ defined in include/linux/pm.h: `unsigned int runtime_auto;` - if set, indicates that the user space has allowed the device driver to power manage the device at run time via the /sys/devices/.../power/control - `interface;` it may only be modified with the help of the pm_runtime_allow() - and pm_runtime_forbid() helper functions + `interface;` it may only be modified with the help of the + pm_runtime_allow() and pm_runtime_forbid() helper functions `unsigned int no_callbacks;` - indicates that the device does not use the runtime PM callbacks (see diff --git a/Documentation/power/suspend-and-cpuhotplug.rst b/Documentation/power/suspend-and-cpuhotplug.rst index 7ac8e1f549f4..572d968c5375 100644 --- a/Documentation/power/suspend-and-cpuhotplug.rst +++ b/Documentation/power/suspend-and-cpuhotplug.rst @@ -106,8 +106,8 @@ execution during resume): * Release system_transition_mutex lock. -It is to be noted here that the system_transition_mutex lock is acquired at the very -beginning, when we are just starting out to suspend, and then released only +It is to be noted here that the system_transition_mutex lock is acquired at the +very beginning, when we are just starting out to suspend, and then released only after the entire cycle is complete (i.e., suspend + resume). :: @@ -165,7 +165,8 @@ Important files and functions/entry points: - kernel/power/process.c : freeze_processes(), thaw_processes() - kernel/power/suspend.c : suspend_prepare(), suspend_enter(), suspend_finish() -- kernel/cpu.c: cpu_[up|down](), _cpu_[up|down](), [disable|enable]_nonboot_cpus() +- kernel/cpu.c: cpu_[up|down](), _cpu_[up|down](), + [disable|enable]_nonboot_cpus() diff --git a/Documentation/power/swsusp.rst b/Documentation/power/swsusp.rst index d000312f6965..8524f079e05c 100644 --- a/Documentation/power/swsusp.rst +++ b/Documentation/power/swsusp.rst @@ -118,7 +118,8 @@ In a really perfect world:: echo 1 > /proc/acpi/sleep # for standby echo 2 > /proc/acpi/sleep # for suspend to ram - echo 3 > /proc/acpi/sleep # for suspend to ram, but with more power conservative + echo 3 > /proc/acpi/sleep # for suspend to ram, but with more power + # conservative echo 4 > /proc/acpi/sleep # for suspend to disk echo 5 > /proc/acpi/sleep # for shutdown unfriendly the system @@ -192,8 +193,8 @@ Q: A: The freezing of tasks is a mechanism by which user space processes and some - kernel threads are controlled during hibernation or system-wide suspend (on some - architectures). See freezing-of-tasks.txt for details. + kernel threads are controlled during hibernation or system-wide suspend (on + some architectures). See freezing-of-tasks.txt for details. Q: What is the difference between "platform" and "shutdown"? @@ -282,7 +283,8 @@ A: suspend(PMSG_FREEZE): devices are frozen so that they don't interfere with state snapshot - state snapshot: copy of whole used memory is taken with interrupts disabled + state snapshot: copy of whole used memory is taken with interrupts + disabled resume(): devices are woken up so that we can write image to swap @@ -353,8 +355,8 @@ Q: A: Generally, yes, you can. However, it requires you to use the "resume=" and - "resume_offset=" kernel command line parameters, so the resume from a swap file - cannot be initiated from an initrd or initramfs image. See + "resume_offset=" kernel command line parameters, so the resume from a swap + file cannot be initiated from an initrd or initramfs image. See swsusp-and-swap-files.txt for details. Q: