linux

Commit Graph

Author	SHA1	Message	Date
Daniel Lezcano	9a23fe65cf	cpuidle / kirkwood: remove redundant Kconfig option When the CPU_IDLE and the ARCH_KIRKWOOD options are set it is pointless to define a new option CPU_IDLE_KIRKWOOD because it is redundant. The Makefile drivers directory contains a condition to compile the cpuidle drivers: obj-$(CONFIG_CPU_IDLE) += cpuidle/ Hence, if CPU_IDLE is not set we won't enter this directory. This patch removes the useless Kconfig option and replaces the condition in the Makefile by CONFIG_ARCH_KIRKWOOD. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-01 01:10:27 +02:00
Daniel Lezcano	b60e6a0eb0	cpuidle : handle clockevent notify from the cpuidle framework When a cpu enters a deep idle state, the local timers are stopped and the time framework falls back to the timer device used as a broadcast timer. The different cpuidle drivers are calling clockevents_notify ENTER/EXIT when the idle state stops the local timer. Add a new flag CPUIDLE_FLAG_TIMER_STOP which can be set by the cpuidle drivers. If the flag is set, the cpuidle core code takes care of the notification on behalf of the driver to avoid pointless code duplication. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-01 01:10:27 +02:00
Linus Torvalds	bab588fcfb	arm-soc: soc-specific updates This is a larger set of new functionality for the existing SoC families, including: * vt8500 gains support for new CPU cores, notably the Cortex-A9 based wm8850 * prima2 gains support for the "marco" SoC family, its SMP based cousin * tegra gains support for the new Tegra4 (Tegra114) family * socfpga now supports a newer version of the hardware including SMP * i.mx31 and bcm2835 are now using DT probing for their clocks * lots of updates for sh-mobile * OMAP updates for clocks, power management and USB * i.mx6q and tegra now support cpuidle * kirkwood now supports PCIe hot plugging * tegra clock support is updated * tegra USB PHY probing gets implemented diffently -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIVAwUAUSUyPGCrR//JCVInAQI4YA/+Nb0FaA7qMmTPuJhm7aZNfnwBcGxZ7IZp s2xByEl3r5zbLKlKGNGE0x7Q7ETHV4y9tohzi9ZduH2b60dMRYgII06CEmDPu6/h 4vBap2oLzfWfs9hwpCIh7N9wNzxSj/R42vlXHhNmspHlw7cFk1yw5EeJ+ocxmZPq H9lyjAxsGErkZyM/xstNQ1Uvhc8XHAFSUzWrg8hvf6AVVR8hwpIqVzfIizv6Vpk6 ryBoUBHfdTztAOrafK54CdRc7l6kVMomRodKGzMyasnBK3ZfFca3IR7elnxLyEFJ uPDu5DKOdYrjXC8X2dPM6kYiE41YFuqOV2ahBt9HqRe6liNBLHQ6NAH7f7+jBWSI eeWe84c2vFaqhAGlci/xm4GaP0ud5ZLudtiVPlDY5tYIADqLygNcx1HIt/5sT7QI h34LMjc4+/TGVWTVf5yRmIzTrCXZv5YoAak3UWFoM4nVBo/eYVyNLEt5g9YsfjrC P/GWrXJJvOCB3gAi31pgGYJzZg8K7kTTAh/dgxjqzU4f6nGRm5PBydiJe18/lWkH qtfNE0RbhxCi3JEBnxW48AIEndVSRbd7jf8upC/s9rPURtFSVXp4APTHVyNUKCip gojBxcRYtesyG/53nrwdTyiyHx6GocmWnMNZJoDo0UQEkog2dOef+StdC3zhc2Vm 9EttcFqWJ+E= =PRrg -----END PGP SIGNATURE----- Merge tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC-specific updates from Arnd Bergmann: "This is a larger set of new functionality for the existing SoC families, including: - vt8500 gains support for new CPU cores, notably the Cortex-A9 based wm8850 - prima2 gains support for the "marco" SoC family, its SMP based cousin - tegra gains support for the new Tegra4 (Tegra114) family - socfpga now supports a newer version of the hardware including SMP - i.mx31 and bcm2835 are now using DT probing for their clocks - lots of updates for sh-mobile - OMAP updates for clocks, power management and USB - i.mx6q and tegra now support cpuidle - kirkwood now supports PCIe hot plugging - tegra clock support is updated - tegra USB PHY probing gets implemented diffently" * tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (148 commits) ARM: prima2: remove duplicate v7_invalidate_l1 ARM: shmobile: r8a7779: Correct TMU clock support again ARM: prima2: fix __init section for cpu hotplug ARM: OMAP: Consolidate OMAP USB-HS platform data (part 3/3) ARM: OMAP: Consolidate OMAP USB-HS platform data (part 1/3) arm: socfpga: Add SMP support for actual socfpga harware arm: Add v7_invalidate_l1 to cache-v7.S arm: socfpga: Add entries to enable make dtbs socfpga arm: socfpga: Add new device tree source for actual socfpga HW ARM: tegra: sort Kconfig selects for Tegra114 ARM: tegra: enable ARCH_REQUIRE_GPIOLIB for Tegra114 ARM: tegra: Fix build error w/ ARCH_TEGRA_114_SOC w/o ARCH_TEGRA_3x_SOC ARM: tegra: Fix build error for gic update ARM: tegra: remove empty tegra_smp_init_cpus() ARM: shmobile: Register ARM architected timer ARM: MARCO: fix the build issue due to gic-vic-to-irqchip move ARM: shmobile: r8a7779: Correct TMU clock support ARM: mxs_defconfig: Select CONFIG_DEVTMPFS_MOUNT ARM: mxs: decrease mxs_clockevent_device.min_delta_ns to 2 clock cycles ARM: mxs: use apbx bus clock to drive the timers on timrotv2 ...	2013-02-21 15:27:22 -08:00
Andrew Lunn	9cfc94eb0f	cpuidle: kirkwood: Move out of mach directory Move the Kirkwood cpuidle driver out of arch/arm/mach-kirkwood and into drivers/cpuidle. Convert the driver into a platform driver. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jason Cooper <jason@lakedaemon.net>	2013-01-31 17:01:37 +00:00
Paul Gortmaker	43720bd601	PM / tracing: remove deprecated power trace API The text in Documentation said it would be removed in 2.6.41; the text in the Kconfig said removal in the 3.1 release. Either way you look at it, we are well past both, so push it off a cliff. Note that the POWER_CSTATE and the POWER_PSTATE are part of the legacy tracing API. Remove all tracepoints which use these flags. As can be seen from context, most already have a trace entry via trace_cpu_idle anyways. Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as compared to the CSTATE ones which all have a clear start/stop. As part of this, the trace_power_frequency also becomes orphaned, so it too is deleted. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-01-26 00:39:12 +01:00
Daniel Lezcano	8aef33a7cf	cpuidle: remove the power_specified field in the driver We realized that the power usage field is never filled and when it is filled for tegra, the power_specified flag is not set causing all of these values to be reset when the driver is initialized with set_power_state(). However, the power_specified flag can be simply removed under the assumption that the states are always backward sorted, which is the case with the current code. This change allows the menu governor select function and the cpuidle_play_dead() to be simplified. Moreover, the set_power_states() function can removed as it does not make sense any more. Drop the power_specified flag from struct cpuidle_driver and make the related changes as described above. As a consequence, this also fixes the bug where on the dynamic C-states system, the power fields are not initialized. [rjw: Changelog] References: https://bugzilla.kernel.org/show_bug.cgi?id=42870 References: https://bugzilla.kernel.org/show_bug.cgi?id=43349 References: https://lkml.org/lkml/2012/10/16/518 Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-01-15 14:18:04 +01:00
Krzysztof Mazur	392370e7aa	cpuidle: fix number of initialized/destroyed states Commit `bf4d1b5ddb` (cpuidle: support multiple drivers) changed the number of initialized state kobjects in cpuidle_add_state_sysfs() from device->state_count to drv->state_count, but left device->state_count in cpuidle_remove_state_sysfs(). The values of these two fields may be different, in which case a NULL pointer dereference may happen in cpuidle_remove_state_sysfs(), for example. Fix this problem by making cpuidle_add_state_sysfs() use device->state_count too (which restores the original behavior of it). [rjw: Changelog] Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-01-11 23:20:09 +01:00
Daniel Lezcano	ac34d7c8c8	cpuidle: fix lock contention in the idle path Commit `bf4d1b5` (cpuidle: support multiple drivers) introduced locking in cpuidle_get_cpu_driver(), which is used in the idle_call() function. This leads to a contention problem with a large number of CPUs, because they all try to run the idle routine at the same time. The lock can be safely removed because of how is used the cpuidle API. Namely, cpuidle_register_driver() is called first, but the cpuidle idle function is not entered before cpuidle_register_device() is called, because the cpuidle device is not enabled then. Moreover, cpuidle_unregister_driver(), which would reset the driver value to NULL, is not called before cpuidle_unregister_device(). All of the cpuidle drivers use the API in the same way. In general, a cleanup around the lock is necessary and a proper refcounting mechanism should be used to ensure the consistency in the API (for example, cpuidle_unregister_driver() should fail if the driver's refcount is not 0). However, these modifications will require some code reorganization and rewrite which will be too intrusive for a fix. For this reason, fix the contention problem introduced by commit `bf4d1b5` by simply removing the locking from cpuidle_get_cpu_driver(), which restores the original behavior of that routine. [rjw: Changelog.] Reported-and-tested-by: Russ Anderson <rja@sgi.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-01-03 13:11:06 +01:00
Sivaram Nair	92638e2fac	cpuidle / coupled: fix ready counter decrement The ready_waiting_counts atomic variable is compared against the wrong online cpu count. The latter is computed incorrectly using logical-OR instead of bit-OR. This patch fixes that. Signed-off-by: Sivaram Nair <sivaramn@nvidia.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Acked-by: Colin Cross <ccross@android.com> Cc: <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-01-03 13:11:05 +01:00
Sivaram Nair	0e5537b30d	cpuidle: Fix finding state with min power_usage Since cpuidle_state.power_usage is a signed value, use INT_MAX (instead of -1) to init the local copies so that functions that tries to find cpuidle states with minimum power usage works correctly even if they use non-negative values. Signed-off-by: Sivaram Nair <sivaramn@nvidia.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-01-03 13:11:05 +01:00
Linus Torvalds	d027db132b	ARM: arm-soc: SoC updates for 3.8 This contains the bulk of new SoC development for this merge window. Two new platforms have been added, the sunxi platforms (Allwinner A1x SoCs) by Maxime Ripard, and a generic Broadcom platform for a new series of ARMv7 platforms from them, where the hope is that we can keep the platform code generic enough to have them all share one mach directory. The new Broadcom platform is contributed by Christian Daudt. Highbank has grown support for Calxeda's next generation of hardware, ECX-2000. clps711x has seen a lot of cleanup from Alexander Shiyan, and he's also taken on maintainership of the platform. Beyond this there has been a bunch of work from a number of people on converting more platforms to IRQ domains, pinctrl conversion, cleanup and general feature enablement across most of the active platforms. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJQyLCjAAoJEIwa5zzehBx3AdQP/R+L3+EQMjiEWt/p7g/ql5Em 0SnP92CcGzrjgLTg9z1FeOazfOsGnkZAYUlDRkqfKobH3VqkhYFFtt1/0x0KMahm xcowHgMBOyimFdWT9vLK3J8U6DLui5XrEG9LGH2VL+lqmfjIyP/OOF3mVc0/+pV9 WTLAsYswdBRSeiNuF43kqlfrOwF6xsPLgiNMlc82w6BzHqoHu6dOif5M9MqWaApS V74DPmwLD371Tyit6aHqt3JOqpgiPSHlmxkzomK+5idcW3Pa7HnzzFYmx85dk/eN J2siqIkoOu7tEfjIbNZTL2MYoX4tUUKv4qZZ3IOl3YSWaV3P5ilMApF01XVrkk8E DWOMhzte9hC7L90W+/kCPLF1VyeAhCem2KQWUitO71fKur3r+3ZaUokNVvWzkJIL 7aduxAJOV2hfLgEqbjbjF3o4S8p63OV3kzivFJM1And15zDJo4+qqOh67+bPo4jj +R4du+SqzXriw4i3tDLGVpdjDffk4D41tbLzgkWAtvGyoP45yeYfHAzAh0pDFPRv ASfZVmZ5PhwAUAkIMnpC2sjgmxMYff3SYqmDgnsqXES7rbDH/hG+teymtHFTyUQp m+f60DNotSMcMvkLdvruLSB4aeTiwbfOqPn/g+aXYUlPuNMq1fVWgN7EJKWkamK4 nRwaJmLwx1/ojcVbpy2G =YMKB -----END PGP SIGNATURE----- Merge tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC updates from Olof Johansson: "This contains the bulk of new SoC development for this merge window. Two new platforms have been added, the sunxi platforms (Allwinner A1x SoCs) by Maxime Ripard, and a generic Broadcom platform for a new series of ARMv7 platforms from them, where the hope is that we can keep the platform code generic enough to have them all share one mach directory. The new Broadcom platform is contributed by Christian Daudt. Highbank has grown support for Calxeda's next generation of hardware, ECX-2000. clps711x has seen a lot of cleanup from Alexander Shiyan, and he's also taken on maintainership of the platform. Beyond this there has been a bunch of work from a number of people on converting more platforms to IRQ domains, pinctrl conversion, cleanup and general feature enablement across most of the active platforms." Fix up trivial conflicts as per Olof. * tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (174 commits) mfd: vexpress-sysreg: Remove LEDs code irqchip: irq-sunxi: Add terminating entry for sunxi_irq_dt_ids clocksource: sunxi_timer: Add terminating entry for sunxi_timer_dt_ids irq: versatile: delete dangling variable ARM: sunxi: add missing include for mdelay() ARM: EXYNOS: Avoid early use of of_machine_is_compatible() ARM: dts: add node for PL330 MDMA1 controller for exynos4 ARM: EXYNOS: Add support for secondary CPU bring-up on Exynos4412 ARM: EXYNOS: add UART3 to DEBUG_LL ports ARM: S3C24XX: Add clkdev entry for camif-upll clock ARM: SAMSUNG: Add s3c24xx/s3c64xx CAMIF GPIO setup helpers ARM: sunxi: Add missing sun4i.dtsi file pinctrl: samsung: Do not initialise statics to 0 ARM i.MX6: remove gate_mask from pllv3 ARM i.MX6: Fix ethernet PLL clocks ARM i.MX6: rename PLLs according to datasheet ARM i.MX6: Add pwm support ARM i.MX51: Add pwm support ARM i.MX53: Add pwm support ARM: mx5: Replace clk_register_clkdev with clock DT lookup ...	2012-12-12 12:05:15 -08:00
Julius Werner	a474a51549	cpuidle: Measure idle state durations with monotonic clock Many cpuidle drivers measure their time spent in an idle state by reading the wallclock time before and after idling and calculating the difference. This leads to erroneous results when the wallclock time gets updated by another processor in the meantime, adding that clock adjustment to the idle state's time counter. If the clock adjustment was negative, the result is even worse due to an erroneous cast from int to unsigned long long of the last_residency variable. The negative 32 bit integer will zero-extend and result in a forward time jump of roughly four billion milliseconds or 1.3 hours on the idle state residency counter. This patch changes all affected cpuidle drivers to either use the monotonic clock for their measurements or make use of the generic time measurement wrapper in cpuidle.c, which was already working correctly. Some superfluous CLIs/STIs in the ACPI code are removed (interrupts should always already be disabled before entering the idle function, and not get reenabled until the generic wrapper has performed its second measurement). It also removes the erroneous cast, making sure that negative residency values are applied correctly even though they should not appear anymore. Signed-off-by: Julius Werner <jwerner@chromium.org> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-27 14:17:58 +01:00
Li Zhong	a093b93ee0	cpuidle: fix a suspicious RCU usage in menu governor I saw this suspicious RCU usage on the next tree of 11/15 [ 67.123404] =============================== [ 67.123413] [ INFO: suspicious RCU usage. ] [ 67.123423] 3.7.0-rc5-next-20121115-dirty #1 Not tainted [ 67.123434] ------------------------------- [ 67.123444] include/trace/events/timer.h:186 suspicious rcu_dereference_check() usage! [ 67.123458] [ 67.123458] other info that might help us debug this: [ 67.123458] [ 67.123474] [ 67.123474] RCU used illegally from idle CPU! [ 67.123474] rcu_scheduler_active = 1, debug_locks = 0 [ 67.123493] RCU used illegally from extended quiescent state! [ 67.123507] 1 lock held by swapper/1/0: [ 67.123516] #0: (&cpu_base->lock){-.-...}, at: [<c0000000000979b0>] .__hrtimer_start_range_ns+0x28c/0x524 [ 67.123555] [ 67.123555] stack backtrace: [ 67.123566] Call Trace: [ 67.123576] [c0000001e2ccb920] [c00000000001275c] .show_stack+0x78/0x184 (unreliable) [ 67.123599] [c0000001e2ccb9d0] [c0000000000c15a0] .lockdep_rcu_suspicious+0x120/0x148 [ 67.123619] [c0000001e2ccba70] [c00000000009601c] .enqueue_hrtimer+0x1c0/0x1c8 [ 67.123639] [c0000001e2ccbb00] [c000000000097aa0] .__hrtimer_start_range_ns+0x37c/0x524 [ 67.123660] [c0000001e2ccbc20] [c0000000005c9698] .menu_select+0x508/0x5bc [ 67.123678] [c0000001e2ccbd20] [c0000000005c740c] .cpuidle_idle_call+0xa8/0x6e4 [ 67.123699] [c0000001e2ccbdd0] [c0000000000459a0] .pSeries_idle+0x10/0x34 [ 67.123717] [c0000001e2ccbe40] [c000000000014dc8] .cpu_idle+0x130/0x280 [ 67.123738] [c0000001e2ccbee0] [c0000000006ffa8c] .start_secondary+0x378/0x384 [ 67.123758] [c0000001e2ccbf90] [c00000000000936c] .start_secondary_prolog+0x10/0x14 hrtimer_start was added in 198fd638 and ae515197. The patch below tries to use RCU_NONIDLE around it to avoid the above report. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-23 00:05:03 +01:00
Daniel Lezcano	bf4d1b5ddb	cpuidle: support multiple drivers With the tegra3 and the big.LITTLE [1] new architectures, several cpus with different characteristics (latencies and states) can co-exists on the system. The cpuidle framework has the limitation of handling only identical cpus. This patch removes this limitation by introducing the multiple driver support for cpuidle. This option is configurable at compile time and should be enabled for the architectures mentioned above. So there is no impact for the other platforms if the option is disabled. The option defaults to 'n'. Note the multiple drivers support is also compatible with the existing drivers, even if just one driver is needed, all the cpu will be tied to this driver using an extra small chunk of processor memory. The multiple driver support use a per-cpu driver pointer instead of a global variable and the accessor to this variable are done from a cpu context. In order to keep the compatibility with the existing drivers, the function 'cpuidle_register_driver' and 'cpuidle_unregister_driver' will register the specified driver for all the cpus. The semantic for the output of /sys/devices/system/cpu/cpuidle/current_driver remains the same except the driver name will be related to the current cpu. The /sys/devices/system/cpu/cpu[0-9]/cpuidle/driver/name files are added allowing to read the per cpu driver name. [1] http://lwn.net/Articles/481055/ Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:23 +01:00
Daniel Lezcano	13dd52f11a	cpuidle: prepare the cpuidle core to handle multiple drivers This patch is a preparation for the multiple cpuidle drivers support. As the next patch will introduce the multiple drivers with the Kconfig option and we want to keep the code clean and understandable, this patch defines a set of functions for encapsulating some common parts and splits what should be done under a lock from the rest. [rjw: Modified the subject and changelog slightly.] Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:22 +01:00
Daniel Lezcano	4168203271	cpuidle: move driver checking within the lock section The code is racy and the check with cpuidle_curr_driver should be done under the lock. I don't find a path in the different drivers where that could happen because the arch specific drivers are written in such way it is not possible to register a driver while it is unregistered, except maybe in a very improbable case when "intel_idle" and "processor_idle" are competing. One could unregister a driver, while the other one is registering. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:22 +01:00
Daniel Lezcano	42f67f2aca	cpuidle: move driver's refcount to cpuidle We want to support different cpuidle drivers co-existing together. In this case we should move the refcount to the cpuidle_driver structure to handle several drivers at a time. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:22 +01:00
Daniel Lezcano	8f3e9953e1	cpuidle: fixup device.h header in cpuidle.h The "struct device" is only used in sysfs.c. The other .c files including the private header "cpuidle.h" do not need to pull the entire headers tree from there as they don't manipulate the "struct device". This patch fixes this by moving the header inclusion to sysfs.c and adding a forward declaration for the struct device. The number of lines generated by the preprocesor: Without this patch : 17269 loc With this patch : 16446 loc Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:21 +01:00
Daniel Lezcano	349631e0e4	cpuidle / sysfs: move structure declaration into the sysfs.c file The structure cpuidle_state_kobj is not used anywhere except in the sysfs.c file. The definition of this structure is not needed in the cpuidle header file. This patch moves it to the sysfs.c file in order to encapsulate the code a bit more. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:21 +01:00
Youquan Song	c96ca4fb76	cpuidle: Get typical recent sleep interval The function detect_repeating_patterns was not very useful for workloads with alternating long and short pauses, for example virtual machines handling network requests for each other (say a web and database server). Instead, try to find a recent sleep interval that is somewhere between the median and the mode sleep time, by discarding outliers to the up side and recalculating the average and standard deviation until that is no longer required. This should do something sane with a sleep interval series like: 200 180 210 10000 30 1000 170 200 The current code would simply discard such a series, while the new code will guess a typical sleep interval just shy of 200. The original patch come from Rik van Riel <riel@redhat.com>. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:20 +01:00
Youquan Song	d73d68dc49	cpuidle: Set residency to 0 if target Cstate not enter When cpuidle governor choose a C-state to enter for idle CPU, but it notice that there is tasks request to be executed. So the idle CPU will not really enter the target C-state and go to run task. In this situation, it will use the residency of previous really entered target C-states. Obviously, it is not reasonable. So, this patch fix it by set the target C-state residency to 0. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:20 +01:00
Youquan Song	e11538d1f0	cpuidle: Quickly notice prediction failure in general case The prediction for future is difficult and when the cpuidle governor prediction fails and govenor possibly choose the shallower C-state than it should. How to quickly notice and find the failure becomes important for power saving. The patch extends to general case that prediction logic get a small predicted residency, so it choose a shallow C-state though the expected residency is large . Once the prediction will be fail, the CPU will keep staying at shallow C-state for a long time. Acutally, the CPU has change enter into deep C-state. So when the expected residency is long enough but governor choose a shallow C-state, an timer will be added in order to monitor if the prediction failure. When C-state is waken up prior to the adding timer, the timer will be cancelled initiatively. When the timer is triggered and menu governor will quickly notice prediction failure and re-evaluates deeper C-states possibility. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:20 +01:00
Youquan Song	69a37beabf	cpuidle: Quickly notice prediction failure for repeat mode The prediction for future is difficult and when the cpuidle governor prediction fails and govenor possibly choose the shallower C-state than it should. How to quickly notice and find the failure becomes important for power saving. cpuidle menu governor has a method to predict the repeat pattern if there are 8 C-states residency which are continuous and the same or very close, so it will predict the next C-states residency will keep same residency time. There is a real case that turbostat utility (tools/power/x86/turbostat) at kernel 3.3 or early. turbostat utility will read 10 registers one by one at Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu governor will predict it is repeat mode and there is another IPI wake up idle CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally idle. However, in the turbostat, following 10 registers reading is sleep 5 seconds by default, so the idle CPU will keep at C1 for a long time though it is idle until break event occurs. In a idle Sandybridge system, run "./turbostat -v", we will notice that deep C-state dangles between "70% ~ 99%". After patched the kernel, we will notice deep C-state stays at >99.98%. In the patch, a timer is added when menu governor detects a repeat mode and choose a shallow C-state. The timer is set to a time out value that greater than predicted time, and we conclude repeat mode prediction failure if timer is triggered. When repeat mode happens as expected, the timer is not triggered and CPU waken up from C-states and it will cancel the timer initiatively. When repeat mode does not happen, the timer will be time out and menu governor will quickly notice that the repeat mode prediction fails and then re-evaluates deeper C-states possibility. Below is another case which will clearly show the patch much benefit: #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <signal.h> #include <sys/time.h> #include <time.h> #include <pthread.h> volatile int * shutdown; volatile long * count; int delay = 20; int loop = 8; void usage(void) { fprintf(stderr, "Usage: idle_predict [options]\n" " --help -h Print this help\n" " --thread -n Thread number\n" " --loop -l Loop times in shallow Cstate\n" " --delay -t Sleep time (uS)in shallow Cstate\n"); } void simple_loop() { int idle_num = 1; while (!(shutdown)) { count = count + 1; if (idle_num % loop) usleep(delay); else { /* sleep 1 second / usleep(1000000); idle_num = 0; } idle_num++; } } static void sighand(int sig) { shutdown = 1; } int main(int argc, char argv[]) { sigset_t sigset; int signum = SIGALRM; int i, c, er = 0, thread_num = 8; pthread_t pt[1024]; static char optstr[] = "n:l:t:h:"; while ((c = getopt(argc, argv, optstr)) != EOF) switch (c) { case 'n': thread_num = atoi(optarg); break; case 'l': loop = atoi(optarg); break; case 't': delay = atoi(optarg); break; case 'h': default: usage(); exit(1); } printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay); count = malloc(sizeof(long)); shutdown = malloc(sizeof(int)); count = 0; *shutdown = 0; sigemptyset(&sigset); sigaddset(&sigset, signum); sigprocmask (SIG_BLOCK, &sigset, NULL); signal(SIGINT, sighand); signal(SIGTERM, sighand); for(i = 0; i < thread_num ; i++) pthread_create(&pt[i], NULL, simple_loop, NULL); for (i = 0; i < thread_num; i++) pthread_join(pt[i], NULL); exit(0); } Get powertop V2 from git://github.com/fenrus75/powertop, build powertop. After build the above test application, then run it. Test plaform can be Intel Sandybridge or other recent platforms. #./idle_predict -l 10 & #./powertop We will find that deep C-state will dangle between 40%~100% and much time spent on C1 state. It is because menu governor wrongly predict that repeat mode is kept, so it will choose the C1 shallow C-state even though it has chance to sleep 1 second in deep C-state. While after patched the kernel, we find that deep C-state will keep >99.6%. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:19 +01:00
Daniel Lezcano	e45a00d679	cpuidle / sysfs: move kobj initialization in the syfs file Move the kobj initialization and completion in the sysfs.c and encapsulate the code more. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:19 +01:00
Daniel Lezcano	1aef40e288	cpuidle / sysfs: change function parameter The function needs the cpuidle_device which is initially passed to the caller. The current code gets the struct device from the struct cpuidle_device, pass it the cpuidle_add_sysfs function. This function calls per_cpu(cpuidle_devices, cpu) to get the cpuidle_device. This patch pass the cpuidle_device instead and simplify the code. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:19 +01:00
Rob Herring	be6a98d3f0	cpuidle: add Calxeda SOC idle support Add support for core powergating on Calxeda platforms. Initially, this supports ECX-1000 (highbank), but support will be added for ECX-2000 later. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Cc: Len Brown <len.brown@intel.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl>	2012-11-07 17:15:36 -06:00
Srivatsa S. Bhat	cf31cd1a0c	ACPI idle, CPU hotplug: Fix NULL pointer dereference during hotplug On a KVM guest, when a CPU is taken offline and brought back online, we hit the following NULL pointer dereference: [ 45.400843] Unregister pv shared memory for cpu 1 [ 45.412331] smpboot: CPU 1 is now offline [ 45.529894] SMP alternatives: lockdep: fixing up alternatives [ 45.533472] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 45.411526] kvm-clock: cpu 1, msr 0:7d14601, secondary cpu clock [ 45.571370] KVM setup async PF for cpu 1 [ 45.572331] kvm-stealtime: cpu 1, msr 7d0e040 [ 45.575031] BUG: unable to handle kernel NULL pointer dereference at (null) [ 45.576017] IP: [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80 [ 45.576017] PGD 5dfb067 PUD 5da8067 PMD 0 [ 45.576017] Oops: 0000 [#1] SMP [ 45.576017] Modules linked in: [ 45.576017] CPU 0 [ 45.576017] Pid: 607, comm: stress_cpu_hotp Not tainted 3.6.0-padata-tp-debug #3 Bochs Bochs [ 45.576017] RIP: 0010:[<ffffffff81519f98>] [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80 [ 45.576017] RSP: 0018:ffff880005d93ce8 EFLAGS: 00010286 [ 45.576017] RAX: ffff880005d93fd8 RBX: 0000000000000000 RCX: 0000000000000006 [ 45.576017] RDX: 0000000000000006 RSI: 2222222222222222 RDI: 0000000000000000 [ 45.576017] RBP: ffff880005d93cf8 R08: 2222222222222222 R09: 2222222222222222 [ 45.576017] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 45.576017] R13: 0000000000000000 R14: ffffffff81c8cca0 R15: 0000000000000001 [ 45.576017] FS: 00007f91936ae700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000 [ 45.576017] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 45.576017] CR2: 0000000000000000 CR3: 0000000005db3000 CR4: 00000000000006f0 [ 45.576017] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 45.576017] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 45.576017] Process stress_cpu_hotp (pid: 607, threadinfo ffff880005d92000, task ffff8800066bbf40) [ 45.576017] Stack: [ 45.576017] ffff880007a96400 0000000000000000 ffff880005d93d28 ffffffff813ac689 [ 45.576017] ffff880007a96400 ffff880007a96400 0000000000000002 ffffffff81cd8d01 [ 45.576017] ffff880005d93d58 ffffffff813aa498 0000000000000001 00000000ffffffdd [ 45.576017] Call Trace: [ 45.576017] [<ffffffff813ac689>] acpi_processor_hotplug+0x55/0x97 [ 45.576017] [<ffffffff813aa498>] acpi_cpu_soft_notify+0x93/0xce [ 45.576017] [<ffffffff816ae47d>] notifier_call_chain+0x5d/0x110 [ 45.576017] [<ffffffff8109730e>] __raw_notifier_call_chain+0xe/0x10 [ 45.576017] [<ffffffff81069050>] __cpu_notify+0x20/0x40 [ 45.576017] [<ffffffff81069085>] cpu_notify+0x15/0x20 [ 45.576017] [<ffffffff816978f1>] _cpu_up+0xee/0x137 [ 45.576017] [<ffffffff81697983>] cpu_up+0x49/0x59 [ 45.576017] [<ffffffff8168758d>] store_online+0x9d/0xe0 [ 45.576017] [<ffffffff8140a9f8>] dev_attr_store+0x18/0x30 [ 45.576017] [<ffffffff812322c0>] sysfs_write_file+0xe0/0x150 [ 45.576017] [<ffffffff811b389c>] vfs_write+0xac/0x180 [ 45.576017] [<ffffffff811b3be2>] sys_write+0x52/0xa0 [ 45.576017] [<ffffffff816b31e9>] system_call_fastpath+0x16/0x1b [ 45.576017] Code: 48 c7 c7 40 e5 ca 81 e8 07 d0 18 00 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 48 89 fb <f6> 07 02 75 13 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 84 00 00 [ 45.576017] RIP [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80 [ 45.576017] RSP <ffff880005d93ce8> [ 45.576017] CR2: 0000000000000000 [ 45.656079] ---[ end trace 433d6c9ac0b02cef ]--- Analysis: Commit `3d339dc` (cpuidle / ACPI : move cpuidle_device field out of the acpi_processor_power structure()) made the allocation of the dev structure (struct cpuidle) of a CPU dynamic, whereas previously it was statically allocated. And this dynamic allocation occurs in acpi_processor_power_init() if pr->flags.power evaluates to non-zero. On KVM guests, pr->flags.power evaluates to zero, hence dev is never allocated. This causes the NULL pointer (dev) dereference in cpuidle_disable_device() during a subsequent CPU online operation. Fix this by ensuring that dev is non-NULL before dereferencing. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Len Brown <len.brown@intel.com>	2012-10-08 22:52:54 -04:00
Daniel Lezcano	ed953472d1	cpuidle: rename function name "__cpuidle_register_driver", v2 The function __cpuidle_register_driver name is confusing because it suggests, conforming to the coding style of the kernel, it registers the driver without taking a lock. Actually, it just fill the different power field states with a decresing value if the power has not been specified. Clarify the purpose of the function by changing its name and move the condition out of this function. This patch fix nothing and does not change the behavior of the function. It is just for the sake of clarity. IHMO, reading in the code: + if (!drv->power_specified) + set_power_states(drv); is much more explicit than: - __cpuidle_register_driver(drv); Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-09-22 00:38:32 +02:00
Daniel Lezcano	a77de28662	cpuidle: remove some empty lines This mindless patch is just about removing some trailing carriage returns. [rjw: Changed the subject.] Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-09-19 21:59:42 +02:00
Rafael J. Wysocki	66804c13f7	PM / cpuidle: Make ladder governor use the "disabled" state flag For the mechanism introduced by commit `cbc9ef0` (PM / Domains: Add preliminary support for cpuidle, v2) to work with the ladder governor, that governor should respect the "disabled" state flag added by that commit. Change the ladder governor accordingly. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-09-04 01:35:45 +02:00
Carsten Emde	62d6ae880e	Honor state disabling in the cpuidle ladder governor There are two cpuidle governors ladder and menu. While the ladder governor is always available, if CONFIG_CPU_IDLE is selected, the menu governor additionally requires CONFIG_NO_HZ. A particular C state can be disabled by writing to the sysfs file /sys/devices/system/cpu/cpuN/cpuidle/stateN/disable, but this mechanism is only implemented in the menu governor. Thus, in a system where CONFIG_NO_HZ is not selected, the ladder governor becomes default and always will walk through all sleep states - irrespective of whether the C state was disabled via sysfs or not. The only way to select a specific C state was to write the related latency to /dev/cpu_dma_latency and keep the file open as long as this setting was required - not very practical and not suitable for setting a single core in an SMP system. With this patch, the ladder governor only will promote to the next C state, if it has not been disabled, and it will demote, if the current C state was disabled. Note that the patch does not make the setting of the sysfs variable "disable" coherent, i.e. if one is disabling a light state, then all deeper states are disabled as well, but the "disable" variable does not reflect it. Likewise, if one enables a deep state but a lighter state still is disabled, then this has no effect. A related section has been added to the documentation. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-09-04 01:35:44 +02:00
Jon Medhurst (Tixy)	5fbbb90dfd	cpuidle: Prevent null pointer dereference in cpuidle_coupled_cpu_notify When a kernel is built to support multiple hardware types it's possible that CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is set but the hardware the kernel is run on doesn't support cpuidle and therefore doesn't load a driver for it. In this case, when the system is shut down, cpuidle_coupled_cpu_notify() gets called with cpuidle_devices set to NULL. There are quite possibly other circumstances where this situation can also occur and we should check for it. Signed-off-by: Jon Medhurst <tixy@linaro.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-08-17 19:37:08 +02:00
Colin Cross	63c6ba4352	cpuidle: coupled: fix sleeping while atomic in cpu notifier The cpu hotplug notifier gets called in both atomic and non-atomic contexts, it is not always safe to lock a mutex. Filter out all events except the six necessary ones, which are all sleepable, before taking the mutex. Signed-off-by: Colin Cross <ccross@android.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-08-17 19:37:01 +02:00
Linus Torvalds	476525004a	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull ACPI & power management update from Len Brown: "Re-write of the turbostat tool. lower overhead was necessary for measuring very large system when they are very idle. IVB support in intel_idle It's what I run on my IVB, others should be able to also:-) ACPICA core update We have found some bugs due to divergence between Linux and the upstream ACPICA base. Most of these patches are to reduce that divergence to reduce the risk of future bugs. Some cpuidle updates, mostly for non-Intel More will be coming, as they depend on this part. Some thermal management changes needed by non-ACPI systems. Some _OST (OS Status Indication) updates for hot ACPI hot-plug." * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (51 commits) Thermal: Documentation update Thermal: Add Hysteresis attributes Thermal: Make Thermal trip points writeable ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check tools/power: turbostat: fix large c1% issue tools/power: turbostat v2 - re-write for efficiency ACPICA: Update to version 20120711 ACPICA: AcpiSrc: Fix some translation issues for Linux conversion ACPICA: Update header files copyrights to 2012 ACPICA: Add new ACPI table load/unload external interfaces ACPICA: Split file: tbxface.c -> tbxfload.c ACPICA: Add PCC address space to space ID decode function ACPICA: Fix some comment fields ACPICA: Table manager: deploy new firmware error/warning interfaces ACPICA: Add new interfaces for BIOS(firmware) errors and warnings ACPICA: Split exception code utilities to a new file, utexcep.c ACPI: acpi_pad: tune round_robin_time ACPICA: Update to version 20120620 ACPICA: Add support for implicit notify on multiple devices ACPICA: Update comments; no functional change ...	2012-07-26 14:28:55 -07:00
Len Brown	ec033d0a02	Merge branches 'acpi_pad', 'acpica', 'apei-bugzilla-43282', 'battery', 'cpuidle-coupled', 'cpuidle-tweaks', 'intel_idle-ivb', 'ost', 'red-hat-bz-772730', 'thermal', 'thermal-spear' and 'turbostat-v2' into release	2012-07-26 00:03:58 -04:00
Rafael J. Wysocki	7791bd230c	Merge branch 'pm-domains' * pm-domains: PM / Domains: Fix build warning for CONFIG_PM_RUNTIME unset PM / Domains: Replace plain integer with NULL pointer in domain.c file PM / Domains: Add missing static storage class specifier in domain.c file PM / Domains: Allow device callbacks to be added at any time PM / Domains: Add device domain data reference counter PM / Domains: Add preliminary support for cpuidle, v2 PM / Domains: Do not stop devices after restoring their states PM / Domains: Use subsystem runtime suspend/resume callbacks by default	2012-07-19 00:03:17 +02:00
Preeti U Murthy	8651f97bd9	PM / cpuidle: System resume hang fix with cpuidle On certain bios, resume hangs if cpus are allowed to enter idle states during suspend [1]. This was fixed in apci idle driver [2].But intel_idle driver does not have this fix. Thus instead of replicating the fix in both the idle drivers, or in more platform specific idle drivers if needed, the more general cpuidle infrastructure could handle this. A suspend callback in cpuidle_driver could handle this fix. But a cpuidle_driver provides only basic functionalities like platform idle state detection capability and mechanisms to support entry and exit into CPU idle states. All other cpuidle functions are found in the cpuidle generic infrastructure for good reason that all cpuidle drivers, irrepective of their platforms will support these functions. One option therefore would be to register a suspend callback in cpuidle which handles this fix. This could be called through a PM_SUSPEND_PREPARE notifier. But this is too generic a notfier for a driver to handle. Also, ideally the job of cpuidle is not to handle side effects of suspend. It should expose the interfaces which "handle cpuidle 'during' suspend" or any other operation, which the subsystems call during that respective operation. The fix demands that during suspend, no cpus should be allowed to enter deep C-states. The interface cpuidle_uninstall_idle_handler() in cpuidle ensures that. Not just that it also kicks all the cpus which are already in idle out of their idle states which was being done during cpu hotplug through a CPU_DYING_FROZEN callbacks. Now the question arises about when during suspend should cpuidle_uninstall_idle_handler() be called. Since we are dealing with drivers it seems best to call this function during dpm_suspend(). Delaying the call till dpm_suspend_noirq() does no harm, as long as it is before cpu_hotplug_begin() to avoid race conditions with cpu hotpulg operations. In dpm_suspend_noirq(), it would be wise to place this call before suspend_device_irqs() to avoid ugly interactions with the same. Ananlogously, during resume. References: [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/674075. [2] http://marc.info/?l=linux-pm&m=133958534231884&w=2 Reported-and-tested-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-07-10 21:34:49 +02:00
Rafael J. Wysocki	cbc9ef0287	PM / Domains: Add preliminary support for cpuidle, v2 On some systems there are CPU cores located in the same power domains as I/O devices. Then, power can only be removed from the domain if all I/O devices in it are not in use and the CPU core is idle. Add preliminary support for that to the generic PM domains framework. First, the platform is expected to provide a cpuidle driver with one extra state designated for use with the generic PM domains code. This state should be initially disabled and its exit_latency value should be set to whatever time is needed to bring up the CPU core itself after restoring power to it, not including the domain's power on latency. Its .enter() callback should point to a procedure that will remove power from the domain containing the CPU core at the end of the CPU power transition. The remaining characteristics of the extra cpuidle state, referred to as the "domain" cpuidle state below, (e.g. power usage, target residency) should be populated in accordance with the properties of the hardware. Next, the platform should execute genpd_attach_cpuidle() on the PM domain containing the CPU core. That will cause the generic PM domains framework to treat that domain in a special way such that: * When all devices in the domain have been suspended and it is about to be turned off, the states of the devices will be saved, but power will not be removed from the domain. Instead, the "domain" cpuidle state will be enabled so that power can be removed from the domain when the CPU core is idle and the state has been chosen as the target by the cpuidle governor. * When the first I/O device in the domain is resumed and __pm_genpd_poweron(() is called for the first time after power has been removed from the domain, the "domain" cpuidle state will be disabled to avoid subsequent surprise power removals via cpuidle. The effective exit_latency value of the "domain" cpuidle state depends on the time needed to bring up the CPU core itself after restoring power to it as well as on the power on latency of the domain containing the CPU core. Thus the "domain" cpuidle state's exit_latency has to be recomputed every time the domain's power on latency is updated, which may happen every time power is restored to the domain, if the measured power on latency is greater than the latency stored in the corresponding generic_pm_domain structure. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Kevin Hilman <khilman@ti.com>	2012-07-03 19:07:42 +02:00
Rafael J. Wysocki	6e797a0788	PM / cpuidle: Add driver reference counter Add a reference counter for the cpuidle driver, so that it can't be unregistered when it is in use. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-07-03 19:06:25 +02:00
ShuoX Liu	dc7fd275ae	cpuidle: move field disable from per-driver to per-cpu Andrew J.Schorr raises a question. When he changes the disable setting on a single CPU, it affects all the other CPUs. Basically, currently, the disable field is per-driver instead of per-cpu. All the C states of the same driver are shared by all CPU in the same machine. The patch changes the `disable' field to per-cpu, so we could set this separately for each cpu. Signed-off-by: ShuoX Liu <shuox.liu@intel.com> Reported-by: Andrew J.Schorr <aschorr@telemetry-investments.com> Reviewed-by: Yanmin Zhang <yanmin_zhang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-07-03 19:05:31 +02:00
Colin Cross	20ff51a36b	cpuidle: coupled: add parallel barrier function Adds cpuidle_coupled_parallel_barrier, which can be used by coupled cpuidle state enter functions to handle resynchronization after determining if any cpu needs to abort. The normal use case will be: static bool abort_flag; static atomic_t abort_barrier; int arch_cpuidle_enter(struct cpuidle_device dev, ...) { if (arch_turn_off_irq_controller()) { / returns an error if an irq is pending and would be lost if idle continued and turned off power / abort_flag = true; } cpuidle_coupled_parallel_barrier(dev, &abort_barrier); if (abort_flag) { / One of the cpus didn't turn off it's irq controller / arch_turn_on_irq_controller(); return -EINTR; } / continue with idle */ ... } This will cause all cpus to abort idle together if one of them needs to abort. Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Tested-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Len Brown <len.brown@intel.com>	2012-06-02 00:49:36 -04:00
Colin Cross	4126c0197b	cpuidle: add support for states that affect multiple cpus On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the cpus cannot be independently powered down, either due to sequencing restrictions (on Tegra 2, cpu 0 must be the last to power down), or due to HW bugs (on OMAP4460, a cpu powering up will corrupt the gic state unless the other cpu runs a work around). Each cpu has a power state that it can enter without coordinating with the other cpu (usually Wait For Interrupt, or WFI), and one or more "coupled" power states that affect blocks shared between the cpus (L2 cache, interrupt controller, and sometimes the whole SoC). Entering a coupled power state must be tightly controlled on both cpus. The easiest solution to implementing coupled cpu power states is to hotplug all but one cpu whenever possible, usually using a cpufreq governor that looks at cpu load to determine when to enable the secondary cpus. This causes problems, as hotplug is an expensive operation, so the number of hotplug transitions must be minimized, leading to very slow response to loads, often on the order of seconds. This file implements an alternative solution, where each cpu will wait in the WFI state until all cpus are ready to enter a coupled state, at which point the coupled state function will be called on all cpus at approximately the same time. Once all cpus are ready to enter idle, they are woken by an smp cross call. At this point, there is a chance that one of the cpus will find work to do, and choose not to enter idle. A final pass is needed to guarantee that all cpus will call the power state enter function at the same time. During this pass, each cpu will increment the ready counter, and continue once the ready counter matches the number of online coupled cpus. If any cpu exits idle, the other cpus will decrement their counter and retry. To use coupled cpuidle states, a cpuidle driver must: Set struct cpuidle_device.coupled_cpus to the mask of all coupled cpus, usually the same as cpu_possible_mask if all cpus are part of the same cluster. The coupled_cpus mask must be set in the struct cpuidle_device for each cpu. Set struct cpuidle_device.safe_state to a state that is not a coupled state. This is usually WFI. Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each state that affects multiple cpus. Provide a struct cpuidle_state.enter function for each state that affects multiple cpus. This function is guaranteed to be called on all cpus at approximately the same time. The driver should ensure that the cpus all abort together if any cpu tries to abort once the function is called. update1: cpuidle: coupled: fix count of online cpus online_count was never incremented on boot, and was also counting cpus that were not part of the coupled set. Fix both issues by introducting a new function that counts online coupled cpus, and call it from register as well as the hotplug notifier. update2: cpuidle: coupled: fix decrementing ready count cpuidle_coupled_set_not_ready sometimes refuses to decrement the ready count in order to prevent a race condition. This makes it unsuitable for use when finished with idle. Add a new function cpuidle_coupled_set_done that decrements both the ready count and waiting count, and call it after idle is complete. Cc: Amit Kucheria <amit.kucheria@linaro.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Trinabh Gupta <g.trinabh@gmail.com> Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Tested-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Colin Cross <ccross@android.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com>	2012-06-02 00:49:09 -04:00
Colin Cross	3af272ab75	cpuidle: fix error handling in __cpuidle_register_device Fix the error handling in __cpuidle_register_device to include the missing list_del. Move it to a label, which will simplify the error handling when coupled states are added. Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Tested-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Colin Cross <ccross@android.com> Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com>	2012-06-02 00:48:49 -04:00
Colin Cross	56cfbf74a1	cpuidle: refactor out cpuidle_enter_state Split the code to enter a state and update the stats into a helper function, cpuidle_enter_state, and export it. This function will be called by the coupled state code to handle entering the safe state and the final coupled state. Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Tested-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Colin Cross <ccross@android.com> Reviewed-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com>	2012-06-02 00:48:31 -04:00
Srivatsa S. Bhat	1b0a0e9a15	cpuidle: add checks to avoid NULL pointer dereference The existing check for dev == NULL in __cpuidle_register_device() is rendered useless because dev is dereferenced before the check itself. Moreover, correctly speaking, it is the job of the callers of this function, i.e., cpuidle_register_device() & cpuidle_enable_device() (which also happen to be exported functions) to ensure that __cpuidle_register_device() is called with a non-NULL dev. So add the necessary dev == NULL checks in the two callers and remove the (useless) check from __cpuidle_register_device(). Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>	2012-06-01 16:07:23 -04:00
Sergey Senozhatsky	0aeb9cac6f	cpuidle: remove unused hrtimer_peek_ahead_timers() call commit `9a6558371b` Author: Arjan van de Ven <arjan@linux.intel.com> Date: Sun Nov 9 12:45:10 2008 -0800 regression: disable timer peek-ahead for 2.6.28 It's showing up as regressions; disabling it very likely just papers over an underlying issue, but time is running out for 2.6.28, lets get back to this for 2.6.29 Many years has passed since 2008, so it seems ok to remove whole `#if 0' block. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Kevin Hilman <khilman@ti.com> Cc: Trinabh Gupta <g.trinabh@gmail.com> Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>	2012-06-01 16:06:48 -04:00
Thomas Gleixner	4a1625133d	cpuidle: Use kick_all_cpus_sync() kick_all_cpus_sync() is the core implementation of cpu_idle_wait() which is copied all over the arch code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120507175652.119842173@linutronix.de	2012-05-08 12:35:06 +02:00
Len Brown	eeaab2d8af	Merge branches 'idle-fix' and 'misc' into release	2012-04-06 21:48:59 -04:00
Toshi Kani	ee01e66337	cpuidle: Fix panic in CPU off-lining with no idle driver Fix a NULL pointer dereference panic in cpuidle_play_dead() during CPU off-lining when no cpuidle driver is registered. A cpuidle driver may be registered at boot-time based on CPU type. This patch allows an off-lined CPU to enter HLT-based idle in this condition. Signed-off-by: Toshi Kani <toshi.kani@hp.com> Cc: Boris Ostrovsky <boris.ostrovsky@amd.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Len Brown <len.brown@intel.com>	2012-04-06 15:01:25 -04:00
Linus Torvalds	a335750b9a	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull ACPI & Power Management changes from Len Brown: - ACPI 5.0 after-ripples, ACPICA/Linux divergence cleanup - cpuidle evolving, more ARM use - thermal sub-system evolving, ditto - assorted other PM bits Fix up conflicts in various cpuidle implementations due to ARM cpuidle cleanups (ARM at91 self-refresh and cpu idle code rewritten into "standby" in asm conflicting with the consolidation of cpuidle time keeping), trivial SH include file context conflict and RCU tracing fixes in generic code. * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (77 commits) ACPI throttling: fix endian bug in acpi_read_throttling_status() Disable MCP limit exceeded messages from Intel IPS driver ACPI video: Don't start video device until its associated input device has been allocated ACPI video: Harden video bus adding. ACPI: Add support for exposing BGRT data ACPI: export acpi_kobj ACPI: Fix logic for removing mappings in 'acpi_unmap' CPER failed to handle generic error records with multiple sections ACPI: Clean redundant codes in scan.c ACPI: Fix unprotected smp_processor_id() in acpi_processor_cst_has_changed() ACPI: consistently use should_use_kmap() PNPACPI: Fix device ref leaking in acpi_pnp_match ACPI: Fix use-after-free in acpi_map_lsapic ACPI: processor_driver: add missing kfree ACPI, APEI: Fix incorrect APEI register bit width check and usage Update documentation for parameter notrigger in einj.txt ACPI, APEI, EINJ, new parameter to control trigger action ACPI, APEI, EINJ, limit the range of einj_param ACPI, APEI, Fix ERST header length check cpuidle: power_usage should be declared signed integer ...	2012-03-30 16:45:39 -07:00
Boris Ostrovsky	02401c06b7	cpuidle: power_usage should be declared signed integer power_usage is always assigned a negative value and should be declared a signed integer Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> Signed-off-by: Len Brown <len.brown@intel.com>	2012-03-30 03:23:30 -04:00
Boris Ostrovsky	1a022e3f1b	idle, x86: Allow off-lined CPU to enter deeper C states Currently when a CPU is off-lined it enters either MWAIT-based idle or, if MWAIT is not desired or supported, HLT-based idle (which places the processor in C1 state). This patch allows processors without MWAIT support to stay in states deeper than C1. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> Signed-off-by: Len Brown <len.brown@intel.com>	2012-03-30 03:23:01 -04:00
Daniel Lezcano	fc850f39ea	cpuidle: use the driver's state_count as default If the state_count is not initialized for the device use the driver's state count as the default. That will prevent to add it manually in the cpuidle driver initialization routine and will save us from duplicate line of code. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Len Brown <len.brown@intel.com>	2012-03-30 01:55:04 -04:00
ShuoX Liu	3a53396b03	cpuidle: add a sysfs entry to disable specific C state for debug purpose. Some C states of new CPU might be not good. One reason is BIOS might configure them incorrectly. To help developers root cause it quickly, the patch adds a new sysfs entry, so developers could disable specific C state manually. In addition, C state might have much impact on performance tuning, as it takes much time to enter/exit C states, which might delay interrupt processing. With the new debug option, developers could check if a deep C state could impact performance and how much impact it could cause. Also add this option in Documentation/cpuidle/sysfs.txt. [akpm@linux-foundation.org: check kstrtol return value] Signed-off-by: ShuoX Liu <shuox.liu@intel.com> Reviewed-by: Yanmin Zhang <yanmin_zhang@intel.com> Reviewed-and-Tested-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>	2012-03-30 01:52:58 -04:00
Robert Lee	e1689795a7	cpuidle: Add common time keeping and irq enabling Make necessary changes to implement time keeping and irq enabling in the core cpuidle code. This will allow the removal of these functionalities from various platform cpuidle implementations whose timekeeping and irq enabling follows the form in this common code. Signed-off-by: Robert Lee <rob.lee@linaro.org> Tested-by: Jean Pihet <j-pihet@ti.com> Tested-by: Amit Daniel <amit.kachhap@linaro.org> Tested-by: Robert Lee <rob.lee@linaro.org> Reviewed-by: Kevin Hilman <khilman@ti.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Acked-by: Jean Pihet <j-pihet@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>	2012-03-21 01:59:40 -04:00
Ingo Molnar	737f24bda7	Merge branch 'perf/urgent' into perf/core Conflicts: tools/perf/builtin-record.c tools/perf/builtin-top.c tools/perf/perf.h tools/perf/util/top.h Merge reason: resolve these cherry-picking conflicts. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2012-03-05 09:20:08 +01:00
Benjamin Herrenschmidt	aa491ad3d4	cpuidle: Default y on powerpc pSeries We moved all our pSeries idle loops to the cpu idle framework so we really want it to come up by default. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2012-02-22 16:48:51 +11:00
Steven Rostedt	76027ea863	cpuidle/tracing: Denote the tracepoints as being in rcu_idle_exit() section As the tracepoints in the cpuidle code are called when rcu_idle_exit() is in effect, the _rcuidle() version must be used, otherwise the rcu_read_lock()s that protect the tracepoint will not be honored. Cc: Len Brown <len.brown@intel.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-02-13 09:14:46 -05:00
Kay Sievers	8a25a2fd12	cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem and converts the devices to regular devices. The sysdev drivers are implemented as subsystem interfaces now. After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Userspace relies on events and generic sysfs subsystem infrastructure from sysdev devices, which are made available with this conversion. Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@amd64.org> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Len Brown <lenb@kernel.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Dave Jones <davej@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-12-21 14:29:42 -08:00
Linus Torvalds	3c00303206	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: cpuidle: Single/Global registration of idle states cpuidle: Split cpuidle_state structure and move per-cpu statistics fields cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare() cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state ACPI: Fix CONFIG_ACPI_DOCK=n compiler warning ACPI: Export FADT pm_profile integer value to userspace thermal: Prevent polling from happening during system suspend ACPI: Drop ACPI_NO_HARDWARE_INIT ACPI atomicio: Convert width in bits to bytes in __acpi_ioremap_fast() PNPACPI: Simplify disabled resource registration ACPI: Fix possible recursive locking in hwregs.c ACPI: use kstrdup() mrst pmu: update comment tools/power turbostat: less verbose debugging	2011-11-07 10:13:52 -08:00
Deepthi Dharwar	46bcfad7a8	cpuidle: Single/Global registration of idle states This patch makes the cpuidle_states structure global (single copy) instead of per-cpu. The statistics needed on per-cpu basis by the governor are kept per-cpu. This simplifies the cpuidle subsystem as state registration is done by single cpu only. Having single copy of cpuidle_states saves memory. Rare case of asymmetric C-states can be handled within the cpuidle driver and architectures such as POWER do not have asymmetric C-states. Having single/global registration of all the idle states, dynamic C-state transitions on x86 are handled by the boot cpu. Here, the boot cpu would disable all the devices, re-populate the states and later enable all the devices, irrespective of the cpu that would receive the notification first. Reference: https://lkml.org/lkml/2011/4/25/83 Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com> Tested-by: Jean Pihet <j-pihet@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-11-06 21:13:58 -05:00
Deepthi Dharwar	4202735e8a	cpuidle: Split cpuidle_state structure and move per-cpu statistics fields This is the first step towards global registration of cpuidle states. The statistics used primarily by the governor are per-cpu and have to be split from rest of the fields inside cpuidle_state, which would be made global i.e. single copy. The driver_data field is also per-cpu and moved. Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com> Tested-by: Jean Pihet <j-pihet@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-11-06 21:13:49 -05:00
Deepthi Dharwar	b25edc42bf	cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare() The cpuidle_device->prepare() mechanism causes updates to the cpuidle_state[].flags, setting and clearing CPUIDLE_FLAG_IGNORE to tell the governor not to chose a state on a per-cpu basis at run-time. State demotion is now handled by the driver and it returns the actual state entered. Hence, this mechanism is not required. Also this removes per-cpu flags from cpuidle_state enabling it to be made global. Reference: https://lkml.org/lkml/2011/3/25/52 Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm> Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com> Tested-by: Jean Pihet <j-pihet@ti.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-11-06 21:13:43 -05:00
Deepthi Dharwar	e978aa7d7d	cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state Cpuidle governor only suggests the state to enter using the governor->select() interface, but allows the low level driver to override the recommended state. The actual entered state may be different because of software or hardware demotion. Software demotion is done by the back-end cpuidle driver and can be accounted correctly. Current cpuidle code uses last_state field to capture the actual state entered and based on that updates the statistics for the state entered. Ideally the driver enter routine should update the counters, and it should return the state actually entered rather than the time spent there. The generic cpuidle code should simply handle where the counters live in the sysfs namespace, not updating the counters. Reference: https://lkml.org/lkml/2011/3/25/52 Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com> Tested-by: Jean Pihet <j-pihet@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-11-06 21:13:30 -05:00
Paul Gortmaker	70e522a028	cpuidle: ladder.c needs module.h and not just moduleparam.h This file has module_init/exit and MODULE_LICENSE, and so it needs the full module.h header. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:31 -04:00
Paul Gortmaker	884b17e109	cpuidle: Add module.h to drivers/cpuidle files as required. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:30 -04:00
Jean Pihet	e8db0be124	PM QoS: Move and rename the implementation files The PM QoS implementation files are better named kernel/power/qos.c and include/linux/pm_qos.h. The PM QoS support is compiled under the CONFIG_PM option. Signed-off-by: Jean Pihet <j-pihet@ti.com> Acked-by: markgross <markgross@thegnar.org> Reviewed-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-08-25 15:35:03 +02:00
Len Brown	a0bfa13738	cpuidle: stop depending on pm_idle cpuidle users should call cpuidle_call_idle() directly rather than via (pm_idle)() function pointer. Architecture may choose to continue using (pm_idle)(), but cpuidle need not depend on it: my_arch_cpu_idle() ... if(cpuidle_call_idle()) pm_idle(); cc: Kevin Hilman <khilman@deeprootsystems.com> cc: Paul Mundt <lethal@linux-sh.org> cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-08-03 19:06:37 -04:00
Len Brown	d91ee5863b	cpuidle: replace xen access to x86 pm_idle and default_idle When a Xen Dom0 kernel boots on a hypervisor, it gets access to the raw-hardware ACPI tables. While it parses the idle tables for the hypervisor's beneift, it uses HLT for its own idle. Rather than have xen scribble on pm_idle and access default_idle, have it simply disable_cpuidle() so acpi_idle will not load and architecture default HLT will be used. cc: xen-devel@lists.xensource.com Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-08-03 19:06:36 -04:00
Len Brown	62027aea23	cpuidle: create bootparam "cpuidle.off=1" useful for disabling cpuidle to fall back to architecture-default idle loop cpuidle drivers and governors will fail to register. on x86 they'll say so: intel_idle: intel_idle yielding to (null) ACPI: acpi_idle yielding to (null) Signed-off-by: Len Brown <len.brown@intel.com>	2011-08-03 19:06:36 -04:00
Linus Torvalds	f310642123	Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param x86 idle: deprecate "no-hlt" cmdline param x86 idle APM: deprecate CONFIG_APM_CPU_IDLE x86 idle floppy: deprecate disable_hlt() x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it x86 idle: clarify AMD erratum 400 workaround idle governor: Avoid lock acquisition to read pm_qos before entering idle cpuidle: menu: fixed wrapping timers at 4.294 seconds	2011-05-29 11:18:09 -07:00
Tero Kristo	7467571f44	cpuidle: menu: fixed wrapping timers at 4.294 seconds Cpuidle menu governor is using u32 as a temporary datatype for storing nanosecond values which wrap around at 4.294 seconds. This causes errors in predicted sleep times resulting in higher than should be C state selection and increased power consumption. This also breaks cpuidle state residency statistics. cc: stable@kernel.org # .32.x through .39.x Signed-off-by: Tero Kristo <tero.kristo@nokia.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-05-29 00:35:47 -04:00
Jiri Kosina	0a9d59a246	Merge branch 'master' into for-next	2011-02-15 10:24:31 +01:00
Jesper Juhl	42b16b3fbb	Kill off warning: ‘inline’ is not at beginning of declaration Fix a bunch of warning: ‘inline’ is not at beginning of declaration messages when building a 'make allyesconfig' kernel with -Wextra. These warnings are trivial to kill, yet rather annoying when building with -Wextra. The more we can cut down on pointless crap like this the better (IMHO). A previous patch to do this for a 'allnoconfig' build has already been merged. This just takes the cleanup a little further. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-01-19 15:43:08 +01:00
Len Brown	43952886f0	Merge branch 'cpuidle-perf-events' into idle-test	2011-01-12 18:06:19 -05:00
Len Brown	56dbed129d	Merge branch 'linus' into idle-test	2011-01-12 18:06:06 -05:00
Thomas Renninger	f77cfe4ea2	cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle" events -> this patch fixes it and makes cpu_idle events throwing less complex. It also introduces cpu_idle events for all architectures which use the cpuidle subsystem, namely: - arch/arm/mach-at91/cpuidle.c - arch/arm/mach-davinci/cpuidle.c - arch/arm/mach-kirkwood/cpuidle.c - arch/arm/mach-omap2/cpuidle34xx.c - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait) - arch/x86/kernel/process.c (did throw events before, but was a mess) - drivers/idle/intel_idle.c (did throw events before) Convention should be: Fire cpu_idle events inside the current pm_idle function (not somewhere down the the callee tree) to keep things easy. Current possible pm_idle functions in X86: c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle -> this is really easy is now. This affects userspace: The type field of the cpu_idle power event can now direclty get mapped to: /sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...} instead of throwing very CPU/mwait specific values. This change is not visible for the intel_idle driver. For the acpi_idle driver it should only be visible if the vendor misses out C-states in his BIOS. Another (perf timechart) patch reads out cpuidle info of cpu_idle events from: /sys/.../cpuidle/stateX/*, then the cpuidle events are mapped to the correct C-/cpuidle state again, even if e.g. vendors miss out C-states in their BIOS and for example only export C1 and C3. -> everything is fine. Signed-off-by: Thomas Renninger <trenn@suse.de> CC: Robert Schoene <robert.schoene@tu-dresden.de> CC: Jean Pihet <j-pihet@ti.com> CC: Arjan van de Ven <arjan@linux.intel.com> CC: Ingo Molnar <mingo@elte.hu> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: linux-pm@lists.linux-foundation.org CC: linux-acpi@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-perf-users@vger.kernel.org CC: linux-omap@vger.kernel.org Signed-off-by: Len Brown <len.brown@intel.com>	2011-01-12 18:05:16 -05:00
Len Brown	d247632c08	cpuidle: delete NOP CPUIDLE_FLAG_POLL it serves no purpose Signed-off-by: Len Brown <len.brown@intel.com>	2011-01-12 12:47:31 -05:00
Thomas Renninger	720f1c3010	cpuidle: Rename X86 specific idle poll state[0] from C0 to POLL C0 means and is well know as "not idle". All documentation out there uses this term as "running"/"not idle" state. Also Linux userspace tools (e.g. cpufreq-aperf and turbostat) show C0 residency which there is correct, but means something totally else than cpuidle "POLL" state. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Len Brown <len.brown@intel.com>	2011-01-12 12:47:31 -05:00
Rafael J. Wysocki	d8c216cfa5	cpuidle: Make cpuidle_enable_device() call poll_idle_init() The following scenario is possible with the current cpuidle code and the ACPI cpuidle driver: (1) acpi_processor_cst_has_changed() is called, (2) cpuidle_disable_device() is called, (3) cpuidle_remove_state_sysfs() is called to remove the (presumably outdated) states info from sysfs, (3) acpi_processor_get_power_info() is called, the first entry in the pr->power.states[] table is filled with zeros, (4) acpi_processor_setup_cpuidle() is called and it doesn't fill the first entry in pr->power.states[], (5) cpuidle_enable_device() is called, (6) __cpuidle_register_device() is _not_ called, since the device has already been registered, (7) Consequently, poll_idle_init() is _not_ called either, (8) cpuidle_add_state_sysfs() is called to create the sysfs attributes for the new states and it uses the bogus first table entry from acpi_processor_get_power_info() for creating state0. This problem is avoided if cpuidle_enable_device() unconditionally calls poll_idle_init(). Reported-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com> cc: stable@kernel.org	2011-01-12 12:47:30 -05:00
Linus Torvalds	72eb6a7914	Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits) gameport: use this_cpu_read instead of lookup x86: udelay: Use this_cpu_read to avoid address calculation x86: Use this_cpu_inc_return for nmi counter x86: Replace uses of current_cpu_data with this_cpu ops x86: Use this_cpu_ops to optimize code vmstat: User per cpu atomics to avoid interrupt disable / enable irq_work: Use per cpu atomics instead of regular atomics cpuops: Use cmpxchg for xchg to avoid lock semantics x86: this_cpu_cmpxchg and this_cpu_xchg operations percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support percpu,x86: relocate this_cpu_add_return() and friends connector: Use this_cpu operations xen: Use this_cpu_inc_return taskstats: Use this_cpu_ops random: Use this_cpu_inc_return fs: Use this_cpu_inc_return in buffer.c highmem: Use this_cpu_xx_return() operations vmstat: Use this_cpu_inc_return for vm statistics x86: Support for this_cpu_add, sub, dec, inc_return percpu: Generic support for this_cpu_add, sub, dec, inc_return ... Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c} as per Tejun.	2011-01-07 17:02:58 -08:00
Thomas Renninger	25e41933b5	perf: Clean up power events by introducing new, more generic ones Add these new power trace events: power:cpu_idle power:cpu_frequency power:machine_suspend The old C-state/idle accounting events: power:power_start power:power_end Have now a replacement (but we are still keeping the old tracepoints for compatibility): power:cpu_idle and power:power_frequency is replaced with: power:cpu_frequency power:machine_suspend is newly introduced. Jean Pihet has a patch integrated into the generic layer (kernel/power/suspend.c) which will make use of it. the type= field got removed from both, it was never used and the type is differed by the event type itself. perf timechart userspace tool gets adjusted in a separate patch. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Jean Pihet <jean.pihet@newoldbits.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: rjw@sisk.pl LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>	2011-01-04 08:16:54 +01:00
Christoph Lameter	4a6f4fe837	drivers: Replace __get_cpu_var with __this_cpu_read if not used for an address. __get_cpu_var() can be replaced with this_cpu_read and will then use a single read instruction with implied address calculation to access the correct per cpu instance. However, the address of a per cpu variable passed to __this_cpu_read() cannot be determed (since its an implied address conversion through segment prefixes). Therefore apply this only to uses of __get_cpu_var where the addres of the variable is not used. V3->V4: - Move one instance of this_cpu_inc_return to a later patch so that this one can go in without percpu infrastructrure changes. Sedat: fixed compile failure caused by an extra ')'. Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Sedat Dilek <sedat.dilek@gmail.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2010-12-17 15:07:18 +01:00
Lucas De Marchi	20e3341bb1	cpuidle: Fix typos Signed-off-by: Len Brown <len.brown@intel.com>	2010-09-28 23:30:38 -04:00
Ai Li	71abbbf856	cpuidle: extend cpuidle and menu governor to handle dynamic states On some SoC chips, HW resources may be in use during any particular idle period. As a consequence, the cpuidle states that the SoC is safe to enter can change from idle period to idle period. In addition, the latency and threshold of each cpuidle state can vary, depending on the operating condition when the CPU becomes idle, e.g. the current cpu frequency, the current state of the HW blocks, etc. cpuidle core and the menu governor, in the current form, are geared towards cpuidle states that are static, i.e. the availabiltiy of the states, their latencies, their thresholds are non-changing during run time. cpuidle does not provide any hook that cpuidle drivers can use to adjust those values on the fly for the current idle period before the menu governor selects the target cpuidle state. This patch extends cpuidle core and the menu governor to handle states that are dynamic. There are three additions in the patch and the patch maintains backwards-compatibility with existing cpuidle drivers. 1) add prepare() to struct cpuidle_device. A cpuidle driver can hook into the callback and cpuidle will call prepare() before calling the governor's select function. The callback gives the cpuidle driver a chance to update the dynamic information of the cpuidle states for the current idle period, e.g. state availability, latencies, thresholds, power values, etc. 2) add CPUIDLE_FLAG_IGNORE as one of the state flags. In the prepare() function, a cpuidle driver can set/clear the flag to indicate to the menu governor whether a cpuidle state should be ignored, i.e. not available, during the current idle period. 3) add power_specified bit to struct cpuidle_device. The menu governor currently assumes that the cpuidle states are arranged in the order of increasing latency, threshold, and power savings. This is true or can be made true for static states. Once the state parameters are dynamic, the latencies, thresholds, and power savings for the cpuidle states can increase or decrease by different amounts from idle period to idle period. So the assumption of increasing latency, threshold, and power savings from Cn to C(n+1) can no longer be guaranteed. It can be straightforward to calculate the power consumption of each available state and to specify it in power_usage for the idle period. Using the power_usage fields, the menu governor then selects the state that has the lowest power consumption and that still satisfies all other critieria. The power_specified bit defaults to 0. For existing cpuidle drivers, cpuidle detects that power_specified is 0 and fills in a dummy set of power_usage values. Signed-off-by: Ai Li <aili@codeaurora.org> Cc: Len Brown <len.brown@intel.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:45:04 -07:00
Thomas Renninger	6f4f2723d0	[CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independent and fix the broken case if a core's frequency depends on others. trace_power_frequency was only implemented in a rather ungeneric way in acpi-cpufreq driver's target() function only. -> Move the call to trace_power_frequency to cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE notifier is triggered. This will support power frequency tracing by all cpufreq drivers trace_power_frequency did not trace frequency changes correctly when the userspace governor was used or when CPU cores' frequency depend on each other. -> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu which gets switched automatically fixes this. Robert Schoene provided some important fixes on top of my initial quick shot version which are integrated in this patch: - Forgot some changes in power_end trace (TP_printk/variable names) - Variable dummy in power_end must now be cpu_id - Use static 64 bit variable instead of unsigned int for cpu_id Signed-off-by: Thomas Renninger <trenn@suse.de> CC: davej@redhat.com CC: arjan@infradead.org CC: linux-kernel@vger.kernel.org CC: robert.schoene@tu-dresden.de Tested-by: robert.schoene@tu-dresden.de Signed-off-by: Dave Jones <davej@redhat.com>	2010-08-03 13:47:05 -04:00
Peter Zijlstra	8c215bd389	sched: Cure nr_iowait_cpu() users Commit `0224cf4c5e` (sched: Intoduce get_cpu_iowait_time_us()) broke things by not making sure preemption was indeed disabled by the callers of nr_iowait_cpu() which took the iowait value of the current cpu. This resulted in a heap of preempt warnings. Cure this by making nr_iowait_cpu() take a cpu number and fix up the callers to pass in the right number. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Jiri Slaby <jslaby@suse.cz> Cc: linux-pm@lists.linux-foundation.org LKML-Reference: <1277968037.1868.120.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-07-01 09:39:48 +02:00
Linus Torvalds	e4f2e5eaac	Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6 * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6: intel_idle: native hardware cpuidle driver for latest Intel processors ACPI: acpi_idle: touch TS_POLLING only in the non-MWAIT case acpi_pad: uses MONITOR/MWAIT, so it doesn't need to clear TS_POLLING sched: clarify commment for TS_POLLING ACPI: allow a native cpuidle driver to displace ACPI cpuidle: make cpuidle_curr_driver static cpuidle: add cpuidle_unregister_driver() error check cpuidle: fail to register if !CONFIG_CPU_IDLE	2010-05-28 16:14:17 -07:00
Len Brown	752138df0d	cpuidle: make cpuidle_curr_driver static cpuidle_register_driver() sets cpuidle_curr_driver cpuidle_unregister_driver() clears cpuidle_curr_driver We should't expose cpuidle_curr_driver to potential modification except via these interfaces. So make it static and create cpuidle_get_driver() to observe it. Signed-off-by: Len Brown <len.brown@intel.com>	2010-05-27 21:06:58 -04:00
Len Brown	c0d64cb031	cpuidle: add cpuidle_unregister_driver() error check Assure that cpuidle_unregister_driver() will not clobber the registered driver if unregistered by somebody else. Signed-off-by: Len Brown <len.brown@intel.com>	2010-05-27 13:04:04 -04:00
Arjan van de Ven	1f85f87d4f	cpuidle: add a repeating pattern detector to the menu governor Currently, the menu governor uses the (corrected) next timer as key item for predicting the idle duration. It turns out that there are specific cases where this breaks down: There are cases where we have a very repetitive pattern of idle durations, where the idle period is pretty much the same, for reasons completely unrelated to the next timer event. Examples of such repeating patterns are network loads with irq mitigation, the mouse moving but in theory also the wifi beacons. This patch adds a relatively simple detector for such repeating patterns, where the standard deviation of the last 8 idle periods is compared to a threshold. With this extra predictor in place, measurements show that the DECAY factor can now be increased (the decaying average will now decay slower) to get an even more stable result. [arjan@infradead.org: fix bug identified by Frank] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Corrado Zoccolo <czoccolo@gmail.com> Cc: Frank Rowand <frank.rowand@am.sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-25 08:07:02 -07:00
Mark Gross	ed77134bfc	PM QOS update This patch changes the string based list management to a handle base implementation to help with the hot path use of pm-qos, it also renames much of the API to use "request" as opposed to "requirement" that was used in the initial implementation. I did this because request more accurately represents what it actually does. Also, I added a string based ABI for users wanting to use a string interface. So if the user writes 0xDDDDDDDD formatted hex it will be accepted by the interface. (someone asked me for it and I don't think it hurts anything.) This patch updates some documentation input I got from Randy. Signed-off-by: markgross <mgross@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2010-05-10 23:08:19 +02:00
Arjan van de Ven	1c6fe0364f	cpuidle: Fix incorrect optimization commit `672917dcc7` ("cpuidle: menu governor: reduce latency on exit") added an optimization, where the analysis on the past idle period moved from the end of idle, to the beginning of the new idle. Unfortunately, this optimization had a bug where it zeroed one key variable for new use, that is needed for the analysis. The fix is simple, zero the variable after doing the work from the previous idle. During the audit of the code that found this issue, another issue was also found; the ->measured_us data structure member is never set, a local variable is always used instead. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Corrado Zoccolo <czoccolo@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-09 18:35:36 -07:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Emese Revfy	52cf25d0ab	Driver core: Constify struct sysfs_ops in struct kobj_type Constify struct sysfs_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Acked-by: David Teigland <teigland@redhat.com> Acked-by: Matt Domsch <Matt_Domsch@dell.com> Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Acked-by: Hans J. Koch <hjk@linutronix.de> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-03-07 17:04:49 -08:00
Andi Kleen	c9be0a36f9	sysdev: Pass attribute in sysdev_class attributes show/store Passing the attribute to the low level IO functions allows all kinds of cleanups, by sharing low level IO code without requiring an own function for every piece of data. Also drivers can extend the attributes with own data fields and use that in the low level function. Similar to sysdev_attributes and normal attributes. This is a tree-wide sweep, converting everything in one go. No functional changes in this patch other than passing the new argument everywhere. Tested on x86, the non x86 parts are uncompiled. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2010-03-07 17:04:47 -08:00
Richard Kennedy	56e6943b41	cpuidle menu: remove 8 bytes of padding on 64 bit builds Reorder struct menu_device to remove 8 bytes of padding on 64 bit builds. Size drops from 136 to 128 bytes, so possibly needing one fewer cache lines. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-03-06 11:26:28 -08:00
Stephen Hemminger	5787536edf	drivers/cpuidle/governors/menu.c: fix undefined reference to `__udivdi3' menu: use proper 64 bit math The new menu governor is incorrectly doing a 64 bit divide. Compile tested only Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Len Brown <len.brown@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-01-11 09:34:07 -08:00
Julia Lawall	faa7b7ddca	drivers/cpuidle: Move dereference after NULL test It does not seem possible that ldev can be NULL, so drop the unnecessary test. If ldev can somehow be NULL, then the initialization of last_idx should be moved below the test. A simplified version of the semantic match that detects this problem is as follows (http://coccinelle.lip6.fr/): // <smpl> @match exists@ expression x, E; identifier fld; @@ * x->fld ... when != $x = E\\|&x$ * x == NULL // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-15 08:53:25 -08:00
Uwe Kleine-K�nig	21ae2956ce	tree-wide: fix typos "aquire" -> "acquire", "cumsumed" -> "consumed" This patch was generated by git grep -E -i -l '[Aa]quire' \| xargs -r perl -p -i -e 's/([Aa])quire/$1cquire/' and the cumsumed was found by checking the diff for aquire. Signed-off-by: Uwe Kleine-K�nig <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-11-09 09:40:57 +01:00

1 2 3 4

182 Commits