2010-02-18 06:39:08 +08:00
|
|
|
/*
|
|
|
|
* PCIe Native PME support
|
|
|
|
*
|
|
|
|
* Copyright (C) 2007 - 2009 Intel Corp
|
|
|
|
* Copyright (C) 2007 - 2009 Shaohua Li <shaohua.li@intel.com>
|
|
|
|
* Copyright (C) 2009 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.
|
|
|
|
*
|
|
|
|
* This file is subject to the terms and conditions of the GNU General Public
|
|
|
|
* License V2. See the file "COPYING" in the main directory of this archive
|
|
|
|
* for more details.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/pci.h>
|
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/errno.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 16:04:11 +08:00
|
|
|
#include <linux/slab.h>
|
2010-02-18 06:39:08 +08:00
|
|
|
#include <linux/init.h>
|
|
|
|
#include <linux/interrupt.h>
|
|
|
|
#include <linux/device.h>
|
|
|
|
#include <linux/pcieport_if.h>
|
|
|
|
#include <linux/acpi.h>
|
|
|
|
#include <linux/pci-acpi.h>
|
|
|
|
#include <linux/pm_runtime.h>
|
|
|
|
|
2010-08-21 07:58:22 +08:00
|
|
|
#include "../pci.h"
|
|
|
|
#include "portdrv.h"
|
2010-02-18 06:39:08 +08:00
|
|
|
|
2010-02-18 06:40:07 +08:00
|
|
|
/*
|
|
|
|
* If this switch is set, MSI will not be used for PCIe PME signaling. This
|
|
|
|
* causes the PCIe port driver to use INTx interrupts only, but it turns out
|
|
|
|
* that using MSI for PCIe PME signaling doesn't play well with PCIe PME-based
|
|
|
|
* wake-up from system sleep states.
|
|
|
|
*/
|
|
|
|
bool pcie_pme_msi_disabled;
|
|
|
|
|
2010-02-18 06:39:08 +08:00
|
|
|
static int __init pcie_pme_setup(char *str)
|
|
|
|
{
|
PCI: PCIe: Ask BIOS for control of all native services at once
After commit 852972acff8f10f3a15679be2059bb94916cba5d (ACPI: Disable
ASPM if the platform won't provide _OSC control for PCIe) control of
the PCIe Capability Structure is unconditionally requested by
acpi_pci_root_add(), which in principle may cause problems to
happen in two ways. First, the BIOS may refuse to give control of
the PCIe Capability Structure if it is not asked for any of the
_OSC features depending on it at the same time. Second, the BIOS may
assume that control of the _OSC features depending on the PCIe
Capability Structure will be requested in the future and may behave
incorrectly if that doesn't happen. For this reason, control of
the PCIe Capability Structure should always be requested along with
control of any other _OSC features that may depend on it (ie. PCIe
native PME, PCIe native hot-plug, PCIe AER).
Rework the PCIe port driver so that (1) it checks which native PCIe
port services can be enabled, according to the BIOS, and (2) it
requests control of all these services simultaneously. In
particular, this causes pcie_portdrv_probe() to fail if the BIOS
refuses to grant control of the PCIe Capability Structure, which
means that no native PCIe port services can be enabled for the PCIe
Root Complex the given port belongs to. If that happens, ASPM is
disabled to avoid problems with mishandling it by the part of the
PCIe hierarchy for which control of the PCIe Capability Structure
has not been received.
Make it possible to override this behavior using 'pcie_ports=native'
(use the PCIe native services regardless of the BIOS response to the
control request), or 'pcie_ports=compat' (do not use the PCIe native
services at all).
Accordingly, rework the existing PCIe port service drivers so that
they don't request control of the services directly.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-08-22 04:02:38 +08:00
|
|
|
if (!strncmp(str, "nomsi", 5))
|
|
|
|
pcie_pme_msi_disabled = true;
|
2010-06-18 23:04:22 +08:00
|
|
|
|
2010-02-18 06:39:08 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
__setup("pcie_pme=", pcie_pme_setup);
|
|
|
|
|
|
|
|
struct pcie_pme_service_data {
|
|
|
|
spinlock_t lock;
|
|
|
|
struct pcie_device *srv;
|
|
|
|
struct work_struct work;
|
|
|
|
bool noirq; /* Don't enable the PME interrupt used by this service. */
|
|
|
|
};
|
|
|
|
|
|
|
|
/**
|
|
|
|
* pcie_pme_interrupt_enable - Enable/disable PCIe PME interrupt generation.
|
|
|
|
* @dev: PCIe root port or event collector.
|
|
|
|
* @enable: Enable or disable the interrupt.
|
|
|
|
*/
|
PCI: PCIe: Ask BIOS for control of all native services at once
After commit 852972acff8f10f3a15679be2059bb94916cba5d (ACPI: Disable
ASPM if the platform won't provide _OSC control for PCIe) control of
the PCIe Capability Structure is unconditionally requested by
acpi_pci_root_add(), which in principle may cause problems to
happen in two ways. First, the BIOS may refuse to give control of
the PCIe Capability Structure if it is not asked for any of the
_OSC features depending on it at the same time. Second, the BIOS may
assume that control of the _OSC features depending on the PCIe
Capability Structure will be requested in the future and may behave
incorrectly if that doesn't happen. For this reason, control of
the PCIe Capability Structure should always be requested along with
control of any other _OSC features that may depend on it (ie. PCIe
native PME, PCIe native hot-plug, PCIe AER).
Rework the PCIe port driver so that (1) it checks which native PCIe
port services can be enabled, according to the BIOS, and (2) it
requests control of all these services simultaneously. In
particular, this causes pcie_portdrv_probe() to fail if the BIOS
refuses to grant control of the PCIe Capability Structure, which
means that no native PCIe port services can be enabled for the PCIe
Root Complex the given port belongs to. If that happens, ASPM is
disabled to avoid problems with mishandling it by the part of the
PCIe hierarchy for which control of the PCIe Capability Structure
has not been received.
Make it possible to override this behavior using 'pcie_ports=native'
(use the PCIe native services regardless of the BIOS response to the
control request), or 'pcie_ports=compat' (do not use the PCIe native
services at all).
Accordingly, rework the existing PCIe port service drivers so that
they don't request control of the services directly.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-08-22 04:02:38 +08:00
|
|
|
void pcie_pme_interrupt_enable(struct pci_dev *dev, bool enable)
|
2010-02-18 06:39:08 +08:00
|
|
|
{
|
|
|
|
if (enable)
|
2012-07-24 17:20:10 +08:00
|
|
|
pcie_capability_set_word(dev, PCI_EXP_RTCTL,
|
|
|
|
PCI_EXP_RTCTL_PMEIE);
|
2010-02-18 06:39:08 +08:00
|
|
|
else
|
2012-07-24 17:20:10 +08:00
|
|
|
pcie_capability_clear_word(dev, PCI_EXP_RTCTL,
|
|
|
|
PCI_EXP_RTCTL_PMEIE);
|
2010-02-18 06:39:08 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* pcie_pme_walk_bus - Scan a PCI bus for devices asserting PME#.
|
|
|
|
* @bus: PCI bus to scan.
|
|
|
|
*
|
|
|
|
* Scan given PCI bus and all buses under it for devices asserting PME#.
|
|
|
|
*/
|
|
|
|
static bool pcie_pme_walk_bus(struct pci_bus *bus)
|
|
|
|
{
|
|
|
|
struct pci_dev *dev;
|
|
|
|
bool ret = false;
|
|
|
|
|
|
|
|
list_for_each_entry(dev, &bus->devices, bus_list) {
|
|
|
|
/* Skip PCIe devices in case we started from a root port. */
|
2010-02-22 13:12:24 +08:00
|
|
|
if (!pci_is_pcie(dev) && pci_check_pme_status(dev)) {
|
PCI / PM: Extend PME polling to all PCI devices
The land of PCI power management is a land of sorrow and ugliness,
especially in the area of signaling events by devices. There are
devices that set their PME Status bits, but don't really bother
to send a PME message or assert PME#. There are hardware vendors
who don't connect PME# lines to the system core logic (they know
who they are). There are PCI Express Root Ports that don't bother
to trigger interrupts when they receive PME messages from the devices
below. There are ACPI BIOSes that forget to provide _PRW methods for
devices capable of signaling wakeup. Finally, there are BIOSes that
do provide _PRW methods for such devices, but then don't bother to
call Notify() for those devices from the corresponding _Lxx/_Exx
GPE-handling methods. In all of these cases the kernel doesn't have
a chance to receive a proper notification that it should wake up a
device, so devices stay in low-power states forever. Worse yet, in
some cases they continuously send PME Messages that are silently
ignored, because the kernel simply doesn't know that it should clear
the device's PME Status bit.
This problem was first observed for "parallel" (non-Express) PCI
devices on add-on cards and Matthew Garrett addressed it by adding
code that polls PME Status bits of such devices, if they are enabled
to signal PME, to the kernel. Recently, however, it has turned out
that PCI Express devices are also affected by this issue and that it
is not limited to add-on devices, so it seems necessary to extend
the PME polling to all PCI devices, including PCI Express and planar
ones. Still, it would be wasteful to poll the PME Status bits of
devices that are known to receive proper PME notifications, so make
the kernel (1) poll the PME Status bits of all PCI and PCIe devices
enabled to signal PME and (2) disable the PME Status polling for
devices for which correct PME notifications are received.
Tested-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-10-04 05:16:33 +08:00
|
|
|
if (dev->pme_poll)
|
|
|
|
dev->pme_poll = false;
|
|
|
|
|
PM: Make it possible to avoid races between wakeup and system sleep
One of the arguments during the suspend blockers discussion was that
the mainline kernel didn't contain any mechanisms making it possible
to avoid races between wakeup and system suspend.
Generally, there are two problems in that area. First, if a wakeup
event occurs exactly when /sys/power/state is being written to, it
may be delivered to user space right before the freezer kicks in, so
the user space consumer of the event may not be able to process it
before the system is suspended. Second, if a wakeup event occurs
after user space has been frozen, it is not generally guaranteed that
the ongoing transition of the system into a sleep state will be
aborted.
To address these issues introduce a new global sysfs attribute,
/sys/power/wakeup_count, associated with a running counter of wakeup
events and three helper functions, pm_stay_awake(), pm_relax(), and
pm_wakeup_event(), that may be used by kernel subsystems to control
the behavior of this attribute and to request the PM core to abort
system transitions into a sleep state already in progress.
The /sys/power/wakeup_count file may be read from or written to by
user space. Reads will always succeed (unless interrupted by a
signal) and return the current value of the wakeup events counter.
Writes, however, will only succeed if the written number is equal to
the current value of the wakeup events counter. If a write is
successful, it will cause the kernel to save the current value of the
wakeup events counter and to abort the subsequent system transition
into a sleep state if any wakeup events are reported after the write
has returned.
[The assumption is that before writing to /sys/power/state user space
will first read from /sys/power/wakeup_count. Next, user space
consumers of wakeup events will have a chance to acknowledge or
veto the upcoming system transition to a sleep state. Finally, if
the transition is allowed to proceed, /sys/power/wakeup_count will
be written to and if that succeeds, /sys/power/state will be written
to as well. Still, if any wakeup events are reported to the PM core
by kernel subsystems after that point, the transition will be
aborted.]
Additionally, put a wakeup events counter into struct dev_pm_info and
make these per-device wakeup event counters available via sysfs,
so that it's possible to check the activity of various wakeup event
sources within the kernel.
To illustrate how subsystems can use pm_wakeup_event(), make the
low-level PCI runtime PM wakeup-handling code use it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: markgross <markgross@thegnar.org>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
2010-07-06 04:43:53 +08:00
|
|
|
pci_wakeup_event(dev);
|
2010-12-29 20:22:08 +08:00
|
|
|
pm_request_resume(&dev->dev);
|
2010-02-18 06:39:08 +08:00
|
|
|
ret = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (dev->subordinate && pcie_pme_walk_bus(dev->subordinate))
|
|
|
|
ret = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* pcie_pme_from_pci_bridge - Check if PCIe-PCI bridge generated a PME.
|
|
|
|
* @bus: Secondary bus of the bridge.
|
|
|
|
* @devfn: Device/function number to check.
|
|
|
|
*
|
|
|
|
* PME from PCI devices under a PCIe-PCI bridge may be converted to an in-band
|
|
|
|
* PCIe PME message. In such that case the bridge should use the Requester ID
|
|
|
|
* of device/function number 0 on its secondary bus.
|
|
|
|
*/
|
|
|
|
static bool pcie_pme_from_pci_bridge(struct pci_bus *bus, u8 devfn)
|
|
|
|
{
|
|
|
|
struct pci_dev *dev;
|
|
|
|
bool found = false;
|
|
|
|
|
|
|
|
if (devfn)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
dev = pci_dev_get(bus->self);
|
|
|
|
if (!dev)
|
|
|
|
return false;
|
|
|
|
|
2012-07-24 17:20:03 +08:00
|
|
|
if (pci_is_pcie(dev) && pci_pcie_type(dev) == PCI_EXP_TYPE_PCI_BRIDGE) {
|
2010-02-18 06:39:08 +08:00
|
|
|
down_read(&pci_bus_sem);
|
|
|
|
if (pcie_pme_walk_bus(bus))
|
|
|
|
found = true;
|
|
|
|
up_read(&pci_bus_sem);
|
|
|
|
}
|
|
|
|
|
|
|
|
pci_dev_put(dev);
|
|
|
|
return found;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* pcie_pme_handle_request - Find device that generated PME and handle it.
|
|
|
|
* @port: Root port or event collector that generated the PME interrupt.
|
|
|
|
* @req_id: PCIe Requester ID of the device that generated the PME.
|
|
|
|
*/
|
|
|
|
static void pcie_pme_handle_request(struct pci_dev *port, u16 req_id)
|
|
|
|
{
|
|
|
|
u8 busnr = req_id >> 8, devfn = req_id & 0xff;
|
|
|
|
struct pci_bus *bus;
|
|
|
|
struct pci_dev *dev;
|
|
|
|
bool found = false;
|
|
|
|
|
|
|
|
/* First, check if the PME is from the root port itself. */
|
|
|
|
if (port->devfn == devfn && port->bus->number == busnr) {
|
PCI / PM: Extend PME polling to all PCI devices
The land of PCI power management is a land of sorrow and ugliness,
especially in the area of signaling events by devices. There are
devices that set their PME Status bits, but don't really bother
to send a PME message or assert PME#. There are hardware vendors
who don't connect PME# lines to the system core logic (they know
who they are). There are PCI Express Root Ports that don't bother
to trigger interrupts when they receive PME messages from the devices
below. There are ACPI BIOSes that forget to provide _PRW methods for
devices capable of signaling wakeup. Finally, there are BIOSes that
do provide _PRW methods for such devices, but then don't bother to
call Notify() for those devices from the corresponding _Lxx/_Exx
GPE-handling methods. In all of these cases the kernel doesn't have
a chance to receive a proper notification that it should wake up a
device, so devices stay in low-power states forever. Worse yet, in
some cases they continuously send PME Messages that are silently
ignored, because the kernel simply doesn't know that it should clear
the device's PME Status bit.
This problem was first observed for "parallel" (non-Express) PCI
devices on add-on cards and Matthew Garrett addressed it by adding
code that polls PME Status bits of such devices, if they are enabled
to signal PME, to the kernel. Recently, however, it has turned out
that PCI Express devices are also affected by this issue and that it
is not limited to add-on devices, so it seems necessary to extend
the PME polling to all PCI devices, including PCI Express and planar
ones. Still, it would be wasteful to poll the PME Status bits of
devices that are known to receive proper PME notifications, so make
the kernel (1) poll the PME Status bits of all PCI and PCIe devices
enabled to signal PME and (2) disable the PME Status polling for
devices for which correct PME notifications are received.
Tested-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-10-04 05:16:33 +08:00
|
|
|
if (port->pme_poll)
|
|
|
|
port->pme_poll = false;
|
|
|
|
|
2010-02-18 06:39:08 +08:00
|
|
|
if (pci_check_pme_status(port)) {
|
|
|
|
pm_request_resume(&port->dev);
|
|
|
|
found = true;
|
|
|
|
} else {
|
|
|
|
/*
|
|
|
|
* Apparently, the root port generated the PME on behalf
|
|
|
|
* of a non-PCIe device downstream. If this is done by
|
|
|
|
* a root port, the Requester ID field in its status
|
|
|
|
* register may contain either the root port's, or the
|
|
|
|
* source device's information (PCI Express Base
|
|
|
|
* Specification, Rev. 2.0, Section 6.1.9).
|
|
|
|
*/
|
|
|
|
down_read(&pci_bus_sem);
|
|
|
|
found = pcie_pme_walk_bus(port->subordinate);
|
|
|
|
up_read(&pci_bus_sem);
|
|
|
|
}
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Second, find the bus the source device is on. */
|
|
|
|
bus = pci_find_bus(pci_domain_nr(port->bus), busnr);
|
|
|
|
if (!bus)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
/* Next, check if the PME is from a PCIe-PCI bridge. */
|
|
|
|
found = pcie_pme_from_pci_bridge(bus, devfn);
|
|
|
|
if (found)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
/* Finally, try to find the PME source on the bus. */
|
|
|
|
down_read(&pci_bus_sem);
|
|
|
|
list_for_each_entry(dev, &bus->devices, bus_list) {
|
|
|
|
pci_dev_get(dev);
|
|
|
|
if (dev->devfn == devfn) {
|
|
|
|
found = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
pci_dev_put(dev);
|
|
|
|
}
|
|
|
|
up_read(&pci_bus_sem);
|
|
|
|
|
|
|
|
if (found) {
|
|
|
|
/* The device is there, but we have to check its PME status. */
|
|
|
|
found = pci_check_pme_status(dev);
|
PM: Make it possible to avoid races between wakeup and system sleep
One of the arguments during the suspend blockers discussion was that
the mainline kernel didn't contain any mechanisms making it possible
to avoid races between wakeup and system suspend.
Generally, there are two problems in that area. First, if a wakeup
event occurs exactly when /sys/power/state is being written to, it
may be delivered to user space right before the freezer kicks in, so
the user space consumer of the event may not be able to process it
before the system is suspended. Second, if a wakeup event occurs
after user space has been frozen, it is not generally guaranteed that
the ongoing transition of the system into a sleep state will be
aborted.
To address these issues introduce a new global sysfs attribute,
/sys/power/wakeup_count, associated with a running counter of wakeup
events and three helper functions, pm_stay_awake(), pm_relax(), and
pm_wakeup_event(), that may be used by kernel subsystems to control
the behavior of this attribute and to request the PM core to abort
system transitions into a sleep state already in progress.
The /sys/power/wakeup_count file may be read from or written to by
user space. Reads will always succeed (unless interrupted by a
signal) and return the current value of the wakeup events counter.
Writes, however, will only succeed if the written number is equal to
the current value of the wakeup events counter. If a write is
successful, it will cause the kernel to save the current value of the
wakeup events counter and to abort the subsequent system transition
into a sleep state if any wakeup events are reported after the write
has returned.
[The assumption is that before writing to /sys/power/state user space
will first read from /sys/power/wakeup_count. Next, user space
consumers of wakeup events will have a chance to acknowledge or
veto the upcoming system transition to a sleep state. Finally, if
the transition is allowed to proceed, /sys/power/wakeup_count will
be written to and if that succeeds, /sys/power/state will be written
to as well. Still, if any wakeup events are reported to the PM core
by kernel subsystems after that point, the transition will be
aborted.]
Additionally, put a wakeup events counter into struct dev_pm_info and
make these per-device wakeup event counters available via sysfs,
so that it's possible to check the activity of various wakeup event
sources within the kernel.
To illustrate how subsystems can use pm_wakeup_event(), make the
low-level PCI runtime PM wakeup-handling code use it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: markgross <markgross@thegnar.org>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
2010-07-06 04:43:53 +08:00
|
|
|
if (found) {
|
PCI / PM: Extend PME polling to all PCI devices
The land of PCI power management is a land of sorrow and ugliness,
especially in the area of signaling events by devices. There are
devices that set their PME Status bits, but don't really bother
to send a PME message or assert PME#. There are hardware vendors
who don't connect PME# lines to the system core logic (they know
who they are). There are PCI Express Root Ports that don't bother
to trigger interrupts when they receive PME messages from the devices
below. There are ACPI BIOSes that forget to provide _PRW methods for
devices capable of signaling wakeup. Finally, there are BIOSes that
do provide _PRW methods for such devices, but then don't bother to
call Notify() for those devices from the corresponding _Lxx/_Exx
GPE-handling methods. In all of these cases the kernel doesn't have
a chance to receive a proper notification that it should wake up a
device, so devices stay in low-power states forever. Worse yet, in
some cases they continuously send PME Messages that are silently
ignored, because the kernel simply doesn't know that it should clear
the device's PME Status bit.
This problem was first observed for "parallel" (non-Express) PCI
devices on add-on cards and Matthew Garrett addressed it by adding
code that polls PME Status bits of such devices, if they are enabled
to signal PME, to the kernel. Recently, however, it has turned out
that PCI Express devices are also affected by this issue and that it
is not limited to add-on devices, so it seems necessary to extend
the PME polling to all PCI devices, including PCI Express and planar
ones. Still, it would be wasteful to poll the PME Status bits of
devices that are known to receive proper PME notifications, so make
the kernel (1) poll the PME Status bits of all PCI and PCIe devices
enabled to signal PME and (2) disable the PME Status polling for
devices for which correct PME notifications are received.
Tested-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2011-10-04 05:16:33 +08:00
|
|
|
if (dev->pme_poll)
|
|
|
|
dev->pme_poll = false;
|
|
|
|
|
PM: Make it possible to avoid races between wakeup and system sleep
One of the arguments during the suspend blockers discussion was that
the mainline kernel didn't contain any mechanisms making it possible
to avoid races between wakeup and system suspend.
Generally, there are two problems in that area. First, if a wakeup
event occurs exactly when /sys/power/state is being written to, it
may be delivered to user space right before the freezer kicks in, so
the user space consumer of the event may not be able to process it
before the system is suspended. Second, if a wakeup event occurs
after user space has been frozen, it is not generally guaranteed that
the ongoing transition of the system into a sleep state will be
aborted.
To address these issues introduce a new global sysfs attribute,
/sys/power/wakeup_count, associated with a running counter of wakeup
events and three helper functions, pm_stay_awake(), pm_relax(), and
pm_wakeup_event(), that may be used by kernel subsystems to control
the behavior of this attribute and to request the PM core to abort
system transitions into a sleep state already in progress.
The /sys/power/wakeup_count file may be read from or written to by
user space. Reads will always succeed (unless interrupted by a
signal) and return the current value of the wakeup events counter.
Writes, however, will only succeed if the written number is equal to
the current value of the wakeup events counter. If a write is
successful, it will cause the kernel to save the current value of the
wakeup events counter and to abort the subsequent system transition
into a sleep state if any wakeup events are reported after the write
has returned.
[The assumption is that before writing to /sys/power/state user space
will first read from /sys/power/wakeup_count. Next, user space
consumers of wakeup events will have a chance to acknowledge or
veto the upcoming system transition to a sleep state. Finally, if
the transition is allowed to proceed, /sys/power/wakeup_count will
be written to and if that succeeds, /sys/power/state will be written
to as well. Still, if any wakeup events are reported to the PM core
by kernel subsystems after that point, the transition will be
aborted.]
Additionally, put a wakeup events counter into struct dev_pm_info and
make these per-device wakeup event counters available via sysfs,
so that it's possible to check the activity of various wakeup event
sources within the kernel.
To illustrate how subsystems can use pm_wakeup_event(), make the
low-level PCI runtime PM wakeup-handling code use it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: markgross <markgross@thegnar.org>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
2010-07-06 04:43:53 +08:00
|
|
|
pci_wakeup_event(dev);
|
2010-12-29 20:22:08 +08:00
|
|
|
pm_request_resume(&dev->dev);
|
PM: Make it possible to avoid races between wakeup and system sleep
One of the arguments during the suspend blockers discussion was that
the mainline kernel didn't contain any mechanisms making it possible
to avoid races between wakeup and system suspend.
Generally, there are two problems in that area. First, if a wakeup
event occurs exactly when /sys/power/state is being written to, it
may be delivered to user space right before the freezer kicks in, so
the user space consumer of the event may not be able to process it
before the system is suspended. Second, if a wakeup event occurs
after user space has been frozen, it is not generally guaranteed that
the ongoing transition of the system into a sleep state will be
aborted.
To address these issues introduce a new global sysfs attribute,
/sys/power/wakeup_count, associated with a running counter of wakeup
events and three helper functions, pm_stay_awake(), pm_relax(), and
pm_wakeup_event(), that may be used by kernel subsystems to control
the behavior of this attribute and to request the PM core to abort
system transitions into a sleep state already in progress.
The /sys/power/wakeup_count file may be read from or written to by
user space. Reads will always succeed (unless interrupted by a
signal) and return the current value of the wakeup events counter.
Writes, however, will only succeed if the written number is equal to
the current value of the wakeup events counter. If a write is
successful, it will cause the kernel to save the current value of the
wakeup events counter and to abort the subsequent system transition
into a sleep state if any wakeup events are reported after the write
has returned.
[The assumption is that before writing to /sys/power/state user space
will first read from /sys/power/wakeup_count. Next, user space
consumers of wakeup events will have a chance to acknowledge or
veto the upcoming system transition to a sleep state. Finally, if
the transition is allowed to proceed, /sys/power/wakeup_count will
be written to and if that succeeds, /sys/power/state will be written
to as well. Still, if any wakeup events are reported to the PM core
by kernel subsystems after that point, the transition will be
aborted.]
Additionally, put a wakeup events counter into struct dev_pm_info and
make these per-device wakeup event counters available via sysfs,
so that it's possible to check the activity of various wakeup event
sources within the kernel.
To illustrate how subsystems can use pm_wakeup_event(), make the
low-level PCI runtime PM wakeup-handling code use it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: markgross <markgross@thegnar.org>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
2010-07-06 04:43:53 +08:00
|
|
|
}
|
2010-02-18 06:39:08 +08:00
|
|
|
pci_dev_put(dev);
|
|
|
|
} else if (devfn) {
|
|
|
|
/*
|
|
|
|
* The device is not there, but we can still try to recover by
|
|
|
|
* assuming that the PME was reported by a PCIe-PCI bridge that
|
|
|
|
* used devfn different from zero.
|
|
|
|
*/
|
|
|
|
dev_dbg(&port->dev, "PME interrupt generated for "
|
|
|
|
"non-existent device %02x:%02x.%d\n",
|
|
|
|
busnr, PCI_SLOT(devfn), PCI_FUNC(devfn));
|
|
|
|
found = pcie_pme_from_pci_bridge(bus, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
if (!found)
|
|
|
|
dev_dbg(&port->dev, "Spurious native PME interrupt!\n");
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* pcie_pme_work_fn - Work handler for PCIe PME interrupt.
|
|
|
|
* @work: Work structure giving access to service data.
|
|
|
|
*/
|
|
|
|
static void pcie_pme_work_fn(struct work_struct *work)
|
|
|
|
{
|
|
|
|
struct pcie_pme_service_data *data =
|
|
|
|
container_of(work, struct pcie_pme_service_data, work);
|
|
|
|
struct pci_dev *port = data->srv->port;
|
|
|
|
u32 rtsta;
|
|
|
|
|
|
|
|
spin_lock_irq(&data->lock);
|
|
|
|
|
|
|
|
for (;;) {
|
|
|
|
if (data->noirq)
|
|
|
|
break;
|
|
|
|
|
2012-07-24 17:20:10 +08:00
|
|
|
pcie_capability_read_dword(port, PCI_EXP_RTSTA, &rtsta);
|
2010-02-18 06:39:08 +08:00
|
|
|
if (rtsta & PCI_EXP_RTSTA_PME) {
|
|
|
|
/*
|
|
|
|
* Clear PME status of the port. If there are other
|
|
|
|
* pending PMEs, the status will be set again.
|
|
|
|
*/
|
2010-12-19 22:57:16 +08:00
|
|
|
pcie_clear_root_pme_status(port);
|
2010-02-18 06:39:08 +08:00
|
|
|
|
|
|
|
spin_unlock_irq(&data->lock);
|
|
|
|
pcie_pme_handle_request(port, rtsta & 0xffff);
|
|
|
|
spin_lock_irq(&data->lock);
|
|
|
|
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* No need to loop if there are no more PMEs pending. */
|
|
|
|
if (!(rtsta & PCI_EXP_RTSTA_PENDING))
|
|
|
|
break;
|
|
|
|
|
|
|
|
spin_unlock_irq(&data->lock);
|
|
|
|
cpu_relax();
|
|
|
|
spin_lock_irq(&data->lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!data->noirq)
|
|
|
|
pcie_pme_interrupt_enable(port, true);
|
|
|
|
|
|
|
|
spin_unlock_irq(&data->lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* pcie_pme_irq - Interrupt handler for PCIe root port PME interrupt.
|
|
|
|
* @irq: Interrupt vector.
|
|
|
|
* @context: Interrupt context pointer.
|
|
|
|
*/
|
|
|
|
static irqreturn_t pcie_pme_irq(int irq, void *context)
|
|
|
|
{
|
|
|
|
struct pci_dev *port;
|
|
|
|
struct pcie_pme_service_data *data;
|
|
|
|
u32 rtsta;
|
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
port = ((struct pcie_device *)context)->port;
|
|
|
|
data = get_service_data((struct pcie_device *)context);
|
|
|
|
|
|
|
|
spin_lock_irqsave(&data->lock, flags);
|
2012-07-24 17:20:10 +08:00
|
|
|
pcie_capability_read_dword(port, PCI_EXP_RTSTA, &rtsta);
|
2010-02-18 06:39:08 +08:00
|
|
|
|
|
|
|
if (!(rtsta & PCI_EXP_RTSTA_PME)) {
|
|
|
|
spin_unlock_irqrestore(&data->lock, flags);
|
|
|
|
return IRQ_NONE;
|
|
|
|
}
|
|
|
|
|
|
|
|
pcie_pme_interrupt_enable(port, false);
|
|
|
|
spin_unlock_irqrestore(&data->lock, flags);
|
|
|
|
|
|
|
|
/* We don't use pm_wq, because it's freezable. */
|
|
|
|
schedule_work(&data->work);
|
|
|
|
|
|
|
|
return IRQ_HANDLED;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* pcie_pme_set_native - Set the PME interrupt flag for given device.
|
|
|
|
* @dev: PCI device to handle.
|
|
|
|
* @ign: Ignored.
|
|
|
|
*/
|
|
|
|
static int pcie_pme_set_native(struct pci_dev *dev, void *ign)
|
|
|
|
{
|
|
|
|
dev_info(&dev->dev, "Signaling PME through PCIe PME interrupt\n");
|
|
|
|
|
|
|
|
device_set_run_wake(&dev->dev, true);
|
|
|
|
dev->pme_interrupt = true;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* pcie_pme_mark_devices - Set the PME interrupt flag for devices below a port.
|
|
|
|
* @port: PCIe root port or event collector to handle.
|
|
|
|
*
|
|
|
|
* For each device below given root port, including the port itself (or for each
|
|
|
|
* root complex integrated endpoint if @port is a root complex event collector)
|
|
|
|
* set the flag indicating that it can signal run-time wake-up events via PCIe
|
|
|
|
* PME interrupts.
|
|
|
|
*/
|
|
|
|
static void pcie_pme_mark_devices(struct pci_dev *port)
|
|
|
|
{
|
|
|
|
pcie_pme_set_native(port, NULL);
|
|
|
|
if (port->subordinate) {
|
|
|
|
pci_walk_bus(port->subordinate, pcie_pme_set_native, NULL);
|
|
|
|
} else {
|
|
|
|
struct pci_bus *bus = port->bus;
|
|
|
|
struct pci_dev *dev;
|
|
|
|
|
|
|
|
/* Check if this is a root port event collector. */
|
2012-07-24 17:20:03 +08:00
|
|
|
if (pci_pcie_type(port) != PCI_EXP_TYPE_RC_EC || !bus)
|
2010-02-18 06:39:08 +08:00
|
|
|
return;
|
|
|
|
|
|
|
|
down_read(&pci_bus_sem);
|
|
|
|
list_for_each_entry(dev, &bus->devices, bus_list)
|
2010-02-22 13:12:24 +08:00
|
|
|
if (pci_is_pcie(dev)
|
2012-07-24 17:20:03 +08:00
|
|
|
&& pci_pcie_type(dev) == PCI_EXP_TYPE_RC_END)
|
2010-02-18 06:39:08 +08:00
|
|
|
pcie_pme_set_native(dev, NULL);
|
|
|
|
up_read(&pci_bus_sem);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* pcie_pme_probe - Initialize PCIe PME service for given root port.
|
|
|
|
* @srv: PCIe service to initialize.
|
|
|
|
*/
|
|
|
|
static int pcie_pme_probe(struct pcie_device *srv)
|
|
|
|
{
|
|
|
|
struct pci_dev *port;
|
|
|
|
struct pcie_pme_service_data *data;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
data = kzalloc(sizeof(*data), GFP_KERNEL);
|
|
|
|
if (!data)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
spin_lock_init(&data->lock);
|
|
|
|
INIT_WORK(&data->work, pcie_pme_work_fn);
|
|
|
|
data->srv = srv;
|
|
|
|
set_service_data(srv, data);
|
|
|
|
|
|
|
|
port = srv->port;
|
|
|
|
pcie_pme_interrupt_enable(port, false);
|
2010-12-19 22:57:16 +08:00
|
|
|
pcie_clear_root_pme_status(port);
|
2010-02-18 06:39:08 +08:00
|
|
|
|
|
|
|
ret = request_irq(srv->irq, pcie_pme_irq, IRQF_SHARED, "PCIe PME", srv);
|
|
|
|
if (ret) {
|
|
|
|
kfree(data);
|
|
|
|
} else {
|
|
|
|
pcie_pme_mark_devices(port);
|
|
|
|
pcie_pme_interrupt_enable(port, true);
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* pcie_pme_suspend - Suspend PCIe PME service device.
|
|
|
|
* @srv: PCIe service device to suspend.
|
|
|
|
*/
|
|
|
|
static int pcie_pme_suspend(struct pcie_device *srv)
|
|
|
|
{
|
|
|
|
struct pcie_pme_service_data *data = get_service_data(srv);
|
|
|
|
struct pci_dev *port = srv->port;
|
|
|
|
|
|
|
|
spin_lock_irq(&data->lock);
|
|
|
|
pcie_pme_interrupt_enable(port, false);
|
2010-12-19 22:57:16 +08:00
|
|
|
pcie_clear_root_pme_status(port);
|
2010-02-18 06:39:08 +08:00
|
|
|
data->noirq = true;
|
|
|
|
spin_unlock_irq(&data->lock);
|
|
|
|
|
|
|
|
synchronize_irq(srv->irq);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* pcie_pme_resume - Resume PCIe PME service device.
|
|
|
|
* @srv - PCIe service device to resume.
|
|
|
|
*/
|
|
|
|
static int pcie_pme_resume(struct pcie_device *srv)
|
|
|
|
{
|
|
|
|
struct pcie_pme_service_data *data = get_service_data(srv);
|
|
|
|
struct pci_dev *port = srv->port;
|
|
|
|
|
|
|
|
spin_lock_irq(&data->lock);
|
|
|
|
data->noirq = false;
|
2010-12-19 22:57:16 +08:00
|
|
|
pcie_clear_root_pme_status(port);
|
2010-02-18 06:39:08 +08:00
|
|
|
pcie_pme_interrupt_enable(port, true);
|
|
|
|
spin_unlock_irq(&data->lock);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* pcie_pme_remove - Prepare PCIe PME service device for removal.
|
|
|
|
* @srv - PCIe service device to resume.
|
|
|
|
*/
|
|
|
|
static void pcie_pme_remove(struct pcie_device *srv)
|
|
|
|
{
|
|
|
|
pcie_pme_suspend(srv);
|
|
|
|
free_irq(srv->irq, srv);
|
|
|
|
kfree(get_service_data(srv));
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct pcie_port_service_driver pcie_pme_driver = {
|
|
|
|
.name = "pcie_pme",
|
|
|
|
.port_type = PCI_EXP_TYPE_ROOT_PORT,
|
|
|
|
.service = PCIE_PORT_SERVICE_PME,
|
|
|
|
|
|
|
|
.probe = pcie_pme_probe,
|
|
|
|
.suspend = pcie_pme_suspend,
|
|
|
|
.resume = pcie_pme_resume,
|
|
|
|
.remove = pcie_pme_remove,
|
|
|
|
};
|
|
|
|
|
|
|
|
/**
|
|
|
|
* pcie_pme_service_init - Register the PCIe PME service driver.
|
|
|
|
*/
|
|
|
|
static int __init pcie_pme_service_init(void)
|
|
|
|
{
|
PCI: PCIe: Ask BIOS for control of all native services at once
After commit 852972acff8f10f3a15679be2059bb94916cba5d (ACPI: Disable
ASPM if the platform won't provide _OSC control for PCIe) control of
the PCIe Capability Structure is unconditionally requested by
acpi_pci_root_add(), which in principle may cause problems to
happen in two ways. First, the BIOS may refuse to give control of
the PCIe Capability Structure if it is not asked for any of the
_OSC features depending on it at the same time. Second, the BIOS may
assume that control of the _OSC features depending on the PCIe
Capability Structure will be requested in the future and may behave
incorrectly if that doesn't happen. For this reason, control of
the PCIe Capability Structure should always be requested along with
control of any other _OSC features that may depend on it (ie. PCIe
native PME, PCIe native hot-plug, PCIe AER).
Rework the PCIe port driver so that (1) it checks which native PCIe
port services can be enabled, according to the BIOS, and (2) it
requests control of all these services simultaneously. In
particular, this causes pcie_portdrv_probe() to fail if the BIOS
refuses to grant control of the PCIe Capability Structure, which
means that no native PCIe port services can be enabled for the PCIe
Root Complex the given port belongs to. If that happens, ASPM is
disabled to avoid problems with mishandling it by the part of the
PCIe hierarchy for which control of the PCIe Capability Structure
has not been received.
Make it possible to override this behavior using 'pcie_ports=native'
(use the PCIe native services regardless of the BIOS response to the
control request), or 'pcie_ports=compat' (do not use the PCIe native
services at all).
Accordingly, rework the existing PCIe port service drivers so that
they don't request control of the services directly.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-08-22 04:02:38 +08:00
|
|
|
return pcie_port_service_register(&pcie_pme_driver);
|
2010-02-18 06:39:08 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
module_init(pcie_pme_service_init);
|