mirror of https://gitee.com/openkylin/linux.git
Documentation: PCI: convert pci-error-recovery.txt to reST
Convert plain text documentation to reStructuredText format and add it to Sphinx TOC tree. No essential content change. Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
This commit is contained in:
parent
b66357f32f
commit
8a01fa6434
|
@ -13,3 +13,4 @@ Linux PCI Bus Subsystem
|
||||||
pci-iov-howto
|
pci-iov-howto
|
||||||
msi-howto
|
msi-howto
|
||||||
acpi-info
|
acpi-info
|
||||||
|
pci-error-recovery
|
||||||
|
|
|
@ -1,12 +1,13 @@
|
||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
PCI Error Recovery
|
==================
|
||||||
------------------
|
PCI Error Recovery
|
||||||
February 2, 2006
|
==================
|
||||||
|
|
||||||
Current document maintainer:
|
|
||||||
Linas Vepstas <linasvepstas@gmail.com>
|
:Authors: - Linas Vepstas <linasvepstas@gmail.com>
|
||||||
updated by Richard Lary <rlary@us.ibm.com>
|
- Richard Lary <rlary@us.ibm.com>
|
||||||
and Mike Mason <mmlnx@us.ibm.com> on 27-Jul-2009
|
- Mike Mason <mmlnx@us.ibm.com>
|
||||||
|
|
||||||
|
|
||||||
Many PCI bus controllers are able to detect a variety of hardware
|
Many PCI bus controllers are able to detect a variety of hardware
|
||||||
|
@ -63,7 +64,8 @@ mechanisms for dealing with SCSI bus errors and SCSI bus resets.
|
||||||
|
|
||||||
|
|
||||||
Detailed Design
|
Detailed Design
|
||||||
---------------
|
===============
|
||||||
|
|
||||||
Design and implementation details below, based on a chain of
|
Design and implementation details below, based on a chain of
|
||||||
public email discussions with Ben Herrenschmidt, circa 5 April 2005.
|
public email discussions with Ben Herrenschmidt, circa 5 April 2005.
|
||||||
|
|
||||||
|
@ -73,30 +75,33 @@ pci_driver. A driver that fails to provide the structure is "non-aware",
|
||||||
and the actual recovery steps taken are platform dependent. The
|
and the actual recovery steps taken are platform dependent. The
|
||||||
arch/powerpc implementation will simulate a PCI hotplug remove/add.
|
arch/powerpc implementation will simulate a PCI hotplug remove/add.
|
||||||
|
|
||||||
This structure has the form:
|
This structure has the form::
|
||||||
struct pci_error_handlers
|
|
||||||
{
|
|
||||||
int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
|
|
||||||
int (*mmio_enabled)(struct pci_dev *dev);
|
|
||||||
int (*slot_reset)(struct pci_dev *dev);
|
|
||||||
void (*resume)(struct pci_dev *dev);
|
|
||||||
};
|
|
||||||
|
|
||||||
The possible channel states are:
|
struct pci_error_handlers
|
||||||
enum pci_channel_state {
|
{
|
||||||
pci_channel_io_normal, /* I/O channel is in normal state */
|
int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
|
||||||
pci_channel_io_frozen, /* I/O to channel is blocked */
|
int (*mmio_enabled)(struct pci_dev *dev);
|
||||||
pci_channel_io_perm_failure, /* PCI card is dead */
|
int (*slot_reset)(struct pci_dev *dev);
|
||||||
};
|
void (*resume)(struct pci_dev *dev);
|
||||||
|
};
|
||||||
|
|
||||||
Possible return values are:
|
The possible channel states are::
|
||||||
enum pci_ers_result {
|
|
||||||
PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */
|
enum pci_channel_state {
|
||||||
PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
|
pci_channel_io_normal, /* I/O channel is in normal state */
|
||||||
PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
|
pci_channel_io_frozen, /* I/O to channel is blocked */
|
||||||
PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
|
pci_channel_io_perm_failure, /* PCI card is dead */
|
||||||
PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
|
};
|
||||||
};
|
|
||||||
|
Possible return values are::
|
||||||
|
|
||||||
|
enum pci_ers_result {
|
||||||
|
PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */
|
||||||
|
PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
|
||||||
|
PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
|
||||||
|
PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
|
||||||
|
PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
|
||||||
|
};
|
||||||
|
|
||||||
A driver does not have to implement all of these callbacks; however,
|
A driver does not have to implement all of these callbacks; however,
|
||||||
if it implements any, it must implement error_detected(). If a callback
|
if it implements any, it must implement error_detected(). If a callback
|
||||||
|
@ -134,16 +139,17 @@ shouldn't do any new IOs. Called in task context. This is sort of a
|
||||||
|
|
||||||
All drivers participating in this system must implement this call.
|
All drivers participating in this system must implement this call.
|
||||||
The driver must return one of the following result codes:
|
The driver must return one of the following result codes:
|
||||||
- PCI_ERS_RESULT_CAN_RECOVER:
|
|
||||||
Driver returns this if it thinks it might be able to recover
|
- PCI_ERS_RESULT_CAN_RECOVER
|
||||||
the HW by just banging IOs or if it wants to be given
|
Driver returns this if it thinks it might be able to recover
|
||||||
a chance to extract some diagnostic information (see
|
the HW by just banging IOs or if it wants to be given
|
||||||
mmio_enable, below).
|
a chance to extract some diagnostic information (see
|
||||||
- PCI_ERS_RESULT_NEED_RESET:
|
mmio_enable, below).
|
||||||
Driver returns this if it can't recover without a
|
- PCI_ERS_RESULT_NEED_RESET
|
||||||
slot reset.
|
Driver returns this if it can't recover without a
|
||||||
- PCI_ERS_RESULT_DISCONNECT:
|
slot reset.
|
||||||
Driver returns this if it doesn't want to recover at all.
|
- PCI_ERS_RESULT_DISCONNECT
|
||||||
|
Driver returns this if it doesn't want to recover at all.
|
||||||
|
|
||||||
The next step taken will depend on the result codes returned by the
|
The next step taken will depend on the result codes returned by the
|
||||||
drivers.
|
drivers.
|
||||||
|
@ -159,25 +165,27 @@ then recovery proceeds to STEP 4 (Slot Reset).
|
||||||
If the platform is unable to recover the slot, the next step
|
If the platform is unable to recover the slot, the next step
|
||||||
is STEP 6 (Permanent Failure).
|
is STEP 6 (Permanent Failure).
|
||||||
|
|
||||||
>>> The current powerpc implementation assumes that a device driver will
|
.. note::
|
||||||
>>> *not* schedule or semaphore in this routine; the current powerpc
|
|
||||||
>>> implementation uses one kernel thread to notify all devices;
|
|
||||||
>>> thus, if one device sleeps/schedules, all devices are affected.
|
|
||||||
>>> Doing better requires complex multi-threaded logic in the error
|
|
||||||
>>> recovery implementation (e.g. waiting for all notification threads
|
|
||||||
>>> to "join" before proceeding with recovery.) This seems excessively
|
|
||||||
>>> complex and not worth implementing.
|
|
||||||
|
|
||||||
>>> The current powerpc implementation doesn't much care if the device
|
The current powerpc implementation assumes that a device driver will
|
||||||
>>> attempts I/O at this point, or not. I/O's will fail, returning
|
*not* schedule or semaphore in this routine; the current powerpc
|
||||||
>>> a value of 0xff on read, and writes will be dropped. If more than
|
implementation uses one kernel thread to notify all devices;
|
||||||
>>> EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
|
thus, if one device sleeps/schedules, all devices are affected.
|
||||||
>>> assumes that the device driver has gone into an infinite loop
|
Doing better requires complex multi-threaded logic in the error
|
||||||
>>> and prints an error to syslog. A reboot is then required to
|
recovery implementation (e.g. waiting for all notification threads
|
||||||
>>> get the device working again.
|
to "join" before proceeding with recovery.) This seems excessively
|
||||||
|
complex and not worth implementing.
|
||||||
|
|
||||||
|
The current powerpc implementation doesn't much care if the device
|
||||||
|
attempts I/O at this point, or not. I/O's will fail, returning
|
||||||
|
a value of 0xff on read, and writes will be dropped. If more than
|
||||||
|
EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
|
||||||
|
assumes that the device driver has gone into an infinite loop
|
||||||
|
and prints an error to syslog. A reboot is then required to
|
||||||
|
get the device working again.
|
||||||
|
|
||||||
STEP 2: MMIO Enabled
|
STEP 2: MMIO Enabled
|
||||||
-------------------
|
--------------------
|
||||||
The platform re-enables MMIO to the device (but typically not the
|
The platform re-enables MMIO to the device (but typically not the
|
||||||
DMA), and then calls the mmio_enabled() callback on all affected
|
DMA), and then calls the mmio_enabled() callback on all affected
|
||||||
device drivers.
|
device drivers.
|
||||||
|
@ -192,34 +200,36 @@ link reset was performed by the HW. If the platform can't just re-enable IOs
|
||||||
without a slot reset or a link reset, it will not call this callback, and
|
without a slot reset or a link reset, it will not call this callback, and
|
||||||
instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
|
instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
|
||||||
|
|
||||||
>>> The following is proposed; no platform implements this yet:
|
.. note::
|
||||||
>>> Proposal: All I/O's should be done _synchronously_ from within
|
|
||||||
>>> this callback, errors triggered by them will be returned via
|
The following is proposed; no platform implements this yet:
|
||||||
>>> the normal pci_check_whatever() API, no new error_detected()
|
Proposal: All I/O's should be done _synchronously_ from within
|
||||||
>>> callback will be issued due to an error happening here. However,
|
this callback, errors triggered by them will be returned via
|
||||||
>>> such an error might cause IOs to be re-blocked for the whole
|
the normal pci_check_whatever() API, no new error_detected()
|
||||||
>>> segment, and thus invalidate the recovery that other devices
|
callback will be issued due to an error happening here. However,
|
||||||
>>> on the same segment might have done, forcing the whole segment
|
such an error might cause IOs to be re-blocked for the whole
|
||||||
>>> into one of the next states, that is, link reset or slot reset.
|
segment, and thus invalidate the recovery that other devices
|
||||||
|
on the same segment might have done, forcing the whole segment
|
||||||
|
into one of the next states, that is, link reset or slot reset.
|
||||||
|
|
||||||
The driver should return one of the following result codes:
|
The driver should return one of the following result codes:
|
||||||
- PCI_ERS_RESULT_RECOVERED
|
- PCI_ERS_RESULT_RECOVERED
|
||||||
Driver returns this if it thinks the device is fully
|
Driver returns this if it thinks the device is fully
|
||||||
functional and thinks it is ready to start
|
functional and thinks it is ready to start
|
||||||
normal driver operations again. There is no
|
normal driver operations again. There is no
|
||||||
guarantee that the driver will actually be
|
guarantee that the driver will actually be
|
||||||
allowed to proceed, as another driver on the
|
allowed to proceed, as another driver on the
|
||||||
same segment might have failed and thus triggered a
|
same segment might have failed and thus triggered a
|
||||||
slot reset on platforms that support it.
|
slot reset on platforms that support it.
|
||||||
|
|
||||||
- PCI_ERS_RESULT_NEED_RESET
|
- PCI_ERS_RESULT_NEED_RESET
|
||||||
Driver returns this if it thinks the device is not
|
Driver returns this if it thinks the device is not
|
||||||
recoverable in its current state and it needs a slot
|
recoverable in its current state and it needs a slot
|
||||||
reset to proceed.
|
reset to proceed.
|
||||||
|
|
||||||
- PCI_ERS_RESULT_DISCONNECT
|
- PCI_ERS_RESULT_DISCONNECT
|
||||||
Same as above. Total failure, no recovery even after
|
Same as above. Total failure, no recovery even after
|
||||||
reset driver dead. (To be defined more precisely)
|
reset driver dead. (To be defined more precisely)
|
||||||
|
|
||||||
The next step taken depends on the results returned by the drivers.
|
The next step taken depends on the results returned by the drivers.
|
||||||
If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
|
If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
|
||||||
|
@ -293,31 +303,33 @@ device will be considered "dead" in this case.
|
||||||
Drivers for multi-function cards will need to coordinate among
|
Drivers for multi-function cards will need to coordinate among
|
||||||
themselves as to which driver instance will perform any "one-shot"
|
themselves as to which driver instance will perform any "one-shot"
|
||||||
or global device initialization. For example, the Symbios sym53cxx2
|
or global device initialization. For example, the Symbios sym53cxx2
|
||||||
driver performs device init only from PCI function 0:
|
driver performs device init only from PCI function 0::
|
||||||
|
|
||||||
+ if (PCI_FUNC(pdev->devfn) == 0)
|
+ if (PCI_FUNC(pdev->devfn) == 0)
|
||||||
+ sym_reset_scsi_bus(np, 0);
|
+ sym_reset_scsi_bus(np, 0);
|
||||||
|
|
||||||
Result codes:
|
Result codes:
|
||||||
- PCI_ERS_RESULT_DISCONNECT
|
- PCI_ERS_RESULT_DISCONNECT
|
||||||
Same as above.
|
Same as above.
|
||||||
|
|
||||||
Drivers for PCI Express cards that require a fundamental reset must
|
Drivers for PCI Express cards that require a fundamental reset must
|
||||||
set the needs_freset bit in the pci_dev structure in their probe function.
|
set the needs_freset bit in the pci_dev structure in their probe function.
|
||||||
For example, the QLogic qla2xxx driver sets the needs_freset bit for certain
|
For example, the QLogic qla2xxx driver sets the needs_freset bit for certain
|
||||||
PCI card types:
|
PCI card types::
|
||||||
|
|
||||||
+ /* Set EEH reset type to fundamental if required by hba */
|
+ /* Set EEH reset type to fundamental if required by hba */
|
||||||
+ if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
|
+ if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
|
||||||
+ pdev->needs_freset = 1;
|
+ pdev->needs_freset = 1;
|
||||||
+
|
+
|
||||||
|
|
||||||
Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
|
Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
|
||||||
Failure).
|
Failure).
|
||||||
|
|
||||||
>>> The current powerpc implementation does not try a power-cycle
|
.. note::
|
||||||
>>> reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
|
|
||||||
>>> However, it probably should.
|
The current powerpc implementation does not try a power-cycle
|
||||||
|
reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
|
||||||
|
However, it probably should.
|
||||||
|
|
||||||
|
|
||||||
STEP 5: Resume Operations
|
STEP 5: Resume Operations
|
||||||
|
@ -370,44 +382,43 @@ The current policy is to turn this into a platform policy.
|
||||||
That is, the recovery API only requires that:
|
That is, the recovery API only requires that:
|
||||||
|
|
||||||
- There is no guarantee that interrupt delivery can proceed from any
|
- There is no guarantee that interrupt delivery can proceed from any
|
||||||
device on the segment starting from the error detection and until the
|
device on the segment starting from the error detection and until the
|
||||||
slot_reset callback is called, at which point interrupts are expected
|
slot_reset callback is called, at which point interrupts are expected
|
||||||
to be fully operational.
|
to be fully operational.
|
||||||
|
|
||||||
- There is no guarantee that interrupt delivery is stopped, that is,
|
- There is no guarantee that interrupt delivery is stopped, that is,
|
||||||
a driver that gets an interrupt after detecting an error, or that detects
|
a driver that gets an interrupt after detecting an error, or that detects
|
||||||
an error within the interrupt handler such that it prevents proper
|
an error within the interrupt handler such that it prevents proper
|
||||||
ack'ing of the interrupt (and thus removal of the source) should just
|
ack'ing of the interrupt (and thus removal of the source) should just
|
||||||
return IRQ_NOTHANDLED. It's up to the platform to deal with that
|
return IRQ_NOTHANDLED. It's up to the platform to deal with that
|
||||||
condition, typically by masking the IRQ source during the duration of
|
condition, typically by masking the IRQ source during the duration of
|
||||||
the error handling. It is expected that the platform "knows" which
|
the error handling. It is expected that the platform "knows" which
|
||||||
interrupts are routed to error-management capable slots and can deal
|
interrupts are routed to error-management capable slots and can deal
|
||||||
with temporarily disabling that IRQ number during error processing (this
|
with temporarily disabling that IRQ number during error processing (this
|
||||||
isn't terribly complex). That means some IRQ latency for other devices
|
isn't terribly complex). That means some IRQ latency for other devices
|
||||||
sharing the interrupt, but there is simply no other way. High end
|
sharing the interrupt, but there is simply no other way. High end
|
||||||
platforms aren't supposed to share interrupts between many devices
|
platforms aren't supposed to share interrupts between many devices
|
||||||
anyway :)
|
anyway :)
|
||||||
|
|
||||||
>>> Implementation details for the powerpc platform are discussed in
|
.. note::
|
||||||
>>> the file Documentation/powerpc/eeh-pci-error-recovery.txt
|
|
||||||
|
|
||||||
>>> As of this writing, there is a growing list of device drivers with
|
Implementation details for the powerpc platform are discussed in
|
||||||
>>> patches implementing error recovery. Not all of these patches are in
|
the file Documentation/powerpc/eeh-pci-error-recovery.txt
|
||||||
>>> mainline yet. These may be used as "examples":
|
|
||||||
>>>
|
|
||||||
>>> drivers/scsi/ipr
|
|
||||||
>>> drivers/scsi/sym53c8xx_2
|
|
||||||
>>> drivers/scsi/qla2xxx
|
|
||||||
>>> drivers/scsi/lpfc
|
|
||||||
>>> drivers/next/bnx2.c
|
|
||||||
>>> drivers/next/e100.c
|
|
||||||
>>> drivers/net/e1000
|
|
||||||
>>> drivers/net/e1000e
|
|
||||||
>>> drivers/net/ixgb
|
|
||||||
>>> drivers/net/ixgbe
|
|
||||||
>>> drivers/net/cxgb3
|
|
||||||
>>> drivers/net/s2io.c
|
|
||||||
>>> drivers/net/qlge
|
|
||||||
|
|
||||||
The End
|
As of this writing, there is a growing list of device drivers with
|
||||||
-------
|
patches implementing error recovery. Not all of these patches are in
|
||||||
|
mainline yet. These may be used as "examples":
|
||||||
|
|
||||||
|
- drivers/scsi/ipr
|
||||||
|
- drivers/scsi/sym53c8xx_2
|
||||||
|
- drivers/scsi/qla2xxx
|
||||||
|
- drivers/scsi/lpfc
|
||||||
|
- drivers/next/bnx2.c
|
||||||
|
- drivers/next/e100.c
|
||||||
|
- drivers/net/e1000
|
||||||
|
- drivers/net/e1000e
|
||||||
|
- drivers/net/ixgb
|
||||||
|
- drivers/net/ixgbe
|
||||||
|
- drivers/net/cxgb3
|
||||||
|
- drivers/net/s2io.c
|
||||||
|
- drivers/net/qlge
|
|
@ -12143,7 +12143,7 @@ M: Sam Bobroff <sbobroff@linux.ibm.com>
|
||||||
M: Oliver O'Halloran <oohall@gmail.com>
|
M: Oliver O'Halloran <oohall@gmail.com>
|
||||||
L: linuxppc-dev@lists.ozlabs.org
|
L: linuxppc-dev@lists.ozlabs.org
|
||||||
S: Supported
|
S: Supported
|
||||||
F: Documentation/PCI/pci-error-recovery.txt
|
F: Documentation/PCI/pci-error-recovery.rst
|
||||||
F: drivers/pci/pcie/aer.c
|
F: drivers/pci/pcie/aer.c
|
||||||
F: drivers/pci/pcie/dpc.c
|
F: drivers/pci/pcie/dpc.c
|
||||||
F: drivers/pci/pcie/err.c
|
F: drivers/pci/pcie/err.c
|
||||||
|
@ -12156,7 +12156,7 @@ PCI ERROR RECOVERY
|
||||||
M: Linas Vepstas <linasvepstas@gmail.com>
|
M: Linas Vepstas <linasvepstas@gmail.com>
|
||||||
L: linux-pci@vger.kernel.org
|
L: linux-pci@vger.kernel.org
|
||||||
S: Supported
|
S: Supported
|
||||||
F: Documentation/PCI/pci-error-recovery.txt
|
F: Documentation/PCI/pci-error-recovery.rst
|
||||||
|
|
||||||
PCI MSI DRIVER FOR ALTERA MSI IP
|
PCI MSI DRIVER FOR ALTERA MSI IP
|
||||||
M: Ley Foon Tan <lftan@altera.com>
|
M: Ley Foon Tan <lftan@altera.com>
|
||||||
|
|
Loading…
Reference in New Issue