Commit Graph

63369 Commits

Author SHA1 Message Date
Kashyap, Desai 37c60f374a [SCSI] mpt fusion: rewrite of all internal generated functions
Rewrite of all internal generated functions that issue commands to firmware,
porting them to be single threaded using the generic MPT_MGMT
struct. Implemented using completion Queue.

Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-09 17:27:49 -05:00
Kashyap, Desai f0f09d3b3f [SCSI] mpt fusion: config path optimized, completion queue is used
1) 	Previously we had mutliple #defines to use same values.
	Now those #defines are optimized.
	MPT_IOCTL_STATUS_* is removed and  MPT_MGMT_STATUS_* are new
	#defines.
2.)	config path is optimized.
	Instead of wait Queue and timer, using completion Q.
3.)	mpt_timer_expired is not used.

[jejb: elide patch to eliminate mpt_timer_expired]
Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-09 17:26:20 -05:00
Kashyap, Desai fd76175a7d [SCSI] mpt fusion: Optimized SendEvent notification Using Doorbell instead FIFO
SendEventNotification was handled through FIFO, now it is using doorbell to
communicate with hardware. Added Sleep Flag as an extra argument to support
Can-Sleep feature.  Resending patch including compilation error fix reviewed
by Grant Grundler.

Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-09 17:22:08 -05:00
Kashyap, Desai 7b5a65b9e6 [SCSI] mpt fusion: Added support for MPT discovery completion check
sas_discovery_quiesce_io flag is used to control IO start/resume functionality.
IO will be stoped while doing discovery of topology. Once discovery is completed
It will resume IO. Resending patch including James review.

Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-09 17:21:49 -05:00
Kashyap, Desai 14d0f0b063 [SCSI] mpt fusion: Fixing 1078 data corruption issue for 36GB memory region
The reason for this change is there is a data corruption when four different
physical memory regions in the 36GB to 37GB region are
accessed. This is only affecting 1078.

The solution is we need to use different addressing when filling in
the scatter gather table for the effected memory regions.  So instead
of snooping on all four different memory holes, we treat any physical
addresses in the 36GB address with the same algorithm.

The fix is explained below
1) Ensure that the message frames are NOT located in the trouble
region. There is no remapping available for message frames, they must
be allocated outside the problem region.
2) Ensure that Sense buffers are NOT in the trouble region. There is
no remapping available.
3) Walk through the SGE entries and if any are inside the trouble region
   then they need to be remapped as discussed below.
	1) Set the Local Address bit in the SGE Flags field.
  	MPI_SGE_FLAGS_LOCAL_ADDRESS
  	2) Ensure we are using 64-bit SGEs
  	3) Set MSb (Bit 63) of the 64-bit address, this will indicate buffer
	location is Host Memory.

Signed-off-by: Kashyap Desai <kadesai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-09 17:21:31 -05:00
Alan Cox 238ddbb98c [SCSI] gdth: fix overlapping snprintf users
Closes-bug: http://bugzilla.kernel.org/show_bug.cgi?id=13438
Closes-bug: http://bugzilla.kernel.org/show_bug.cgi?id=13437
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-09 10:31:46 -05:00
Michael Chan cf4e636385 [SCSI] bnx2i: Add bnx2i iSCSI driver.
New iSCSI driver for Broadcom BNX2 devices.  The driver interfaces with
the CNIC driver to access the hardware.

Signed-off-by: Anil Veerabhadrappa <anilgv@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-09 10:22:45 -05:00
Michael Chan a463696039 [SCSI] cnic: Add new Broadcom CNIC driver.
The CNIC driver controls BNX2 hardware rings and resources used by
iSCSI.  Most hardware resources for iSCSI are separate from those
used for ethernet networking.

iSCSI uses a separate MAC address and IP address.  The CNIC driver
creates a UIO interface to handle the non-offloaded packets such as
ARP, etc in userspace.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-09 10:22:42 -05:00
Michael Chan 4edd473f20 [SCSI] bnx2: Add support for CNIC driver.
Add interface and functions to support a new CNIC driver to drive
the Broadcom bnx2 hardware for iSCSI offload.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-09 10:22:38 -05:00
Michael Chan 43514774ff [SCSI] iscsi class: Add new NETLINK_ISCSI messages for cnic/bnx2i driver.
Add ISCSI_NETLINK messages for iSCSI NICs to get information such as
path from userspace.  Original iscsid messages are now always sent as
multicast to group 1.  The new messages are sent to group 2.

The multicast changes were made by Mike Christie.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-09 10:22:35 -05:00
Brian King 126c5cc37e [SCSI] ibmvscsi: Add support for capabilities MAD
Add support to ibmvscsi for the capabilities MAD. This command gets sent
to the Virtual I/O server prior to login in order to communicate client
capabilities. Additionally it returns information regarding capabilities
that the server supports. The two main capabilities communicated in this
MAD are related to partition migration and client reserve. Client reserve
allows for SCSI-2 reservations to be sent to virtual disks which are backed
by physical LUNs and will result in the reservation being sent to the
physical LUN.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 18:05:21 -05:00
Robert Jennings c1988e3123 [SCSI] ibmvscsi: Enable fast fail feature
A new mode of error reporting, fast fail, has been added to the VIOS
which allows failover to happen more quickly.

If this new fast fail mode is enabled on the VIOS and the vSCSI client
supports the mode, the VIOS will not return MEDIUM error on path failures,
but rather return VIOSRP_ADAPTER_FAIL in the crq response, which
ibmvscsi will translate to DID_ERROR.

This new mode can be enabled for single path configurations as well,
so it is the new default error reporting mode. A module parameter is
provided to disable this new behavior on the off chance it causes a
problem on some old VIOS version.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 18:05:20 -05:00
Brian King 3507e13fcb [SCSI] ibmvscsi: Send adapter info before login
The ibmvscsi driver currently sends the SRP Login before sending the Adapter
Info MAD, which can result in commands getting sent to the virtual adapter
before we are ready for them. This results in a slight window where the target
devices may not behave as expected. Change the order and close the window.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 18:05:18 -05:00
Robert Jennings e1a5ce5b88 [SCSI] ibmvscsi: Add specific timeouts for operations
Previously we had one timeout that was used for all types of operations.
This adds specific timeout values for different operations (init, login,
adapter info MAD, abort task, and LUN reset).

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 18:05:16 -05:00
Brian King fbc56f0801 [SCSI] ibmvscsi: Add 16 byte CDB support
Adds support for 16 byte CDBs to the ibmvscsi driver.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 18:05:15 -05:00
Andrew Vasquez 714df9399b [SCSI] qla2xxx: Update version number to 8.03.01-k3.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:47:02 -05:00
Andrew Vasquez 18e7555a38 [SCSI] qla2xxx: Synchronize MPI settings after a PE Reset.
Ensure MPS remains in synchronization across all NIC/FCoE
functions after a reset.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:47:00 -05:00
Andrew Vasquez 656e89122a [SCSI] qla2xxx: Export additional firmware-states for application support.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:58 -05:00
Andrew Vasquez f999f4c196 [SCSI] qla2xxx: Reduce lock-contention during do-work processing.
Queued work processing will now be serialized with its own
lower-priority spinlock.  This also simplifies the work-queue
interface for future work-queue consumers.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:57 -05:00
Andrew Vasquez 6805c1504e [SCSI] qla2xxx: Avoid explicit LOGO during driver host tear-down.
As firmware will ultimately terminate (stop) and port
states-cleared.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:55 -05:00
Andrew Vasquez aed1088112 [SCSI] qla2xxx: Query supported RISC registers bits in determining a paused-state.
ISP24xx and above must query the host-status register, not HCCR.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:54 -05:00
Andrew Vasquez e8233ca40b [SCSI] qla2xxx: Avoid redundant RISC reset during (re)-initialization.
ISP24xx and above ISPs perform a RISC reset in
qla24xx_reset_chip(), which is called prior to
qla24xx_chip_diag().

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:53 -05:00
Andrew Vasquez eeebcc9223 [SCSI] qla2xxx: Fallback enode-mac should not be a multicast address.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:51 -05:00
Andrew Vasquez 81eb9b4985 [SCSI] qla2xxx: Add notification message when an NPIV fails to acquire a port-id.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:50 -05:00
Andrew Vasquez 9f8fddeef2 [SCSI] qla2xxx: Add 10Gb iiDMA support.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:49 -05:00
Andrew Vasquez f4658b6ccc [SCSI] qla2xxx: Mark a port's state as needing-rediscovery during link disruptions.
With RSCN states not being kept across qla2x00_configure_loop()
invocations, loop-resync distruptions during fabric-discovery may
cause ports to remain in a lost state.  Force state
renegotiation during a follow-on configure-loop iteration.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:47 -05:00
Andrew Vasquez ca9e9c3eb1 [SCSI] qla2xxx: Check status of qla2x00_get_fw_version() call.
Unlike earlier ISPs, recent ISPs (ISP81xx) can in fact fail this
mailbox command.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:46 -05:00
Anirban Chakraborty 59e0b8b088 [SCSI] qla2xxx: Correct NULL pointer bug in cpu affinity mode.
This patch fixes a NULL pointer bug that occurs when IO is being
carried out on a vport for which the cpu affinity mode is turned on.

Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:44 -05:00
Andrew Vasquez 94b3aa47ac [SCSI] qla2xxx: Use 'proper' DID_* status code for dropped-frame scenarios.
The SCSI-midlayer's fast-fail codes consider an DID_ERROR status
as a driver-error and the failed I/O would then be retried in the
midlayer without being fast-failed to dm-multipath.  DID_BUS_BUSY
status returns would induce unneeded path-failures events being
propagated to the DM/MD.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:43 -05:00
Andrew Vasquez cbc8eb67da [SCSI] qla2xxx: Fallback to 'golden-firmware' operation on supported ISPs.
In case the onboard firmware is unable to be read or loaded for
operation, attempt to fallback to a limited-operational firmware
image stored in a different flash region.  This will allow a user
to reflash and correct a board with proper operational firmware.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:41 -05:00
Anirban Chakraborty 40859ae5f1 [SCSI] qla2xxx: Correct queue-creation bug when driver loaded in QoS mode.
Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:39 -05:00
Andrew Vasquez 1b91a2e671 [SCSI] qla2xxx: Correct logic-bug in set-model-info().
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:38 -05:00
Andrew Vasquez 11bbc1d896 [SCSI] qla2xxx: Export TLV data on supported ISPs.
Firmware currently provides PB and PGF TLVs.

Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:36 -05:00
Andrew Vasquez ce0423f4a2 [SCSI] qla2xxx: Export XGMAC statistics on supported ISPs.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:34 -05:00
Andrew Vasquez 7f77402517 [SCSI] qla2xxx: Export negotiated fabric-parameters for application support.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 14:46:32 -05:00
Vasu Dev 4e57e1cbbd [SCSI] fcoe: removes reserving memory for vlan_ethdr on tx path
This is not required as VLAN header is added by device
interface driver, this was causing bad FC_CRC in FCoE pkts when
using VLAN interface.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 13:29:16 -05:00
Vasu Dev 1047f22108 [SCSI] fcoe: removes fcoe_watchdog
Removes periodic fcoe_watchdog timer used across all fcoe interface
maintained in fcoe_hostlist instead added new fcoe_queue_timer
per fcoe interface.

Added timer is armed only when some pending skb need to be flushed
as oppose to periodic 1 second fcoe_watchdog, since now
fcoe_queue_timer is used on demand thus set this to 2 jiffies.

Now fcoe_queue_timer is much simple than fcoe_watchdog using lock to
process all fcoe interface from fcoe_hostlist.

I noticed +ve performance result with using 2 jiffies timer as
this helps flushing fcoe_pending_queue quickly.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 13:29:14 -05:00
Vasu Dev 4bb6b51533 [SCSI] fcoe: reduces lock cost when adding a new skb to fcoe_pending_queue
Currently fcoe_pending_queue.lock held twice for every new skb
adding to this queue when already least one pkt is pending in this
queue and that is not uncommon once skb pkts starts getting queued
here upon fcoe_start_io => dev_queue_xmit failure.

This patch moves most fcoe_pending_queue logic to fcoe_check_wait_queue
function, this new logic grabs fcoe_pending_queue.lock only once to
add a new skb instead twice as used to be.

I think after this patch call flow around fcoe_check_wait_queue
calling in fcoe_xmit is bit simplified with modified
fcoe_check_wait_queue function taking care of adding and
removing pending skb in one function.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 13:29:13 -05:00
Steve Ma 30121d14f5 [SCSI] libfc: Check if exchange is completed when receiving a sequence
When a sequence is received in response to an exchange we issued previously,
we should check to see if the exchange has completed. If yes, the sequence
should be discarded. Since the exchange might be still in the completion
process, it should be untouched.

Signed-off-by: Steve Ma <steve.ma@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 13:29:11 -05:00
Mike Christie d5e6054a0a [SCSI] libfc: use DID_ERROR when we have internall aborted command
If we aborted a command, because it timed out we should not use
DID_ABORT. It will fail the command right away back to the upper
layer. We want to use something that indicated that the problem
did not complete normally, but it was not a fatal problem.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 13:29:10 -05:00
Chris Leech 0f4915398a [SCSI] fcoe: use ETH_P_FIP for skb->protocol of FIP frames
FIP frames should leave the fcoe layer with skb->protocol set to
ETH_P_FIP, not ETH_P_802_3.

Signed-off-by: Chris Leech <christopher.leech@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 13:29:08 -05:00
Joe Eykholt 5f48f70ece [SCSI] libfcoe: fip: fix non-FIP-mode FLOGI state after reset.
When a reset is sent using fcoeadm on a non-FIP mode NIC,
there's no link flap, so the fcoe_ctlr stays in non-FIP mode.

In that case, FIP wasn't setting the flogi_oxid or map_dest flag,
causing the FLOGI to be sent with the both wrong source MAC and
the wrong destination MAC address, causing it to fail.

This leads to a non-functioning HBA until a link flap or
instance delete/create.

Signed-off-by: Joe Eykholt <jeykholt@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 13:29:07 -05:00
Herbert Xu f00a3328bf [SCSI] cxgb3i: Include net/dst.h for struct dst_cache
This driver needs dst_cache->dev so it should include net/dst.h
to ensure that it builds.  While net/tcp.h probably includes it
already, we shouldn't rely on that since there is no guarantee
that this won't change in future.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 13:09:33 -05:00
Brian King cbbf58f2e2 [SCSI] ibmvfc: Driver version 1.0.6
Bump driver version

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 13:07:51 -05:00
Brian King 6d29cc56be [SCSI] ibmvfc: Improve LOGO/PRLO ELS handling
There are several scenarios where the ibmvfc driver needs to
try to log back into a target on the fabric. Today when these events
occur, we simply go through re-discovery for all attached targets,
assuming that either the query of the name server or an ADISC will
indicate we might need to log back into the target, which doesn't
work for all scenarios. Fix this by taking note of the affected target(s)
in these conditions and ensuring we try to PLOGI back into the target.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 13:07:49 -05:00
Brian King 5e47167b6b [SCSI] ibmvfc: Improve device rediscovery
For certain scenarios during device rediscovery, we detect we need
to log back into a target. Currently we do just that - PLOGI/PRLI
back into the target. Change the code to delete and add the target
from the FC transport layer as well, to ensure we handle any cases
where the target may have changed.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 13:07:47 -05:00
Brian King 497f9c504f [SCSI] ibmvfc: Add flush on halt support
The virtual I/O server controlling the NPIV adapter associated with
a virtual fibre channel adapter can send a HALT event to the client.
When this occurs, the client can no longer send commands until a RESUME
is received. By adding support for flush on halt, we will get all of
our outstanding commands flushed back before the Virtual I/O server
enters the halt state, eliminating potential command timeouts for
outstanding commands which might occur if we did not support this feature.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 13:07:45 -05:00
Brian King 79111d0899 [SCSI] ibmvfc: Add support for NPIV Logout
This patch adds support for a new command supported by the Virtual I/O
Server, NPIV Logout. The command will abort all outstanding commands
and log out of the fabric. Currently, the only way to do this is
by breaking the CRQ, which can take a fairly long time when lots of
commands are outstanding. The NPIV Logout commands provides a mechanism
to accomplish virtually the same function, but is much faster.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 13:07:44 -05:00
Brian King 43c8da907c [SCSI] ibmvfc: Fix deadlock in EH
Fixes the following deadlock scenario shown below. We currently allow
queuecommand to send commands when the ibmvfc workqueue is scanning for
new rports, so we should also allow EH to function at this time as well.

scsi_eh_3     D 0000000000000000 12304  1279      2
Call Trace:
[c0000002f7257730] [c0000002f72577e0] 0xc0000002f72577e0 (unreliable)
[c0000002f7257900] [c0000000000118f4] .__switch_to+0x158/0x1a0
[c0000002f72579a0] [c0000000004f8b40] .schedule+0x8d4/0x9dc
[c0000002f7257b60] [c0000000004f8f08] .schedule_timeout+0xa8/0xe8
[c0000002f7257c50] [d0000000001d23e0] .ibmvfc_wait_while_resetting+0xe4/0x140 [ibmvfc]
[c0000002f7257d20] [d0000000001d3984] .ibmvfc_eh_abort_handler+0x60/0xe4 [ibmvfc]
[c0000002f7257dc0] [d000000000366714] .scsi_error_handler+0x38c/0x674 [scsi_mod]
[c0000002f7257f00] [c0000000000a7470] .kthread+0x78/0xc4
[c0000002f7257f90] [c000000000029b8c] .kernel_thread+0x4c/0x68
ibmvfc_3      D 0000000000000000 12432  1280      2
Call Trace:
[c0000002f7253540] [c0000002f72535f0] 0xc0000002f72535f0 (unreliable)
[c0000002f7253710] [c0000000000118f4] .__switch_to+0x158/0x1a0
[c0000002f72537b0] [c0000000004f8b40] .schedule+0x8d4/0x9dc
[c0000002f7253970] [c0000000004f8e98] .schedule_timeout+0x38/0xe8
[c0000002f7253a60] [c0000000004f80cc] .wait_for_common+0x138/0x220
[c0000002f7253b40] [c0000000000a2784] .flush_cpu_workqueue+0xac/0xcc
[c0000002f7253c10] [c0000000000a2960] .flush_workqueue+0x58/0xa0
[c0000002f7253ca0] [d0000000000827fc] .fc_flush_work+0x4c/0x64 [scsi_transport_fc]
[c0000002f7253d20] [d000000000082db4] .fc_remote_port_add+0x48/0x6c4 [scsi_transport_fc]
[c0000002f7253dd0] [d0000000001d7d04] .ibmvfc_work+0x820/0xa7c [ibmvfc]
[c0000002f7253f00] [c0000000000a7470] .kthread+0x78/0xc4
[c0000002f7253f90] [c000000000029b8c] .kernel_thread+0x4c/0x68
fc_wq_3       D 0000000000000000 10720  1283      2
Call Trace:
[c0000002f559ac30] [c0000002f559ace0] 0xc0000002f559ace0 (unreliable)
[c0000002f559ae00] [c0000000000118f4] .__switch_to+0x158/0x1a0
[c0000002f559aea0] [c0000000004f8b40] .schedule+0x8d4/0x9dc
[c0000002f559b060] [c0000000004f8e98] .schedule_timeout+0x38/0xe8
[c0000002f559b150] [c0000000004f80cc] .wait_for_common+0x138/0x220
[c0000002f559b230] [c0000000002721c4] .blk_execute_rq+0xb4/0x100
[c0000002f559b360] [d00000000036a1f8] .scsi_execute+0x118/0x194 [scsi_mod]
[c0000002f559b420] [d00000000036a32c] .scsi_execute_req+0xb8/0x124 [scsi_mod]
[c0000002f559b500] [d0000000000c1330] .sd_sync_cache+0x8c/0x108 [sd_mod]
[c0000002f559b5e0] [d0000000000c15b4] .sd_shutdown+0x9c/0x158 [sd_mod]
[c0000002f559b660] [d0000000000c16d0] .sd_remove+0x60/0xb4 [sd_mod]
[c0000002f559b700] [c000000000392ecc] .__device_release_driver+0xd0/0x118
[c0000002f559b7a0] [c000000000393080] .device_release_driver+0x30/0x54
[c0000002f559b830] [c000000000392108] .bus_remove_device+0x128/0x16c
[c0000002f559b8d0] [c00000000038f94c] .device_del+0x158/0x234
[c0000002f559b960] [d00000000036f078] .__scsi_remove_device+0x5c/0xd4 [scsi_mod]
[c0000002f559b9f0] [d00000000036f124] .scsi_remove_device+0x34/0x58 [scsi_mod]
[c0000002f559ba80] [d00000000036f204] .__scsi_remove_target+0xb4/0x120 [scsi_mod]
[c0000002f559bb10] [d00000000036f338] .__remove_child+0x2c/0x44 [scsi_mod]
[c0000002f559bb90] [c00000000038f11c] .device_for_each_child+0x54/0xb4
[c0000002f559bc50] [d00000000036f2e0] .scsi_remove_target+0x70/0x9c [scsi_mod]
[c0000002f559bce0] [d000000000083454] .fc_starget_delete+0x24/0x3c [scsi_transport_fc]
[c0000002f559bd70] [c0000000000a2368] .run_workqueue+0x118/0x208
[c0000002f559be30] [c0000000000a2580] .worker_thread+0x128/0x154
[c0000002f559bf00] [c0000000000a7470] .kthread+0x78/0xc4
[c0000002f559bf90] [c000000000029b8c] .kernel_thread+0x4c/0x68

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 13:07:42 -05:00
Brian King 7d0e462247 [SCSI] ibmvfc: Reduce error logging noise
The ibmvfc driver currently logs errors during discovery for several
transient fabric errors, which generally get retried. If retries
do not work, we see multiple errors in the log. If retries do work,
we see errors in the log which may be confusing since the retry worked.
This patch enhances the discovery time error logging to only log errors
for command failures during discovery if all allowed retries have been
used up. The existing behavior of logging all failures can be restored
by setting the hosts log_level to a value of 3 or greater.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-08 13:07:34 -05:00