Commit Graph

288 Commits

Author SHA1 Message Date
Dan Williams f0bf750c2d [SCSI] libsas: trim sas_task of slow path infrastructure
The timer and the completion are only used for slow path tasks (smp, and
lldd tmfs), yet we incur the allocation space and cpu setup time for
every fast path task.

Cc: Xiangliang Yu <yuxiangl@marvell.com>
Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:54 +01:00
Dan Williams a494fd5bd9 [SCSI] libsas: drop sata port multiplier infrastructure
On the way to add a new sata_device field, noticed that libsas is
carrying port multiplier infrastructure that is explicitly disabled by
sas_discover_sata().  The aic94xx touches the unused port_no, so leave
that field in case there was some use for it.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:53 +01:00
Dan Williams b17caa174a [SCSI] libsas: fix sas_discover_devices return code handling
commit 198439e4 [SCSI] libsas: do not set res = 0 in sas_ex_discover_dev()
commit 19252de6 [SCSI] libsas: fix wide port hotplug issues

The above commits seem to have confused the return value of
sas_ex_discover_dev which is non-zero on failure and
sas_ex_join_wide_port which just indicates short circuiting discovery on
already established ports.  The result is random discovery failures
depending on configuration.

Calls to sas_ex_join_wide_port are the source of the trouble as its
return value is errantly assigned to 'res'.  Convert it to bool and stop
returning its result up the stack.

Cc: <stable@vger.kernel.org>
Tested-by: Dan Melnic <dan.melnic@amd.com>
Reported-by: Dan Melnic <dan.melnic@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:53 +01:00
Dan Williams 26f2f199ff [SCSI] libsas: continue revalidation
Continue running revalidation until no more broadcast devices are
discovered.  Fixes cases where re-discovery completes too early in a
domain with multiple expanders with pending re-discovery events.
Servicing BCNs can get backed up behind error recovery.

Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:52 +01:00
Jeff Skirvin b2311a2875 [SCSI] libsas: sas_rediscover_dev did not look at the SMP exec status.
The discovery function "sas_rediscover_dev" had two bugs: 1) it did
not pay attention to the return status from the SMP task execution;
2) the stack variable used for the returned SAS address was compared
against 0 without being initialized.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:52 +01:00
Dan Williams e7db822996 [SCSI] libsas: use ->lldd_I_T_nexus_reset for ->eh_bus_reset_handler
sas_eh_bus_reset_handler() amounts to sas_phy_reset() without
notification of the reset to the lldd.  If this is triggered from
eh-cmnd recovery there may be sas_tasks for the lldd to terminate, so
->lldd_I_T_nexus_reset is warranted.

Cc: Xiangliang Yu <yuxiangl@marvell.com>
Cc: Luben Tuikov <ltuikov@yahoo.com>
Cc: Jack Wang <jack_wang@usish.com>
Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
[jacek: modify pm8001_I_T_nexus_reset to return -ENODEV]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:51 +01:00
Dan Williams 9524c68218 [SCSI] libsas: add sas_eh_abort_handler
When recovering failed eh-cmnds let the lldd attempt an abort via
scsi_abort_eh_cmnd before escalating.

Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:50 +01:00
Dan Williams 5db45bdc87 [SCSI] libsas: enforce eh strategy handlers only in eh context
The strategy handlers may be called in places that are problematic for
libsas (i.e. sata resets outside of domain revalidation filtering /
libata link recovery), or problematic for userspace (non-blocking ioctl
to sleeping reset functions).  However, these routines are also called
for eh escalations and recovery of scsi_eh_prep_cmnd(), so permit them
as long as we are running in the host's error handler, otherwise arrange
for them to be triggered in eh_context.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:50 +01:00
Maciej Trela 36fed49805 [SCSI] libsas: cleanup spurious calls to scsi_schedule_eh
eh is woken up automatically by the presence of failed commands,
scsi_schedule_eh is reserved for cases where there are no failed
commands.  This guarantees that host_eh_sceduled is only incremented
when an explicit eh request is made.

Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Maciej Trela <maciej.trela@intel.com>
[fixed spurious delete of sas_ata_task_abort]
Signed-off-by: Artur Wojcik <artur.wojcik@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:47 +01:00
Dan Williams e4a9c3732c [SCSI] libata, libsas: introduce sched_eh and end_eh port ops
When managing shost->host_eh_scheduled libata assumes that there is a
1:1 shost-to-ata_port relationship.  libsas creates a 1:N relationship
so it needs to manage host_eh_scheduled cumulatively at the host level.
The sched_eh and end_eh port port ops allow libsas to track when domain
devices enter/leave the "eh-pending" state under ha->lock (previously
named ha->state_lock, but it is no longer just a lock for ha->state
changes).

Since host_eh_scheduled indicates eh without backing commands pinning
the device it can be deallocated at any time.  Move the taking of the
domain_device reference under the port_lock to guarantee that the
ata_port stays around for the duration of eh.

Reviewed-by: Jacek Danecki <jacek.danecki@intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:45 +01:00
Dan Williams 6ef1b512f4 [SCSI] libsas: fix taskfile corruption in sas_ata_qc_fill_rtf
fill_result_tf() grabs the taskfile flags from the originating qc which
sas_ata_qc_fill_rtf() promptly overwrites.  The presence of an
ata_taskfile in the sata_device makes it tempting to just copy the full
contents in sas_ata_qc_fill_rtf().  However, libata really only wants
the fis contents and expects the other portions of the taskfile to not
be touched by ->qc_fill_rtf.  To that end store a fis buffer in the
sata_device and use ata_tf_from_fis() like every other ->qc_fill_rtf()
implementation.

Cc: <stable@vger.kernel.org>
Reported-by: Praveen Murali <pmurali@logicube.com>
Tested-by: Praveen Murali <pmurali@logicube.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-08 09:49:14 +01:00
Dan Williams b4698d8858 [SCSI] Revert "[SCSI] libsas: fix sas port naming"
This reverts commit a692b0eec5.

Tom reports:

[    8.741033] ------------[ cut here ]------------
[    8.741038] WARNING: at fs/sysfs/dir.c:508 sysfs_add_one+0xc1/0xf0()
[    8.741040] Hardware name: To Be Filled By O.E.M.
[    8.741041] sysfs: cannot create duplicate filename

...and missing 2 out of 4 drives connected to mvsas.  Commit a692b0ee
made the assumption that all the phy ids an lldd registers to libsas are
unique.  However, in the "multi-chip" case mvsas does a rather annoying
duplication of phy ids in the array passed to libsas.  So, for example,
chip0 has phy0-3 at ha phy index 0-3 and chip1 has its phy0-3 at ha phy
index 4-7.  The more natural model would be to create a scsi_host (and
sas_ha) per chip (controller), but for now revert the naming fix which
unfortunately means dealing with unpredictable end-device names for a
bit longer.

Cc: Xiangliang Yu <yuxiangl@marvell.com>
Cc: Patrick Thomson <patrick.s.thomson@intel.com>
Reported-by: Tom Rini <trini@ti.com>
Tested-by: Tom Rini <trini@ti.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-04-23 12:15:53 +01:00
Dan Williams 7d1d865181 [SCSI] libsas: fix false positive 'device attached' conditions
Normalize phy->attached_sas_addr to return a zero-address in the case
when device-type == NO_DEVICE or the linkrate is invalid to handle
expanders that put non-zero sas addresses in the discovery response:

 sas: ex 5001b4da000f903f phy02:U:0 attached: 0100000000000000 (no device)
 sas: ex 5001b4da000f903f phy01:U:0 attached: 0100000000000000 (no device)
 sas: ex 5001b4da000f903f phy03:U:0 attached: 0100000000000000 (no device)
 sas: ex 5001b4da000f903f phy00:U:0 attached: 0100000000000000 (no device)

Reported-by: Andrzej Jakowski <andrzej.jakowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-04-23 12:14:09 +01:00
Dan Williams b202445925 [SCSI] libsas, libata: fix start of life for a sas ata_port
This changes the ordering of initialization and probing events from:
  1/ allocate rphy in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
  2/ allocate ata_port and schedule port probe in DISCE_PROBE
...to:
  1/ allocate ata_port in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
  2/ allocate rphy in PORTE_BYTES_DMAED, DISCE_REVALIDATE_DOMAIN
  3/ schedule port probe in DISCE_PROBE

This ordering prevents PHYE_SIGNAL_LOSS_EVENTS from sneaking in to
destrory ata devices before they have been fully initialized:

  BUG: unable to handle kernel paging request at 0000000000003b10
  IP: [<ffffffffa0053d7e>] sas_ata_end_eh+0x12/0x5e [libsas]
  ...
  [<ffffffffa004d1af>] sas_unregister_common_dev+0x78/0xc9 [libsas]
  [<ffffffffa004d4d4>] sas_unregister_dev+0x4f/0xad [libsas]
  [<ffffffffa004d5b1>] sas_unregister_domain_devices+0x7f/0xbf [libsas]
  [<ffffffffa004c487>] sas_deform_port+0x61/0x1b8 [libsas]
  [<ffffffffa004bed0>] sas_phye_loss_of_signal+0x29/0x2b [libsas]

...and kills the awkward "sata domain_device briefly existing in the
domain without an ata_port" state.

Reported-by: Michal Kosciowski <michal.kosciowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-04-23 12:11:47 +01:00
Dan Williams 0f3fce5cc7 [SCSI] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready
The check_ready implementation in the expander-attached ata device case
polls on sas_ex_phy_discover().  The effect is that the ex_phy fields
(critically ->attached_sas_addr) can change.  When ata_eh ends and
libsas comes along to revalidate the domain
sas_unregister_devs_sas_addr() can fail to lookup devices to remove, or
fail to re-add an ata device that ata_eh marked as disabled.  So change
the code to skip the sas_address and change count updates when ata_eh is
active.

Cc: Jack Wang <jack_wang@usish.com>
Tested-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Tested-by: Bartek Nowakowski <bartek.nowakowski@intel.com>
Tested-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-04-23 12:10:34 +01:00
Dan Williams 9487669fc2 [SCSI] libsas: unify domain_device sas_rphy lifetimes
Since the domain_device can out live the scsi_target we need the rphy to
follow suit otherwise we run into issues like:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
  IP: [<ffffffffa011561b>] sas_ata_printk+0x43/0x6f [libsas]
  PGD 0
  Oops: 0000 [#1] SMP
  CPU 1
  Modules linked in: ses enclosure isci libsas scsi_transport_sas fuse sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf microcode pcspkr igb joydev iTCO_wdt ioatdma iTCO_vendor_support i2c_i801 i2c_core dca wmi hed ipv6 pata_acpi ata_generic [last unloaded: scsi_wait_scan]

  Pid: 129, comm: kworker/u:3 Not tainted 3.3.0-rc5-isci+ #1 Intel Corporation SandyBridge Platform/To be filled by O.E.M.
  RIP: 0010:[<ffffffffa011561b>] [<ffffffffa011561b>] sas_ata_printk+0x43/0x6f [libsas]
  RSP: 0018:ffff88042232dd70 EFLAGS: 00010282
  RAX: 0000000000000000 RBX: ffff8804283165b8 RCX: ffff88042232dda0
  RDX: ffff88042232dd78 RSI: ffff8804283165b8 RDI: ffffffffa01188d7
  RBP: ffff88042232ddd0 R08: ffff880388454000 R09: ffff8803edfde1f8
  R10: ffff8803edfde1f8 R11: ffff8803edfde1f8 R12: ffff880428316750
  R13: ffff880388454000 R14: ffff8803f88b31d0 R15: ffff8803f8b21d50
  FS: 0000000000000000(0000) GS:ffff88042ee20000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000050 CR3: 0000000001a05000 CR4: 00000000000406e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process kworker/u:3 (pid: 129, threadinfo ffff88042232c000, task ffff88042230c920)
  Stack:
  0000000000000000 ffff880400000018 ffff88042232dde0 ffff88042232dda0
  ffffffffa01188c4 ffff88042ee93af0 ffff88042232ddb0 ffffffff8100e047
  ffff88042232de10 ffff880420e5a2c8 ffff8803f8b21d50 ffff8803edfde1f8
  Call Trace:
  [<ffffffff8100e047>] ? load_TLS+0xb/0xf
  [<ffffffffa01156ad>] async_sas_ata_eh+0x66/0x95 [libsas]
  [<ffffffff810655e1>] async_run_entry_fn+0x9e/0x131

Reported-by: Tom Jackson <thomas.p.jackson@intel.com>
Tested-by: Tom Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-04-23 12:08:56 +01:00
Dan Williams ec236e5267 [SCSI] libsas: fix sas_get_port_device regression
Commit 899fcf4 "[SCSI] libsas: set attached device type and target
protocols for local phys" setup 'phy' to be dereferenced after
list_for_each_entry(phy, &port->phy_list, port_phy_el) (i.e. phy ==
&port->phy_list) resulting in reports like:

  BUG: unable to handle kernel NULL pointer dereference at 00000000000002b0
  IP: [<ffffffffa00ce948>] sas_discover_domain+0x29e/0x4fb [libsas]

...fix by deferring sas_phy_set_target() to the end of
sas_get_port_device().

Reported-by: Tom Jackson <thomas.p.jackson@intel.com>
Tested-by: Tom Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-04-23 12:07:25 +01:00
Thomas Jackson 1699490db3 [SCSI] libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys
If an expander reports 'PHY VACANT' for a phy index prior to the one
that generated a BCN libsas fails rediscovery.  Since a vacant phy is
defined as a valid phy index that will never have an attached device
just continue the search.

Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-04-23 12:06:16 +01:00
Dan Williams 22b9153faa [SCSI] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work
When requeuing work to a draining workqueue the last work instance may
not be idle, so sas_queue_work() must not touch work->entry.  Introduce
sas_work with a drain_node list_head to have a private list for
collecting work deferred due to drain collision.

Fixes reports like:
  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [<ffffffff810410d4>] process_one_work+0x2e/0x338

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-04-23 12:03:39 +01:00
Linus Torvalds 424a6f6ef9 SCSI updates on 20120319
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPZxSnAAoJEDeqqVYsXL0M0Y4IAMX0vrTVZbg6psA5/gMcWGRP
 CkFXEQ8n0PL2SCaj6BoDqamJFe5Nc7dnqxM0fGawB4S9vr3rHhiOlwO+NbV9zFYC
 2skBTpeL3sjgtN/jTBdfeeAa7xTYpu/XGyei0NS1A5c2AyMVXV0uYV2s4VNZxe44
 tVIn1OEzM2giZ9EB1OZslDMrg5XXm3MBIUECP0LbWUhBm/35caSFKzMXRwhh7WiK
 +AVmc2AZYtdEwuknDyiH7KlsaoB3vGL9pPrAUJzIgEhy2pOo2A7W72HfA4Fj+y6a
 uF9HBS5zciMp1+sGWry62AjNbWgin9BRlozBEO/lJhIfMGDV1nXEIJsOkOgkdoE=
 =1TxB
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6

SCSI updates from James Bottomley:
 "The update includes the usual assortment of driver updates (lpfc,
  qla2xxx, qla4xxx, bfa, bnx2fc, bnx2i, isci, fcoe, hpsa) plus a huge
  amount of infrastructure work in the SAS library and transport class
  as well as an iSCSI update.  There's also a new SCSI based virtio
  driver."

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (177 commits)
  [SCSI] qla4xxx: Update driver version to 5.02.00-k15
  [SCSI] qla4xxx: trivial cleanup
  [SCSI] qla4xxx: Fix sparse warning
  [SCSI] qla4xxx: Add support for multiple session per host.
  [SCSI] qla4xxx: Export CHAP index as sysfs attribute
  [SCSI] scsi_transport: Export CHAP index as sysfs attribute
  [SCSI] qla4xxx: Add support to display CHAP list and delete CHAP entry
  [SCSI] iscsi_transport: Add support to display CHAP list and delete CHAP entry
  [SCSI] pm8001: fix endian issue with code optimization.
  [SCSI] pm8001: Fix possible racing condition.
  [SCSI] pm8001: Fix bogus interrupt state flag issue.
  [SCSI] ipr: update PCI ID definitions for new adapters
  [SCSI] qla2xxx: handle default case in qla2x00_request_firmware()
  [SCSI] isci: improvements in driver unloading routine
  [SCSI] isci: improve phy event warnings
  [SCSI] isci: debug, provide state-enum-to-string conversions
  [SCSI] scsi_transport_sas: 'enable' phys on reset
  [SCSI] libsas: don't recover end devices attached to disabled phys
  [SCSI] libsas: fixup target_port_protocols for expanders that don't report sata
  [SCSI] libsas: set attached device type and target protocols for local phys
  ...
2012-03-22 12:55:29 -07:00
Cong Wang 77dfce076c scsi: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:19 +08:00
Dan Williams 26a2e68f81 [SCSI] libsas: don't recover end devices attached to disabled phys
If userspace has decided to disable a phy the kernel should honor that
and not inadvertantly re-enable the phy via error recovery.  This is
more straightforward in the sata case where link recovery (via
libata-eh) is separate from sas_task cancelling in libsas-eh.  Teach
libsas to accept -ENODEV as a successful response from I_T_nexus_reset
('successful' in terms of not escalating further).

This is a more comprehensive fix then "libsas: don't recover 'gone'
devices in sas_ata_hard_reset()", as it is no longer sata-specific.

aic94xx does check the return value from sas_phy_reset() so if the phy
is disabled we proceed with clearing the I_T_nexus.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:42:51 -06:00
Dan Williams 77c309f3cd [SCSI] libsas: fixup target_port_protocols for expanders that don't report sata
If discovery returns 0 for target_port_protocols but shows an attached
sata device, just report SAS_PROTOCOL_SATA in the identify data so
userspace can reliably search for sata devices in the domain.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:41:51 -06:00
Dan Williams 899fcf40f3 [SCSI] libsas: set attached device type and target protocols for local phys
Before:
$ cat /sys/class/sas_phy/phy-6\:3/device_type
none
$ cat /sys/class/sas_phy/phy-6\:3/target_port_protocols
none

After:
$ cat /sys/class/sas_phy/phy-6\:3/device_type
end device
$ cat /sys/class/sas_phy/phy-6\:3/target_port_protocols
sata

Also downgrade the phy_list_lock to _irq instead of _irqsave since
libsas will never call sas_get_port_device with interrupts disbled.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:40:33 -06:00
Dan Williams 9a10b33caf [SCSI] libsas: revert ata srst
libata issues follow up srsts when the controller has a hard time
recording the signature-fis after a reset, or if the link supports port
multipliers.  libsas does not support port multipliers and no current
libsas lldds appear to need help retrieving the signature fis.  Revert
it for now to remove confusion.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:39:25 -06:00
Dan Williams 840234745e [SCSI] libsas: fix lifetime of SAS_HA_FROZEN
Until all sas_tasks are known to no longer be in-flight this flag gates late
completions from colliding with error handling.  However, it must be cleared
prior to the submission of scsi_send_eh_cmnd() requests, otherwise those
commands will never be completed correctly.

This was spotted by slub debug:
 =============================================================================
 BUG sas_task: Objects remaining on kmem_cache_close()
 -----------------------------------------------------------------------------

 INFO: Slab 0xffffea001f0eba00 objects=34 used=1 fp=0xffff8807c3aecb00 flags=0x8000000000004080
 Pid: 22919, comm: modprobe Not tainted 3.2.0-isci+ #2
 Call Trace:
  [<ffffffff810fcdcd>] slab_err+0xb0/0xd2
  [<ffffffff810e1c50>] ? free_percpu+0x31/0x117
  [<ffffffff81100122>] ? kzalloc+0x14/0x16
  [<ffffffff81100122>] ? kzalloc+0x14/0x16
  [<ffffffff81100486>] kmem_cache_destroy+0x11d/0x270
  [<ffffffffa0112bdc>] sas_class_exit+0x10/0x12 [libsas]
  [<ffffffff81078fba>] sys_delete_module+0x1c4/0x23c
  [<ffffffff814797ba>] ? sysret_check+0x2e/0x69
  [<ffffffff8126479e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff81479782>] system_call_fastpath+0x16/0x1b
 INFO: Object 0xffff8807c3aed280 @offset=21120
 INFO: Allocated in sas_alloc_task+0x22/0x90 [libsas] age=4615311 cpu=2 pid=12966
  __slab_alloc.clone.3+0x1d1/0x234
  kmem_cache_alloc+0x52/0x10d
  sas_alloc_task+0x22/0x90 [libsas]
  sas_queuecommand+0x20e/0x230 [libsas]
  scsi_send_eh_cmnd+0xd1/0x30c
  scsi_eh_try_stu+0x4f/0x6b
  scsi_eh_ready_devs+0xba/0x6ef
  sas_scsi_recover_host+0xa35/0xab1 [libsas]
  scsi_error_handler+0x14b/0x5fa
  kthread+0x9d/0xa5
  kernel_thread_helper+0x4/0x10

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:38:09 -06:00
Dan Williams 9508a66f89 [SCSI] libsas: async ata scanning
libsas ata error handling is already async but this does not help the
scan case.  Move initial link recovery out from under host->scan_mutex,
and delay synchronization with eh until after all port probe/recovery
work has been queued.

Device ordering is maintained with scan order by still calling
sas_rphy_add() in order of domain discovery.

Since we now scan the domain list when invoking libata-eh we need to be
careful to check for fully initialized ata ports.

Acked-by: Jack Wang <jack_wang@usish.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:35:41 -06:00
Dan Williams 92625f9bff [SCSI] libsas: restore scan order
ata devices are always scanned after ssp.  Prior to the ata error
handling reworks libsas would tend to scan devices in ascending expander
phy order.  Restore this ordering by deferring ssp discovery to a
DISCE_PROBE event, and keep the probe order consistent with the
discovery order, not the placement of sata devices.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:34:19 -06:00
Dan Williams c666aae691 [SCSI] libsas: delete device on sas address changed
If the phy is attached to a new sas address unregister the first address
before processing the new attachment.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:33:39 -06:00
Dan Williams 354cf82980 [SCSI] libsas: let libata recover links that fail to transmit initial sig-fis
libsas fails to discover all sata devices in the domain.  If a device fails
negotiation and does not transmit a signature fis the link needs recovery.
libata already understands how to manage slow to come up links, so treat these
conditions as ata device attach events for the purposes of creating an
ata_port.  This allows libata to manage retrying link bring up.

Rediscovery is modified to be careful about checking changes in dev_type.  It
looks like libsas leaks old devices if the sas address changes, but that's a
fix for another patch.

Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:33:02 -06:00
Dan Williams a692b0eec5 [SCSI] libsas: fix sas port naming
Make sas-port naming consistent with the expander-attached case whereby
the phy-id is the last digit in the port name.  Otherwise we get the
random behavior of the allocation order.

Reported-by: Patrick Thomson <patrick.s.thomson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:29:06 -06:00
Dan Williams d214d81e88 [SCSI] libsas: improve debug statements
It's difficult to determine which domain_device is triggering error recovery,
so convert messages like:

  sas: ex 5001b4da000e703f phy08:T attached: 5001b4da000e7028
  sas: ex 5001b4da000e703f phy09:T attached: 5001b4da000e7029
  ...
  ata7: sas eh calling libata port error handler
  ata8: sas eh calling libata port error handler

...into:

  sas: ex 5001517e85cfefff phy05:T:9 attached: 5001517e85cfefe5 (stp)
  sas: ex 5001517e3b0af0bf phy11:T:8 attached: 5001517e3b0af0ab (stp)
  ...
  sas: ata7: end_device-21:1: dev error handler
  sas: ata8: end_device-20:0:5: dev error handler

which shows attached link rate, device type, and associates a
domain_device with its ata_port id to correlate messages emitted from
libata-eh.

As Doug notes, we can also take the opportunity to clarify expander phy
routing capabilities.

[dgilbert@interlog.com: clarify table2table with 'U']
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:28:24 -06:00
Maciej Trela fdfd9d1b89 [SCSI] libsas: kill spurious sas_put_device
Holdover from a patch rework, prior to the addition of SAS_DEV_DESTROY
we were holding a reference while the destruct was pending in case the
domain was torn down before the desctruct event ran.  That case is
covered by SAS_DEV_DESTROY, and the sas_put_device() just corrupts freed
memory, or worse frees the memory while another agent holds a reference.

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:27:44 -06:00
Dan Williams 5d7f6d1071 [SCSI] libsas: fix sas_unregister_ports vs sas_drain_work
We need to hold drain_mutex across the unregistration as port down events
queue device removal as chained events, so we need to make sure no other
drainers are active.

[ 1118.673968] WARNING: at kernel/workqueue.c:996 __queue_work+0x11a/0x326()
[ 1118.681982] Hardware name: S2600CP
[ 1118.686193] Modules linked in: isci(-) libsas scsi_transport_sas nls_utf8
ipv6 uinput sg iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core ioatdma dca
sd_mod sr_mod cdrom ahci libahci libata [last unloaded: scsi_transport_sas]
[ 1118.709893] Pid: 6831, comm: rmmod Not tainted 3.2.0-isci+ #1
[ 1118.716727] Call Trace:
[ 1118.719867]  [<ffffffff8103e9f5>] warn_slowpath_common+0x85/0x9d
[ 1118.727000]  [<ffffffff8103ea27>] warn_slowpath_null+0x1a/0x1c
[ 1118.733942]  [<ffffffff81056d44>] __queue_work+0x11a/0x326
[ 1118.740481]  [<ffffffff81056f99>] queue_work_on+0x1b/0x22
[ 1118.746925]  [<ffffffff81057106>] queue_work+0x37/0x3e
[ 1118.753105]  [<ffffffffa0120e05>] ? sas_discover_event+0x55/0x82 [libsas]
[ 1118.761094]  [<ffffffff813217c3>] scsi_queue_work+0x42/0x44
[ 1118.767717]  [<ffffffffa0120e19>] sas_discover_event+0x69/0x82 [libsas]
[ 1118.775509]  [<ffffffffa0120f5b>] sas_unregister_dev+0xc3/0xcc [libsas]
[ 1118.783319]  [<ffffffffa0120fae>] sas_unregister_domain_devices+0x4a/0xc8 [libsas]
[ 1118.792731]  [<ffffffffa0120071>] sas_deform_port+0x60/0x1a6 [libsas]
[ 1118.800339]  [<ffffffffa01201ea>] sas_unregister_ports+0x33/0x44 [libsas]
[ 1118.808342]  [<ffffffffa011f7e5>] sas_unregister_ha+0x41/0x6b [libsas]
[ 1118.816055]  [<ffffffffa0134055>] isci_unregister+0x22/0x4d [isci]
[ 1118.823384]  [<ffffffffa0143040>] isci_pci_remove+0x2e/0x60 [isci]

Reported-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:27:09 -06:00
Dan Williams ab5266335b [SCSI] libsas: route local link resets through ata-eh
Similar to the conversion of the transport-class reset we want bsg
initiated resets to be managed by libata.

Reported-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:25:32 -06:00
Dan Williams d230ce691c [SCSI] libsas: fix mixed topology recovery
If we have a domain with sas and sata devices there may still be sas
recovery actions to take after peeling off the commands to send to
libata.

Reported-by: Andrzej Jakowski <andrzej.jakowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:23:24 -06:00
Dan Williams 8abda4d28a [SCSI] libsas: close scsi_remove_target() vs libata-eh race
ata_port lifetime in libata follows the host.  In libsas it follows the
scsi_target.  Once scsi_remove_device() has caused all commands to be
completed it allows scsi_remove_target() to immediately proceed to
freeing the ata_port causing bug reports like:

[  848.393333] BUG: spinlock bad magic on CPU#4, kworker/u:2/5107
[  848.400262] general protection fault: 0000 [#1] SMP
[  848.406244] CPU 4
[  848.408310] Modules linked in: nls_utf8 ipv6 uinput i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma dca sg sd_mod sr_mod cdrom ahci libahci isci libsas libata scsi_transport_sas [last unloaded: scsi_wait_scan]
[  848.432060]
[  848.434137] Pid: 5107, comm: kworker/u:2 Not tainted 3.2.0-isci+ #8 Intel Corporation S2600CP/S2600CP
[  848.445310] RIP: 0010:[<ffffffff8126a68c>]  [<ffffffff8126a68c>] spin_dump+0x5e/0x8c
[  848.454787] RSP: 0018:ffff8807f868dca0  EFLAGS: 00010002
[  848.461137] RAX: 0000000000000048 RBX: ffff8807fe86a630 RCX: ffffffff817d0be0
[  848.469520] RDX: 0000000000000000 RSI: ffffffff814af1cf RDI: 0000000000000002
[  848.477959] RBP: ffff8807f868dcb0 R08: 00000000ffffffff R09: 000000006b6b6b6b
[  848.486327] R10: 000000000003fb8c R11: ffffffff81a19448 R12: 6b6b6b6b6b6b6b6b
[  848.494699] R13: ffff8808027dc520 R14: 0000000000000000 R15: 000000000000001e
[  848.503067] FS:  0000000000000000(0000) GS:ffff88083fd00000(0000) knlGS:0000000000000000
[  848.512899] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  848.519710] CR2: 00007ff77d001000 CR3: 00000007f7a5d000 CR4: 00000000000406e0
[  848.528072] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  848.536446] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  848.544831] Process kworker/u:2 (pid: 5107, threadinfo ffff8807f868c000, task ffff8807ff348000)
[  848.555327] Stack:
[  848.557959]  ffff8807fe86a630 ffff8807fe86a630 ffff8807f868dcd0 ffffffff8126a6e0
[  848.567072]  ffffffff817c142f ffff8807fe86a630 ffff8807f868dcf0 ffffffff8126a703
[  848.576190]  ffff8808027dc520 0000000000000286 ffff8807f868dd10 ffffffff814af1bb
[  848.585281] Call Trace:
[  848.588409]  [<ffffffff8126a6e0>] spin_bug+0x26/0x28
[  848.594357]  [<ffffffff8126a703>] do_raw_spin_unlock+0x21/0x88
[  848.601283]  [<ffffffff814af1bb>] _raw_spin_unlock_irqrestore+0x2c/0x65
[  848.609089]  [<ffffffffa001c103>] ata_scsi_port_error_handler+0x548/0x557 [libata]
[  848.618331]  [<ffffffff81061813>] ? async_schedule+0x17/0x17
[  848.625060]  [<ffffffffa004f30f>] async_sas_ata_eh+0x45/0x69 [libsas]
[  848.632655]  [<ffffffff810618aa>] async_run_entry_fn+0x97/0x125
[  848.639670]  [<ffffffff81057439>] process_one_work+0x207/0x38d
[  848.646577]  [<ffffffff8105738c>] ? process_one_work+0x15a/0x38d
[  848.653681]  [<ffffffff810576f7>] worker_thread+0x138/0x21c
[  848.660305]  [<ffffffff810575bf>] ? process_one_work+0x38d/0x38d
[  848.667493]  [<ffffffff8105b098>] kthread+0x9d/0xa5
[  848.673382]  [<ffffffff8106e1bd>] ? trace_hardirqs_on_caller+0x12f/0x166
[  848.681304]  [<ffffffff814b7704>] kernel_thread_helper+0x4/0x10
[  848.688324]  [<ffffffff814af534>] ? retint_restore_args+0x13/0x13
[  848.695530]  [<ffffffff8105affb>] ? __init_kthread_worker+0x5b/0x5b
[  848.702929]  [<ffffffff814b7700>] ? gs_change+0x13/0x13
[  848.709155] Code: 00 00 48 8d 88 38 04 00 00 44 8b 80 84 02 00 00 31 c0 e8 cf 1b 24 00 41 83 c8 ff 44 8b 4b 08 48 c7 c1 e0 0b 7d 81 4d 85 e4 74 10 <45> 8b 84 24 84 02 00 00 49 8d 8c 24 38 04 00 00 8b 53 04 48 89
[  848.732467] RIP  [<ffffffff8126a68c>] spin_dump+0x5e/0x8c
[  848.738905]  RSP <ffff8807f868dca0>
[  848.743743] ---[ end trace 143161646eee8caa ]---

...so arrange for the ata_port to have the same end of life as the domain
device.

Reported-by: Marcin Tomczak <marcin.tomczak@intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:22:25 -06:00
Dan Williams 7d05919aad [SCSI] libsas: mark all domain devices gone if root port disappears
If the top level expander is hot removed, mark all child devices as gone
before unregistration to short circuit futile recovery.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:20:55 -06:00
Dan Williams 45c73b6519 [SCSI] libsas: pre-clean commands that won the eh vs completion race
When scrolling forward through the eh list (in a clear_q scenario) it is
possible to encounter commands that won the completion vs eh race.  Rather
than sprinkle more "if (!task)" throughout the handler just make a pass
through the list and delete the race winners before handling the rest.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:20:01 -06:00
Dan Williams 43a5ab151f [SCSI] isci: stop interpreting ->lldd_lu_reset() as an ata soft-reset
Driving resets from libsas-eh is pre-mature as libata will make a
decision about performing a softreset.  Currently libata determines
whether to perform a softreset based on ata_eh_followup_srst_needed(),
and none of those conditions apply to isci.

Remove the srst implementation and translate ->lldd_lu_reset() for ata
devices as a request to drive a reset via libata-eh.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 15:13:40 -06:00
Dan Williams cb48d672bf [SCSI] libsas: don't recover 'gone' devices in sas_ata_hard_reset()
The commands that timeout when a disk is forcibly removed may trigger
libata to attempt recovery of the device.  If libsas has decided to
remove the device don't permit ata to continue to issue resets to its
last known phy.

The primary motivation for this patch is hotplug testing by writing 0 to
/sys/class/sas_phy/phyX/enable.  Without this check this test leads to
libata issuing a reset and re-enabling the device that wants to be torn
down.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 13:04:59 -06:00
Dan Williams f41a0c441c [SCSI] libsas: fix sas_find_local_phy(), take phy references
In the direct-attached case this routine returns the phy on which this
device was first discovered.  Which is broken if we want to support
wide-targets, as this phy reference can become stale even though the
port is still active.

In the expander-attached case this routine tries to lookup the phy by
scanning the attached sas addresses of the parent expander, and BUG_ONs
if it can't find it.  However since eh and the libsas workqueue run
independently we can still be attempting device recovery via eh after
libsas has recorded the device as detached.  This is even easier to hit
now that eh is blocked while device domain rediscovery takes place, and
that libata is fed more timed out commands increasing the chances that
it will try to recover the ata device.

Arrange for dev->phy to always point to a last known good phy, it may be
stale after the port is torn down, but it will catch up for wide port
reconfigurations, and never be NULL.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 13:01:06 -06:00
Dan Williams 3a9c5560f6 [SCSI] libsas: check for 'gone' expanders in smp_execute_task()
No sense in issuing or retrying commands to an expander that has been
removed.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 12:55:12 -06:00
Dan Williams 0508c2f3b7 [SCSI] libsas: don't mark expanders as gone when a child device is removed
Commit 56dd2c06 "[SCSI] libsas: Don't issue commands to devices that
have been hot-removed" marked the parent device of an end-device as gone
when all the phys to the end device have been deleted.

The expander device is still present until its parent is removed.  This
is a benign change until the smp_execute_task() path is taught to check
->gone.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 12:51:48 -06:00
Dan Williams 36a3994739 [SCSI] libsas: poll for ata device readiness after reset
Use ata_wait_after_reset() to poll for link recovery after a reset.
This combined with sas_ha->eh_mutex prevents expander rediscovery from
probing phys in an intermediate state.  Local discovery does not have a
mechanism to filter link status changes during this timeout, so it
remains the responsibility of lldds to prevent premature port teardown.
Although once all lldd's support ->lldd_ata_check_ready() that could be
used as a gate to local port teardown.

The signature fis is re-transmitted when the link comes back so we
should be revalidating the ata device class, but that is left to a future
patch.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-29 12:49:36 -06:00
Dan Williams 50824d6c56 [SCSI] libsas: async ata-eh
Once sas_ata_hard_reset() starts honoring the 'deadline' parameter a
pathological configuration could take 25 seconds per ata device
(serialized) to recover.  Run per-port recoveries in parallel.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:24:02 -06:00
Jeff Skirvin 89d3cf6ac3 [SCSI] libsas: add mutex for SMP task execution
SAS does not tag SMP requests, and at least one lldd (isci) does not permit
more than one in-flight request at a time.

[jejb: fix sas_init_dev tab issues while we're at it]
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:22:49 -06:00
Jeff Skirvin 1f4fe89c9c [SCSI] libsas: Remove redundant phy state notification calls.
In the case of an explicit sas_phy_enable call to disable a phy,
the LLDD provides the calls to sas_phy_disconnected and the
PHYE_LOSS_OF_SIGNAL event.

NOTE: This assumes that the lldd(s) generate the notification, which
appears to be the case, but only verfied on isci.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:20:09 -06:00
Dan Williams 2a559f4ba4 [SCSI] libsas: sas_phy_enable via transport_sas_phy_reset
Execute the link-reset triggered by sas_phy_enable via
transport_sas_phy_reset so that it can be managed by libata.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:18:01 -06:00
Dan Williams 81c757bc69 [SCSI] libsas: execute transport link resets with libata-eh via host workqueue
Link resets leave ata affiliations intact, so arrange for libsas to make
an effort to avoid dropping the device due to a slow-to-recover link.
Towards this end carry out reset in the host workqueue so that it can
check for ata devices and kick the reset request to libata.  Hard
resets, in contrast, bypass libata since they are meant for associating
an ata device with another initiator in the domain (tears down
affiliations).

Need to add a new transport_sas_phy_reset() since the current
sas_phy_reset() is a utility function to libsas lldds.  They are not
prepared for it to loop back into eh.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:13:51 -06:00
Dan Williams 0b3e09da13 [SCSI] libsas: perform sas-transport resets in shost->workq context
Extend the sas transport class to allow transport users to attach extra
data to a sas_phy (->hostdata).  Use this area in libsas to move resets
to workq context in preparation for scheduling ata device resets through
libata-eh.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:11:33 -06:00
Dan Williams b52df4174d [SCSI] libsas: use libata-eh-reset for sata rediscovery fis transmit failures
Since sata devices can take several seconds to recover the link on reset
the 0.5 seconds that libsas currently waits may not be enough.  Instead
if we are rediscovering a phy that was previously attached to a sata
device let libata handle any resets to encourage the device to transmit
the initial fis.

Once sas_ata_hard_reset() and lldds learn how to honor 'deadline' libsas
should stop encountering phys in an intermediate state, until then this
will loop until the fis is transmitted or ->attached_sas_addr gets
cleared, but in the more likely initial discovery case we keep existing
behavior.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:09:32 -06:00
Dan Williams 3a2cdf391b [SCSI] libsas: defer SAS_TASK_NEED_DEV_RESET commands to libata
lldds use the SAS_TASK_NEED_DEV_RESET interface to request that eh
perform a reset.  In the sata device case defer the commands that
triggered the reset to libata-eh context so it can perform its pre and
post reset management.

In the sas_ata_post_internal() case the reset request is falling on deaf
ears as the sas_task is immediately destroyed without any reset action.
Since it is currently a nop, and likely superfluous given the conversion
to new-style libata-eh, just drop the request.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:09:02 -06:00
Dan Williams 3944f50995 [SCSI] libsas: let libata handle command timeouts
libsas-eh if it successfully aborts an ata command will hide the timeout
condition (AC_ERR_TIMEOUT) from libata.  The command likely completes
with the all-zero task->task_status it started with.  Instead, interpret
a TMF_RESP_FUNC_COMPLETE as the end of the sas_task but keep the scmd
around for libata-eh to handle.

Tested-by: Andrzej Jakowski <andrzej.jakowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:07:15 -06:00
Dan Williams 9095a64a9a [SCSI] libsas: fix timeout vs completion race
Until we have told the lldd to forget a task a timed out operation can
return from the hardware at any time.  Since completion frees the task
we need to make sure that no tasks run their normal completion handler
once eh has decided to manage the task.  Similar to
ata_scsi_cmd_error_handler() freeze completions to let eh judge the
outcome of the race.

Task collector mode is problematic because it presents a situation where
a task can be timed out and aborted before the lldd has even seen it.
For this case we need to guarantee that a task that an lldd has been
told to forget does not get queued after the lldd says "never seen it".
With sas_scsi_timed_out we achieve this with the ->task_queue_flush
mutex, rather than adding more time.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:06:08 -06:00
Dan Williams a3a142524a [SCSI] libsas: prevent double completion of scmds from eh
We invoke task->task_done() to free the task in the eh case, but at this
point we are prepared for scsi_eh_flush_done_q() to finish off the scmd.

Introduce sas_end_task() to capture the final response status from the
lldd and free the task.

Also take the opportunity to kill this warning.
drivers/scsi/libsas/sas_scsi_host.c: In function ‘sas_end_task’:
drivers/scsi/libsas/sas_scsi_host.c:102:3: warning: case value ‘2’ not in enumerated type ‘enum exec_status’ [-Wswitch]

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 14:04:52 -06:00
Dan Williams 3dff5721e4 [SCSI] libsas: close error handling vs sas_ata_task_done() race
Since sas_ata does not implement ->freeze(), completions for scmds and
internal commands can still arrive concurrent with
ata_scsi_cmd_error_handler() and sas_ata_post_internal() respectively.
By the time either of those is called libata has committed to completing
the qc, and the ATA_PFLAG_FROZEN flag tells sas_ata_task_done() it has
lost the race.

In the sas_ata_post_internal() case we take on the additional
responsibility of freeing the sas_task to close the race with
sas_ata_task_done() freeing the the task while sas_ata_post_internal()
is in the process of invoking ->lldd_abort_task().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:58:38 -06:00
Dan Williams e500a34b02 [SCSI] libsas: kill invocation of scsi_eh_finish_cmd from sas_ata_task_done
Prior to the conversion to the new-style libata-eh sas_ata_task_done()
may have been the last opportunity to clean up the scmd, but now
libata-eh explicitly handles this case.  It also races against sas-eh.
If a lldd completes a task after SAS_TASK_STATE_ABORTED is set it could
trigger a spurious decrement of shost->host_failed.  Current lldds have
the band-aid of checking SAS_TASK_STATE_ABORTED before calling
->task_done(), but better to just let the scmds escalate to libata for
race free cleanup.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:57:01 -06:00
Dan Williams b91bb29618 [SCSI] libsas: use ->set_dmamode to notify lldds of NCQ parameters
sas_discover_sata() notifies lldds of sata devices twice.  Once to allow
the 'identify' to be sent, and a second time to allow aic94xx (the only
libsas driver that cares about sata_dev.identify) to setup NCQ
parameters before the device becomes known to the midlayer.  Replace
this double notification and intervening 'identify' with an explicit
->lldd_ata_set_dmamode notification.  With this change all ata internal
commands are issued by libata, so we no longer need sas_issue_ata_cmd().

The data from the identify command only needs to be cached in one
location so ata_device.id replaces domain_device.sata_dev.identify.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:55:42 -06:00
Dan Williams 87c8331fcf [SCSI] libsas: prevent domain rediscovery competing with ata error handling
libata error handling provides for a timeout for link recovery.  libsas
must not rescan for previously known devices in this interval otherwise
it may remove a device that is simply waiting for its link to recover.
Let libata-eh make the determination of when the link is stable and
prevent libsas (host workqueue) from taking action while this
determination is pending.

Using a mutex (ha->disco_mutex) to flush and disable revalidation while
eh is running requires any discovery action that may block on eh be
moved to its own context outside the lock.  Probing ATA devices
explicitly waits on ata-eh and the cache-flush-io issued during device
removal may also pend awaiting eh completion.  Essentially any rphy
add/remove activity needs to run outside the lock.

This adds two new cleanup states for sas_unregister_domain_devices()
'allocated-but-not-probed', and 'flagged-for-destruction'.  In the
'allocated-but-not-probed' state  dev->rphy points to a rphy that is
known to have not been through a sas_rphy_add() event.  At domain
teardown check if this device is still pending probe and cleanup
accordingly.  Similarly if a device has already been queued for removal
then sas_unregister_domain_devices has nothing to do.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:52:34 -06:00
Dan Williams e139942d77 [SCSI] libsas: convert dev->gone to flags
In preparation for adding tracking of another device state "destroy".

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:51:23 -06:00
Dan Williams 312d3e5611 [SCSI] libsas: remove ata_port.lock management duties from lldds
Each libsas driver (mvsas, pm8001, and isci) has invented a different
method for managing the ap->lock.  The lock is held by the ata
->queuecommand() path.  mvsas drops it prior to acquiring any internal
locks which allows it to hold its internal lock across calls to
task->task_done().  This capability is important as it is the only way
the driver can flush task->task_done() instances to guarantee that it no
longer has any in-flight references to a domain_device at
->lldd_dev_gone() time.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:50:12 -06:00
Dan Williams b1124cd3ec [SCSI] libsas: introduce sas_drain_work()
When an lldd invokes ->notify_port_event() it can trigger a chain of libsas
events to:

  1/ form the port and find the direct attached device

  2/ if the attached device is an expander perform domain discovery

A call to flush_workqueue() will only flush the initial port formation work.
Currently libsas users need to call scsi_flush_work() up to the max depth of
chain (which will grow from 2 to 3 when ata discovery is moved to its own
discovery event).  Instead of open coding multiple calls switch to use
drain_workqueue() to flush sas work.

drain_workqueue() does not handle new work submitted during the drain so
libsas needs a bit of infrastructure to hold off unchained work submissions
while a drain is in flight.  A lldd ->notify() event is considered 'unchained'
while a sas_discover_event() is 'chained'.  As Tejun notes:

  "For now, I think it would be best to add private wrapper in libsas to
   support deferring unchained work items while draining."

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:48:51 -06:00
Dan Williams f8daa6e6d8 [SCSI] libsas: convert ha->state to flags
In preparation for adding new states (SAS_HA_DRAINING, SAS_HA_FROZEN),
convert ha->state into a set of flags.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:47:29 -06:00
Dan Williams b15ebe0b5d [SCSI] libsas: replace event locks with atomic bitops
The locks only served to make sure the pending event bitmask was updated
consistently.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:41:04 -06:00
Dan Williams 756f173fb5 [SCSI] libsas: fix leak of dev->sata_dev.identify_[packet_]device
These are never freed in the nominal path.  A domain_device has a
different lifetime than a sas_rphy we need a dev->rphy independent way
of identifying sata devices.

Reviewed-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:39:36 -06:00
Dan Williams 735f7d2fed [SCSI] libsas: fix domain_device leak
Arrange for the deallocation of a struct domain_device object when it no
longer has:
1/ any children
2/ references by any scsi_targets
3/ references by a lldd

The comment about domain_device lifetime in
Documentation/scsi/libsas.txt is stale as it appears mainline never had
a version of a struct domain_device that was registered as a kobject.
We now manage domain_device reference counts on behalf of external
agents.

Reviewed-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:37:47 -06:00
Dan Williams 6f4e75a49f [SCSI] libsas: kill sas_slave_destroy
Per commit 3e4ec344 "libata: kill ATA_FLAG_DISABLED" needing to set
ATA_DEV_NONE is a holdover from before libsas converted to the
"new-style" ata-eh.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:36:36 -06:00
Dan Williams 95ac7fd189 [SCSI] libsas: remove unused ata_task_resp fields
Commit 1e34c838 "[SCSI] libsas: remove spurious sata control register
read/write" removed the routines to fake the presence of the sata
control registers, now remove the unused data structure fields to kill
any remaining confusion.

Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 13:32:33 -06:00
Paul Gortmaker 09703660ed scsi: Add export.h for EXPORT_SYMBOL/THIS_MODULE as required
For the basic SCSI infrastructure files that are exporting symbols
but not modules themselves, add in the basic export.h header file
to allow the exports.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:23 -04:00
Linus Torvalds ec7ae51753 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (204 commits)
  [SCSI] qla4xxx: export address/port of connection (fix udev disk names)
  [SCSI] ipr: Fix BUG on adapter dump timeout
  [SCSI] megaraid_sas: Fix instance access in megasas_reset_timer
  [SCSI] hpsa: change confusing message to be more clear
  [SCSI] iscsi class: fix vlan configuration
  [SCSI] qla4xxx: fix data alignment and use nl helpers
  [SCSI] iscsi class: fix link local mispelling
  [SCSI] iscsi class: Replace iscsi_get_next_target_id with IDA
  [SCSI] aacraid: use lower snprintf() limit
  [SCSI] lpfc 8.3.27: Change driver version to 8.3.27
  [SCSI] lpfc 8.3.27: T10 additions for SLI4
  [SCSI] lpfc 8.3.27: Fix queue allocation failure recovery
  [SCSI] lpfc 8.3.27: Change algorithm for getting physical port name
  [SCSI] lpfc 8.3.27: Changed worst case mailbox timeout
  [SCSI] lpfc 8.3.27: Miscellanous logic and interface fixes
  [SCSI] megaraid_sas: Changelog and version update
  [SCSI] megaraid_sas: Add driver workaround for PERC5/1068 kdump kernel panic
  [SCSI] megaraid_sas: Add multiple MSI-X vector/multiple reply queue support
  [SCSI] megaraid_sas: Add support for MegaRAID 9360/9380 12GB/s controllers
  [SCSI] megaraid_sas: Clear FUSION_IN_RESET before enabling interrupts
  ...
2011-10-28 16:44:18 -07:00
Dan Williams 1a34c06401 [SCSI] libsas: fix port->dev_list locking
port->dev_list maintains a list of devices attached to a given sas root port.
It needs to be mutated under a lock as contexts outside of the
single-threaded-libsas-workqueue access the list via sas_find_dev_by_rphy().
Fixup locations where the list was being mutated without a lock.

This is a follow-up to commit 5911e963 "[SCSI] libsas: remove expander
from dev list on error", where Luben noted [1]:

    > 2/ We have unlocked list manipulations in sas_ex_discover_end_dev(),
    > sas_unregister_common_dev(), and sas_ex_discover_end_dev()

    Yes, I can see that and that is very unfortunate.

[1]: http://marc.info/?l=linux-scsi&m=131480962006471&w=2

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-16 10:54:02 -05:00
Mark Salyzyn a73914c35b [SCSI] libsas: fix panic when single phy is disabled on a wide port
When a wide port is being utilized to a target, if one disables only one
of the
phys, we get an OS crash:

BUG: unable to handle kernel NULL pointer dereference at
0000000000000238
IP: [<ffffffff814ca9b1>] mutex_lock+0x21/0x50
PGD 4103f5067 PUD 41dba9067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/bus/pci/slots/5/address
CPU 0
Modules linked in: pm8001(U) ses enclosure fuse nfsd exportfs autofs4
ipmi_devintf ipmi_si ipmi_msghandler nfs lockd fscache nfs_acl
auth_rpcgss 8021q fcoe libfcoe garp libfc scsi_transport_fc stp scsi_tgt
llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 sr_mod cdrom
dm_mirror dm_region_hash dm_log uinput sg i2c_i801 i2c_core iTCO_wdt
iTCO_vendor_support e1000e mlx4_ib ib_mad ib_core mlx4_en mlx4_core ext3
jbd mbcache sd_mod crc_t10dif usb_storage ata_generic pata_acpi ata_piix
libsas(U) scsi_transport_sas dm_mod [last unloaded: pm8001]

Modules linked in: pm8001(U) ses enclosure fuse nfsd exportfs autofs4
ipmi_devintf ipmi_si ipmi_msghandler nfs lockd fscache nfs_acl
auth_rpcgss 8021q fcoe libfcoe garp libfc scsi_transport_fc stp scsi_tgt
llc sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 sr_mod cdrom
dm_mirror dm_region_hash dm_log uinput sg i2c_i801 i2c_core iTCO_wdt
iTCO_vendor_support e1000e mlx4_ib ib_mad ib_core mlx4_en mlx4_core ext3
jbd mbcache sd_mod crc_t10dif usb_storage ata_generic pata_acpi ata_piix
libsas(U) scsi_transport_sas dm_mod [last unloaded: pm8001]
Pid: 5146, comm: scsi_wq_5 Not tainted
2.6.32-71.29.1.el6.lustre.7.x86_64 #1 Storage Server
RIP: 0010:[<ffffffff814ca9b1>]  [<ffffffff814ca9b1>]
mutex_lock+0x21/0x50
RSP: 0018:ffff8803e4e33d30  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000238 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff8803e664c800 RDI: 0000000000000238
RBP: ffff8803e4e33d40 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000238 R14: ffff88041acb7200 R15: ffff88041c51ada0
FS:  0000000000000000(0000) GS:ffff880028200000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000238 CR3: 0000000410143000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process scsi_wq_5 (pid: 5146, threadinfo ffff8803e4e32000, task
ffff8803e4e294a0)
Stack:
 ffff8803e664c800 0000000000000000 ffff8803e4e33d70 ffffffffa001f06e
<0> ffff8803e4e33d60 ffff88041c51ada0 ffff88041acb7200 ffff88041bc0aa00
<0> ffff8803e4e33d90 ffffffffa0032b6c 0000000000000014 ffff88041acb7200
Call Trace:
 [<ffffffffa001f06e>] sas_port_delete_phy+0x2e/0xa0 [scsi_transport_sas]
 [<ffffffffa0032b6c>] sas_unregister_devs_sas_addr+0xac/0xe0 [libsas]
 [<ffffffffa0034914>] sas_ex_revalidate_domain+0x204/0x330 [libsas]
 [<ffffffffa00307f0>] ? sas_revalidate_domain+0x0/0x90 [libsas]
 [<ffffffffa0030855>] sas_revalidate_domain+0x65/0x90 [libsas]
 [<ffffffff8108c7d0>] worker_thread+0x170/0x2a0
 [<ffffffff81091ea0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff8108c660>] ? worker_thread+0x0/0x2a0
 [<ffffffff81091b36>] kthread+0x96/0xa0
 [<ffffffff810141ca>] child_rip+0xa/0x20
 [<ffffffff81091aa0>] ? kthread+0x0/0xa0
 [<ffffffff810141c0>] ? child_rip+0x0/0x20
Code: ff ff 85 c0 75 ed eb d6 66 90 55 48 89 e5 48 83 ec 10 48 89 1c 24
4c 89 64 24 08 0f 1f 44 00 00 48 89 fb e8 92 f4 ff ff 48 89 df <f0> ff
0f 79 05 e8 25 00 00 00 65 48 8b 04 25 08 cc 00 00 48 2d
RIP  [<ffffffff814ca9b1>] mutex_lock+0x21/0x50
 RSP <ffff8803e4e33d30>
CR2: 0000000000000238

The following patch is admittedly a band-aid, and does not solve the
root cause, but it still is a good candidate for hardening as a pointer
check before reference.

Signed-off-by: Mark Salyzyn <mark_salyzyn@us.xyratex.com>
Tested-by: Jack Wang <jack_wang@usish.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-02 13:28:55 -05:00
Dan Williams ac013ed1cb [SCSI] isci: export phy events via ->lldd_control_phy()
Allow the sas-transport-class to update events for local phys via a new
PHY_FUNC_GET_EVENTS command to ->lldd_control_phy().  Fixup drivers that
are not prepared for new enum phy_func values, and unify
->lldd_control_phy() error codes.

These are the SAS defined phy events that are reported in a
smp-report-phy-error-log command:
 * /sys/class/sas_phy/<phyX>/invalid_dword_count
 * /sys/class/sas_phy/<phyX>/running_disparity_error_count
 * /sys/class/sas_phy/<phyX>/loss_of_dword_sync_count
 * /sys/class/sas_phy/<phyX>/phy_reset_problem_count

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-02 13:24:26 -05:00
Dan Williams b50102d3e9 [SCSI] isci: atapi support
Based on original implementation from Jiangbi Liu and Maciej Trela.

ATAPI transfers happen in two-to-three stages.  The two stage atapi
commands are those that include a dma data transfer.  The data transfer
portion of these operations is handled by the hardware packet-dma
acceleration.  The three-stage commands do not have a data transfer and
are handled without hardware assistance in raw frame mode.

stage1: transmit host-to-device fis to notify the device of an incoming
atapi cdb.  Upon reception of the pio-setup-fis repost the task_context
to perform the dma transfer of the cdb+data (go to stage3), or repost
the task_context to transmit the cdb as a raw frame (go to stage 2).

stage2: wait for hardware notification of the cdb transmission and then
go to stage 3.

stage3: wait for the arrival of the terminating device-to-host fis and
terminate the command.

To keep the implementation simple we only support ATAPI packet-dma
protocol (for commands with data) to avoid needing to handle the data
transfer manually (like we do for SATA-PIO).  This may affect
compatibility for a small number of devices (see
ATA_HORKAGE_ATAPI_MOD16_DMA).

If the data-transfer underruns, or encounters an error the
device-to-host fis is expected to arrive in the unsolicited frame queue
to pass to libata for disposition.  However, in the DONE_UNEXP_FIS (data
underrun) case it appears we need to craft a response.  In the
DONE_REG_ERR case we do receive the UF and propagate it to libsas.

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-02 13:20:03 -05:00
Jack Wang bb041a0e9c [SCSI] libsas: set sas_address and device type of rphy
Libsas forget to set the sas_address and device type of rphy lead to file
under /sys/class/sas_x show wrong value, fix that.

Signed-off-by: Jack Wang <jack_wang@usish.com>
Tested-by: Crystal Yu <crystal_yu@usish.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-02 12:51:06 -05:00
Dan Williams 97a1420d12 [SCSI] libsas: dynamic queue depth
The queue-depth for libsas-attached devices initializes to 32 and can
only be increased manually via sysfs to a max of 64, while mpt2sas
attached devices initialize to 254 and dynamically float via the
midlayer ->change_queue_depth interface.

No performance regression was observed with this change on the isci
driver.

Tested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-02 12:36:43 -05:00
Dan Williams f6e67035a9 [SCSI] libsas,libata: fix ->change_queue_{depth|type} for sata devices
Pass queue_depth change requests to libata, and prevent queue_type
changes for ATA devices.

Otherwise:
1/ we do not honor the libata specific restrictions on the queue depth
2/ libsas drivers that do not set sdev->tagged_supported are unable to
   change the queue_depth of ata devices via sysfs

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-02 12:30:30 -05:00
Luben Tuikov ffaac8f45b [SCSI] libsas: Allow expander T-T attachments
Allow expander table-to-table attachments for
expanders that support it.

Signed-off-by: Luben Tuikov <ltuikov@yahoo.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-02 12:23:11 -05:00
Dan Williams 8ec6552f4a [SCSI] libsas: sgpio write support
Add SFF-8485 v0.7 / SAS-1 smp-write-gpio register support to libsas.
Defer SAS-2 support unless/until it defines an sgpio interface.

Minimum implementation needed to get the lights blinking.
try_test_sas_gpio_gp_bit() provides a common method to parse the
incoming write data (raw bitstream), and the to_sas_gpio_gp_bit() helper
routine can be used as a basis for the set/clear operations for the
'read' implementation.  Host implementations parse as many bits
(ODx.[012]) as are locally supported and report the number of registers
successfully written.  If the submitted data overruns the internal
number of registers available report the write as a success with the
number of bytes remaining reported in ->resid_len.

Example (assuming an active backplane) set the "identify" pattern for
the first 21 devices:

smp_write_gpio --count=2 --data=92,49,24,92,24,92,49,24 -t 4 --index=1 /dev/bsg/sas_hostX

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-09-22 14:59:09 +04:00
Mark Salyzyn 24926dadc4 [SCSI] libsas: fix failure to revalidate domain for anything but the first expander child.
In an enclosure model where there are chaining expanders to a large body
of storage, it was discovered that libsas, responding to a broadcast
event change, would only revalidate the domain of first child expander
in the list.

The issue is that the pointer value to the discovered source device was
used to break out of the loop, rather than the content of the pointer.

This still remains non-compliant as the revalidate domain code is
supposed to loop through all child expanders, and not stop at the first
one it finds that reports a change count. However, the design of this
routine does not allow multiple device discoveries and that would be a
more complicated set of patches reserved for another day. We are fixing
the glaring bug rather than refactoring the code.

Signed-off-by: Mark Salyzyn <msalyzyn@us.xyratex.com>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-09-22 11:23:56 +04:00
Christoph Hellwig ac81c6a832 [SCSI] libsas: fix sas_queuecommand return values
->queuecommand must return either 0, or one of the SCSI_MLQUEUE_* return
values.  Non-transient errors are indicated by setting cmd->result before
calling ->scsi_done and returning 0.  Fix libsas to adhere to this calling
convention.  Note that the DID_ERROR for returns from the low-level driver
might not be correct for all cases, but it's the best we can do with
the current layering in libsas.  I also suspect that the pre-existing
handling of -SAS_QUEUE_FULL should really be SCSI_MLQUEUE_HOST_BUSY, but
I'll leave that for a separate change.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-30 12:13:46 -07:00
Christoph Hellwig a923f756be [SCSI] libsas: reindent sas_queuecommand
Switch sas_queuecommand to a normal indentation and goto based error handling.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-30 12:12:22 -07:00
Christoph Hellwig 92bd401ba7 [SCSI] libsas: sas_queuecommand doesnt need host_lock
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-30 12:09:41 -07:00
Dan Williams 4fcf812ca3 [SCSI] libsas: export sas_alloc_task()
Now that isci has added a 3rd open coded user of this functionality just
share the libsas version.

Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:35:13 -06:00
Luben Tuikov 5911e963d3 [SCSI] libsas: remove expander from dev list on error
If expander discovery fails (sas_discover_expander()), remove the
expander from the port device list (sas_ex_discover_expander()),
before freeing it. Else the list is corrupted and, e.g., when we
attempt to send SMP commands to other devices, the kernel oopses.

Signed-off-by: Luben Tuikov <ltuikov@yahoo.com>
Cc: stable@kernel.org
Reviewed-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-07-27 15:50:58 +04:00
Dave Jiang 1ca1e43e55 [SCSI] libsas: Add option for SATA soft reset
This allows a libsas driver to optionally provide a soft reset handler
for libata to drive.  The isci driver allows software to control the
assertion/deassertion of SRST.

[jejb: checkpatch.pl fixes]
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
2011-05-26 22:49:33 -05:00
Dan Williams 3673f4bf6a [SCSI] libsas: check dev->gone before submitting sata i/o
Head off doomed-to-fail i/o in sas_queuecommand before sending it down
the ata path.

Before:
sd 7:0:0:0: [sdd] Synchronizing SCSI cache
ata8: no sense translation for status: 0x00
ata8: translated ATA stat/err 0x00/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata8.00: device reported invalid CHS sector 0
ata8: status=0x00 { }
ata8: no sense translation for status: 0x00
ata8: translated ATA stat/err 0x00/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata8.00: device reported invalid CHS sector 0
ata8: status=0x00 { }
ata8: no sense translation for status: 0x00
ata8: translated ATA stat/err 0x00/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata8.00: device reported invalid CHS sector 0
ata8: status=0x00 { }
sd 7:0:0:0: [sdd]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 7:0:0:0: [sdd]  Sense Key : Aborted Command [current] [descriptor]
sd 7:0:0:0: [sdd]  Add. Sense: No additional sense information
sd 7:0:0:0: [sdd] Stopping disk

After:
sd 9:0:0:0: [sdd] Synchronizing SCSI cache
sd 9:0:0:0: [sdd]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 9:0:0:0: [sdd] Stopping disk
sd 9:0:0:0: [sdd] START_STOP FAILED
sd 9:0:0:0: [sdd]  Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

This is a cosmetic change as sata i/o can still leak to a gone device,
but this addresses the nominal hotplug case when releasing the target.

Acked-by: Jack Wang <jack_wang@usish.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
2011-05-26 22:49:33 -05:00
Dan Williams 90f1e10d08 [SCSI] libsas: fix/amend device gone notification in sas_deform_port()
Commit 56dd2c06 "libsas: Don't issue commands to devices that have been
hot-removed" edited Darrick's original patch to remove setting 'gone' in
the sas_deform_port() path because that prevented scsi sync cache
commands from being issued when the driver was unloaded.  However, this
allows true device gone notifications (as signaled port phy events) to
trigger sync cache commands to devices that are known to be unreachable.

Teach libsas which sas_deform_port() invocations are likely device gone
events.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
2011-05-26 22:49:32 -05:00
Xiangliang Yu bb650a1bef [SCSI] libsas: fix SATA NCQ error
Current version of libsas can not handle SATA NCQ error.
This patch handle SATA NCQ error as AHCI do.

Signed-off-by: Xiangliang Yu <yuxiangl@marvell.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
2011-05-24 12:34:01 -04:00
Lucas De Marchi 25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Linus Torvalds a44f99c7ef Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6
* 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: (25 commits)
  video: change to new flag variable
  scsi: change to new flag variable
  rtc: change to new flag variable
  rapidio: change to new flag variable
  pps: change to new flag variable
  net: change to new flag variable
  misc: change to new flag variable
  message: change to new flag variable
  memstick: change to new flag variable
  isdn: change to new flag variable
  ieee802154: change to new flag variable
  ide: change to new flag variable
  hwmon: change to new flag variable
  dma: change to new flag variable
  char: change to new flag variable
  fs: change to new flag variable
  xtensa: change to new flag variable
  um: change to new flag variables
  s390: change to new flag variable
  mips: change to new flag variable
  ...

Fix up trivial conflict in drivers/hwmon/Makefile
2011-03-20 18:14:55 -07:00
Linus Torvalds c55d267de2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (170 commits)
  [SCSI] scsi_dh_rdac: Add MD36xxf into device list
  [SCSI] scsi_debug: add consecutive medium errors
  [SCSI] libsas: fix ata list corruption issue
  [SCSI] hpsa: export resettable host attribute
  [SCSI] hpsa: move device attributes to avoid forward declarations
  [SCSI] scsi_debug: Logical Block Provisioning (SBC3r26)
  [SCSI] sd: Logical Block Provisioning update
  [SCSI] Include protection operation in SCSI command trace
  [SCSI] hpsa: fix incorrect PCI IDs and add two new ones (2nd try)
  [SCSI] target: Fix volume size misreporting for volumes > 2TB
  [SCSI] bnx2fc: Broadcom FCoE offload driver
  [SCSI] fcoe: fix broken fcoe interface reset
  [SCSI] fcoe: precedence bug in fcoe_filter_frames()
  [SCSI] libfcoe: Remove stale fcoe-netdev entries
  [SCSI] libfcoe: Move FCOE_MTU definition from fcoe.h to libfcoe.h
  [SCSI] libfc: introduce __fc_fill_fc_hdr that accepts fc_hdr as an argument
  [SCSI] fcoe, libfc: initialize EM anchors list and then update npiv EMs
  [SCSI] Revert "[SCSI] libfc: fix exchange being deleted when the abort itself is timed out"
  [SCSI] libfc: Fixing a memory leak when destroying an interface
  [SCSI] megaraid_sas: Version and Changelog update
  ...

Fix up trivial conflicts due to whitespace differences in
drivers/scsi/libsas/{sas_ata.c,sas_scsi_host.c}
2011-03-17 17:54:40 -07:00
matt mooney bfbec92075 scsi: change to new flag variable
Replace EXTRA_CFLAGS with ccflags-y.

Signed-off-by: matt mooney <mfm@muteddisk.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2011-03-17 14:05:35 +01:00
James Bottomley a82058a730 [SCSI] libsas: fix ata list corruption issue
I think this stems from a misunderstanding of how the ata error handler
works.  ata_scsi_cmd_error_handler() gets called with a passed in list
of commands to handle.  However, that list may still not be empty when
it exits.  The command ata_scsi_port_error_handler() must be called
(which takes no list) before the list will be completely emptied.  This
bites the sas error handler because the two are called from different
functions and the original list has gone out of scope before
ata_scsi_port_error_handler() is called. leading to some commands
dangling on bare stack, which is a potential memory corruption issue.
Fix this by manually deleting all outstanding commands from the on-stack
list before it goes out of scope.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-03-14 18:46:25 -05:00
Sergei Shtylyov 9cbe056f6c libata: remove ATA_FLAG_NO_LEGACY
All checks of ATA_FLAG_NO_LEGACY have been removed by the commits
c791c30670 ([libata] minor PCI IDE probe
fixes and cleanups) and f0d36efdc6 (libata:
update libata core layer to use devres), so I think it's time to finally
get rid of this flag...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02 02:36:46 -05:00
Sergei Shtylyov 3696df3099 libata: remove ATA_FLAG_MMIO
Commit 0d5ff56677 (libata: convert to iomap)
removed all checks of ATA_FLAG_MMIO but neglected to remove the flag itself.
Do it now, at last...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02 02:36:46 -05:00
Sergei Shtylyov c10f97b9d8 libata: remove ATA_FLAG_{SRST|SATA_RESET}
These flags are marked as obsolete and the checks for them have been removed
by commit 294440887b (libata-sff: kill unused
ata_bus_reset()), so I think it's time to finally get rid of them...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02 02:36:46 -05:00
Sergei Shtylyov 0f2e0330a8 ipr/sas_ata: use mode mask macros from <linux/ata.h>
Commit 14bdef982c ([libata] convert drivers to
use ata.h mode mask defines) didn't convert these two libata driver outside
drivers/ata/...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02 02:36:46 -05:00
James Bottomley 00dd4998a6 libsas: convert to libata new error handler
The conversion is quite complex given that the libata new error
handler has to be hooked into the current libsas timeout and error
handling.  The way this is done is to process all the failed commands
via libsas first, but if they have no underlying sas task (and they're
on a sata device) assume they are destined for the libata error
handler and send them accordingly.

Finally, activate the port recovery of the libata error handler for
each port known to the host.  This is somewhat suboptimal, since that
port may not need recovering, but given the current architecture of
the libata error handler, it's the only way; and the spurious
activation is harmless.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2011-03-02 02:36:45 -05:00