Commit Graph

378 Commits

Author SHA1 Message Date
Tony Battersby 6f4a16266f scsi-mq: fix requests that use a separate CDB buffer
This patch fixes code such as the following with scsi-mq enabled:

    rq = blk_get_request(...);
    blk_rq_set_block_pc(rq);

    rq->cmd = my_cmd_buffer; /* separate CDB buffer */

    blk_execute_rq_nowait(...);

Code like this appears in e.g. sg_start_req() in drivers/scsi/sg.c (for
large CDBs only).  Without this patch, scsi_mq_prep_fn() will set
rq->cmd back to rq->__cmd, causing the wrong CDB to be sent to the device.

Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-22 15:04:31 -05:00
Guenter Roeck 480cadc2b7 scsi: Fix qemu boot hang problem
The latest kernel fails to boot qemu arm images when using scsi
for disk access. Boot gets stuck after the following messages.

brd: module loaded
sym53c8xx 0000:00:0c.0: enabling device (0100 -> 0103)
sym0: <895a> rev 0x0 at pci 0000:00:0c.0 irq 93
sym0: No NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: SCSI BUS has been reset.
scsi host0: sym-2.2.3

Bisect points to commit 71e75c97f9 ("scsi: convert device_busy to
atomic_t"). Code inspection shows the following suspicious change
in scsi_request_fn.

out_delay:
-       if (sdev->device_busy == 0 && !scsi_device_blocked(sdev))
+       if (atomic_read(&sdev->device_busy) && !scsi_device_blocked(sdev))
		blk_delay_queue(q, SCSI_QUEUE_DELAY);
	}

'sdev->device_busy == 0' was replaced with 'atomic_read(&sdev->device_busy)',
meaning the logic was reversed. Changing this expression to
'!atomic_read(&sdev->device_busy)' fixes the problem.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
Reviewed-by: Webb Scales <webbnh@hp.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-19 12:42:26 -05:00
Christoph Hellwig d285203cf6 scsi: add support for a blk-mq based I/O path.
This patch adds support for an alternate I/O path in the scsi midlayer
which uses the blk-mq infrastructure instead of the legacy request code.

Use of blk-mq is fully transparent to drivers, although for now a host
template field is provided to opt out of blk-mq usage in case any unforseen
incompatibilities arise.

In general replacing the legacy request code with blk-mq is a simple and
mostly mechanical transformation.  The biggest exception is the new code
that deals with the fact the I/O submissions in blk-mq must happen from
process context, which slightly complicates the I/O completion handler.
The second biggest differences is that blk-mq is build around the concept
of preallocated requests that also include driver specific data, which
in SCSI context means the scsi_cmnd structure.  This completely avoids
dynamic memory allocations for the fast path through I/O submission.

Due the preallocated requests the MQ code path exclusively uses the
host-wide shared tag allocator instead of a per-LUN one.  This only
affects drivers actually using the block layer provided tag allocator
instead of their own.  Unlike the old path blk-mq always provides a tag,
although drivers don't have to use it.

For now the blk-mq path is disable by defauly and must be enabled using
the "use_blk_mq" module parameter.  Once the remaining work in the block
layer to make blk-mq more suitable for slow devices is complete I hope
to make it the default and eventually even remove the old code path.

Based on the earlier scsi-mq prototype by Nicholas Bellinger.

Thanks to Bart Van Assche and Robert Elliot for testing, benchmarking and
various sugestions and code contributions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25 17:16:28 -04:00
Christoph Hellwig c53c6d6a68 scatterlist: allow chaining to preallocated chunks
Blk-mq drivers usually preallocate their S/G list as part of the request,
but if we want to support the very large S/G lists currently supported by
the SCSI code that would tie up a lot of memory in the preallocated request
pool.  Add support to the scatterlist code so that it can initialize a
S/G list that uses a preallocated first chunks and dynamically allocated
additional chunks.  That way the scsi-mq code can preallocate a first
page worth of S/G entries as part of the request, and dynamically extend
the S/G list when needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25 17:16:21 -04:00
Christoph Hellwig f6d47e74fc scsi: unwind blk_end_request_all and blk_end_request_err calls
Replace the calls to the various blk_end_request variants with opencode
equivalents.  Blk-mq is using a model that gives the driver control
between the bio updates and the actual completion, and making the old
code follow that same model allows us to keep the code more similar for
both paths.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25 17:16:13 -04:00
Christoph Hellwig 2ccbb00808 scsi: only maintain target_blocked if the driver has a target queue limit
This saves us an atomic operation for each I/O submission and completion
for the usual case where the driver doesn't set a per-target can_queue
value.  Only a few iscsi hardware offload drivers set the per-target
can_queue value at the moment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25 17:16:05 -04:00
Christoph Hellwig cd9070c9c5 scsi: fix the {host,target,device}_blocked counter mess
Seems like these counters are missing any sort of synchronization for
updates, as a over 10 year old comment from me noted.  Fix this by
using atomic counters, and while we're at it also make sure they are
in the same cacheline as the _busy counters and not needlessly stored
to in every I/O completion.

With the new model the _busy counters can temporarily go negative,
so all the readers are updated to check for > 0 values.  Longer
term every successful I/O completion will reset the counters to zero,
so the temporarily negative values will not cause any harm.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25 17:15:48 -04:00
Christoph Hellwig 71e75c97f9 scsi: convert device_busy to atomic_t
Avoid taking the queue_lock to check the per-device queue limit.  Instead
we do an atomic_inc_return early on to grab our slot in the queue,
and if necessary decrement it after finishing all checks.

Unlike the host and target busy counters this doesn't allow us to avoid the
queue_lock in the request_fn due to the way the interface works, but it'll
allow us to prepare for using the blk-mq code, which doesn't use the
queue_lock at all, and it at least avoids a queue_lock round trip in
scsi_device_unbusy, which is still important given how busy the queue_lock
is.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25 07:43:45 -04:00
Christoph Hellwig 7466501608 scsi: convert host_busy to atomic_t
Avoid taking the host-wide host_lock to check the per-host queue limit.
Instead we do an atomic_inc_return early on to grab our slot in the queue,
and if necessary decrement it after finishing all checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25 07:43:43 -04:00
Christoph Hellwig 7ae65c0f96 scsi: convert target_busy to an atomic_t
Avoid taking the host-wide host_lock to check the per-target queue limit.
Instead we do an atomic_inc_return early on to grab our slot in the queue,
and if necessary decrement it after finishing all checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25 07:39:00 -04:00
Christoph Hellwig cf68d334dd scsi: push host_lock down into scsi_{host,target}_queue_ready
Prepare for not taking a host-wide lock in the dispatch path by pushing
the lock down into the places that actually need it.  Note that this
patch is just a preparation step, as it will actually increase lock
roundtrips and thus decrease performance on its own.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25 07:38:53 -04:00
Christoph Hellwig 3b5382c459 scsi: set ->scsi_done before calling scsi_dispatch_cmd
The blk-mq code path will set this to a different function, so make the
code simpler by setting it up in a legacy-request specific place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25 07:38:48 -04:00
Christoph Hellwig d0d3bbf96e scsi: centralize command re-queueing in scsi_dispatch_fn
Make sure we only have the logic for requeing commands in one place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25 07:38:41 -04:00
Christoph Hellwig de3e8bf331 scsi: split __scsi_queue_insert
Factor out a helper to set the _blocked values, which we'll reuse for the
blk-mq code path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
2014-07-25 07:38:35 -04:00
Christoph Hellwig 6af7a4ffa2 scsi: add scsi_setup_cmnd helper
Factor out command setup code that will be shared with the blk-mq code path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Webb Scales <webbnh@hp.com>
2014-07-25 07:38:08 -04:00
Christoph Hellwig 4f1e576575 scsi: mark scsi_setup_blk_pc_cmnd static
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:16:29 +02:00
Christoph Hellwig 5158a899d8 scsi: set sc_data_direction in common code
The data direction fiel in the SCSI command is derived only from the block
request structure.  Move setting it up into common code instead of
duplicating it in the ULDs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:11:41 +02:00
Christoph Hellwig 3868cf8ea7 scsi: restructure command initialization for TYPE_FS requests
We should call the device handler prep_fn for all TYPE_FS requests,
not just simple read/write calls that are handled by the disk driver.

Restructure the common I/O code to call the prep_fn handler and zero
out the CDB, and just leave the call to scsi_init_io to the ULDs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:11:27 +02:00
Christoph Hellwig 635d98b1d0 scsi: move the nr_phys_segments assert into scsi_init_io
scsi_init_io should only be called for requests that transfer data,
so move the assert that a request has segments from the callers into
scsi_init_io.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:10:53 +02:00
Maurizio Lombardi e6c11dbb8d scsi_lib: remove the description string in scsi_io_completion()
During IO with fabric faults, one generally sees several "Unhandled error
code" messages in the syslog as shown below:

sd 4:0:6:2: [sdbw] Unhandled error code
sd 4:0:6:2: [sdbw] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
sd 4:0:6:2: [sdbw] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
end_request: I/O error, dev sdbw, sector 0

This comes from scsi_io_completion (in scsi_lib.c) while handling error
codes other than DID_RESET or not deferred sense keys i.e. this is
actually handled by the SCSI mid layer. But what gets displayed here is
"Unhandled error code" which is quite misleading as it indicates
something that is not addressed by the mid layer.

The description string is based on the sense key and sometimes on the
additional sense code;
since the ACTION_FAIL case always prints the sense key and the
additional sense code, this patch removes the description string
completely because it does not add useful information.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:46 +02:00
Christoph Hellwig f1bea55d5a scsi: remove various exports that were only used by scsi_tgt
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-07-17 22:07:45 +02:00
Hannes Reinecke 91921e016a scsi: use dev_printk variants where possible
Using dev_printk variants prefixes the logging message with
the originating device, which makes debugging easier.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:42 +02:00
James Bottomley 89fb4cd1f7 scsi: handle flush errors properly
Flush commands don't transfer data and thus need to be special cased
in the I/O completion handler so that we can propagate errors to
the block layer and filesystem.

Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Reported-by: Steven Haber <steven@qumulo.com>
Tested-by: Steven Haber <steven@qumulo.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 21:56:34 +02:00
Linus Torvalds 23d4ed53b7 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
 "Final small batch of fixes to be included before -rc1.  Some general
  cleanups in here as well, but some of the blk-mq fixes we need for the
  NVMe conversion and/or scsi-mq.  The pull request contains:

   - Support for not merging across a specified "chunk size", if set by
     the driver.  Some NVMe devices perform poorly for IO that crosses
     such a chunk, so we need to support it generically as part of
     request merging avoid having to do complicated split logic.  From
     me.

   - Bump max tag depth to 10Ki tags.  Some scsi devices have a huge
     shared tag space.  Before we failed with EINVAL if a too large tag
     depth was specified, now we truncate it and pass back the actual
     value.  From me.

   - Various blk-mq rq init fixes from me and others.

   - A fix for enter on a dying queue for blk-mq from Keith.  This is
     needed to prevent oopsing on hot device removal.

   - Fixup for blk-mq timer addition from Ming Lei.

   - Small round of performance fixes for mtip32xx from Sam Bradshaw.

   - Minor stack leak fix from Rickard Strandqvist.

   - Two __init annotations from Fabian Frederick"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: add __init to blkcg_policy_register
  block: add __init to elv_register
  block: ensure that bio_add_page() always accepts a page for an empty bio
  blk-mq: add timer in blk_mq_start_request
  blk-mq: always initialize request->start_time
  block: blk-exec.c: Cleaning up local variable address returnd
  mtip32xx: minor performance enhancements
  blk-mq: ->timeout should be cleared in blk_mq_rq_ctx_init()
  blk-mq: don't allow queue entering for a dying queue
  blk-mq: bump max tag depth to 10K tags
  block: add blk_rq_set_block_pc()
  block: add notion of a chunk size for request merging
2014-06-11 08:41:17 -07:00
Linus Torvalds 1c54fc1efe SCSI for-linus on 20140609
This patch consists of the usual driver updates (qla2xxx, qla4xxx, lpfc,
 be2iscsi, fnic, ufs, NCR5380) The NCR5380 is the addition to maintained status
 of a long neglected driver for older hardware.  In addition there are a lot of
 minor fixes and cleanups and some more updates to make scsi mq ready.
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJTlcwiAAoJEDeqqVYsXL0M5qsIALzVPLd4yxA16zCiaPQUeIV5
 mfYmwISFlN+qW3AcUeSH4D13YgegCjEBfqaDMWvIkgouxLy/7jpxtChutq3MCzUE
 cDT1B9+ZrzoqBISRNHEh/gx5F1MOF2VPuqG2pe0J90wyRCNzJscB6PbtWMAo86CA
 2eu7wq3K9FXxCC1qY0PzwBLXHqUcgk5GWiK9CM/k4W0NiTVeNmwPeh5i91IQnBHx
 E2l7NAXgNLyCf5tyeswvZ4pW0T3hlaswNmBB4qC8oJm4U6UqMN+tk4ML63Pz7uPe
 4mlHG0uI8Vbdi13iv1EDUZ9Vo8iqVrzP2UAhakgP9poKSGqE4d/MD0EKNGQB2Es=
 =cBY8
 -----END PGP SIGNATURE-----

Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This patch consists of the usual driver updates (qla2xxx, qla4xxx,
  lpfc, be2iscsi, fnic, ufs, NCR5380) The NCR5380 is the addition to
  maintained status of a long neglected driver for older hardware.  In
  addition there are a lot of minor fixes and cleanups and some more
  updates to make scsi mq ready"

* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (130 commits)
  include/scsi/osd_protocol.h: remove unnecessary __constant
  mvsas: Recognise device/subsystem 9485/9485 as 88SE9485
  Revert "be2iscsi: Fix processing cqe for cxn whose endpoint is freed"
  mptfusion: fix msgContext in mptctl_hp_hostinfo
  acornscsi: remove linked command support
  scsi/NCR5380: dprintk macro
  fusion: Remove use of DEF_SCSI_QCMD
  fusion: Add free msg frames to the head, not tail of list
  mpt2sas: Add free smids to the head, not tail of list
  mpt2sas: Remove use of DEF_SCSI_QCMD
  mpt2sas: Remove uses of serial_number
  mpt3sas: Remove use of DEF_SCSI_QCMD
  mpt3sas: Remove uses of serial_number
  qla2xxx: Use kmemdup instead of kmalloc + memcpy
  qla4xxx: Use kmemdup instead of kmalloc + memcpy
  qla2xxx: fix incorrect debug printk
  be2iscsi: Bump the driver version
  be2iscsi: Fix processing cqe for cxn whose endpoint is freed
  be2iscsi: Fix destroy MCC-CQ before MCC-EQ is destroyed
  be2iscsi: Fix memory corruption in MBX path
  ...
2014-06-09 18:54:06 -07:00
Jens Axboe f27b087b81 block: add blk_rq_set_block_pc()
With the optimizations around not clearing the full request at alloc
time, we are leaving some of the needed init for REQ_TYPE_BLOCK_PC
up to the user allocating the request.

Add a blk_rq_set_block_pc() that sets the command type to
REQ_TYPE_BLOCK_PC, and properly initializes the members associated
with this type of request. Update callers to use this function instead
of manipulating rq->cmd_type directly.

Includes fixes from Christoph Hellwig <hch@lst.de> for my half-assed
attempt.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-06-06 07:57:37 -06:00
Linus Torvalds 681a289548 Merge branch 'for-3.16/core' of git://git.kernel.dk/linux-block into next
Pull block core updates from Jens Axboe:
 "It's a big(ish) round this time, lots of development effort has gone
  into blk-mq in the last 3 months.  Generally we're heading to where
  3.16 will be a feature complete and performant blk-mq.  scsi-mq is
  progressing nicely and will hopefully be in 3.17.  A nvme port is in
  progress, and the Micron pci-e flash driver, mtip32xx, is converted
  and will be sent in with the driver pull request for 3.16.

  This pull request contains:

   - Lots of prep and support patches for scsi-mq have been integrated.
     All from Christoph.

   - API and code cleanups for blk-mq from Christoph.

   - Lots of good corner case and error handling cleanup fixes for
     blk-mq from Ming Lei.

   - A flew of blk-mq updates from me:

     * Provide strict mappings so that the driver can rely on the CPU
       to queue mapping.  This enables optimizations in the driver.

     * Provided a bitmap tagging instead of percpu_ida, which never
       really worked well for blk-mq.  percpu_ida relies on the fact
       that we have a lot more tags available than we really need, it
       fails miserably for cases where we exhaust (or are close to
       exhausting) the tag space.

     * Provide sane support for shared tag maps, as utilized by scsi-mq

     * Various fixes for IO timeouts.

     * API cleanups, and lots of perf tweaks and optimizations.

   - Remove 'buffer' from struct request.  This is ancient code, from
     when requests were always virtually mapped.  Kill it, to reclaim
     some space in struct request.  From me.

   - Remove 'magic' from blk_plug.  Since we store these on the stack
     and since we've never caught any actual bugs with this, lets just
     get rid of it.  From me.

   - Only call part_in_flight() once for IO completion, as includes two
     atomic reads.  Hopefully we'll get a better implementation soon, as
     the part IO stats are now one of the more expensive parts of doing
     IO on blk-mq.  From me.

   - File migration of block code from {mm,fs}/ to block/.  This
     includes bio.c, bio-integrity.c, bounce.c, and ioprio.c.  From me,
     from a discussion on lkml.

  That should describe the meat of the pull request.  Also has various
  little fixes and cleanups from Dave Jones, Shaohua Li, Duan Jiong,
  Fengguang Wu, Fabian Frederick, Randy Dunlap, Robert Elliott, and Sam
  Bradshaw"

* 'for-3.16/core' of git://git.kernel.dk/linux-block: (100 commits)
  blk-mq: push IPI or local end_io decision to __blk_mq_complete_request()
  blk-mq: remember to start timeout handler for direct queue
  block: ensure that the timer is always added
  blk-mq: blk_mq_unregister_hctx() can be static
  blk-mq: make the sysfs mq/ layout reflect current mappings
  blk-mq: blk_mq_tag_to_rq should handle flush request
  block: remove dead code in scsi_ioctl:blk_verify_command
  blk-mq: request initialization optimizations
  block: add queue flag for disabling SG merging
  block: remove 'magic' from struct blk_plug
  blk-mq: remove alloc_hctx and free_hctx methods
  blk-mq: add file comments and update copyright notices
  blk-mq: remove blk_mq_alloc_request_pinned
  blk-mq: do not use blk_mq_alloc_request_pinned in blk_mq_map_request
  blk-mq: remove blk_mq_wait_for_tags
  blk-mq: initialize request in __blk_mq_alloc_request
  blk-mq: merge blk_mq_alloc_reserved_request into blk_mq_alloc_request
  blk-mq: add helper to insert requests from irq context
  blk-mq: remove stale comment for blk_mq_complete_request()
  blk-mq: allow non-softirq completions
  ...
2014-06-02 09:29:34 -07:00
Christoph Hellwig a1b73fc194 scsi: reintroduce scsi_driver.init_command
Instead of letting the ULD play games with the prep_fn move back to
the model of a central prep_fn with a callback to the ULD.  This
already cleans up and shortens the code by itself, and will be required
to properly support blk-mq in the SCSI midlayer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-05-19 12:35:09 +02:00
Christoph Hellwig bc85dc500f scsi: remove scsi_end_request
By folding scsi_end_request into its only caller we can significantly clean
up the completion logic.  We can use simple goto labels now to only have
a single place to finish or requeue command there instead of the previous
convoluted logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-05-19 12:33:41 +02:00
Christoph Hellwig c682adf3e1 scsi: explicitly release bidi buffers
Instead of trying to guess when we have a BIDI buffer in scsi_release_buffers
add a function to explicitly free the BIDI ressoures in the one place that
handles them.  This avoids needing a special __scsi_release_buffers for the
case where we already have freed the request as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
2014-05-19 12:33:41 +02:00
Alan Stern 644373a421 [SCSI] Fix command result state propagation
We're seeing a case where the contents of scmd->result isn't being reset after
a SCSI command encounters an error, is resubmitted, times out and then gets
handled.  The error handler acts on the stale result of the previous error
instead of the timeout.  Fix this by properly zeroing the scmd->status before
the command is resubmitted.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-04-21 14:27:26 -07:00
Christoph Hellwig 68c03d9193 [SCSI] don't reference freed command in scsi_prep_return
Patch

commit 0479633686
Author: Christoph Hellwig <hch@infradead.org>
Date:   Thu Feb 20 14:20:55 2014 -0800

    [SCSI] do not manipulate device reference counts in scsi_get/put_command

Introduced a use after free:I in the kill case of scsi_prep_return we have to
release our device reference, but we do this trying to reference the just
freed command.  Use the local sdev pointer instead.

Fixes: 0479633686
Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-04-21 07:57:22 -07:00
Christoph Hellwig 5e012aad85 [SCSI] don't reference freed command in scsi_init_sgtable
Patch

commit 0479633686
Author: Christoph Hellwig <hch@infradead.org>
Date:   Thu Feb 20 14:20:55 2014 -0800

    [SCSI] do not manipulate device reference counts in scsi_get/put_command

Introduced a use after free: when scsi_init_io fails we have to release our
device reference, but we do this trying to reference the just freed command.
Add a local scsi_device pointer to fix this.

Fixes: 0479633686
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-04-21 07:57:21 -07:00
Jens Axboe b4f42e2831 block: remove struct request buffer member
This was used in the olden days, back when onions were proper
yellow. Basically it mapped to the current buffer to be
transferred. With highmem being added more than a decade ago,
most drivers map pages out of a bio, and rq->buffer isn't
pointing at anything valid.

Convert old style drivers to just use bio_data().

For the discard payload use case, just reference the page
in the bio.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-15 14:03:02 -06:00
Jens Axboe f89e0dd9d1 Linux 3.15-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTSv89AAoJEHm+PkMAQRiGd7AIAKL45dFRhX96W53uzGlpXtv7
 Ecs9CMY5mtsB/rqtV/NSQaUELlNb4Ilb4lITh7NaLWAZxDJ12GwVsIbFoaBj7Ypx
 tfBNNxffGGnTTn2C1PpPQmLytLBvqXIVHMaPpDvnHYJl6g9BihshLzyMrsrizqnA
 DPJ0xMdgGp6BLC4qm0ZH1mS2q+TyXLN2ZCjJ1lp6NNYwBnwOe/ABjnUa0Ztu7aii
 837N8h6nEuKHTr6DgrYHEgYVc3DPHPHaly/UJ8v4U30myzRv83YkD5DsBSUjSLj8
 CzxJEnECa1XljLJK7SjRHy5pSP9tcUlmjx3Fk7IpQixmT+A15cov6fQ967jNlDw=
 =Hxnc
 -----END PGP SIGNATURE-----

Merge tag 'v3.15-rc1' into for-3.16/core

We don't like this, but things have diverged with the blk-mq fixes
in 3.15-rc1. So merge it in.
2014-04-15 14:02:24 -06:00
Martin K. Petersen 2bfad21ecc scsi: Make sure cmd_flags are 64-bit
cmd_flags in struct request is now 64 bits wide but the scsi_execute
functions truncated arguments passed to int leading to errors. Make sure
the flags parameters are u64.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Jens Axboe <axboe@fb.com>
CC: Jan Kara <jack@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-09 20:26:20 -06:00
Jens Axboe 59c3d45e48 block: remove 'q' parameter from kblockd_schedule_*_work()
The queue parameter is never used, just get rid of it.

Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-09 10:17:00 -06:00
Christoph Hellwig 134997a041 [SCSI] remove a useless get/put_device pair in scsi_requeue_command
Avoid a spurious device get/put pair by cleaning up scsi_requeue_command
and folding scsi_unprep_request into it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-03-15 10:19:25 -07:00
Bart Van Assche 27e9e0f12a [SCSI] remove a useless get/put_device pair in scsi_next_command
Eliminate a get_device() / put_device() pair from scsi_next_command().
Both are atomic operations hence removing these slightly improves
performance.

[hch: slight changes due to different context]
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-03-15 10:19:25 -07:00
Bart Van Assche 613be1f626 [SCSI] remove a useless get/put_device pair in scsi_request_fn
SCSI devices may only be removed by calling scsi_remove_device().
That function must invoke blk_cleanup_queue() before the final put
of sdev->sdev_gendev. Since blk_cleanup_queue() waits for the
block queue to drain and then tears it down, scsi_request_fn cannot
be active anymore after blk_cleanup_queue() has returned and hence
the get_device()/put_device() pair in scsi_request_fn is unnecessary.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-03-15 10:19:25 -07:00
Christoph Hellwig 0479633686 [SCSI] do not manipulate device reference counts in scsi_get/put_command
Many callers won't need this and we can optimize them away.  In addition
the handling in the __-prefixed variants was inconsistant to start with.

Based on an earlier patch from Bart Van Assche.

[jejb: fix kerneldoc probelm picked up by Fengguang Wu]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-03-15 10:19:24 -07:00
Christoph Hellwig 21a05df547 [SCSI] avoid taking host_lock in scsi_run_queue unless nessecary
If we don't have starved devices we don't need to take the host lock
to iterate over them.  Also split the function up to be more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-03-15 10:19:24 -07:00
Eiichi Tsukata ee60b2c52e [SCSI] Add timeout to avoid infinite command retry
Currently, scsi error handling in scsi_io_completion() tries to
unconditionally requeue scsi command when device keeps some error state.
For example, UNIT_ATTENTION causes infinite retry with
action == ACTION_RETRY.
This is because retryable errors are thought to be temporary and the scsi
device will soon recover from those errors. Normally, such retry policy is
appropriate because the device will soon recover from temporary error state.

But there is no guarantee that device is able to recover from error state
immediately. Some hardware error can prevent device from recovering.

This patch adds timeout in scsi_io_completion() to avoid infinite command
retry in scsi_io_completion(). Once scsi command retry time is longer than
this timeout, the command is treated as failure.

Signed-off-by: Eiichi Tsukata <eiichi.tsukata.xh@hitachi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2014-03-15 10:19:19 -07:00
Russell King e83b366487 Fix uses of dma_max_pfn() when converting to a limiting address
We must use a 64-bit for this, otherwise overflowed bits get lost, and
that can result in a lower than intended value set.

Fixes: 8e0cb8a1f6 ("ARM: 7797/1: mmc: Use dma_max_pfn(dev) helper for bounce_limit calculations")
Fixes: 7d35496dd9 ("ARM: 7796/1: scsi: Use dma_max_pfn(dev) helper for bounce_limit calculations")
Tested-Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-02-17 23:08:41 +00:00
Santosh Shilimkar 7d35496dd9 ARM: 7796/1: scsi: Use dma_max_pfn(dev) helper for bounce_limit calculations
DMA bounce limit is the maximum direct DMA'able memory beyond which
bounce buffers has to be used to perform dma operations. SCSI driver
relies on dma_mask but its calculation is based on max_*pfn which
don't have uniform meaning across architectures. So make use of
dma_max_pfn() which is expected to return the DMAable maximum pfn
value across architectures.

Cc: linux-scsi@vger.kernel.org
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-10-31 14:49:26 +00:00
Linus Torvalds 357397a141 Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata changes from Tejun Heo:
 "Two interesting changes.

   - libata acpi handling has been restructured so that the association
     between ata devices and ACPI handles are less convoluted.  This
     change shouldn't change visible behavior.

   - Queued TRIM support, which enables sending TRIM to the device
     without draining in-flight RW commands, is added.  Currently only
     enabled for ahci (and likely to stay that way for the foreseeable
     future).

  Other changes are driver-specific updates / fixes"

* 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: bugfix: Remove __le32 in ata_tf_to_fis()
  libata: acpi: Remove ata_dev_acpi_handle stub in libata.h
  libata: Add support for queued DSM TRIM
  libata: Add support for SEND/RECEIVE FPDMA QUEUED
  libata: Add H2D FIS "auxiliary" port flag
  libata: Populate host-to-device FIS "auxiliary" field
  ata: acpi: rework the ata acpi bind support
  sata, highbank: send extra clock cycles in SGPIO patterns
  sata, highbank: set tx_atten override bits
  devicetree: create a separate binding description for sata_highbank
  drivers/ata/sata_rcar.c: simplify use of devm_ioremap_resource
  sata highbank: enable 64-bit DMA mask when using LPAE
  ata: pata_samsung_cf: add missing __iomem annotation
  ata: pata_arasan: Staticize local symbols
  sata_mv: Remove unneeded CONFIG_HAVE_CLK ifdefs
  ata: use dev_get_platdata()
  sata_mv: Remove unneeded forward declaration
  libata: acpi: remove dead code for ata_acpi_(un)bind
  libata: move 'struct ata_taskfile' and friends from ata.h to libata.h
2013-09-03 18:19:53 -07:00
Ewan D. Milne 279afdfe78 [SCSI] Generate uevents on certain unit attention codes
Generate a uevent when the following Unit Attention ASC/ASCQ
codes are received:

    2A/01  MODE PARAMETERS CHANGED
    2A/09  CAPACITY DATA HAS CHANGED
    38/07  THIN PROVISIONING SOFT THRESHOLD REACHED
    3F/03  INQUIRY DATA HAS CHANGED
    3F/0E  REPORTED LUNS DATA HAS CHANGED

Log kernel messages when the following Unit Attention ASC/ASCQ
codes are received that are not as specific as those above:

    2A/xx  PARAMETERS CHANGED
    3F/xx  TARGET OPERATING CONDITIONS HAVE CHANGED

Added logic to set expecting_lun_change for other LUNs on the target
after REPORTED LUNS DATA HAS CHANGED is received, so that duplicate
uevents are not generated, and clear expecting_lun_change when a
REPORT LUNS command completes, in accordance with the SPC-3
specification regarding reporting of the 3F 0E ASC/ASCQ UA.

[jejb: remove SPC3 test in scsi_report_lun_change and some docbook fixes and
       unused variable fix, both reported by Fengguang Wu]
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-26 18:52:27 +04:00
Hannes Reinecke 7e782af576 [SCSI] Return ENODATA on medium error
When a medium error is detected the SCSI stack should return
ENODATA to the upper layers.

[jejb: fix whitespace error]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-23 12:54:53 -04:00
Hannes Reinecke a9d6ceb838 [SCSI] return ENOSPC on thin provisioning failure
When the thin provisioning hard threshold is reached we
should return ENOSPC to inform upper layers about this fact.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-23 12:43:54 -04:00
Hannes Reinecke 0f7f6234d3 [SCSI] Document enhanced error codes
Document the various error codes returned on I/O failure.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-23 12:31:59 -04:00
Aaron Lu f1bc1e4c44 ata: acpi: rework the ata acpi bind support
Binding ACPI handle to SCSI device has several drawbacks, namely:
1 During ATA device initialization time, ACPI handle will be needed
  while SCSI devices are not created yet. So each time ACPI handle is
  needed, instead of retrieving the handle by ACPI_HANDLE macro,
  a namespace scan is performed to find the handle for the corresponding
  ATA device. This is inefficient, and also expose a restriction on
  calling path not holding any lock.
2 The binding to SCSI device tree makes code complex, while at the same
  time doesn't bring us any benefit. All ACPI handlings are still done
  in ATA module, not in SCSI.

Rework the ATA ACPI binding code to bind ACPI handle to ATA transport
devices(ATA port and ATA device). The binding needs to be done only once,
since the ATA transport devices do not go away with hotplug. And due to
this, the flush_work call in hotplug handler for ATA bay is no longer
needed.

Tested on an Intel test platform for binding and runtime power off for
ODD(ZPODD) and hard disk; on an ASUS S400C for binding and normal boot
and S3, where its SATA port node has _SDD and _GTF control methods when
configured as an AHCI controller and its PATA device node has _GTF
control method when configured as an IDE controller. SATA PMP binding
and ATA hotplug is not tested.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Tested-by: Dirk Griesbach <spamthis@freenet.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
2013-08-23 12:09:23 -04:00
Bart Van Assche 0516c08d10 [SCSI] enable destruction of blocked devices which fail LUN scanning
If something goes wrong during LUN scanning, e.g. a transport layer
failure occurs, then __scsi_remove_device() can get invoked by the
LUN scanning code for a SCSI device in state SDEV_CREATED_BLOCK and
before the SCSI device has been added to sysfs (is_visible == 0).
Make sure that even in this case the transition into state SDEV_DEL
occurs. This avoids that __scsi_remove_device() can get invoked a
second time by scsi_forget_host() if this last function is invoked
from another thread than the thread that performs LUN scanning.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-07-09 12:14:09 +01:00
James Bottomley e2eb7244bc [SCSI] Fix race between starved list and device removal
scsi_run_queue() examines all SCSI devices that are present on
the starved list. Since scsi_run_queue() unlocks the SCSI host
lock a SCSI device can get removed after it has been removed
from the starved list and before its queue is run. Protect
against that race condition by holding a reference on the
queue while running it.

Reported-by: Chanho Min <chanho.min@lge.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-07-09 12:14:08 +01:00
Lin Ming 9b21493c45 [SCSI] sd: use REQ_PM in sd's runtime suspend operation
With the introduction of REQ_PM, modify sd's runtime suspend operation
functions to use that flag so that the operations to put the device into
runtime suspended state(i.e. sync cache and stop device) will not affect
its runtime PM status.

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-06 12:48:17 -07:00
Rafael J. Wysocki 53540098b2 ACPI / glue: Add .match() callback to struct acpi_bus_type
USB uses the .find_bridge() callback from struct acpi_bus_type
incorrectly, because as a result of the way it is used by USB every
device in the system that doesn't have a bus type or parent is
passed to usb_acpi_find_device() for inspection.

What USB actually needs, though, is to call usb_acpi_find_device()
for USB ports that don't have a bus type defined, but have
usb_port_device_type as their device type, as well as for USB
devices.

To fix that replace the struct bus_type pointer in struct
acpi_bus_type used for matching devices to specific subsystems
with a .match() callback to be used for this purpose and update
the users of struct acpi_bus_type, including USB, accordingly.
Define the .match() callback routine for USB, usb_acpi_bus_match(),
in such a way that it will cover both USB devices and USB ports
and remove the now redundant .find_bridge() callback pointer from
usb_acpi_bus.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
2013-03-04 14:23:40 +01:00
Aaron Lu 6f4c827e68 [libata] scsi: no poll when ODD is powered off
When the ODD is powered off, any action the user did to the ODD that
would generate a media event will trigger an ACPI interrupt, so the
poll for media event is no longer necessary. And the poll will also
cause a runtime status change, which will stop the ODD from staying in
powered off state, so the poll should better be stopped.

But since we don't have access to the gendisk structure in LLDs, here
comes the disk_events_disable_depth for scsi device. This field is a
hint set by LLDs to convey information to upper layer drivers. A value
of 0 means media poll is necessary for the device, while values above 0
means media poll is not needed and should better be skipped. So we can
increase its value when we are to power off the ODD in ATA layer and
decrease its value when the ODD is powered on, effectively silence the
media events poll.

Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2013-01-25 15:36:43 -05:00
Linus Torvalds 60da5bf47d Merge branch 'for-3.8/core' of git://git.kernel.dk/linux-block
Pull block layer core updates from Jens Axboe:
 "Here are the core block IO bits for 3.8.  The branch contains:

   - The final version of the surprise device removal fixups from Bart.

   - Don't hide EFI partitions under advanced partition types.  It's
     fairly wide spread these days.  This is especially dangerous for
     systems that have both msdos and efi partition tables, where you
     want to keep them in sync.

   - Cleanup of using -1 instead of the proper NUMA_NO_NODE

   - Export control of bdi flusher thread CPU mask and default to using
     the home node (if known) from Jeff.

   - Export unplug tracepoint for MD.

   - Core improvements from Shaohua.  Reinstate the recursive merge, as
     the original bug has been fixed.  Add plugging for discard and also
     fix a problem handling non pow-of-2 discard limits.

  There's a trivial merge in block/blk-exec.c due to a fix that went
  into 3.7-rc at a later point than -rc4 where this is based."

* 'for-3.8/core' of git://git.kernel.dk/linux-block:
  block: export block_unplug tracepoint
  block: add plug for blkdev_issue_discard
  block: discard granularity might not be power of 2
  deadline: Allow 0ms deadline latency, increase the read speed
  partitions: enable EFI/GPT support by default
  bsg: Remove unused function bsg_goose_queue()
  block: Make blk_cleanup_queue() wait until request_fn finished
  block: Avoid scheduling delayed work on a dead queue
  block: Avoid that request_fn is invoked on a dead queue
  block: Let blk_drain_queue() caller obtain the queue lock
  block: Rename queue dead flag
  bdi: add a user-tunable cpu_list for the bdi flusher threads
  block: use NUMA_NO_NODE instead of -1
  block: recursive merge requests
  block CFQ: avoid moving request to different queue
2012-12-17 08:27:23 -08:00
Bart Van Assche 3f3299d5c0 block: Rename queue dead flag
QUEUE_FLAG_DEAD is used to indicate that queuing new requests must
stop. After this flag has been set queue draining starts. However,
during the queue draining phase it is still safe to invoke the
queue's request_fn, so QUEUE_FLAG_DYING is a better name for this
flag.

This patch has been generated by running the following command
over the kernel source tree:

git grep -lEw 'blk_queue_dead|QUEUE_FLAG_DEAD' |
    xargs sed -i.tmp -e 's/blk_queue_dead/blk_queue_dying/g'      \
        -e 's/QUEUE_FLAG_DEAD/QUEUE_FLAG_DYING/g';                \
sed -i.tmp -e "s/QUEUE_FLAG_DYING$(printf \\t)*5/QUEUE_FLAG_DYING$(printf \\t)5/g" \
    include/linux/blkdev.h;                                       \
sed -i.tmp -e 's/ DEAD/ DYING/g' -e 's/dead queue/a dying queue/' \
    -e 's/Dead queue/A dying queue/' block/blk-core.c

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Chanho Min <chanho.min@lge.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-12-06 14:30:58 +01:00
Martin K. Petersen 5db44863b6 [SCSI] sd: Implement support for WRITE SAME
Implement support for WRITE SAME(10) and WRITE SAME(16) in the SCSI disk
driver.

 - We set the default maximum to 0xFFFF because there are several
   devices out there that only support two-byte block counts even with
   WRITE SAME(16). We only enable transfers bigger than 0xFFFF if the
   device explicitly reports MAXIMUM WRITE SAME LENGTH in the BLOCK
   LIMITS VPD.

 - max_write_same_blocks can be overriden per-device basis in sysfs.

 - The UNMAP discovery heuristics remain unchanged but the discard
   limits are tweaked to match the "real" WRITE SAME commands.

 - In the error handling logic we now distinguish between WRITE SAME
   with and without UNMAP set.

The discovery process heuristics are:

 - If the device reports a SCSI level of SPC-3 or greater we'll issue
   READ SUPPORTED OPERATION CODES to find out whether WRITE SAME(16) is
   supported. If that's the case we will use it.

 - If the device supports the block limits VPD and reports a MAXIMUM
   WRITE SAME LENGTH bigger than 0xFFFF we will use WRITE SAME(16).

 - Otherwise we will use WRITE SAME(10) unless the target LBA is beyond
   0xFFFFFFFF or the block count exceeds 0xFFFF.

 - no_write_same is set for ATA, FireWire and USB.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-11-13 22:45:42 -08:00
James Bottomley fe709ed827 Merge SCSI misc branch into isci-for-3.6 tag 2012-10-02 08:55:12 +01:00
Vikas Chaudhary 0e58076b37 [SCSI] scsi_lib: Set the device state from transport-offline to running
FC and iSCSI class set SCSI devices to transport-offline state after
fast_io_fail/replacement_timeout has fired, but after relogin, function
scsi_internal_device_unblock() is not setting scsi device state to running.
Due to this the devices even after being relogged in remain offline.

Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-09-14 17:59:22 +01:00
Mike Snitzer 27c419739b [SCSI] scsi_lib: fix scsi_io_completion's SG_IO error propagation
The following v3.4-rc1 commit unmasked an existing bug in scsi_io_completion's
SG_IO error handling: 47ac56d [SCSI] scsi_error: classify some ILLEGAL_REQUEST
sense as a permanent TARGET_ERROR

Given that certain ILLEGAL_REQUEST are now properly categorized as
TARGET_ERROR the host_byte is being set (before host_byte wasn't ever
set for these ILLEGAL_REQUEST).

In scsi_io_completion, initialize req->errors with cmd->result _after_
the SG_IO block that calls __scsi_error_from_host_byte (which may
modify the host_byte).

Before this fix:

    cdb to send: 12 01 01 00 00 00
ioctl(3, SG_IO, {'S', SG_DXFER_NONE, cmd[6]=[12, 01, 01, 00, 00, 00],
    mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=20000, flags=0,
    status=02, masked_status=01, sb[19]=[70, 00, 05, 00, 00, 00, 00, 0b,
    00, 00, 00, 00, 24, 00, 00, 00, 00, 00, 00], host_status=0x10,
    driver_status=0x8, resid=0, duration=0, info=0x1}) = 0
SCSI Status: Check Condition

Sense Information:
sense buffer empty

After:

    cdb to send: 12 01 01 00 00 00
ioctl(3, SG_IO, {'S', SG_DXFER_NONE, cmd[6]=[12, 01, 01, 00, 00, 00],
    mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=20000, flags=0,
    status=02, masked_status=01, sb[19]=[70, 00, 05, 00, 00, 00, 00, 0b,
    00, 00, 00, 00, 24, 00, 00, 00, 00, 00, 00], host_status=0,
    driver_status=0x8, resid=0, duration=0, info=0x1}) = 0
SCSI Status: Check Condition

Sense Information:
 Fixed format, current;  Sense key: Illegal Request
 Additional sense: Invalid field in cdb
 Raw sense data (in hex):
        70 00 05 00 00 00 00 0b  00 00 00 00 24 00 00 00
        00 00 00

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Babu Moger <babu.moger@netapp.com>
Cc: stable@vger.kernel.org # 3.4
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-08-22 09:42:31 +04:00
Jeff Garzik 8407884dd9 Merge branch 'master' [vanilla Linus master] into libata-dev.git/upstream
Two bits were appended to the end of the bitfield
list in struct scsi_device.  Resolve that conflict
by including both bits.

Conflicts:
	include/scsi/scsi_device.h
2012-07-25 15:58:48 -04:00
Bart Van Assche b485462aca [SCSI] Stop accepting SCSI requests before removing a device
Avoid that the code for requeueing SCSI requests triggers a
crash by making sure that that code isn't scheduled anymore
after a device has been removed.

Also, source code inspection of __scsi_remove_device() revealed
a race condition in this function: no new SCSI requests must be
accepted for a SCSI device after device removal started.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:41 +01:00
Bart Van Assche 84feb1664e [SCSI] Change return type of scsi_queue_insert() into void
The return value of scsi_queue_insert() is ignored by all its
callers, hence change the return type of this function into
void.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:41 +01:00
Bart Van Assche 940f5d47e2 [SCSI] Avoid dangling pointer in scsi_requeue_command()
When we call scsi_unprep_request() the command associated with the request
gets destroyed and therefore drops its reference on the device.  If this was
the only reference, the device may get released and we end up with a NULL
pointer deref when we call blk_requeue_request.

Reported-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: <stable@kernel.org>
[jejb: enhance commend and add commit log for stable]
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:40 +01:00
Bart Van Assche 67bd941300 [SCSI] Fix device removal NULL pointer dereference
Use blk_queue_dead() to test whether the queue is dead instead
of !sdev. Since scsi_prep_fn() may be invoked concurrently with
__scsi_remove_device(), keep the queuedata (sdev) pointer in
__scsi_remove_device(). This patch fixes a kernel oops that
can be triggered by USB device removal. See also
http://www.spinics.net/lists/linux-scsi/msg56254.html.

Other changes included in this patch:
- Swap the blk_cleanup_queue() and kfree() calls in
  scsi_host_dev_release() to make that code easier to grasp.
- Remove the queue dead check from scsi_run_queue() since the
  queue state can change anyway at any point in that function
  where the queue lock is not held.
- Remove the queue dead check from the start of scsi_request_fn()
  since it is redundant with the scsi_device_online() check.

Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:40 +01:00
Mike Christie d075498c98 [SCSI] remove old comment from block/unblock functions
We do not hold the host lock when calling these functions,
so remove comment.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:23 +01:00
Mike Christie 5d9fb5cc1b [SCSI] core, classes, mpt2sas: have scsi_internal_device_unblock take new state
This has scsi_internal_device_unblock/scsi_target_unblock take
the new state to set the devices as an argument instead of
always setting to running. The patch also converts users of these
functions.

This allows the FC and iSCSI class to transition devices from blocked
to transport-offline, so that when fast_io_fail/replacement_timeout
has fired we do not set the devices back to running. Instead, we
set them to SDEV_TRANSPORT_OFFLINE.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:22 +01:00
Mike Christie 1b8d262061 [SCSI] add new SDEV_TRANSPORT_OFFLINE state
This patch adds a new state SDEV_TRANSPORT_OFFLINE. It will
be used by transport classes to offline devices for cases like
when the fast_io_fail/recovery_tmo fires. In those cases we
want all IO to fail, and we have not yet escalated to dev_loss_tmo
behavior where we are removing the devices.

Currently to handle this state, transport classes are setting
the scsi_device's state to running, setting their internal
session/port structs state to something that indicates failed,
and then failing IO from some transport check in the queuecommand.

The reason for the new value is so that users can distinguish
between a device failure that is a result of a transport problem
vs the wide range of errors that devices get offlined for
when a scsi command times out and we offline the devices there.
It also fixes the confusion as to why the transport class is
failing IO, but has set the device state from blocked to running.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:21 +01:00
Holger Macht de50ada55b [SCSI] add wrapper to access and set scsi_bus_type in struct acpi_bus_type
For being able to bind ata devices against acpi devices, scsi_bus_type
needs to be set as bus in struct acpi_bus_type. So add wrapper to
scsi_lib to accomplish that.

Signed-off-by: Holger Macht <holger@homac.de>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2012-06-29 11:38:09 -04:00
Jun'ichi Nomura b7e94a1686 [SCSI] Fix dm-multipath starvation when scsi host is busy
block congestion control doesn't have any concept of fairness across
multiple queues.  This means that if SCSI reports the host as busy in
the queue congestion control it can result in an unfair starvation
situation in dm-mp if there are multiple multipath devices on the same
host.  For example:
http://www.redhat.com/archives/dm-devel/2012-May/msg00123.html

The fix for this is to report only the sdev busy state (and ignore the
host busy state) in the block congestion control call back.
The host is still congested, but the SCSI subsystem will sort out the
congestion in a fair way because it knows the relation between the
queues and the host.

[jejb: fixed up trailing whitespace]
Reported-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Tested-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-05-23 09:34:17 +01:00
James Bottomley e346933365 isci update for 3.5
1/ Rework remote-node-context (RNC) handling for proper management of
    the silicon state machine in error handling and hot-plug conditions.
    Further details below, suffice to say if the RNC is mismanaged the
    silicon state machines may lock up.
 
 2/ Refactor the initialization code to be reused for suspend/resume support
 
 3/ Miscellaneous bug fixes to address discovery issues and hardware
    compatibility.
 
 RNC rework details from Jeff Skirvin:
 
 In the controller, devices as they appear on a SAS domain (or
 direct-attached SATA devices) are represented by memory structures known
 as "Remote Node Contexts" (RNCs).  These structures are transferred from
 main memory to the controller using a set of register commands; these
 commands include setting up the context ("posting"), removing the
 context ("invalidating"), and commands to control the scheduling of
 commands and connections to that remote device ("suspensions" and
 "resumptions").  There is a similar path to control RNC scheduling from
 the protocol engine, which interprets the results of command and data
 transmission and reception.
 
 In general, the controller chooses among non-suspended RNCs to find one
 that has work requiring scheduling the transmission of command and data
 frames to a target.  Likewise, when a target tries to return data back
 to the initiator, the state of the RNC is used by the controller to
 determine how to treat the incoming request. As an example, if the RNC
 is in the state "TX/RX Suspended", incoming SSP connection requests from
 the target will be rejected by the controller hardware.  When an RNC is
 "TX Suspended", it will not be selected by the controller hardware to
 start outgoing command or data operations (with certain priority-based
 exceptions).
 
 As mentioned above, there are two sources for management of the RNC
 states: commands from driver software, and the result of transmission
 and reception conditions of commands and data signaled by the controller
 hardware.  As an example of the latter, if an outgoing SSP command ends
 with a OPEN_REJECT(BAD_DESTINATION) status, the RNC state will
 transition to the "TX Suspended" state, and this is signaled by the
 controller hardware in the status to the completion of the pending
 command as well as signaled in a controller hardware event.  Examples of
 the former are included in the patch changelogs.
 
 Driver software is required to suspend the RNC in a "TX/RX Suspended"
 condition before any outstanding commands can be terminated.  Failure to
 guarantee this can lead to a complete hardware hang condition.  Earlier
 versions of the driver software did not guarantee that an RNC was
 correctly managed before I/O termination, and so operated in an unsafe
 way.
 
 Further, the driver performed unnecessary contortions to preserve the
 remote device command state and so was more complicated than it needed
 to be.  A simplifying driver assumption is that once an I/O has entered
 the error handler path without having completed in the target, the
 requirement on the driver is that all use of the sas_task must end.
 Beyond that, recovery of operation is dependent on libsas and other
 components to reset, rediscover and reconfigure the device before normal
 operation can restart.  In the driver, this simplifying assumption meant
 that the RNC management could be reduced to entry into the suspended
 state, terminating the targeted I/O request, and resuming the RNC as
 needed for device-specific management such as an SSP Abort Task or LUN
 Reset Management request.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPtYFXAAoJEB7SkWpmfYgCJkcP/1VvsuuitNy/YM9P1tb/RvQ7
 ytJzjGtWiAABHVWwjgB+Ng7hUTaP2r6l8KeNfwxpwXyNdBAUNysYEUBHAfPsKzKz
 espTmw3wVCnREajgKXZwFp9aTj8DcYFB6vKcC/ddACt3uRNjjA9En36+6797r8Vg
 YdebyjFX2FxwoUj0icTUiV/OXgb8w723imnCl8bfOhhFRi4eFZ4EJ23AdMkUya1i
 uYePAPJPSQJuU/87gNIx4JcR0qHJ1ziGPEY+XC47CzEeXbBTSPgWOwanQ6KPoXRJ
 XVxamfcKAjRdtwQ4m1vYBSE32RTdrjhujVbkGiPi6QaEbCLjhLSCIYyuS3XMckV+
 TCZ16o5kd/I6ZtZOeP4zZRGnNBkOPzY44qiJeKffjWDhTrFacx4XWJB/ftWPgEA5
 N2zFH3RM4sY0FUJ3I/Qe5CERNdCXMtcj+UAf3nHpAIVcv46Lp+qoSkdEx05uuuiN
 +D/dSlubktuvuzmB5WisL3qrjNEkkLTAGQpZs1j0ojLEBm0XAgV5EzqmHiZ0GOPD
 OQNFxeei9SlqgtKIIP0bymRispPrG2HVCOvExYMxzKR6fjxofZLAs/aWOsdhxgMq
 TlAyZJ6OmGI+KX68HHzoMpT9iquvmP64WGkfHzCx296BfSKiruLh/Jzt5gGwv+Z1
 5tlpnUr9dUTxx7qkQXvj
 =HYvO
 -----END PGP SIGNATURE-----

Merge tag 'isci-for-3.5' into misc

isci update for 3.5

1/ Rework remote-node-context (RNC) handling for proper management of
   the silicon state machine in error handling and hot-plug conditions.
   Further details below, suffice to say if the RNC is mismanaged the
   silicon state machines may lock up.

2/ Refactor the initialization code to be reused for suspend/resume support

3/ Miscellaneous bug fixes to address discovery issues and hardware
   compatibility.

RNC rework details from Jeff Skirvin:

In the controller, devices as they appear on a SAS domain (or
direct-attached SATA devices) are represented by memory structures known
as "Remote Node Contexts" (RNCs).  These structures are transferred from
main memory to the controller using a set of register commands; these
commands include setting up the context ("posting"), removing the
context ("invalidating"), and commands to control the scheduling of
commands and connections to that remote device ("suspensions" and
"resumptions").  There is a similar path to control RNC scheduling from
the protocol engine, which interprets the results of command and data
transmission and reception.

In general, the controller chooses among non-suspended RNCs to find one
that has work requiring scheduling the transmission of command and data
frames to a target.  Likewise, when a target tries to return data back
to the initiator, the state of the RNC is used by the controller to
determine how to treat the incoming request. As an example, if the RNC
is in the state "TX/RX Suspended", incoming SSP connection requests from
the target will be rejected by the controller hardware.  When an RNC is
"TX Suspended", it will not be selected by the controller hardware to
start outgoing command or data operations (with certain priority-based
exceptions).

As mentioned above, there are two sources for management of the RNC
states: commands from driver software, and the result of transmission
and reception conditions of commands and data signaled by the controller
hardware.  As an example of the latter, if an outgoing SSP command ends
with a OPEN_REJECT(BAD_DESTINATION) status, the RNC state will
transition to the "TX Suspended" state, and this is signaled by the
controller hardware in the status to the completion of the pending
command as well as signaled in a controller hardware event.  Examples of
the former are included in the patch changelogs.

Driver software is required to suspend the RNC in a "TX/RX Suspended"
condition before any outstanding commands can be terminated.  Failure to
guarantee this can lead to a complete hardware hang condition.  Earlier
versions of the driver software did not guarantee that an RNC was
correctly managed before I/O termination, and so operated in an unsafe
way.

Further, the driver performed unnecessary contortions to preserve the
remote device command state and so was more complicated than it needed
to be.  A simplifying driver assumption is that once an I/O has entered
the error handler path without having completed in the target, the
requirement on the driver is that all use of the sas_task must end.
Beyond that, recovery of operation is dependent on libsas and other
components to reset, rediscover and reconfigure the device before normal
operation can restart.  In the driver, this simplifying assumption meant
that the RNC management could be reduced to entry into the suspended
state, terminating the targeted I/O request, and resuming the RNC as
needed for device-specific management such as an SSP Abort Task or LUN
Reset Management request.
2012-05-21 12:17:30 +01:00
Dan Williams a7a20d1039 [SCSI] sd: limit the scope of the async probe domain
sd injects and synchronizes probe work on the global kernel-wide domain.
This runs into conflict with PM that wants to perform resume actions in
async context:

[  494.237079] INFO: task kworker/u:3:554 blocked for more than 120 seconds.
[  494.294396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  494.360809] kworker/u:3     D 0000000000000000     0   554      2 0x00000000
[  494.420739]  ffff88012e4d3af0 0000000000000046 ffff88013200c160 ffff88012e4d3fd8
[  494.484392]  ffff88012e4d3fd8 0000000000012500 ffff8801394ea0b0 ffff88013200c160
[  494.548038]  ffff88012e4d3ae0 00000000000001e3 ffffffff81a249e0 ffff8801321c5398
[  494.611685] Call Trace:
[  494.632649]  [<ffffffff8149dd25>] schedule+0x5a/0x5c
[  494.674687]  [<ffffffff8104b968>] async_synchronize_cookie_domain+0xb6/0x112
[  494.734177]  [<ffffffff810461ff>] ? __init_waitqueue_head+0x50/0x50
[  494.787134]  [<ffffffff8131a224>] ? scsi_remove_target+0x48/0x48
[  494.837900]  [<ffffffff8104b9d9>] async_synchronize_cookie+0x15/0x17
[  494.891567]  [<ffffffff8104ba49>] async_synchronize_full+0x54/0x70  <-- here we wait for async contexts to complete
[  494.943783]  [<ffffffff8104b9f5>] ? async_synchronize_full_domain+0x1a/0x1a
[  495.002547]  [<ffffffffa00114b1>] sd_remove+0x2c/0xa2 [sd_mod]
[  495.051861]  [<ffffffff812fe94f>] __device_release_driver+0x86/0xcf
[  495.104807]  [<ffffffff812fe9bd>] device_release_driver+0x25/0x32  <-- here we take device_lock()

[  853.511341] INFO: task kworker/u:4:549 blocked for more than 120 seconds.
[  853.568693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  853.635119] kworker/u:4     D ffff88013097b5d0     0   549      2 0x00000000
[  853.695129]  ffff880132773c40 0000000000000046 ffff880130790000 ffff880132773fd8
[  853.758990]  ffff880132773fd8 0000000000012500 ffff88013288a0b0 ffff880130790000
[  853.822796]  0000000000000246 0000000000000040 ffff88013097b5c8 ffff880130790000
[  853.886633] Call Trace:
[  853.907631]  [<ffffffff8149dd25>] schedule+0x5a/0x5c
[  853.949670]  [<ffffffff8149cc44>] __mutex_lock_common+0x220/0x351
[  854.001225]  [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4
[  854.049082]  [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4
[  854.097011]  [<ffffffff8149ce48>] mutex_lock_nested+0x2f/0x36   <-- here we wait for device_lock()
[  854.145591]  [<ffffffff81304bd7>] device_resume+0x58/0x1c4
[  854.192066]  [<ffffffff81304d61>] async_resume+0x1e/0x45
[  854.237019]  [<ffffffff8104bc93>] async_run_entry_fn+0xc6/0x173  <-- ...while running in async context

Provide a 'scsi_sd_probe_domain' so that async probe actions actions can
be flushed without regard for the state of PM, and allow for the resume
path to handle devices that have transitioned from SDEV_QUIESCE to
SDEV_DEL prior to resume.

Acked-by: Alan Stern <stern@rowland.harvard.edu>
[alan: uplevel scsi_sd_probe_domain, clarify scsi_device_resume]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
[jejb: remove unneeded config guards in include file]
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-05-17 09:10:46 +01:00
Lin Ming 6f381fa344 [SCSI] scsi_lib: use correct DMA device in __scsi_alloc_queue
Currently, __scsi_alloc_queue uses SCSI host's parent device
as DMA device to set segment boundary. But the parent device may not
refer to the DMA device. For example, for ATA disk, SCSI host's parent
device now refers to ATA port.

Since commit d139b9b([SCSI] scsi_lib_dma: fix bug with dma maps on
nested scsi objects), a new field Scsi_Host->dma_dev was introduced
to refer to the real DMA device.

Use ->dma_dev in __scsi_alloc_queue to correctly set segment
boundary.

Bug report: http://marc.info/?l=linux-ide&m=133177818318187&w=2

Reported-and-tested-by: Jörg Sommer <joerg@alea.gnuu.de>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-04-22 18:56:18 +01:00
Linus Torvalds 424a6f6ef9 SCSI updates on 20120319
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPZxSnAAoJEDeqqVYsXL0M0Y4IAMX0vrTVZbg6psA5/gMcWGRP
 CkFXEQ8n0PL2SCaj6BoDqamJFe5Nc7dnqxM0fGawB4S9vr3rHhiOlwO+NbV9zFYC
 2skBTpeL3sjgtN/jTBdfeeAa7xTYpu/XGyei0NS1A5c2AyMVXV0uYV2s4VNZxe44
 tVIn1OEzM2giZ9EB1OZslDMrg5XXm3MBIUECP0LbWUhBm/35caSFKzMXRwhh7WiK
 +AVmc2AZYtdEwuknDyiH7KlsaoB3vGL9pPrAUJzIgEhy2pOo2A7W72HfA4Fj+y6a
 uF9HBS5zciMp1+sGWry62AjNbWgin9BRlozBEO/lJhIfMGDV1nXEIJsOkOgkdoE=
 =1TxB
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6

SCSI updates from James Bottomley:
 "The update includes the usual assortment of driver updates (lpfc,
  qla2xxx, qla4xxx, bfa, bnx2fc, bnx2i, isci, fcoe, hpsa) plus a huge
  amount of infrastructure work in the SAS library and transport class
  as well as an iSCSI update.  There's also a new SCSI based virtio
  driver."

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (177 commits)
  [SCSI] qla4xxx: Update driver version to 5.02.00-k15
  [SCSI] qla4xxx: trivial cleanup
  [SCSI] qla4xxx: Fix sparse warning
  [SCSI] qla4xxx: Add support for multiple session per host.
  [SCSI] qla4xxx: Export CHAP index as sysfs attribute
  [SCSI] scsi_transport: Export CHAP index as sysfs attribute
  [SCSI] qla4xxx: Add support to display CHAP list and delete CHAP entry
  [SCSI] iscsi_transport: Add support to display CHAP list and delete CHAP entry
  [SCSI] pm8001: fix endian issue with code optimization.
  [SCSI] pm8001: Fix possible racing condition.
  [SCSI] pm8001: Fix bogus interrupt state flag issue.
  [SCSI] ipr: update PCI ID definitions for new adapters
  [SCSI] qla2xxx: handle default case in qla2x00_request_firmware()
  [SCSI] isci: improvements in driver unloading routine
  [SCSI] isci: improve phy event warnings
  [SCSI] isci: debug, provide state-enum-to-string conversions
  [SCSI] scsi_transport_sas: 'enable' phys on reset
  [SCSI] libsas: don't recover end devices attached to disabled phys
  [SCSI] libsas: fixup target_port_protocols for expanders that don't report sata
  [SCSI] libsas: set attached device type and target protocols for local phys
  ...
2012-03-22 12:55:29 -07:00
Cong Wang 77dfce076c scsi: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:19 +08:00
Martin K. Petersen 66a651aa7a [SCSI] Ensure discard failure gets treated as a target problem
The error reported up the stack for a discard failure did not clearly
indicate that the command was processed and subsequently failed by the
target device.

Return -EREMOTEIO so multipathing does not classify this condition as a
path failure.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 09:38:01 -06:00
Moger, Babu 2082ebc45a [SCSI] fix the new host byte settings (DID_TARGET_FAILURE and DID_NEXUS_FAILURE)
This patch fixes the host byte settings DID_TARGET_FAILURE and
DID_NEXUS_FAILURE.  The function __scsi_error_from_host_byte, tries to reset
the host byte to DID_OK. But that does not happen because of the OR operation.

Here is the flow.

scsi_softirq_done-> scsi_decide_disposition -> __scsi_error_from_host_byte

Let's take an example with DID_NEXUS_FAILURE. In scsi_decide_disposition,
result will be set as DID_NEXUS_FAILURE (=0x11). Then in
__scsi_error_from_host_byte, when we do OR with DID_OK.  Purpose is to reset
it back to DID_OK. But that does not happen.  This patch fixes this issue.

Signed-off-by: Babu Moger <babu.moger@netapp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-02-19 08:08:59 -06:00
Shaohua Li 466c08c71a [SCSI] don't change sdev starvation list order without request dispatched
The sdev is deleted from starved list and then try to dispatch from this
device. It's quite possible the sdev can't eventually dispatch a request,
then the sdev will be in starved list tail. This isn't fair.
There are two cases here:
1. unplug path. scsi_request_fn() calls to scsi_target_queue_ready(), then
the dev is removed from starved list, but quite possible host queue isn't
ready, the dev is moved to starved list without dispatching any request.
2. scsi_run_queue path. It deletes the dev from starved list first (both
global and local starved lists), then handles the dev. Then we could have
the same process like case 1.

This patch fixes the first case. Case 2 isn't fixed, because there is a
rare case scsi_run_queue finds host isn't busy but scsi_request_fn finds
host is busy (other CPU is faster to get host queue depth). Not deleting
the dev from starved list in scsi_run_queue will keep scsi_run_queue
looping (though this is very rare case, because host will become busy).
Fortunately fixing case 1 already gives big improvement for starvation in
my test. In a 12 disk JBOD setup, running file creation under EXT4, this
gives 12% more throughput.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-01-16 11:54:04 +04:00
Hannes Reinecke 745718132c [SCSI] Silencing 'killing requests for dead queue'
When we tear down a device we try to flush all outstanding
commands in scsi_free_queue(). However the check in
scsi_request_fn() is imperfect as it only signals that
we _might start_ aborting commands, not that we've actually
aborted some.
So move the printk inside the scsi_kill_request function,
this will also give us a hint about which commands are aborted.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-11-09 12:07:07 -06:00
Linus Torvalds 32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Paul Gortmaker 09703660ed scsi: Add export.h for EXPORT_SYMBOL/THIS_MODULE as required
For the basic SCSI infrastructure files that are exporting symbols
but not modules themselves, add in the basic export.h header file
to allow the exports.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:23 -04:00
Bart Van Assche 3308511c93 [SCSI] Make scsi_free_queue() kill pending SCSI commands
Make sure that SCSI device removal via scsi_remove_host() does finish
all pending SCSI commands. Currently that's not the case and hence
removal of a SCSI host during I/O can cause a deadlock. See also
"blkdev_issue_discard() hangs forever if underlying storage device is
removed" (http://bugzilla.kernel.org/show_bug.cgi?id=40472). See also
http://lkml.org/lkml/2011/8/27/6.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@kernel.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-10-30 13:20:28 +04:00
James Smart 573e591353 [SCSI] scsi_lib: pause between error retries
During cable pull tests on our 16G FC adapter, we are seeing errors,
typically reads to close targets, which fail due to CRC or framing
errors caused by the cable being pull (return status DID_ERROR).
The adapter detects the error on one of the first frames received,
marks the FC exchange as dead (further frames go to bit bucket) and
signals the host of the error. This action is so quick, and coupled
with fast host CPUs, creates a scenario in which the midlayer sees
the failure and retries the io almost immediately. We've seen link
traces with the retry on the link while the original i/o is still
being processed by the target. We're also seeing the time window
for the "link to pull-apart" and the physical interface to report
disconnected to be in the few millisecond range. Which means, we're
encountering scenarios where the full retry count is exhausted
(all with error) by the midlayer before the link disconnect state
is detected.

We looked at 8G FC behavior and occasionally see the same behavior,
but as the link was slower, it rarely could exhaust all retries
before the link reported disconnect.

What is needed is a slight delay between io retries due to DID_ERROR
to cover this error.  It is inappropriate to put this delay in the
driver, as the error is indistinguishable from other link-related errors,
nor does the driver track whether the io is a retry or not. This is also
easier than tracking between-io-error bursts that are seen in this
scenario.

The patch below updates the retry path so that it inserts a delay as
if the target was busy.  The busy delay is on the order of 6ms. This
delay is sufficient to ensure the link down condition is reported
before the retry count is exhausted (at most 1 retry is seen).

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-07-27 14:06:01 +04:00
James Bottomley bfe159a512 [SCSI] fix crash in scsi_dispatch_cmd()
USB surprise removal of sr is triggering an oops in
scsi_dispatch_command().  What seems to be happening is that USB is
hanging on to a queue reference until the last close of the upper
device, so the crash is caused by surprise remove of a mounted CD
followed by attempted unmount.

The problem is that USB doesn't issue its final commands as part of
the SCSI teardown path, but on last close when the block queue is long
gone.  The long term fix is probably to make sr do the teardown in the
same way as sd (so remove all the lower bits on ejection, but keep the
upper disk alive until last close of user space).  However, the
current oops can be simply fixed by not allowing any commands to be
sent to a dead queue.

Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-07-21 14:21:18 -07:00
Linus Torvalds a2b9c1f620 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: don't delay blk_run_queue_async
  scsi: remove performance regression due to async queue run
  blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup
  block: rescan partitions on invalidated devices on -ENOMEDIA too
  cdrom: always check_disk_change() on open
  block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers
2011-05-18 06:49:02 -07:00
Jens Axboe 9937a5e2f3 scsi: remove performance regression due to async queue run
Commit c21e6beb removed our queue request_fn re-enter
protection, and defaulted to always running the queues from
kblockd to be safe. This was a known potential slow down,
but should be safe.

Unfortunately this is causing big performance regressions for
some, so we need to improve this logic. Looking into the details
of the re-enter, the real issue is on requeue of requests.

Requeue of requests upon seeing a BUSY condition from the device
ends up re-running the queue, causing traces like this:

scsi_request_fn()
        scsi_dispatch_cmd()
                scsi_queue_insert()
                        __scsi_queue_insert()
                                scsi_run_queue()
					scsi_request_fn()
						...

potentially causing the issue we want to avoid. So special
case the requeue re-run of the queue, but improve it to offload
the entire run of local queue and starved queue from a single
workqueue callback. This is a lot better than potentially
kicking off a workqueue run for each device seen.

This also fixes the issue of the local device going into recursion,
since the above mentioned commit never moved that queue run out
of line.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-17 11:04:44 +02:00
James Bottomley c055f5b261 [SCSI] fix oops in scsi_run_queue()
The recent commit closing the race window in device teardown:

commit 86cbfb5607
Author: James Bottomley <James.Bottomley@suse.de>
Date:   Fri Apr 22 10:39:59 2011 -0500

    [SCSI] put stricter guards on queue dead checks

is causing a potential NULL deref in scsi_run_queue() because the
q->queuedata may already be NULL by the time this function is called.
Since we shouldn't be running a queue that is being torn down, simply
add a NULL check in scsi_run_queue() to forestall this.

Tested-by: Jim Schutt <jaschut@sandia.gov>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-05-03 15:30:00 -05:00
Jens Axboe c21e6beba8 block: get rid of QUEUE_FLAG_REENTER
We are currently using this flag to check whether it's safe
to call into ->request_fn(). If it is set, we punt to kblockd.
But we get a lot of false positives and excessive punts to
kblockd, which hurts performance.

The only real abuser of this infrastructure is SCSI. So export
the async queue run and convert SCSI over to use that. There's
room for improvement in that SCSI need not always use the async
call, but this fixes our performance issue and they can fix that
up in due time.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-19 13:32:46 +02:00
Christoph Hellwig 24ecfbe27f block: add blk_run_queue_async
Instead of overloading __blk_run_queue to force an offload to kblockd
add a new blk_run_queue_async helper to do it explicitly.  I've kept
the blk_queue_stopped check for now, but I suspect it's not needed
as the check we do when the workqueue items runs should be enough.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-18 11:41:33 +02:00
Linus Torvalds 6c51038900 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
  Documentation/iostats.txt: bit-size reference etc.
  cfq-iosched: removing unnecessary think time checking
  cfq-iosched: Don't clear queue stats when preempt.
  blk-throttle: Reset group slice when limits are changed
  blk-cgroup: Only give unaccounted_time under debug
  cfq-iosched: Don't set active queue in preempt
  block: fix non-atomic access to genhd inflight structures
  block: attempt to merge with existing requests on plug flush
  block: NULL dereference on error path in __blkdev_get()
  cfq-iosched: Don't update group weights when on service tree
  fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
  block: Require subsystems to explicitly allocate bio_set integrity mempool
  jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
  jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
  fs: make fsync_buffers_list() plug
  mm: make generic_writepages() use plugging
  blk-cgroup: Add unaccounted time to timeslice_used.
  block: fixup plugging stubs for !CONFIG_BLOCK
  block: remove obsolete comments for blkdev_issue_zeroout.
  blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
  ...

Fix up conflicts in fs/{aio.c,super.c}
2011-03-24 10:16:26 -07:00
Linus Torvalds c55d267de2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (170 commits)
  [SCSI] scsi_dh_rdac: Add MD36xxf into device list
  [SCSI] scsi_debug: add consecutive medium errors
  [SCSI] libsas: fix ata list corruption issue
  [SCSI] hpsa: export resettable host attribute
  [SCSI] hpsa: move device attributes to avoid forward declarations
  [SCSI] scsi_debug: Logical Block Provisioning (SBC3r26)
  [SCSI] sd: Logical Block Provisioning update
  [SCSI] Include protection operation in SCSI command trace
  [SCSI] hpsa: fix incorrect PCI IDs and add two new ones (2nd try)
  [SCSI] target: Fix volume size misreporting for volumes > 2TB
  [SCSI] bnx2fc: Broadcom FCoE offload driver
  [SCSI] fcoe: fix broken fcoe interface reset
  [SCSI] fcoe: precedence bug in fcoe_filter_frames()
  [SCSI] libfcoe: Remove stale fcoe-netdev entries
  [SCSI] libfcoe: Move FCOE_MTU definition from fcoe.h to libfcoe.h
  [SCSI] libfc: introduce __fc_fill_fc_hdr that accepts fc_hdr as an argument
  [SCSI] fcoe, libfc: initialize EM anchors list and then update npiv EMs
  [SCSI] Revert "[SCSI] libfc: fix exchange being deleted when the abort itself is timed out"
  [SCSI] libfc: Fixing a memory leak when destroying an interface
  [SCSI] megaraid_sas: Version and Changelog update
  ...

Fix up trivial conflicts due to whitespace differences in
drivers/scsi/libsas/{sas_ata.c,sas_scsi_host.c}
2011-03-17 17:54:40 -07:00
Martin K. Petersen c98a0eb0e9 [SCSI] sd: Logical Block Provisioning update
SBC3r26 contains many changes to the Logical Block Provisioning
interfaces (formerly known as Thin Provisioning ditto). This patch
implements support for both the old and new schemes using the same
heuristic as before (whether the LBP VPD page is present).

The new code also allows the provisioning mode (i.e. choice of command)
to be overridden on a per-device basis via sysfs. Two additional modes
are supported in this version:

 - WRITE SAME(10) with the UNMAP bit set

 - WRITE SAME(10) without the UNMAP bit set. This allows us to support
   devices that predate the TP/LBP enhancements in SBC3 and which work
   by way zero-detection

Switching between modes has been consolidated in a helper function that
also updates the block layer topology according to the limitations of
the chosen command.

I experimented with trying WRITE SAME(16) if UNMAP fails, WRITE SAME(10)
if WRITE SAME(16) fails, etc. but found several devices that got
cranky. So for now we'll disable discard if one of the commands
fail. The user still has the option of selecting a different mode in
sysfs.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-03-14 18:37:34 -05:00
Martin K. Petersen 72f7d322fd [SCSI] Include protection operation in SCSI command trace
When debugging DIF/DIX it is very helpful to be able to see which DIX
operation is associated with the scsi_cmnd. Include the protection op in
the SCSI command trace.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-03-14 18:36:02 -05:00
Jens Axboe 4c63f5646e Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core
Conflicts:
	block/blk-core.c
	block/blk-flush.c
	drivers/md/raid1.c
	drivers/md/raid10.c
	drivers/md/raid5.c
	fs/nilfs2/btnode.c
	fs/nilfs2/mdt.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:58:35 +01:00
Jens Axboe a488e74976 scsi: convert to blk_delay_queue()
It was always abuse to reuse the plugging infrastructure for this,
convert it to the (new) real API for delaying queueing a bit. A
default delay of 3 msec is defined, to match the previous
behaviour.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:45:54 +01:00
Tejun Heo 1654e7411a block: add @force_kblockd to __blk_run_queue()
__blk_run_queue() automatically either calls q->request_fn() directly
or schedules kblockd depending on whether the function is recursed.
blk-flush implementation needs to be able to explicitly choose
kblockd.  Add @force_kblockd.

All the current users are converted to specify %false for the
parameter and this patch doesn't introduce any behavior change.

stable: This is prerequisite for fixing ide oops caused by the new
        blk-flush implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-02 08:48:05 -05:00
Hannes Reinecke 63583cca74 [SCSI] Add detailed SCSI I/O errors
Instead of just passing 'EIO' for any I/O error we should be
notifying the upper layers with more details about the cause
of this error.

Update the possible I/O errors to:

- ENOLINK: Link failure between host and target
- EIO: Retryable I/O error
- EREMOTEIO: Non-retryable I/O error
- EBADE: I/O error restricted to the I_T_L nexus

'Retryable' in this context means that an I/O error _might_ be
restricted to the I_T_L nexus (vulgo: path), so retrying on another
nexus / path might succeed.

'Non-retryable' in general refers to a target failure, so this
error will always be generated regardless of the I_T_L nexus
it was send on.

I/O errors restricted to the I_T_L nexus might be retried
on another nexus / path, but they should _not_ be queued
if no paths are available.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-02-12 10:33:08 -06:00
Linus Torvalds 275220f0fc Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
  block: ensure that completion error gets properly traced
  blktrace: add missing probe argument to block_bio_complete
  block cfq: don't use atomic_t for cfq_group
  block cfq: don't use atomic_t for cfq_queue
  block: trace event block fix unassigned field
  block: add internal hd part table references
  block: fix accounting bug on cross partition merges
  kref: add kref_test_and_get
  bio-integrity: mark kintegrityd_wq highpri and CPU intensive
  block: make kblockd_workqueue smarter
  Revert "sd: implement sd_check_events()"
  block: Clean up exit_io_context() source code.
  Fix compile warnings due to missing removal of a 'ret' variable
  fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
  block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
  cfq-iosched: don't check cfqg in choose_service_tree()
  fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
  cdrom: export cdrom_check_events()
  sd: implement sd_check_events()
  sr: implement sr_check_events()
  ...
2011-01-13 10:45:01 -08:00