linux

Commit Graph

Author	SHA1	Message	Date
Christoph Hellwig	f6d47e74fc	scsi: unwind blk_end_request_all and blk_end_request_err calls Replace the calls to the various blk_end_request variants with opencode equivalents. Blk-mq is using a model that gives the driver control between the bio updates and the actual completion, and making the old code follow that same model allows us to keep the code more similar for both paths. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>	2014-07-25 17:16:13 -04:00
Christoph Hellwig	2ccbb00808	scsi: only maintain target_blocked if the driver has a target queue limit This saves us an atomic operation for each I/O submission and completion for the usual case where the driver doesn't set a per-target can_queue value. Only a few iscsi hardware offload drivers set the per-target can_queue value at the moment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>	2014-07-25 17:16:05 -04:00
Christoph Hellwig	cd9070c9c5	scsi: fix the {host,target,device}_blocked counter mess Seems like these counters are missing any sort of synchronization for updates, as a over 10 year old comment from me noted. Fix this by using atomic counters, and while we're at it also make sure they are in the same cacheline as the _busy counters and not needlessly stored to in every I/O completion. With the new model the _busy counters can temporarily go negative, so all the readers are updated to check for > 0 values. Longer term every successful I/O completion will reset the counters to zero, so the temporarily negative values will not cause any harm. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>	2014-07-25 17:15:48 -04:00
Christoph Hellwig	71e75c97f9	scsi: convert device_busy to atomic_t Avoid taking the queue_lock to check the per-device queue limit. Instead we do an atomic_inc_return early on to grab our slot in the queue, and if necessary decrement it after finishing all checks. Unlike the host and target busy counters this doesn't allow us to avoid the queue_lock in the request_fn due to the way the interface works, but it'll allow us to prepare for using the blk-mq code, which doesn't use the queue_lock at all, and it at least avoids a queue_lock round trip in scsi_device_unbusy, which is still important given how busy the queue_lock is. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>	2014-07-25 07:43:45 -04:00
Christoph Hellwig	7466501608	scsi: convert host_busy to atomic_t Avoid taking the host-wide host_lock to check the per-host queue limit. Instead we do an atomic_inc_return early on to grab our slot in the queue, and if necessary decrement it after finishing all checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>	2014-07-25 07:43:43 -04:00
Christoph Hellwig	7ae65c0f96	scsi: convert target_busy to an atomic_t Avoid taking the host-wide host_lock to check the per-target queue limit. Instead we do an atomic_inc_return early on to grab our slot in the queue, and if necessary decrement it after finishing all checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>	2014-07-25 07:39:00 -04:00
Christoph Hellwig	cf68d334dd	scsi: push host_lock down into scsi_{host,target}_queue_ready Prepare for not taking a host-wide lock in the dispatch path by pushing the lock down into the places that actually need it. Note that this patch is just a preparation step, as it will actually increase lock roundtrips and thus decrease performance on its own. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>	2014-07-25 07:38:53 -04:00
Christoph Hellwig	3b5382c459	scsi: set ->scsi_done before calling scsi_dispatch_cmd The blk-mq code path will set this to a different function, so make the code simpler by setting it up in a legacy-request specific place. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>	2014-07-25 07:38:48 -04:00
Christoph Hellwig	d0d3bbf96e	scsi: centralize command re-queueing in scsi_dispatch_fn Make sure we only have the logic for requeing commands in one place. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>	2014-07-25 07:38:41 -04:00
Christoph Hellwig	de3e8bf331	scsi: split __scsi_queue_insert Factor out a helper to set the _blocked values, which we'll reuse for the blk-mq code path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Webb Scales <webbnh@hp.com> Acked-by: Jens Axboe <axboe@kernel.dk> Tested-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Robert Elliott <elliott@hp.com>	2014-07-25 07:38:35 -04:00
Christoph Hellwig	6af7a4ffa2	scsi: add scsi_setup_cmnd helper Factor out command setup code that will be shared with the blk-mq code path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Webb Scales <webbnh@hp.com>	2014-07-25 07:38:08 -04:00
Christoph Hellwig	4f1e576575	scsi: mark scsi_setup_blk_pc_cmnd static Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de>	2014-07-17 22:16:29 +02:00
Christoph Hellwig	5158a899d8	scsi: set sc_data_direction in common code The data direction fiel in the SCSI command is derived only from the block request structure. Move setting it up into common code instead of duplicating it in the ULDs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de>	2014-07-17 22:11:41 +02:00
Christoph Hellwig	3868cf8ea7	scsi: restructure command initialization for TYPE_FS requests We should call the device handler prep_fn for all TYPE_FS requests, not just simple read/write calls that are handled by the disk driver. Restructure the common I/O code to call the prep_fn handler and zero out the CDB, and just leave the call to scsi_init_io to the ULDs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de>	2014-07-17 22:11:27 +02:00
Christoph Hellwig	635d98b1d0	scsi: move the nr_phys_segments assert into scsi_init_io scsi_init_io should only be called for requests that transfer data, so move the assert that a request has segments from the callers into scsi_init_io. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de>	2014-07-17 22:10:53 +02:00
Maurizio Lombardi	e6c11dbb8d	scsi_lib: remove the description string in scsi_io_completion() During IO with fabric faults, one generally sees several "Unhandled error code" messages in the syslog as shown below: sd 4:0:6:2: [sdbw] Unhandled error code sd 4:0:6:2: [sdbw] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK sd 4:0:6:2: [sdbw] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00 end_request: I/O error, dev sdbw, sector 0 This comes from scsi_io_completion (in scsi_lib.c) while handling error codes other than DID_RESET or not deferred sense keys i.e. this is actually handled by the SCSI mid layer. But what gets displayed here is "Unhandled error code" which is quite misleading as it indicates something that is not addressed by the mid layer. The description string is based on the sense key and sometimes on the additional sense code; since the ACTION_FAIL case always prints the sense key and the additional sense code, this patch removes the description string completely because it does not add useful information. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-07-17 22:07:46 +02:00
Christoph Hellwig	f1bea55d5a	scsi: remove various exports that were only used by scsi_tgt Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de>	2014-07-17 22:07:45 +02:00
Hannes Reinecke	91921e016a	scsi: use dev_printk variants where possible Using dev_printk variants prefixes the logging message with the originating device, which makes debugging easier. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-07-17 22:07:42 +02:00
James Bottomley	89fb4cd1f7	scsi: handle flush errors properly Flush commands don't transfer data and thus need to be special cased in the I/O completion handler so that we can propagate errors to the block layer and filesystem. Signed-off-by: James Bottomley <JBottomley@Parallels.com> Reported-by: Steven Haber <steven@qumulo.com> Tested-by: Steven Haber <steven@qumulo.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de>	2014-07-17 21:56:34 +02:00
Linus Torvalds	23d4ed53b7	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block layer fixes from Jens Axboe: "Final small batch of fixes to be included before -rc1. Some general cleanups in here as well, but some of the blk-mq fixes we need for the NVMe conversion and/or scsi-mq. The pull request contains: - Support for not merging across a specified "chunk size", if set by the driver. Some NVMe devices perform poorly for IO that crosses such a chunk, so we need to support it generically as part of request merging avoid having to do complicated split logic. From me. - Bump max tag depth to 10Ki tags. Some scsi devices have a huge shared tag space. Before we failed with EINVAL if a too large tag depth was specified, now we truncate it and pass back the actual value. From me. - Various blk-mq rq init fixes from me and others. - A fix for enter on a dying queue for blk-mq from Keith. This is needed to prevent oopsing on hot device removal. - Fixup for blk-mq timer addition from Ming Lei. - Small round of performance fixes for mtip32xx from Sam Bradshaw. - Minor stack leak fix from Rickard Strandqvist. - Two __init annotations from Fabian Frederick" * 'for-linus' of git://git.kernel.dk/linux-block: block: add __init to blkcg_policy_register block: add __init to elv_register block: ensure that bio_add_page() always accepts a page for an empty bio blk-mq: add timer in blk_mq_start_request blk-mq: always initialize request->start_time block: blk-exec.c: Cleaning up local variable address returnd mtip32xx: minor performance enhancements blk-mq: ->timeout should be cleared in blk_mq_rq_ctx_init() blk-mq: don't allow queue entering for a dying queue blk-mq: bump max tag depth to 10K tags block: add blk_rq_set_block_pc() block: add notion of a chunk size for request merging	2014-06-11 08:41:17 -07:00
Linus Torvalds	1c54fc1efe	SCSI for-linus on 20140609 This patch consists of the usual driver updates (qla2xxx, qla4xxx, lpfc, be2iscsi, fnic, ufs, NCR5380) The NCR5380 is the addition to maintained status of a long neglected driver for older hardware. In addition there are a lot of minor fixes and cleanups and some more updates to make scsi mq ready. Signed-off-by: James Bottomley <JBottomley@Parallels.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJTlcwiAAoJEDeqqVYsXL0M5qsIALzVPLd4yxA16zCiaPQUeIV5 mfYmwISFlN+qW3AcUeSH4D13YgegCjEBfqaDMWvIkgouxLy/7jpxtChutq3MCzUE cDT1B9+ZrzoqBISRNHEh/gx5F1MOF2VPuqG2pe0J90wyRCNzJscB6PbtWMAo86CA 2eu7wq3K9FXxCC1qY0PzwBLXHqUcgk5GWiK9CM/k4W0NiTVeNmwPeh5i91IQnBHx E2l7NAXgNLyCf5tyeswvZ4pW0T3hlaswNmBB4qC8oJm4U6UqMN+tk4ML63Pz7uPe 4mlHG0uI8Vbdi13iv1EDUZ9Vo8iqVrzP2UAhakgP9poKSGqE4d/MD0EKNGQB2Es= =cBY8 -----END PGP SIGNATURE----- Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This patch consists of the usual driver updates (qla2xxx, qla4xxx, lpfc, be2iscsi, fnic, ufs, NCR5380) The NCR5380 is the addition to maintained status of a long neglected driver for older hardware. In addition there are a lot of minor fixes and cleanups and some more updates to make scsi mq ready" * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (130 commits) include/scsi/osd_protocol.h: remove unnecessary __constant mvsas: Recognise device/subsystem 9485/9485 as 88SE9485 Revert "be2iscsi: Fix processing cqe for cxn whose endpoint is freed" mptfusion: fix msgContext in mptctl_hp_hostinfo acornscsi: remove linked command support scsi/NCR5380: dprintk macro fusion: Remove use of DEF_SCSI_QCMD fusion: Add free msg frames to the head, not tail of list mpt2sas: Add free smids to the head, not tail of list mpt2sas: Remove use of DEF_SCSI_QCMD mpt2sas: Remove uses of serial_number mpt3sas: Remove use of DEF_SCSI_QCMD mpt3sas: Remove uses of serial_number qla2xxx: Use kmemdup instead of kmalloc + memcpy qla4xxx: Use kmemdup instead of kmalloc + memcpy qla2xxx: fix incorrect debug printk be2iscsi: Bump the driver version be2iscsi: Fix processing cqe for cxn whose endpoint is freed be2iscsi: Fix destroy MCC-CQ before MCC-EQ is destroyed be2iscsi: Fix memory corruption in MBX path ...	2014-06-09 18:54:06 -07:00
Jens Axboe	f27b087b81	block: add blk_rq_set_block_pc() With the optimizations around not clearing the full request at alloc time, we are leaving some of the needed init for REQ_TYPE_BLOCK_PC up to the user allocating the request. Add a blk_rq_set_block_pc() that sets the command type to REQ_TYPE_BLOCK_PC, and properly initializes the members associated with this type of request. Update callers to use this function instead of manipulating rq->cmd_type directly. Includes fixes from Christoph Hellwig <hch@lst.de> for my half-assed attempt. Signed-off-by: Jens Axboe <axboe@fb.com>	2014-06-06 07:57:37 -06:00
Linus Torvalds	681a289548	Merge branch 'for-3.16/core' of git://git.kernel.dk/linux-block into next Pull block core updates from Jens Axboe: "It's a big(ish) round this time, lots of development effort has gone into blk-mq in the last 3 months. Generally we're heading to where 3.16 will be a feature complete and performant blk-mq. scsi-mq is progressing nicely and will hopefully be in 3.17. A nvme port is in progress, and the Micron pci-e flash driver, mtip32xx, is converted and will be sent in with the driver pull request for 3.16. This pull request contains: - Lots of prep and support patches for scsi-mq have been integrated. All from Christoph. - API and code cleanups for blk-mq from Christoph. - Lots of good corner case and error handling cleanup fixes for blk-mq from Ming Lei. - A flew of blk-mq updates from me: * Provide strict mappings so that the driver can rely on the CPU to queue mapping. This enables optimizations in the driver. * Provided a bitmap tagging instead of percpu_ida, which never really worked well for blk-mq. percpu_ida relies on the fact that we have a lot more tags available than we really need, it fails miserably for cases where we exhaust (or are close to exhausting) the tag space. * Provide sane support for shared tag maps, as utilized by scsi-mq * Various fixes for IO timeouts. * API cleanups, and lots of perf tweaks and optimizations. - Remove 'buffer' from struct request. This is ancient code, from when requests were always virtually mapped. Kill it, to reclaim some space in struct request. From me. - Remove 'magic' from blk_plug. Since we store these on the stack and since we've never caught any actual bugs with this, lets just get rid of it. From me. - Only call part_in_flight() once for IO completion, as includes two atomic reads. Hopefully we'll get a better implementation soon, as the part IO stats are now one of the more expensive parts of doing IO on blk-mq. From me. - File migration of block code from {mm,fs}/ to block/. This includes bio.c, bio-integrity.c, bounce.c, and ioprio.c. From me, from a discussion on lkml. That should describe the meat of the pull request. Also has various little fixes and cleanups from Dave Jones, Shaohua Li, Duan Jiong, Fengguang Wu, Fabian Frederick, Randy Dunlap, Robert Elliott, and Sam Bradshaw" * 'for-3.16/core' of git://git.kernel.dk/linux-block: (100 commits) blk-mq: push IPI or local end_io decision to __blk_mq_complete_request() blk-mq: remember to start timeout handler for direct queue block: ensure that the timer is always added blk-mq: blk_mq_unregister_hctx() can be static blk-mq: make the sysfs mq/ layout reflect current mappings blk-mq: blk_mq_tag_to_rq should handle flush request block: remove dead code in scsi_ioctl:blk_verify_command blk-mq: request initialization optimizations block: add queue flag for disabling SG merging block: remove 'magic' from struct blk_plug blk-mq: remove alloc_hctx and free_hctx methods blk-mq: add file comments and update copyright notices blk-mq: remove blk_mq_alloc_request_pinned blk-mq: do not use blk_mq_alloc_request_pinned in blk_mq_map_request blk-mq: remove blk_mq_wait_for_tags blk-mq: initialize request in __blk_mq_alloc_request blk-mq: merge blk_mq_alloc_reserved_request into blk_mq_alloc_request blk-mq: add helper to insert requests from irq context blk-mq: remove stale comment for blk_mq_complete_request() blk-mq: allow non-softirq completions ...	2014-06-02 09:29:34 -07:00
Christoph Hellwig	a1b73fc194	scsi: reintroduce scsi_driver.init_command Instead of letting the ULD play games with the prep_fn move back to the model of a central prep_fn with a callback to the ULD. This already cleans up and shortens the code by itself, and will be required to properly support blk-mq in the SCSI midlayer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Hannes Reinecke <hare@suse.de>	2014-05-19 12:35:09 +02:00
Christoph Hellwig	bc85dc500f	scsi: remove scsi_end_request By folding scsi_end_request into its only caller we can significantly clean up the completion logic. We can use simple goto labels now to only have a single place to finish or requeue command there instead of the previous convoluted logic. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Hannes Reinecke <hare@suse.de>	2014-05-19 12:33:41 +02:00
Christoph Hellwig	c682adf3e1	scsi: explicitly release bidi buffers Instead of trying to guess when we have a BIDI buffer in scsi_release_buffers add a function to explicitly free the BIDI ressoures in the one place that handles them. This avoids needing a special __scsi_release_buffers for the case where we already have freed the request as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>	2014-05-19 12:33:41 +02:00
Alan Stern	644373a421	[SCSI] Fix command result state propagation We're seeing a case where the contents of scmd->result isn't being reset after a SCSI command encounters an error, is resubmitted, times out and then gets handled. The error handler acts on the stale result of the previous error instead of the timeout. Fix this by properly zeroing the scmd->status before the command is resubmitted. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2014-04-21 14:27:26 -07:00
Christoph Hellwig	68c03d9193	[SCSI] don't reference freed command in scsi_prep_return Patch commit `0479633686` Author: Christoph Hellwig <hch@infradead.org> Date: Thu Feb 20 14:20:55 2014 -0800 [SCSI] do not manipulate device reference counts in scsi_get/put_command Introduced a use after free:I in the kill case of scsi_prep_return we have to release our device reference, but we do this trying to reference the just freed command. Use the local sdev pointer instead. Fixes: `0479633686` Reported-by: Joe Lawrence <joe.lawrence@stratus.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2014-04-21 07:57:22 -07:00
Christoph Hellwig	5e012aad85	[SCSI] don't reference freed command in scsi_init_sgtable Patch commit `0479633686` Author: Christoph Hellwig <hch@infradead.org> Date: Thu Feb 20 14:20:55 2014 -0800 [SCSI] do not manipulate device reference counts in scsi_get/put_command Introduced a use after free: when scsi_init_io fails we have to release our device reference, but we do this trying to reference the just freed command. Add a local scsi_device pointer to fix this. Fixes: `0479633686` Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2014-04-21 07:57:21 -07:00
Jens Axboe	b4f42e2831	block: remove struct request buffer member This was used in the olden days, back when onions were proper yellow. Basically it mapped to the current buffer to be transferred. With highmem being added more than a decade ago, most drivers map pages out of a bio, and rq->buffer isn't pointing at anything valid. Convert old style drivers to just use bio_data(). For the discard payload use case, just reference the page in the bio. Signed-off-by: Jens Axboe <axboe@fb.com>	2014-04-15 14:03:02 -06:00
Jens Axboe	f89e0dd9d1	Linux 3.15-rc1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJTSv89AAoJEHm+PkMAQRiGd7AIAKL45dFRhX96W53uzGlpXtv7 Ecs9CMY5mtsB/rqtV/NSQaUELlNb4Ilb4lITh7NaLWAZxDJ12GwVsIbFoaBj7Ypx tfBNNxffGGnTTn2C1PpPQmLytLBvqXIVHMaPpDvnHYJl6g9BihshLzyMrsrizqnA DPJ0xMdgGp6BLC4qm0ZH1mS2q+TyXLN2ZCjJ1lp6NNYwBnwOe/ABjnUa0Ztu7aii 837N8h6nEuKHTr6DgrYHEgYVc3DPHPHaly/UJ8v4U30myzRv83YkD5DsBSUjSLj8 CzxJEnECa1XljLJK7SjRHy5pSP9tcUlmjx3Fk7IpQixmT+A15cov6fQ967jNlDw= =Hxnc -----END PGP SIGNATURE----- Merge tag 'v3.15-rc1' into for-3.16/core We don't like this, but things have diverged with the blk-mq fixes in 3.15-rc1. So merge it in.	2014-04-15 14:02:24 -06:00
Martin K. Petersen	2bfad21ecc	scsi: Make sure cmd_flags are 64-bit cmd_flags in struct request is now 64 bits wide but the scsi_execute functions truncated arguments passed to int leading to errors. Make sure the flags parameters are u64. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Jens Axboe <axboe@fb.com> CC: Jan Kara <jack@suse.cz> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-04-09 20:26:20 -06:00
Jens Axboe	59c3d45e48	block: remove 'q' parameter from kblockd_schedule_*_work() The queue parameter is never used, just get rid of it. Signed-off-by: Jens Axboe <axboe@fb.com>	2014-04-09 10:17:00 -06:00
Christoph Hellwig	134997a041	[SCSI] remove a useless get/put_device pair in scsi_requeue_command Avoid a spurious device get/put pair by cleaning up scsi_requeue_command and folding scsi_unprep_request into it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2014-03-15 10:19:25 -07:00
Bart Van Assche	27e9e0f12a	[SCSI] remove a useless get/put_device pair in scsi_next_command Eliminate a get_device() / put_device() pair from scsi_next_command(). Both are atomic operations hence removing these slightly improves performance. [hch: slight changes due to different context] Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2014-03-15 10:19:25 -07:00
Bart Van Assche	613be1f626	[SCSI] remove a useless get/put_device pair in scsi_request_fn SCSI devices may only be removed by calling scsi_remove_device(). That function must invoke blk_cleanup_queue() before the final put of sdev->sdev_gendev. Since blk_cleanup_queue() waits for the block queue to drain and then tears it down, scsi_request_fn cannot be active anymore after blk_cleanup_queue() has returned and hence the get_device()/put_device() pair in scsi_request_fn is unnecessary. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2014-03-15 10:19:25 -07:00
Christoph Hellwig	0479633686	[SCSI] do not manipulate device reference counts in scsi_get/put_command Many callers won't need this and we can optimize them away. In addition the handling in the __-prefixed variants was inconsistant to start with. Based on an earlier patch from Bart Van Assche. [jejb: fix kerneldoc probelm picked up by Fengguang Wu] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2014-03-15 10:19:24 -07:00
Christoph Hellwig	21a05df547	[SCSI] avoid taking host_lock in scsi_run_queue unless nessecary If we don't have starved devices we don't need to take the host lock to iterate over them. Also split the function up to be more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2014-03-15 10:19:24 -07:00
Eiichi Tsukata	ee60b2c52e	[SCSI] Add timeout to avoid infinite command retry Currently, scsi error handling in scsi_io_completion() tries to unconditionally requeue scsi command when device keeps some error state. For example, UNIT_ATTENTION causes infinite retry with action == ACTION_RETRY. This is because retryable errors are thought to be temporary and the scsi device will soon recover from those errors. Normally, such retry policy is appropriate because the device will soon recover from temporary error state. But there is no guarantee that device is able to recover from error state immediately. Some hardware error can prevent device from recovering. This patch adds timeout in scsi_io_completion() to avoid infinite command retry in scsi_io_completion(). Once scsi command retry time is longer than this timeout, the command is treated as failure. Signed-off-by: Eiichi Tsukata <eiichi.tsukata.xh@hitachi.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2014-03-15 10:19:19 -07:00
Russell King	e83b366487	Fix uses of dma_max_pfn() when converting to a limiting address We must use a 64-bit for this, otherwise overflowed bits get lost, and that can result in a lower than intended value set. Fixes: `8e0cb8a1f6` ("ARM: 7797/1: mmc: Use dma_max_pfn(dev) helper for bounce_limit calculations") Fixes: `7d35496dd9` ("ARM: 7796/1: scsi: Use dma_max_pfn(dev) helper for bounce_limit calculations") Tested-Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2014-02-17 23:08:41 +00:00
Santosh Shilimkar	7d35496dd9	ARM: 7796/1: scsi: Use dma_max_pfn(dev) helper for bounce_limit calculations DMA bounce limit is the maximum direct DMA'able memory beyond which bounce buffers has to be used to perform dma operations. SCSI driver relies on dma_mask but its calculation is based on max_*pfn which don't have uniform meaning across architectures. So make use of dma_max_pfn() which is expected to return the DMAable maximum pfn value across architectures. Cc: linux-scsi@vger.kernel.org Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2013-10-31 14:49:26 +00:00
Linus Torvalds	357397a141	Merge branch 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata changes from Tejun Heo: "Two interesting changes. - libata acpi handling has been restructured so that the association between ata devices and ACPI handles are less convoluted. This change shouldn't change visible behavior. - Queued TRIM support, which enables sending TRIM to the device without draining in-flight RW commands, is added. Currently only enabled for ahci (and likely to stay that way for the foreseeable future). Other changes are driver-specific updates / fixes" * 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: libata: bugfix: Remove __le32 in ata_tf_to_fis() libata: acpi: Remove ata_dev_acpi_handle stub in libata.h libata: Add support for queued DSM TRIM libata: Add support for SEND/RECEIVE FPDMA QUEUED libata: Add H2D FIS "auxiliary" port flag libata: Populate host-to-device FIS "auxiliary" field ata: acpi: rework the ata acpi bind support sata, highbank: send extra clock cycles in SGPIO patterns sata, highbank: set tx_atten override bits devicetree: create a separate binding description for sata_highbank drivers/ata/sata_rcar.c: simplify use of devm_ioremap_resource sata highbank: enable 64-bit DMA mask when using LPAE ata: pata_samsung_cf: add missing __iomem annotation ata: pata_arasan: Staticize local symbols sata_mv: Remove unneeded CONFIG_HAVE_CLK ifdefs ata: use dev_get_platdata() sata_mv: Remove unneeded forward declaration libata: acpi: remove dead code for ata_acpi_(un)bind libata: move 'struct ata_taskfile' and friends from ata.h to libata.h	2013-09-03 18:19:53 -07:00
Ewan D. Milne	279afdfe78	[SCSI] Generate uevents on certain unit attention codes Generate a uevent when the following Unit Attention ASC/ASCQ codes are received: 2A/01 MODE PARAMETERS CHANGED 2A/09 CAPACITY DATA HAS CHANGED 38/07 THIN PROVISIONING SOFT THRESHOLD REACHED 3F/03 INQUIRY DATA HAS CHANGED 3F/0E REPORTED LUNS DATA HAS CHANGED Log kernel messages when the following Unit Attention ASC/ASCQ codes are received that are not as specific as those above: 2A/xx PARAMETERS CHANGED 3F/xx TARGET OPERATING CONDITIONS HAVE CHANGED Added logic to set expecting_lun_change for other LUNs on the target after REPORTED LUNS DATA HAS CHANGED is received, so that duplicate uevents are not generated, and clear expecting_lun_change when a REPORT LUNS command completes, in accordance with the SPC-3 specification regarding reporting of the 3F 0E ASC/ASCQ UA. [jejb: remove SPC3 test in scsi_report_lun_change and some docbook fixes and unused variable fix, both reported by Fengguang Wu] Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-08-26 18:52:27 +04:00
Hannes Reinecke	7e782af576	[SCSI] Return ENODATA on medium error When a medium error is detected the SCSI stack should return ENODATA to the upper layers. [jejb: fix whitespace error] Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-08-23 12:54:53 -04:00
Hannes Reinecke	a9d6ceb838	[SCSI] return ENOSPC on thin provisioning failure When the thin provisioning hard threshold is reached we should return ENOSPC to inform upper layers about this fact. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-08-23 12:43:54 -04:00
Hannes Reinecke	0f7f6234d3	[SCSI] Document enhanced error codes Document the various error codes returned on I/O failure. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-08-23 12:31:59 -04:00
Aaron Lu	f1bc1e4c44	ata: acpi: rework the ata acpi bind support Binding ACPI handle to SCSI device has several drawbacks, namely: 1 During ATA device initialization time, ACPI handle will be needed while SCSI devices are not created yet. So each time ACPI handle is needed, instead of retrieving the handle by ACPI_HANDLE macro, a namespace scan is performed to find the handle for the corresponding ATA device. This is inefficient, and also expose a restriction on calling path not holding any lock. 2 The binding to SCSI device tree makes code complex, while at the same time doesn't bring us any benefit. All ACPI handlings are still done in ATA module, not in SCSI. Rework the ATA ACPI binding code to bind ACPI handle to ATA transport devices(ATA port and ATA device). The binding needs to be done only once, since the ATA transport devices do not go away with hotplug. And due to this, the flush_work call in hotplug handler for ATA bay is no longer needed. Tested on an Intel test platform for binding and runtime power off for ODD(ZPODD) and hard disk; on an ASUS S400C for binding and normal boot and S3, where its SATA port node has _SDD and _GTF control methods when configured as an AHCI controller and its PATA device node has _GTF control method when configured as an IDE controller. SATA PMP binding and ATA hotplug is not tested. Signed-off-by: Aaron Lu <aaron.lu@intel.com> Tested-by: Dirk Griesbach <spamthis@freenet.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-08-23 12:09:23 -04:00
Bart Van Assche	0516c08d10	[SCSI] enable destruction of blocked devices which fail LUN scanning If something goes wrong during LUN scanning, e.g. a transport layer failure occurs, then __scsi_remove_device() can get invoked by the LUN scanning code for a SCSI device in state SDEV_CREATED_BLOCK and before the SCSI device has been added to sysfs (is_visible == 0). Make sure that even in this case the transition into state SDEV_DEL occurs. This avoids that __scsi_remove_device() can get invoked a second time by scsi_forget_host() if this last function is invoked from another thread than the thread that performs LUN scanning. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-07-09 12:14:09 +01:00
James Bottomley	e2eb7244bc	[SCSI] Fix race between starved list and device removal scsi_run_queue() examines all SCSI devices that are present on the starved list. Since scsi_run_queue() unlocks the SCSI host lock a SCSI device can get removed after it has been removed from the starved list and before its queue is run. Protect against that race condition by holding a reference on the queue while running it. Reported-by: Chanho Min <chanho.min@lge.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-07-09 12:14:08 +01:00
Lin Ming	9b21493c45	[SCSI] sd: use REQ_PM in sd's runtime suspend operation With the introduction of REQ_PM, modify sd's runtime suspend operation functions to use that flag so that the operations to put the device into runtime suspended state(i.e. sync cache and stop device) will not affect its runtime PM status. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Aaron Lu <aaron.lu@intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2013-05-06 12:48:17 -07:00
Rafael J. Wysocki	53540098b2	ACPI / glue: Add .match() callback to struct acpi_bus_type USB uses the .find_bridge() callback from struct acpi_bus_type incorrectly, because as a result of the way it is used by USB every device in the system that doesn't have a bus type or parent is passed to usb_acpi_find_device() for inspection. What USB actually needs, though, is to call usb_acpi_find_device() for USB ports that don't have a bus type defined, but have usb_port_device_type as their device type, as well as for USB devices. To fix that replace the struct bus_type pointer in struct acpi_bus_type used for matching devices to specific subsystems with a .match() callback to be used for this purpose and update the users of struct acpi_bus_type, including USB, accordingly. Define the .match() callback routine for USB, usb_acpi_bus_match(), in such a way that it will cover both USB devices and USB ports and remove the now redundant .find_bridge() callback pointer from usb_acpi_bus. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Jeff Garzik <jgarzik@pobox.com>	2013-03-04 14:23:40 +01:00
Aaron Lu	6f4c827e68	[libata] scsi: no poll when ODD is powered off When the ODD is powered off, any action the user did to the ODD that would generate a media event will trigger an ACPI interrupt, so the poll for media event is no longer necessary. And the poll will also cause a runtime status change, which will stop the ODD from staying in powered off state, so the poll should better be stopped. But since we don't have access to the gendisk structure in LLDs, here comes the disk_events_disable_depth for scsi device. This field is a hint set by LLDs to convey information to upper layer drivers. A value of 0 means media poll is necessary for the device, while values above 0 means media poll is not needed and should better be skipped. So we can increase its value when we are to power off the ODD in ATA layer and decrease its value when the ODD is powered on, effectively silence the media events poll. Signed-off-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2013-01-25 15:36:43 -05:00
Linus Torvalds	60da5bf47d	Merge branch 'for-3.8/core' of git://git.kernel.dk/linux-block Pull block layer core updates from Jens Axboe: "Here are the core block IO bits for 3.8. The branch contains: - The final version of the surprise device removal fixups from Bart. - Don't hide EFI partitions under advanced partition types. It's fairly wide spread these days. This is especially dangerous for systems that have both msdos and efi partition tables, where you want to keep them in sync. - Cleanup of using -1 instead of the proper NUMA_NO_NODE - Export control of bdi flusher thread CPU mask and default to using the home node (if known) from Jeff. - Export unplug tracepoint for MD. - Core improvements from Shaohua. Reinstate the recursive merge, as the original bug has been fixed. Add plugging for discard and also fix a problem handling non pow-of-2 discard limits. There's a trivial merge in block/blk-exec.c due to a fix that went into 3.7-rc at a later point than -rc4 where this is based." * 'for-3.8/core' of git://git.kernel.dk/linux-block: block: export block_unplug tracepoint block: add plug for blkdev_issue_discard block: discard granularity might not be power of 2 deadline: Allow 0ms deadline latency, increase the read speed partitions: enable EFI/GPT support by default bsg: Remove unused function bsg_goose_queue() block: Make blk_cleanup_queue() wait until request_fn finished block: Avoid scheduling delayed work on a dead queue block: Avoid that request_fn is invoked on a dead queue block: Let blk_drain_queue() caller obtain the queue lock block: Rename queue dead flag bdi: add a user-tunable cpu_list for the bdi flusher threads block: use NUMA_NO_NODE instead of -1 block: recursive merge requests block CFQ: avoid moving request to different queue	2012-12-17 08:27:23 -08:00
Bart Van Assche	3f3299d5c0	block: Rename queue dead flag QUEUE_FLAG_DEAD is used to indicate that queuing new requests must stop. After this flag has been set queue draining starts. However, during the queue draining phase it is still safe to invoke the queue's request_fn, so QUEUE_FLAG_DYING is a better name for this flag. This patch has been generated by running the following command over the kernel source tree: git grep -lEw 'blk_queue_dead\|QUEUE_FLAG_DEAD' \| xargs sed -i.tmp -e 's/blk_queue_dead/blk_queue_dying/g' \ -e 's/QUEUE_FLAG_DEAD/QUEUE_FLAG_DYING/g'; \ sed -i.tmp -e "s/QUEUE_FLAG_DYING$(printf \\t)*5/QUEUE_FLAG_DYING$(printf \\t)5/g" \ include/linux/blkdev.h; \ sed -i.tmp -e 's/ DEAD/ DYING/g' -e 's/dead queue/a dying queue/' \ -e 's/Dead queue/A dying queue/' block/blk-core.c Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: James Bottomley <JBottomley@Parallels.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Jens Axboe <axboe@kernel.dk> Cc: Chanho Min <chanho.min@lge.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-12-06 14:30:58 +01:00
Martin K. Petersen	5db44863b6	[SCSI] sd: Implement support for WRITE SAME Implement support for WRITE SAME(10) and WRITE SAME(16) in the SCSI disk driver. - We set the default maximum to 0xFFFF because there are several devices out there that only support two-byte block counts even with WRITE SAME(16). We only enable transfers bigger than 0xFFFF if the device explicitly reports MAXIMUM WRITE SAME LENGTH in the BLOCK LIMITS VPD. - max_write_same_blocks can be overriden per-device basis in sysfs. - The UNMAP discovery heuristics remain unchanged but the discard limits are tweaked to match the "real" WRITE SAME commands. - In the error handling logic we now distinguish between WRITE SAME with and without UNMAP set. The discovery process heuristics are: - If the device reports a SCSI level of SPC-3 or greater we'll issue READ SUPPORTED OPERATION CODES to find out whether WRITE SAME(16) is supported. If that's the case we will use it. - If the device supports the block limits VPD and reports a MAXIMUM WRITE SAME LENGTH bigger than 0xFFFF we will use WRITE SAME(16). - Otherwise we will use WRITE SAME(10) unless the target LBA is beyond 0xFFFFFFFF or the block count exceeds 0xFFFF. - no_write_same is set for ATA, FireWire and USB. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-11-13 22:45:42 -08:00
James Bottomley	fe709ed827	Merge SCSI misc branch into isci-for-3.6 tag	2012-10-02 08:55:12 +01:00
Vikas Chaudhary	0e58076b37	[SCSI] scsi_lib: Set the device state from transport-offline to running FC and iSCSI class set SCSI devices to transport-offline state after fast_io_fail/replacement_timeout has fired, but after relogin, function scsi_internal_device_unblock() is not setting scsi device state to running. Due to this the devices even after being relogged in remain offline. Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-09-14 17:59:22 +01:00
Mike Snitzer	27c419739b	[SCSI] scsi_lib: fix scsi_io_completion's SG_IO error propagation The following v3.4-rc1 commit unmasked an existing bug in scsi_io_completion's SG_IO error handling: `47ac56d` [SCSI] scsi_error: classify some ILLEGAL_REQUEST sense as a permanent TARGET_ERROR Given that certain ILLEGAL_REQUEST are now properly categorized as TARGET_ERROR the host_byte is being set (before host_byte wasn't ever set for these ILLEGAL_REQUEST). In scsi_io_completion, initialize req->errors with cmd->result _after_ the SG_IO block that calls __scsi_error_from_host_byte (which may modify the host_byte). Before this fix: cdb to send: 12 01 01 00 00 00 ioctl(3, SG_IO, {'S', SG_DXFER_NONE, cmd[6]=[12, 01, 01, 00, 00, 00], mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=20000, flags=0, status=02, masked_status=01, sb[19]=[70, 00, 05, 00, 00, 00, 00, 0b, 00, 00, 00, 00, 24, 00, 00, 00, 00, 00, 00], host_status=0x10, driver_status=0x8, resid=0, duration=0, info=0x1}) = 0 SCSI Status: Check Condition Sense Information: sense buffer empty After: cdb to send: 12 01 01 00 00 00 ioctl(3, SG_IO, {'S', SG_DXFER_NONE, cmd[6]=[12, 01, 01, 00, 00, 00], mx_sb_len=32, iovec_count=0, dxfer_len=0, timeout=20000, flags=0, status=02, masked_status=01, sb[19]=[70, 00, 05, 00, 00, 00, 00, 0b, 00, 00, 00, 00, 24, 00, 00, 00, 00, 00, 00], host_status=0, driver_status=0x8, resid=0, duration=0, info=0x1}) = 0 SCSI Status: Check Condition Sense Information: Fixed format, current; Sense key: Illegal Request Additional sense: Invalid field in cdb Raw sense data (in hex): 70 00 05 00 00 00 00 0b 00 00 00 00 24 00 00 00 00 00 00 Reported-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Babu Moger <babu.moger@netapp.com> Cc: stable@vger.kernel.org # 3.4 Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-08-22 09:42:31 +04:00
Jeff Garzik	8407884dd9	Merge branch 'master' [vanilla Linus master] into libata-dev.git/upstream Two bits were appended to the end of the bitfield list in struct scsi_device. Resolve that conflict by including both bits. Conflicts: include/scsi/scsi_device.h	2012-07-25 15:58:48 -04:00
Bart Van Assche	b485462aca	[SCSI] Stop accepting SCSI requests before removing a device Avoid that the code for requeueing SCSI requests triggers a crash by making sure that that code isn't scheduled anymore after a device has been removed. Also, source code inspection of __scsi_remove_device() revealed a race condition in this function: no new SCSI requests must be accepted for a SCSI device after device removal started. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-07-20 08:58:41 +01:00
Bart Van Assche	84feb1664e	[SCSI] Change return type of scsi_queue_insert() into void The return value of scsi_queue_insert() is ignored by all its callers, hence change the return type of this function into void. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-07-20 08:58:41 +01:00
Bart Van Assche	940f5d47e2	[SCSI] Avoid dangling pointer in scsi_requeue_command() When we call scsi_unprep_request() the command associated with the request gets destroyed and therefore drops its reference on the device. If this was the only reference, the device may get released and we end up with a NULL pointer deref when we call blk_requeue_request. Reported-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: <stable@kernel.org> [jejb: enhance commend and add commit log for stable] Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-07-20 08:58:40 +01:00
Bart Van Assche	67bd941300	[SCSI] Fix device removal NULL pointer dereference Use blk_queue_dead() to test whether the queue is dead instead of !sdev. Since scsi_prep_fn() may be invoked concurrently with __scsi_remove_device(), keep the queuedata (sdev) pointer in __scsi_remove_device(). This patch fixes a kernel oops that can be triggered by USB device removal. See also http://www.spinics.net/lists/linux-scsi/msg56254.html. Other changes included in this patch: - Swap the blk_cleanup_queue() and kfree() calls in scsi_host_dev_release() to make that code easier to grasp. - Remove the queue dead check from scsi_run_queue() since the queue state can change anyway at any point in that function where the queue lock is not held. - Remove the queue dead check from the start of scsi_request_fn() since it is redundant with the scsi_device_online() check. Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: <stable@kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-07-20 08:58:40 +01:00
Mike Christie	d075498c98	[SCSI] remove old comment from block/unblock functions We do not hold the host lock when calling these functions, so remove comment. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-07-20 08:58:23 +01:00
Mike Christie	5d9fb5cc1b	[SCSI] core, classes, mpt2sas: have scsi_internal_device_unblock take new state This has scsi_internal_device_unblock/scsi_target_unblock take the new state to set the devices as an argument instead of always setting to running. The patch also converts users of these functions. This allows the FC and iSCSI class to transition devices from blocked to transport-offline, so that when fast_io_fail/replacement_timeout has fired we do not set the devices back to running. Instead, we set them to SDEV_TRANSPORT_OFFLINE. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-07-20 08:58:22 +01:00
Mike Christie	1b8d262061	[SCSI] add new SDEV_TRANSPORT_OFFLINE state This patch adds a new state SDEV_TRANSPORT_OFFLINE. It will be used by transport classes to offline devices for cases like when the fast_io_fail/recovery_tmo fires. In those cases we want all IO to fail, and we have not yet escalated to dev_loss_tmo behavior where we are removing the devices. Currently to handle this state, transport classes are setting the scsi_device's state to running, setting their internal session/port structs state to something that indicates failed, and then failing IO from some transport check in the queuecommand. The reason for the new value is so that users can distinguish between a device failure that is a result of a transport problem vs the wide range of errors that devices get offlined for when a scsi command times out and we offline the devices there. It also fixes the confusion as to why the transport class is failing IO, but has set the device state from blocked to running. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-07-20 08:58:21 +01:00
Holger Macht	de50ada55b	[SCSI] add wrapper to access and set scsi_bus_type in struct acpi_bus_type For being able to bind ata devices against acpi devices, scsi_bus_type needs to be set as bus in struct acpi_bus_type. So add wrapper to scsi_lib to accomplish that. Signed-off-by: Holger Macht <holger@homac.de> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2012-06-29 11:38:09 -04:00
Jun'ichi Nomura	b7e94a1686	[SCSI] Fix dm-multipath starvation when scsi host is busy block congestion control doesn't have any concept of fairness across multiple queues. This means that if SCSI reports the host as busy in the queue congestion control it can result in an unfair starvation situation in dm-mp if there are multiple multipath devices on the same host. For example: http://www.redhat.com/archives/dm-devel/2012-May/msg00123.html The fix for this is to report only the sdev busy state (and ignore the host busy state) in the block congestion control call back. The host is still congested, but the SCSI subsystem will sort out the congestion in a fair way because it knows the relation between the queues and the host. [jejb: fixed up trailing whitespace] Reported-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Tested-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-05-23 09:34:17 +01:00
James Bottomley	e346933365	isci update for 3.5 1/ Rework remote-node-context (RNC) handling for proper management of the silicon state machine in error handling and hot-plug conditions. Further details below, suffice to say if the RNC is mismanaged the silicon state machines may lock up. 2/ Refactor the initialization code to be reused for suspend/resume support 3/ Miscellaneous bug fixes to address discovery issues and hardware compatibility. RNC rework details from Jeff Skirvin: In the controller, devices as they appear on a SAS domain (or direct-attached SATA devices) are represented by memory structures known as "Remote Node Contexts" (RNCs). These structures are transferred from main memory to the controller using a set of register commands; these commands include setting up the context ("posting"), removing the context ("invalidating"), and commands to control the scheduling of commands and connections to that remote device ("suspensions" and "resumptions"). There is a similar path to control RNC scheduling from the protocol engine, which interprets the results of command and data transmission and reception. In general, the controller chooses among non-suspended RNCs to find one that has work requiring scheduling the transmission of command and data frames to a target. Likewise, when a target tries to return data back to the initiator, the state of the RNC is used by the controller to determine how to treat the incoming request. As an example, if the RNC is in the state "TX/RX Suspended", incoming SSP connection requests from the target will be rejected by the controller hardware. When an RNC is "TX Suspended", it will not be selected by the controller hardware to start outgoing command or data operations (with certain priority-based exceptions). As mentioned above, there are two sources for management of the RNC states: commands from driver software, and the result of transmission and reception conditions of commands and data signaled by the controller hardware. As an example of the latter, if an outgoing SSP command ends with a OPEN_REJECT(BAD_DESTINATION) status, the RNC state will transition to the "TX Suspended" state, and this is signaled by the controller hardware in the status to the completion of the pending command as well as signaled in a controller hardware event. Examples of the former are included in the patch changelogs. Driver software is required to suspend the RNC in a "TX/RX Suspended" condition before any outstanding commands can be terminated. Failure to guarantee this can lead to a complete hardware hang condition. Earlier versions of the driver software did not guarantee that an RNC was correctly managed before I/O termination, and so operated in an unsafe way. Further, the driver performed unnecessary contortions to preserve the remote device command state and so was more complicated than it needed to be. A simplifying driver assumption is that once an I/O has entered the error handler path without having completed in the target, the requirement on the driver is that all use of the sas_task must end. Beyond that, recovery of operation is dependent on libsas and other components to reset, rediscover and reconfigure the device before normal operation can restart. In the driver, this simplifying assumption meant that the RNC management could be reduced to entry into the suspended state, terminating the targeted I/O request, and resuming the RNC as needed for device-specific management such as an SSP Abort Task or LUN Reset Management request. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJPtYFXAAoJEB7SkWpmfYgCJkcP/1VvsuuitNy/YM9P1tb/RvQ7 ytJzjGtWiAABHVWwjgB+Ng7hUTaP2r6l8KeNfwxpwXyNdBAUNysYEUBHAfPsKzKz espTmw3wVCnREajgKXZwFp9aTj8DcYFB6vKcC/ddACt3uRNjjA9En36+6797r8Vg YdebyjFX2FxwoUj0icTUiV/OXgb8w723imnCl8bfOhhFRi4eFZ4EJ23AdMkUya1i uYePAPJPSQJuU/87gNIx4JcR0qHJ1ziGPEY+XC47CzEeXbBTSPgWOwanQ6KPoXRJ XVxamfcKAjRdtwQ4m1vYBSE32RTdrjhujVbkGiPi6QaEbCLjhLSCIYyuS3XMckV+ TCZ16o5kd/I6ZtZOeP4zZRGnNBkOPzY44qiJeKffjWDhTrFacx4XWJB/ftWPgEA5 N2zFH3RM4sY0FUJ3I/Qe5CERNdCXMtcj+UAf3nHpAIVcv46Lp+qoSkdEx05uuuiN +D/dSlubktuvuzmB5WisL3qrjNEkkLTAGQpZs1j0ojLEBm0XAgV5EzqmHiZ0GOPD OQNFxeei9SlqgtKIIP0bymRispPrG2HVCOvExYMxzKR6fjxofZLAs/aWOsdhxgMq TlAyZJ6OmGI+KX68HHzoMpT9iquvmP64WGkfHzCx296BfSKiruLh/Jzt5gGwv+Z1 5tlpnUr9dUTxx7qkQXvj =HYvO -----END PGP SIGNATURE----- Merge tag 'isci-for-3.5' into misc isci update for 3.5 1/ Rework remote-node-context (RNC) handling for proper management of the silicon state machine in error handling and hot-plug conditions. Further details below, suffice to say if the RNC is mismanaged the silicon state machines may lock up. 2/ Refactor the initialization code to be reused for suspend/resume support 3/ Miscellaneous bug fixes to address discovery issues and hardware compatibility. RNC rework details from Jeff Skirvin: In the controller, devices as they appear on a SAS domain (or direct-attached SATA devices) are represented by memory structures known as "Remote Node Contexts" (RNCs). These structures are transferred from main memory to the controller using a set of register commands; these commands include setting up the context ("posting"), removing the context ("invalidating"), and commands to control the scheduling of commands and connections to that remote device ("suspensions" and "resumptions"). There is a similar path to control RNC scheduling from the protocol engine, which interprets the results of command and data transmission and reception. In general, the controller chooses among non-suspended RNCs to find one that has work requiring scheduling the transmission of command and data frames to a target. Likewise, when a target tries to return data back to the initiator, the state of the RNC is used by the controller to determine how to treat the incoming request. As an example, if the RNC is in the state "TX/RX Suspended", incoming SSP connection requests from the target will be rejected by the controller hardware. When an RNC is "TX Suspended", it will not be selected by the controller hardware to start outgoing command or data operations (with certain priority-based exceptions). As mentioned above, there are two sources for management of the RNC states: commands from driver software, and the result of transmission and reception conditions of commands and data signaled by the controller hardware. As an example of the latter, if an outgoing SSP command ends with a OPEN_REJECT(BAD_DESTINATION) status, the RNC state will transition to the "TX Suspended" state, and this is signaled by the controller hardware in the status to the completion of the pending command as well as signaled in a controller hardware event. Examples of the former are included in the patch changelogs. Driver software is required to suspend the RNC in a "TX/RX Suspended" condition before any outstanding commands can be terminated. Failure to guarantee this can lead to a complete hardware hang condition. Earlier versions of the driver software did not guarantee that an RNC was correctly managed before I/O termination, and so operated in an unsafe way. Further, the driver performed unnecessary contortions to preserve the remote device command state and so was more complicated than it needed to be. A simplifying driver assumption is that once an I/O has entered the error handler path without having completed in the target, the requirement on the driver is that all use of the sas_task must end. Beyond that, recovery of operation is dependent on libsas and other components to reset, rediscover and reconfigure the device before normal operation can restart. In the driver, this simplifying assumption meant that the RNC management could be reduced to entry into the suspended state, terminating the targeted I/O request, and resuming the RNC as needed for device-specific management such as an SSP Abort Task or LUN Reset Management request.	2012-05-21 12:17:30 +01:00
Dan Williams	a7a20d1039	[SCSI] sd: limit the scope of the async probe domain sd injects and synchronizes probe work on the global kernel-wide domain. This runs into conflict with PM that wants to perform resume actions in async context: [ 494.237079] INFO: task kworker/u:3:554 blocked for more than 120 seconds. [ 494.294396] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 494.360809] kworker/u:3 D 0000000000000000 0 554 2 0x00000000 [ 494.420739] ffff88012e4d3af0 0000000000000046 ffff88013200c160 ffff88012e4d3fd8 [ 494.484392] ffff88012e4d3fd8 0000000000012500 ffff8801394ea0b0 ffff88013200c160 [ 494.548038] ffff88012e4d3ae0 00000000000001e3 ffffffff81a249e0 ffff8801321c5398 [ 494.611685] Call Trace: [ 494.632649] [<ffffffff8149dd25>] schedule+0x5a/0x5c [ 494.674687] [<ffffffff8104b968>] async_synchronize_cookie_domain+0xb6/0x112 [ 494.734177] [<ffffffff810461ff>] ? __init_waitqueue_head+0x50/0x50 [ 494.787134] [<ffffffff8131a224>] ? scsi_remove_target+0x48/0x48 [ 494.837900] [<ffffffff8104b9d9>] async_synchronize_cookie+0x15/0x17 [ 494.891567] [<ffffffff8104ba49>] async_synchronize_full+0x54/0x70 <-- here we wait for async contexts to complete [ 494.943783] [<ffffffff8104b9f5>] ? async_synchronize_full_domain+0x1a/0x1a [ 495.002547] [<ffffffffa00114b1>] sd_remove+0x2c/0xa2 [sd_mod] [ 495.051861] [<ffffffff812fe94f>] __device_release_driver+0x86/0xcf [ 495.104807] [<ffffffff812fe9bd>] device_release_driver+0x25/0x32 <-- here we take device_lock() [ 853.511341] INFO: task kworker/u:4:549 blocked for more than 120 seconds. [ 853.568693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 853.635119] kworker/u:4 D ffff88013097b5d0 0 549 2 0x00000000 [ 853.695129] ffff880132773c40 0000000000000046 ffff880130790000 ffff880132773fd8 [ 853.758990] ffff880132773fd8 0000000000012500 ffff88013288a0b0 ffff880130790000 [ 853.822796] 0000000000000246 0000000000000040 ffff88013097b5c8 ffff880130790000 [ 853.886633] Call Trace: [ 853.907631] [<ffffffff8149dd25>] schedule+0x5a/0x5c [ 853.949670] [<ffffffff8149cc44>] __mutex_lock_common+0x220/0x351 [ 854.001225] [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4 [ 854.049082] [<ffffffff81304bd7>] ? device_resume+0x58/0x1c4 [ 854.097011] [<ffffffff8149ce48>] mutex_lock_nested+0x2f/0x36 <-- here we wait for device_lock() [ 854.145591] [<ffffffff81304bd7>] device_resume+0x58/0x1c4 [ 854.192066] [<ffffffff81304d61>] async_resume+0x1e/0x45 [ 854.237019] [<ffffffff8104bc93>] async_run_entry_fn+0xc6/0x173 <-- ...while running in async context Provide a 'scsi_sd_probe_domain' so that async probe actions actions can be flushed without regard for the state of PM, and allow for the resume path to handle devices that have transitioned from SDEV_QUIESCE to SDEV_DEL prior to resume. Acked-by: Alan Stern <stern@rowland.harvard.edu> [alan: uplevel scsi_sd_probe_domain, clarify scsi_device_resume] Signed-off-by: Dan Williams <dan.j.williams@intel.com> [jejb: remove unneeded config guards in include file] Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-05-17 09:10:46 +01:00
Lin Ming	6f381fa344	[SCSI] scsi_lib: use correct DMA device in __scsi_alloc_queue Currently, __scsi_alloc_queue uses SCSI host's parent device as DMA device to set segment boundary. But the parent device may not refer to the DMA device. For example, for ATA disk, SCSI host's parent device now refers to ATA port. Since commit d139b9b([SCSI] scsi_lib_dma: fix bug with dma maps on nested scsi objects), a new field Scsi_Host->dma_dev was introduced to refer to the real DMA device. Use ->dma_dev in __scsi_alloc_queue to correctly set segment boundary. Bug report: http://marc.info/?l=linux-ide&m=133177818318187&w=2 Reported-and-tested-by: Jörg Sommer <joerg@alea.gnuu.de> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-04-22 18:56:18 +01:00
Linus Torvalds	424a6f6ef9	SCSI updates on 20120319 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQEcBAABAgAGBQJPZxSnAAoJEDeqqVYsXL0M0Y4IAMX0vrTVZbg6psA5/gMcWGRP CkFXEQ8n0PL2SCaj6BoDqamJFe5Nc7dnqxM0fGawB4S9vr3rHhiOlwO+NbV9zFYC 2skBTpeL3sjgtN/jTBdfeeAa7xTYpu/XGyei0NS1A5c2AyMVXV0uYV2s4VNZxe44 tVIn1OEzM2giZ9EB1OZslDMrg5XXm3MBIUECP0LbWUhBm/35caSFKzMXRwhh7WiK +AVmc2AZYtdEwuknDyiH7KlsaoB3vGL9pPrAUJzIgEhy2pOo2A7W72HfA4Fj+y6a uF9HBS5zciMp1+sGWry62AjNbWgin9BRlozBEO/lJhIfMGDV1nXEIJsOkOgkdoE= =1TxB -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 SCSI updates from James Bottomley: "The update includes the usual assortment of driver updates (lpfc, qla2xxx, qla4xxx, bfa, bnx2fc, bnx2i, isci, fcoe, hpsa) plus a huge amount of infrastructure work in the SAS library and transport class as well as an iSCSI update. There's also a new SCSI based virtio driver." * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (177 commits) [SCSI] qla4xxx: Update driver version to 5.02.00-k15 [SCSI] qla4xxx: trivial cleanup [SCSI] qla4xxx: Fix sparse warning [SCSI] qla4xxx: Add support for multiple session per host. [SCSI] qla4xxx: Export CHAP index as sysfs attribute [SCSI] scsi_transport: Export CHAP index as sysfs attribute [SCSI] qla4xxx: Add support to display CHAP list and delete CHAP entry [SCSI] iscsi_transport: Add support to display CHAP list and delete CHAP entry [SCSI] pm8001: fix endian issue with code optimization. [SCSI] pm8001: Fix possible racing condition. [SCSI] pm8001: Fix bogus interrupt state flag issue. [SCSI] ipr: update PCI ID definitions for new adapters [SCSI] qla2xxx: handle default case in qla2x00_request_firmware() [SCSI] isci: improvements in driver unloading routine [SCSI] isci: improve phy event warnings [SCSI] isci: debug, provide state-enum-to-string conversions [SCSI] scsi_transport_sas: 'enable' phys on reset [SCSI] libsas: don't recover end devices attached to disabled phys [SCSI] libsas: fixup target_port_protocols for expanders that don't report sata [SCSI] libsas: set attached device type and target protocols for local phys ...	2012-03-22 12:55:29 -07:00
Cong Wang	77dfce076c	scsi: remove the second argument of k[un]map_atomic() Signed-off-by: Cong Wang <amwang@redhat.com>	2012-03-20 21:48:19 +08:00
Martin K. Petersen	66a651aa7a	[SCSI] Ensure discard failure gets treated as a target problem The error reported up the stack for a discard failure did not clearly indicate that the command was processed and subsequently failed by the target device. Return -EREMOTEIO so multipathing does not classify this condition as a path failure. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-02-19 09:38:01 -06:00
Moger, Babu	2082ebc45a	[SCSI] fix the new host byte settings (DID_TARGET_FAILURE and DID_NEXUS_FAILURE) This patch fixes the host byte settings DID_TARGET_FAILURE and DID_NEXUS_FAILURE. The function __scsi_error_from_host_byte, tries to reset the host byte to DID_OK. But that does not happen because of the OR operation. Here is the flow. scsi_softirq_done-> scsi_decide_disposition -> __scsi_error_from_host_byte Let's take an example with DID_NEXUS_FAILURE. In scsi_decide_disposition, result will be set as DID_NEXUS_FAILURE (=0x11). Then in __scsi_error_from_host_byte, when we do OR with DID_OK. Purpose is to reset it back to DID_OK. But that does not happen. This patch fixes this issue. Signed-off-by: Babu Moger <babu.moger@netapp.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-02-19 08:08:59 -06:00
Shaohua Li	466c08c71a	[SCSI] don't change sdev starvation list order without request dispatched The sdev is deleted from starved list and then try to dispatch from this device. It's quite possible the sdev can't eventually dispatch a request, then the sdev will be in starved list tail. This isn't fair. There are two cases here: 1. unplug path. scsi_request_fn() calls to scsi_target_queue_ready(), then the dev is removed from starved list, but quite possible host queue isn't ready, the dev is moved to starved list without dispatching any request. 2. scsi_run_queue path. It deletes the dev from starved list first (both global and local starved lists), then handles the dev. Then we could have the same process like case 1. This patch fixes the first case. Case 2 isn't fixed, because there is a rare case scsi_run_queue finds host isn't busy but scsi_request_fn finds host is busy (other CPU is faster to get host queue depth). Not deleting the dev from starved list in scsi_run_queue will keep scsi_run_queue looping (though this is very rare case, because host will become busy). Fortunately fixing case 1 already gives big improvement for starvation in my test. In a 12 disk JBOD setup, running file creation under EXT4, this gives 12% more throughput. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2012-01-16 11:54:04 +04:00
Hannes Reinecke	745718132c	[SCSI] Silencing 'killing requests for dead queue' When we tear down a device we try to flush all outstanding commands in scsi_free_queue(). However the check in scsi_request_fn() is imperfect as it only signals that we _might start_ aborting commands, not that we've actually aborted some. So move the printk inside the scsi_kill_request function, this will also give us a hint about which commands are aborted. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-11-09 12:07:07 -06:00
Linus Torvalds	32aaeffbd4	Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h	2011-11-06 19:44:47 -08:00
Paul Gortmaker	09703660ed	scsi: Add export.h for EXPORT_SYMBOL/THIS_MODULE as required For the basic SCSI infrastructure files that are exporting symbols but not modules themselves, add in the basic export.h header file to allow the exports. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:31:23 -04:00
Bart Van Assche	3308511c93	[SCSI] Make scsi_free_queue() kill pending SCSI commands Make sure that SCSI device removal via scsi_remove_host() does finish all pending SCSI commands. Currently that's not the case and hence removal of a SCSI host during I/O can cause a deadlock. See also "blkdev_issue_discard() hangs forever if underlying storage device is removed" (http://bugzilla.kernel.org/show_bug.cgi?id=40472). See also http://lkml.org/lkml/2011/8/27/6. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@kernel.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-10-30 13:20:28 +04:00
James Smart	573e591353	[SCSI] scsi_lib: pause between error retries During cable pull tests on our 16G FC adapter, we are seeing errors, typically reads to close targets, which fail due to CRC or framing errors caused by the cable being pull (return status DID_ERROR). The adapter detects the error on one of the first frames received, marks the FC exchange as dead (further frames go to bit bucket) and signals the host of the error. This action is so quick, and coupled with fast host CPUs, creates a scenario in which the midlayer sees the failure and retries the io almost immediately. We've seen link traces with the retry on the link while the original i/o is still being processed by the target. We're also seeing the time window for the "link to pull-apart" and the physical interface to report disconnected to be in the few millisecond range. Which means, we're encountering scenarios where the full retry count is exhausted (all with error) by the midlayer before the link disconnect state is detected. We looked at 8G FC behavior and occasionally see the same behavior, but as the link was slower, it rarely could exhaust all retries before the link reported disconnect. What is needed is a slight delay between io retries due to DID_ERROR to cover this error. It is inappropriate to put this delay in the driver, as the error is indistinguishable from other link-related errors, nor does the driver track whether the io is a retry or not. This is also easier than tracking between-io-error bursts that are seen in this scenario. The patch below updates the retry path so that it inserts a delay as if the target was busy. The busy delay is on the order of 6ms. This delay is sufficient to ensure the link down condition is reported before the retry count is exhausted (at most 1 retry is seen). Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com> Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-07-27 14:06:01 +04:00
James Bottomley	bfe159a512	[SCSI] fix crash in scsi_dispatch_cmd() USB surprise removal of sr is triggering an oops in scsi_dispatch_command(). What seems to be happening is that USB is hanging on to a queue reference until the last close of the upper device, so the crash is caused by surprise remove of a mounted CD followed by attempted unmount. The problem is that USB doesn't issue its final commands as part of the SCSI teardown path, but on last close when the block queue is long gone. The long term fix is probably to make sr do the teardown in the same way as sd (so remove all the lower bits on ejection, but keep the upper disk alive until last close of user space). However, the current oops can be simply fixed by not allowing any commands to be sent to a dead queue. Cc: stable@kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>	2011-07-21 14:21:18 -07:00
Linus Torvalds	a2b9c1f620	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: don't delay blk_run_queue_async scsi: remove performance regression due to async queue run blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup block: rescan partitions on invalidated devices on -ENOMEDIA too cdrom: always check_disk_change() on open block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers	2011-05-18 06:49:02 -07:00
Jens Axboe	9937a5e2f3	scsi: remove performance regression due to async queue run Commit `c21e6beb` removed our queue request_fn re-enter protection, and defaulted to always running the queues from kblockd to be safe. This was a known potential slow down, but should be safe. Unfortunately this is causing big performance regressions for some, so we need to improve this logic. Looking into the details of the re-enter, the real issue is on requeue of requests. Requeue of requests upon seeing a BUSY condition from the device ends up re-running the queue, causing traces like this: scsi_request_fn() scsi_dispatch_cmd() scsi_queue_insert() __scsi_queue_insert() scsi_run_queue() scsi_request_fn() ... potentially causing the issue we want to avoid. So special case the requeue re-run of the queue, but improve it to offload the entire run of local queue and starved queue from a single workqueue callback. This is a lot better than potentially kicking off a workqueue run for each device seen. This also fixes the issue of the local device going into recursion, since the above mentioned commit never moved that queue run out of line. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-05-17 11:04:44 +02:00
James Bottomley	c055f5b261	[SCSI] fix oops in scsi_run_queue() The recent commit closing the race window in device teardown: commit `86cbfb5607` Author: James Bottomley <James.Bottomley@suse.de> Date: Fri Apr 22 10:39:59 2011 -0500 [SCSI] put stricter guards on queue dead checks is causing a potential NULL deref in scsi_run_queue() because the q->queuedata may already be NULL by the time this function is called. Since we shouldn't be running a queue that is being torn down, simply add a NULL check in scsi_run_queue() to forestall this. Tested-by: Jim Schutt <jaschut@sandia.gov> Cc: stable@kernel.org Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2011-05-03 15:30:00 -05:00
Jens Axboe	c21e6beba8	block: get rid of QUEUE_FLAG_REENTER We are currently using this flag to check whether it's safe to call into ->request_fn(). If it is set, we punt to kblockd. But we get a lot of false positives and excessive punts to kblockd, which hurts performance. The only real abuser of this infrastructure is SCSI. So export the async queue run and convert SCSI over to use that. There's room for improvement in that SCSI need not always use the async call, but this fixes our performance issue and they can fix that up in due time. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-04-19 13:32:46 +02:00
Christoph Hellwig	24ecfbe27f	block: add blk_run_queue_async Instead of overloading __blk_run_queue to force an offload to kblockd add a new blk_run_queue_async helper to do it explicitly. I've kept the blk_queue_stopped check for now, but I suspect it's not needed as the check we do when the workqueue items runs should be enough. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-04-18 11:41:33 +02:00
Linus Torvalds	6c51038900	Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits) Documentation/iostats.txt: bit-size reference etc. cfq-iosched: removing unnecessary think time checking cfq-iosched: Don't clear queue stats when preempt. blk-throttle: Reset group slice when limits are changed blk-cgroup: Only give unaccounted_time under debug cfq-iosched: Don't set active queue in preempt block: fix non-atomic access to genhd inflight structures block: attempt to merge with existing requests on plug flush block: NULL dereference on error path in __blkdev_get() cfq-iosched: Don't update group weights when on service tree fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away block: Require subsystems to explicitly allocate bio_set integrity mempool jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging fs: make fsync_buffers_list() plug mm: make generic_writepages() use plugging blk-cgroup: Add unaccounted time to timeslice_used. block: fixup plugging stubs for !CONFIG_BLOCK block: remove obsolete comments for blkdev_issue_zeroout. blktrace: Use rq->cmd_flags directly in blk_add_trace_rq. ... Fix up conflicts in fs/{aio.c,super.c}	2011-03-24 10:16:26 -07:00
Linus Torvalds	c55d267de2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (170 commits) [SCSI] scsi_dh_rdac: Add MD36xxf into device list [SCSI] scsi_debug: add consecutive medium errors [SCSI] libsas: fix ata list corruption issue [SCSI] hpsa: export resettable host attribute [SCSI] hpsa: move device attributes to avoid forward declarations [SCSI] scsi_debug: Logical Block Provisioning (SBC3r26) [SCSI] sd: Logical Block Provisioning update [SCSI] Include protection operation in SCSI command trace [SCSI] hpsa: fix incorrect PCI IDs and add two new ones (2nd try) [SCSI] target: Fix volume size misreporting for volumes > 2TB [SCSI] bnx2fc: Broadcom FCoE offload driver [SCSI] fcoe: fix broken fcoe interface reset [SCSI] fcoe: precedence bug in fcoe_filter_frames() [SCSI] libfcoe: Remove stale fcoe-netdev entries [SCSI] libfcoe: Move FCOE_MTU definition from fcoe.h to libfcoe.h [SCSI] libfc: introduce __fc_fill_fc_hdr that accepts fc_hdr as an argument [SCSI] fcoe, libfc: initialize EM anchors list and then update npiv EMs [SCSI] Revert "[SCSI] libfc: fix exchange being deleted when the abort itself is timed out" [SCSI] libfc: Fixing a memory leak when destroying an interface [SCSI] megaraid_sas: Version and Changelog update ... Fix up trivial conflicts due to whitespace differences in drivers/scsi/libsas/{sas_ata.c,sas_scsi_host.c}	2011-03-17 17:54:40 -07:00
Martin K. Petersen	c98a0eb0e9	[SCSI] sd: Logical Block Provisioning update SBC3r26 contains many changes to the Logical Block Provisioning interfaces (formerly known as Thin Provisioning ditto). This patch implements support for both the old and new schemes using the same heuristic as before (whether the LBP VPD page is present). The new code also allows the provisioning mode (i.e. choice of command) to be overridden on a per-device basis via sysfs. Two additional modes are supported in this version: - WRITE SAME(10) with the UNMAP bit set - WRITE SAME(10) without the UNMAP bit set. This allows us to support devices that predate the TP/LBP enhancements in SBC3 and which work by way zero-detection Switching between modes has been consolidated in a helper function that also updates the block layer topology according to the limitations of the chosen command. I experimented with trying WRITE SAME(16) if UNMAP fails, WRITE SAME(10) if WRITE SAME(16) fails, etc. but found several devices that got cranky. So for now we'll disable discard if one of the commands fail. The user still has the option of selecting a different mode in sysfs. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2011-03-14 18:37:34 -05:00
Martin K. Petersen	72f7d322fd	[SCSI] Include protection operation in SCSI command trace When debugging DIF/DIX it is very helpful to be able to see which DIX operation is associated with the scsi_cmnd. Include the protection op in the SCSI command trace. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2011-03-14 18:36:02 -05:00
Jens Axboe	4c63f5646e	Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core Conflicts: block/blk-core.c block/blk-flush.c drivers/md/raid1.c drivers/md/raid10.c drivers/md/raid5.c fs/nilfs2/btnode.c fs/nilfs2/mdt.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:58:35 +01:00
Jens Axboe	a488e74976	scsi: convert to blk_delay_queue() It was always abuse to reuse the plugging infrastructure for this, convert it to the (new) real API for delaying queueing a bit. A default delay of 3 msec is defined, to match the previous behaviour. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:45:54 +01:00
Tejun Heo	1654e7411a	block: add @force_kblockd to __blk_run_queue() __blk_run_queue() automatically either calls q->request_fn() directly or schedules kblockd depending on whether the function is recursed. blk-flush implementation needs to be able to explicitly choose kblockd. Add @force_kblockd. All the current users are converted to specify %false for the parameter and this patch doesn't introduce any behavior change. stable: This is prerequisite for fixing ide oops caused by the new blk-flush implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-02 08:48:05 -05:00
Hannes Reinecke	63583cca74	[SCSI] Add detailed SCSI I/O errors Instead of just passing 'EIO' for any I/O error we should be notifying the upper layers with more details about the cause of this error. Update the possible I/O errors to: - ENOLINK: Link failure between host and target - EIO: Retryable I/O error - EREMOTEIO: Non-retryable I/O error - EBADE: I/O error restricted to the I_T_L nexus 'Retryable' in this context means that an I/O error _might_ be restricted to the I_T_L nexus (vulgo: path), so retrying on another nexus / path might succeed. 'Non-retryable' in general refers to a target failure, so this error will always be generated regardless of the I_T_L nexus it was send on. I/O errors restricted to the I_T_L nexus might be retried on another nexus / path, but they should _not_ be queued if no paths are available. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2011-02-12 10:33:08 -06:00
Linus Torvalds	275220f0fc	Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits) block: ensure that completion error gets properly traced blktrace: add missing probe argument to block_bio_complete block cfq: don't use atomic_t for cfq_group block cfq: don't use atomic_t for cfq_queue block: trace event block fix unassigned field block: add internal hd part table references block: fix accounting bug on cross partition merges kref: add kref_test_and_get bio-integrity: mark kintegrityd_wq highpri and CPU intensive block: make kblockd_workqueue smarter Revert "sd: implement sd_check_events()" block: Clean up exit_io_context() source code. Fix compile warnings due to missing removal of a 'ret' variable fs/block: type signature of major_to_index(int) to major_to_index(unsigned) block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p) cfq-iosched: don't check cfqg in choose_service_tree() fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors cdrom: export cdrom_check_events() sd: implement sd_check_events() sr: implement sr_check_events() ...	2011-01-13 10:45:01 -08:00
Linus Torvalds	da40d036fd	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (147 commits) [SCSI] arcmsr: fix write to device check [SCSI] lpfc: lower stack use in lpfc_fc_frame_check [SCSI] eliminate an unnecessary local variable from scsi_remove_target() [SCSI] libiscsi: use bh locking instead of irq with session lock [SCSI] libiscsi: do not take host lock in queuecommand [SCSI] be2iscsi: fix null ptr when accessing task hdr [SCSI] be2iscsi: fix gfp use in alloc_pdu [SCSI] libiscsi: add more informative failure message during iscsi scsi eh [SCSI] gdth: Add missing call to gdth_ioctl_free [SCSI] bfa: remove unused defintions and misc cleanups [SCSI] bfa: remove inactive functions [SCSI] bfa: replace bfa_assert with WARN_ON [SCSI] qla2xxx: Use sg_next to fetch next sg element while walking sg list. [SCSI] qla2xxx: Fix to avoid recursive lock failure during BSG timeout. [SCSI] qla2xxx: Remove code to not reset ISP82xx on failure. [SCSI] qla2xxx: Display mailbox register 4 during 8012 AEN for ISP82XX parts. [SCSI] qla2xxx: Don't perform a BIG_HAMMER if Get-ID (0x20) mailbox command fails on CNAs. [SCSI] qla2xxx: Remove redundant module parameter permission bits [SCSI] qla2xxx: Add sysfs node for displaying board temperature. [SCSI] qla2xxx: Code cleanup to remove unwanted comments and code. ...	2011-01-07 12:47:02 -08:00
Hillf Danton	fd01a6632d	[SCSI] fix the return value of scsi_target_queue_read() It seems that zero should be returned if scsi_target_is_busy(starget) is true, no matter if sdev is on the starved list. Signed-off-by: Hillf Danton <dhillf@gmail.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2010-12-21 12:37:28 -06:00
Linus Torvalds	7f8635cc9e	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: cciss: fix cciss_revalidate panic block: max hardware sectors limit wrapper block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead blk-throttle: Correct the placement of smp_rmb() blk-throttle: Trim/adjust slice_end once a bio has been dispatched block: check for proper length of iov entries earlier in blk_rq_map_user_iov() drbd: fix for spin_lock_irqsave in endio callback drbd: don't recvmsg with zero length	2010-12-20 09:19:46 -08:00
Martin K. Petersen	e692cb668f	block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead When stacking devices, a request_queue is not always available. This forced us to have a no_cluster flag in the queue_limits that could be used as a carrier until the request_queue had been set up for a metadevice. There were several problems with that approach. First of all it was up to the stacking device to remember to set queue flag after stacking had completed. Also, the queue flag and the queue limits had to be kept in sync at all times. We got that wrong, which could lead to us issuing commands that went beyond the max scatterlist limit set by the driver. The proper fix is to avoid having two flags for tracking the same thing. We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the block layer merging functions. The queue_limit 'no_cluster' is turned into 'cluster' to avoid double negatives and to ease stacking. Clustering defaults to being enabled as before. The queue flag logic is removed from the stacking function, and explicitly setting the cluster flag is no longer necessary in DM and MD. Reported-by: Ed Lin <ed.lin@promise.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-12-17 08:35:53 +01:00
Tejun Heo	9f8a2c23c6	scsi: replace sr_test_unit_ready() with scsi_test_unit_ready() The usage of TUR has been confusing involving several different commits updating different parts over time. Currently, the only differences between scsi_test_unit_ready() and sr_test_unit_ready() are, * scsi_test_unit_ready() also sets sdev->changed on NOT_READY. * scsi_test_unit_ready() returns 0 if TUR ended with UNIT_ATTENTION or NOT_READY. Due to the above two differences, sr is using its own sr_test_unit_ready(), but sd - the sole user of the above extra handling - doesn't even need them. Where scsi_test_unit_ready() is used in sd_media_changed(), the code is looking for device ready w/ media present state which is true iff TUR succeeds w/o sense data or UA, and when the device is not ready for whatever reason sd_media_changed() explicitly marks media as missing so there's no reason to set sdev->changed automatically from scsi_test_unit_ready() on NOT_READY. Drop both special handlings from scsi_test_unit_ready(), which makes it equivalant to sr_test_unit_ready(), and replace sr_test_unit_ready() with scsi_test_unit_ready(). Also, drop the unnecessary explicit NOT_READY check from sd_media_changed(). Checking return value is enough for testing device readiness. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-12-16 17:53:39 +01:00
James Bottomley	459dbf72e4	[SCSI] Eliminate error handler overload of the SCSI serial number The error handler is using the test cmd->serial_number == 0 in the abort routines to signal that the command to be aborted has already completed normally. This design was to close a race window in the original error handler where a command could go through the normal completion routines after it timed out but before error handling was started. Mike Anderson pointed out that when we converted our timeout and softirq completions, we picked up atomicity here because the block layer now mediates this with the REQ_ATOM_COMPLETE flag and guarantees that either the command times out or our done routine is called, but ensures we can't get both occurring. That makes the serial number zero check redundant and it can be removed. Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2010-12-09 09:41:16 -06:00
Mike Christie	986fe6c7f5	[SCSI] Fix regressions in scsi_internal_device_block Deleting a SCSI device on a blocked fc_remote_port (before fast_io_fail_tmo fires) results in a hanging thread: STACK: 0 schedule+1108 [0x5cac48] 1 schedule_timeout+528 [0x5cb7fc] 2 wait_for_common+266 [0x5ca6be] 3 blk_execute_rq+160 [0x354054] 4 scsi_execute+324 [0x3b7ef4] 5 scsi_execute_req+162 [0x3b80ca] 6 sd_sync_cache+138 [0x3cf662] 7 sd_shutdown+138 [0x3cf91a] 8 sd_remove+112 [0x3cfe4c] 9 __device_release_driver+124 [0x3a08b8] 10 device_release_driver+60 [0x3a0a5c] 11 bus_remove_device+266 [0x39fa76] 12 device_del+340 [0x39d818] 13 __scsi_remove_device+204 [0x3bcc48] 14 scsi_remove_device+66 [0x3bcc8e] 15 sysfs_schedule_callback_work+50 [0x260d66] 16 worker_thread+622 [0x162326] 17 kthread+160 [0x1680b0] 18 kernel_thread_starter+6 [0x10aaea] During the delete, the SCSI device is in moved to SDEV_CANCEL. When the FC transport class later calls scsi_target_unblock, this has no effect, since scsi_internal_device_unblock ignores SCSI devics in this state. It looks like all these are regressions caused by: `5c10e63c94` [SCSI] limit state transitions in scsi_internal_device_unblock Fix by rejecting offline and cancel in the state transition. Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> [jejb: Original patch by Christof Schmitt, modified by Mike Christie] Cc: Stable Tree <stable@kernel.org> Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2010-10-25 09:48:32 -05:00
Linus Torvalds	e9dd2b6837	Merge branch 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block: (39 commits) cfq-iosched: Fix a gcc 4.5 warning and put some comments block: Turn bvec_k{un,}map_irq() into static inline functions block: fix accounting bug on cross partition merges block: Make the integrity mapped property a bio flag block: Fix double free in blk_integrity_unregister block: Ensure physical block size is unsigned int blkio-throttle: Fix possible multiplication overflow in iops calculations blkio-throttle: limit max iops value to UINT_MAX blkio-throttle: There is no need to convert jiffies to milli seconds blkio-throttle: Fix link failure failure on i386 blkio: Recalculate the throttled bio dispatch time upon throttle limit change blkio: Add root group to td->tg_list blkio: deletion of a cgroup was causes oops blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n block: set the bounce_pfn to the actual DMA limit rather than to max memory block: revert bad fix for memory hotplug causing bounces Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK block: set the bounce_pfn to the actual DMA limit rather than to max memory block: Prevent hang_check firing during long I/O cfq: improve fsync performance for small files ... Fix up trivial conflicts due to __rcu sparse annotation in include/linux/genhd.h	2010-10-22 17:00:32 -07:00
Martin K. Petersen	13f05c8d8e	block/scsi: Provide a limit on the number of integrity segments Some controllers have a hardware limit on the number of protection information scatter-gather list segments they can handle. Introduce a max_integrity_segments limit in the block layer and provide a new scsi_host_template setting that allows HBA drivers to provide a value suitable for the hardware. Add support for honoring the integrity segment limit when merging both bios and requests. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>	2010-09-10 20:50:10 +02:00
James Bottomley	3a5c19c23d	[SCSI] fix use-after-free in scsi_init_io() we're using a pointer through a freed command to reset the request, which has shown up as an oops with slab poisoning: Reported-by: Tejun Heo <tj@kernel.org> Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2010-09-09 09:58:18 -05:00
Bartlomiej Zolnierkiewicz	d6e9fb46cd	scsi: remove superfluous NULL pointer check from scsi_kill_request() Dan's list included: drivers/scsi/scsi_lib.c +1365 scsi_kill_request(9) warning: variable derefenced in initializer 'cmd' drivers/scsi/scsi_lib.c +1365 scsi_kill_request(9) warning: variable derefenced before check 'cmd' We dereference cmd (and possible OOPS if cmd == NULL) before starting the request so just remove the superfluous debugging code altogether. [ bart: the potential NULL pointer dereference was finally fixed in (much later than mine) commit `03b1470` but my patch is still valid ] Reported-by: Dan Carpenter <error27@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Eugene Teo <eteo@redhat.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-11 08:59:00 -07:00
FUJITA Tomonori	610a63498f	scsi: fix discard page leak We leak a page allocated for discard on some error conditions (e.g. scsi_prep_state_check returns BLKPREP_DEFER in scsi_setup_blk_pc_cmnd). We unprep on requests that weren't prepped in the error path of scsi_init_io. It makes the error path to clean up scsi commands messy. Let's strictly apply the rule that we can't unprep on a request that wasn't prepped. Calling just scsi_put_command() in the error path of scsi_init_io() is enough. We don't set REQ_DONTPREP yet. scsi_setup_discard_cmnd can safely free a page on the error case with the above rule. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:24:28 +02:00
James Bottomley	28018c242a	block: implement an unprep function corresponding directly to prep Reviewed-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:23:47 +02:00
Christoph Hellwig	33659ebbae	block: remove wrappers for request type/flags Remove all the trivial wrappers for the cmd_type and cmd_flags fields in struct requests. This allows much easier grepping for different request types instead of unwinding through macros. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:17:56 +02:00
Linus Torvalds	b1bf936840	Merge branch 'for-2.6.34' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.34' of git://git.kernel.dk/linux-2.6-block: (38 commits) block: don't access jiffies when initialising io_context cfq: remove 8 bytes of padding from cfq_rb_root on 64 bit builds block: fix for "Consolidate phys_segment and hw_segment limits" cfq-iosched: quantum check tweak blktrace: perform cleanup after setup error blkdev: fix merge_bvec_fn return value checks cfq-iosched: requests "in flight" vs "in driver" clarification cciss: Fix problem with scatter gather elements in the scsi half of the driver cciss: eliminate unnecessary pointer use in cciss scsi code cciss: do not use void pointer for scsi hba data cciss: factor out scatter gather chain block mapping code cciss: fix scatter gather chain block dma direction kludge cciss: simplify scatter gather code cciss: factor out scatter gather chain block allocation and freeing cciss: detect bad alignment of scsi commands at build time cciss: clarify command list padding calculation cfq-iosched: rethink seeky detection for SSDs cfq-iosched: rework seeky detection block: remove padding from io_context on 64bit builds block: Consolidate phys_segment and hw_segment limits ...	2010-03-01 09:00:29 -08:00
Martin K. Petersen	8a78362c4e	block: Consolidate phys_segment and hw_segment limits Except for SCSI no device drivers distinguish between physical and hardware segment limits. Consolidate the two into a single segment limit. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-02-26 13:58:08 +01:00
Martin K. Petersen	086fa5ff08	block: Rename blk_queue_max_sectors to blk_queue_max_hw_sectors The block layer calling convention is blk_queue_<limit name>. blk_queue_max_sectors predates this practice, leading to some confusion. Rename the function to appropriately reflect that its intended use is to set max_hw_sectors. Also introduce a temporary wrapper for backwards compability. This can be removed after the merge window is closed. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2010-02-26 13:58:08 +01:00
Douglas Gilbert	e7efe5932b	[SCSI] skip sense logging for some ATA PASS-THROUGH cdbs Further to the lsml thread titled: "does scsi_io_completion need to dump sense data for ata pass through (ck_cond = 1) ?" This is a patch to skip logging when the sense data is associated with a SENSE_KEY of "RECOVERED_ERROR" and the additional sense code is "ATA PASS-THROUGH INFORMATION AVAILABLE". This only occurs with the SAT ATA PASS-THROUGH commands when CK_COND=1 (in the cdb). It indicates that the sense data contains ATA registers. Smartmontools uses such commands on ATA disks connected via SAT. Periodic checks such as those done by smartd cause nuisance entries into logs that are: - neither errors nor warnings - pointless unless the cdb that caused them are also logged Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2010-01-18 10:48:16 -06:00
Boaz Harrosh	63c43b0ec1	[SCSI] scsi_lib: Fix bug in completion of bidi commands Because of the terrible structuring of scsi-bidi-commands it breaks some of the life time rules of a scsi-command. It is now not allowed to free up the block-request before cleanup and partial deallocation of the scsi-command. (Which is not so for none bidi commands) The right fix to this problem would be to make bidi command a first citizen by allocating a scsi_sdb pointer at scsi command just like cmd->prot_sdb. The bidi sdb should be allocated/deallocated as part of the get/put_command (Again like the prot_sdb) and the current decoupling of scsi_cmnd and blk-request should be kept. For now make sure scsi_release_buffers() is called before the call to blk_end_request_all() which might cause the suicide of the block requests. At best the leak of bidi buffers, at worse a crash, as there is a race between the existence of the bidi_request and the free of the associated bidi_sdb. The reason this was never hit before is because only OSD has the potential of doing asynchronous bidi commands. (So does bsg but it is never used) And OSD clients just happen to do all their bidi commands synchronously, up until recently. CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2010-01-17 12:16:18 -06:00
Martin K. Petersen	d8705f11d8	[SCSI] Correctly handle thin provisioning write error A thin provisioned device may temporarily be out of sufficient allocation units to fulfill a write request. In that case it will return a space allocation in progress error. Wait a bit and retry the write. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2009-12-10 08:54:15 -06:00
Jiri Slaby	03b147083a	[SCSI] scsi_lib: fix potential NULL dereference Stanse found a potential NULL dereference in scsi_kill_request. Instead of triggering BUG() in 'if (unlikely(cmd == NULL))' branch, the kernel will Oops earlier on cmd dereference. Move the dereferences after the if. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2009-12-04 12:00:51 -06:00
Mike Christie	ad63082626	[SCSI] fix propogation of integrity errors When the Integrity check is done in scsi_io_completion it will set error to -EILSEQ. However, at this point error is no longer used, and blk_end_request_err has -EIO hardcoded. It looks like there was just porting mistake with this patch http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3e695f89c5debb735e4ff051e9e58d8fb4e95110 and we meant to send error upwards, so this patch changes the hard coded EIO to the error variable. I have only boot tested this patch. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2009-10-29 13:03:27 -04:00
Linus Torvalds	355bbd8cb8	Merge branch 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits) block: use blkdev_issue_discard in blk_ioctl_discard Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads block: don't assume device has a request list backing in nr_requests store block: Optimal I/O limit wrapper cfq: choose a new next_req when a request is dispatched Seperate read and write statistics of in_flight requests aoe: end barrier bios with EOPNOTSUPP block: trace bio queueing trial only when it occurs block: enable rq CPU completion affinity by default cfq: fix the log message after dispatched a request block: use printk_once cciss: memory leak in cciss_init_one() splice: update mtime and atime on files block: make blk_iopoll_prep_sched() follow normal 0/1 return convention cfq-iosched: get rid of must_alloc flag block: use interrupts disabled version of raise_softirq_irqoff() block: fix comment in blk-iopoll.c block: adjust default budget for blk-iopoll block: fix long lines in block/blk-iopoll.c block: add blk-iopoll, a NAPI like approach for block devices ...	2009-09-14 17:55:15 -07:00
Tejun Heo	da6c5c720c	scsi,block: update SCSI to handle mixed merge failures Update scsi_io_completion() such that it only fails requests till the next error boundary and retry the leftover. This enables block layer to merge requests with different failfast settings and still behave correctly on errors. Allow merge of requests of different failfast settings. As SCSI is currently the only subsystem which follows failfast status, there's no need to worry about other block drivers for now. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Niel Lambrechts <niel.lambrechts@gmail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-09-11 14:33:30 +02:00
Martin K. Petersen	002b1eb2c0	[SCSI] Print failed commands When a request fails we print the sense data but not the actual command that failed. Add a printout of the operation + CDB for failed commands. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>	2009-08-22 17:51:51 -05:00
Hannes Reinecke	b391277a56	sd, sr: fix Driver 'sd' needs updating message If a SCSI ULD driver sets blk_queue_prep_rq(), it should clean it up itself on remove(), and not from the bus callbacks. This removes the need to hook into bus->remove(), which should not be used at the same time as driver->remove(). [jejb: fix sdkp initialisation problem due to mismerge] Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-06-21 12:01:27 -05:00
James Bottomley	82681a318f	[SCSI] Merge branch 'linus' Conflicts: drivers/message/fusion/mptsas.c fixed up conflict between req->data_len accessors and mptsas driver updates. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-06-12 10:02:03 -05:00
Takahiro Yasui	5c10e63c94	[SCSI] limit state transitions in scsi_internal_device_unblock scsi timeout on two or more devices may cause extremely long execution time for user applications because SDEV_OFFLINE state is changed to SDEV_RUNNING state during scsi error recovery procedures triggered by a bus reset or a host reset of scsi LLD, and scsi timeout can happens on the same devices many times. This happens because scsi_internal_device_unblock() changes device's state to SDEV_RUNNING even if a device in other states than SDEV_BLOCK, while the following two transitions are required in this function. SDEV_BLOCK -> SDEV_RUNNING SDEV_CREATED_BLOCK -> SDEV_CREATED Otherwise, it returns -EINVAL. Signed-off-by: Takahiro Yasui <tyasui@redhat.com> [matthew@wil.cx: supplied rewritten base for patch] Signed-off-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-05-23 15:44:05 -05:00
Jens Axboe	e4b636366c	Merge branch 'master' into for-2.6.31 Conflicts: drivers/block/hd.c drivers/block/mg_disk.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-22 20:25:34 +02:00
Boaz Harrosh	ac36552a52	scsi_lib: remove unused variable The last request completion cleanup in scsi_lib left an unused this_count variable in scsi_io_completion(). (It was used before in a code segment that now uses blk_end_request_all()) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-19 19:54:09 +02:00
Tejun Heo	e458824f9d	scsi: fix resid_len mis-conversion in scsi_end_request() Commit `c3a4d78c58` introduced rq->data_len and converted residual count users to it. While converting, it mistakenly converted scsi_end_request() to finish requests with residual count when it wants to do is fully complete the request. Fix it by using blk_end_request_all() instead. This bug was spotted by Boaz Harrosh. Signed-off-by: Tejun Heo <tj@kernel.org> Spotted-by: Boaz Harrosh <bharrosh@panasas.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-12 08:49:32 +02:00
FUJITA Tomonori	e6bb7a96c2	scsi: simplify the bidi completion Let's use blk_end_request_all() instead of blk_end_bidi_request(). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 11:06:47 +02:00
Tejun Heo	9934c8c045	block: implement and enforce request peek/start/fetch Till now block layer allowed two separate modes of request execution. A request is always acquired from the request queue via elv_next_request(). After that, drivers are free to either dequeue it or process it without dequeueing. Dequeue allows elv_next_request() to return the next request so that multiple requests can be in flight. Executing requests without dequeueing has its merits mostly in allowing drivers for simpler devices which can't do sg to deal with segments only without considering request boundary. However, the benefit this brings is dubious and declining while the cost of the API ambiguity is increasing. Segment based drivers are usually for very old or limited devices and as converting to dequeueing model isn't difficult, it doesn't justify the API overhead it puts on block layer and its more modern users. Previous patches converted all block low level drivers to dequeueing model. This patch completes the API transition by... * renaming elv_next_request() to blk_peek_request() * renaming blkdev_dequeue_request() to blk_start_request() * adding blk_fetch_request() which is combination of peek and start * disallowing completion of queued (not started) requests * applying new API to all LLDs Renamings are for consistency and to break out of tree code so that it's apparent that out of tree drivers need updating. [ Impact: block request issue API cleanup, no functional change ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Mike Miller <mike.miller@hp.com> Cc: unsik Kim <donari75@gmail.com> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Tim Waugh <tim@cyberelk.net> Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Cc: David S. Miller <davem@davemloft.net> Cc: Laurent Vivier <Laurent@lvivier.info> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: Pierre Ossman <drzeus@drzeus.cx> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Markus Lidel <Markus.Lidel@shadowconnect.com> Cc: Stefan Weinhuber <wein@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 09:52:18 +02:00
Tejun Heo	1011c1b9f2	block: blk_rq_[cur_]_{sectors\|bytes}() usage cleanup With the previous changes, the followings are now guaranteed for all requests in any valid state. * blk_rq_sectors() == blk_rq_bytes() >> 9 * blk_rq_cur_sectors() == blk_rq_cur_bytes() >> 9 Clean up accessor usages. Notable changes are * nbd,i2o_block: end_all used instead of explicit byte count * scsi_lib: unnecessary conditional on request type removed [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: Markus Lidel <Markus.Lidel@shadowconnect.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 09:50:55 +02:00
Tejun Heo	b079041030	block: cleanup rq->data_len usages With recent unification of fields, it's now guaranteed that rq->data_len always equals blk_rq_bytes(). Convert all non-IDE direct users to accessors. IDE will be converted in a separate patch. Boaz: spotted incorrect data_len/resid_len conversion in osd. [ Impact: convert direct rq->data_len usages to blk_rq_bytes() ] Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Markus Lidel <Markus.Lidel@shadowconnect.com> Cc: Darrick J. Wong <djwong@us.ibm.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 09:50:55 +02:00
Tejun Heo	83096ebf12	block: convert to pos and nr_sectors accessors With recent cleanups, there is no place where low level driver directly manipulates request fields. This means that the 'hard' request fields always equal the !hard fields. Convert all rq->sectors, nr_sectors and current_nr_sectors references to accessors. While at it, drop superflous blk_rq_pos() < 0 test in swim.c. [ Impact: use pos and nr_sectors accessors ] Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Tested-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Grant Likely <grant.likely@secretlab.ca> Tested-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Acked-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Acked-by: Mike Miller <mike.miller@hp.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Tim Waugh <tim@cyberelk.net> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Dario Ballabio <ballabio_dario@emc.com> Cc: David S. Miller <davem@davemloft.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: unsik Kim <donari75@gmail.com> Cc: Laurent Vivier <Laurent@lvivier.info> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 09:50:54 +02:00
Tejun Heo	5b93629b45	block: implement blk_rq_pos/[cur_]sectors() and convert obvious ones Implement accessors - blk_rq_pos(), blk_rq_sectors() and blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors and rq->hard_cur_sectors respectively and convert direct references of the said fields to the accessors. This is in preparation of request data length handling cleanup. Geert : suggested adding const to struct request * parameter to accessors Sergei : spotted error in patch description [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Grant Likely <grant.likely@secretlab.ca> Ackec-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 09:50:53 +02:00
Tejun Heo	c3a4d78c58	block: add rq->resid_len rq->data_len served two purposes - the length of data buffer on issue and the residual count on completion. This duality creates some headaches. First of all, block layer and low level drivers can't really determine what rq->data_len contains while a request is executing. It could be the total request length or it coulde be anything else one of the lower layers is using to keep track of residual count. This complicates things because blk_rq_bytes() and thus [__]blk_end_request_all() relies on rq->data_len for PC commands. Drivers which want to report residual count should first cache the total request length, update rq->data_len and then complete the request with the cached data length. Secondly, it makes requests default to reporting full residual count, ie. reporting that no data transfer occurred. The residual count is an exception not the norm; however, the driver should clear rq->data_len to zero to signify the normal cases while leaving it alone means no data transfer occurred at all. This reverse default behavior complicates code unnecessarily and renders block PC on some drivers (ide-tape/floppy) unuseable. This patch adds rq->resid_len which is used only for residual count. While at it, remove now unnecessasry blk_rq_bytes() caching in ide_pc_intr() as rq->data_len is not changed anymore. Boaz : spotted missing conversion in osd Sergei : spotted too early conversion to blk_rq_bytes() in ide-tape [ Impact: cleanup residual count handling, report 0 resid by default ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Borislav Petkov <petkovbb@googlemail.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Mike Miller <mike.miller@hp.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Doug Gilbert <dgilbert@interlog.com> Cc: Mike Miller <mike.miller@hp.com> Cc: Eric Moore <Eric.Moore@lsi.com> Cc: Darrick J. Wong <djwong@us.ibm.com> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2009-05-11 09:50:53 +02:00
Tejun Heo	731ec497e5	block: kill rq->data Now that all block request data transfer is done via bio, rq->data isn't used. Kill it. While at it, make the roles of rq->special and buffer clear. [ Impact: drop now unncessary field from struct request ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Boaz Harrosh <bharrosh@panasas.com>	2009-04-28 07:37:36 +02:00
Tejun Heo	40cbbb781d	block: implement and use [__]blk_end_request_all() There are many [__]blk_end_request() call sites which call it with full request length and expect full completion. Many of them ensure that the request actually completes by doing BUG_ON() the return value, which is awkward and error-prone. This patch adds [__]blk_end_request_all() which takes @rq and @error and fully completes the request. BUG_ON() is added to to ensure that this actually happens. Most conversions are simple but there are a few noteworthy ones. * cdrom/viocd: viocd_end_request() replaced with direct calls to __blk_end_request_all(). * s390/block/dasd: dasd_end_request() replaced with direct calls to __blk_end_request_all(). * s390/char/tape_block: tapeblock_end_request() replaced with direct calls to blk_end_request_all(). [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Mike Miller <mike.miller@hp.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Alex Dubov <oakad@yahoo.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-04-28 07:37:35 +02:00
Mike Christie	b4efdd586b	[SCSI] fix q->lock not held warning when target is busy We cannot call blk_plug_device from scsi_target_queue_ready because the q lock is not held. And we do not need to call it from there because when we return 0, the scsi_request_fn not_ready handling will plug the queue for us if needed. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-04-27 09:48:10 -05:00
James Bottomley	a9bddd7463	[SCSI] fix recovered error handling We have a problem with recovered error handling in that any command which goes down as BLOCK_PC but which returns a sense code of RECOVERED ERROR gets completed with -EIO. For actual SG_IO commands, this doesn't matter at all, since the error return code gets dropped in favour of req->errors which contain the SCSI completion code. However, if this command is part of the block system, then it will pay attention to the returned error code. In particularly if a SYNCHRONIZE CACHE from a barrier command completes with RECOVERED ERROR, the resulting -EIO on the barrier causes block to error the request and return it to the filesystem. Fix this by converting the -EIO for recovered error to zero, plus remove the printing of this from sd and sr so the message isn't double printed. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-04-03 09:22:55 -05:00
FUJITA Tomonori	f078727b25	[SCSI] remove scsi_req_map_sg No one uses scsi_execute_async with data transfer now. We can remove scsi_req_map_sg. Only scsi_eh_lock_door uses scsi_execute_async. scsi_eh_lock_door doesn't handle sense and the callback. So we can remove scsi_io_context too. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-12 12:58:10 -05:00
James Bottomley	126c098296	[SCSI] fix ABORTED_COMMAND looping forever problem Instead of terminating after five retries, commands terminated by ABORTED_COMMAND sense are retrying forever. The problem was introduced by: commit `b60af5b0ad` Author: Alan Stern <stern@rowland.harvard.edu> Date: Mon Nov 3 15:56:47 2008 -0500 [SCSI] simplify scsi_io_completion() Which introduced an error whereby ABORTED_COMMAND now gets erroneously retried in scsi_io_completion. Fix this by returning the behaviour back to the default no retry. Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com> Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-02-21 20:29:38 -06:00
James Bottomley	79ed242972	[SCSI] scsi_lib: fix DID_RESET status problems Andrew Vaszquez said: > There's a problem that is causing commands returned by the LLD with > a DID_RESET status to be reissued with cleared cmd->sdb data which > in our tests are manifesting in firmware detected overruns. Here's > a snippet of a READ_10 scsi_cmnd upon completion by the storage The problem is caused by: commit `b60af5b0ad` Author: Alan Stern <stern@rowland.harvard.edu> Date: Mon Nov 3 15:56:47 2008 -0500 [SCSI] simplify scsi_io_completion() Because scsi_release_buffers() is called before commands that go through the ACTION_RETRY and ACTION_DELAYED_RETRY legs are requeued. However, they're not re-prepared, so nothing ever reallocates the buffer resources to them. Fix this by releasing the buffers only if we're not going to go down these legs (but scsi_release_buffers() on all legs including two in scsi_end_request(); this latter needs a special version __scsi_release_buffers() because the final one can be called after the request has been freed, so the bidi test in scsi_release_buffers(), which touches the request has to be skipped). Reported-by: Andrew Vasquez <andrew.vasquez@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-01-07 15:15:44 -06:00
Martin K. Petersen	3e695f89c5	[SCSI] Fix error handling for DIF/DIX patch commit `b60af5b0ad` Author: Alan Stern <stern@rowland.harvard.edu> Date: Mon Nov 3 15:56:47 2008 -0500 [SCSI] simplify scsi_io_completion() broke DIX error handling. Also, we are now using EILSEQ to indicate integrity errors to the upper layers (as opposed to regular EIO failures). This allows filesystems to inspect buffers and decide whether to retry the I/O. Update scsi_io_completion() accordingly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-01-05 09:02:28 -06:00
James Bottomley	4f5299ac4e	[SCSI] scsi_lib: don't decrement busy counters when inserting commands A bug was introduced by commit `b60af5b0ad` Author: Alan Stern <stern@rowland.harvard.edu> Date: Mon Nov 3 15:56:47 2008 -0500 [SCSI] simplify scsi_io_completion() because the simplification uses scsi_queue_insert(). The problem with this function is that it expects to be called from the completion path while the command is still outstanding, so it decrements the device and host busy counts to do the requeue. The problem is that scsi_io_completion() is a path executed well after these counts have already been decremented, leading to a double decrement if the command goes down any error path leading to ACTION_DELAYED_RETRY. The fix is to allow a private function __scsi_queue_insert() with a flag to say whether the busy counters should be decremented. This is made static to scsi_lib.c to discourage other use. Reported-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-01-05 08:54:11 -06:00
Alan Stern	3dbf6a5404	[SCSI] Fix uninitialized variable error in scsi_io_completion This patch (as1191) adds a missing "default" case in scsi_io_completion(), thereby fixing an "uninitialized variable" error. It also adds a missing newline to a log entry. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-01-02 10:57:41 -06:00
FUJITA Tomonori	f4f4e47e4a	[SCSI] add residual argument to scsi_execute and scsi_execute_req scsi_execute() and scsi_execute_req() discard the residual length information. Some callers need it. This adds residual argument (optional) to scsi_execute and scsi_execute_req. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2008-12-29 11:24:24 -06:00
Alan Stern	b60af5b0ad	[SCSI] simplify scsi_io_completion() This patch (as1142b) consolidates a lot of repetitious code in scsi_io_completion(). It also fixes a few comments. Most importantly, however, it clearly distinguishes among the three sorts of retries that can be done when a command fails to complete: Unprepare the request and resubmit it, so that a new command will be created for it. Requeue the request directly so that it will be retried immediately using the same command. Requeue the request so that it will be retried following a short delay. Complete the remainder of the request with an I/O error. [jejb: Updates 1. For several error conditions, we would now print the sense twice in slightly different ways, so unify the location of sense printing. 2. I added more descriptions to actual failure conditions for better debugging 3. according to spec, ABORTED_COMMAND is supposed to be retried (except on DIF failure). Our old behaviour of erroring it looks to be a bug. 4. I'd prefer not to default initialise the action variable because that ensures that every leg of the error handler has an associated action and the compiler will warn if someone later accidentally misses one or removes one. ] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2008-12-29 11:24:18 -06:00
James Bottomley	02bd3499a3	[SCSI] scsi_lib: only call scsi_unprep_request() under queue lock It's called under that lock everywhere else and it does alter the request state, so it should be. This one occurance in scsi_requeue_command() could open a window where req->special is set to NULL while the requests is going through either timeout or completion processing leading to NULL pointer derefs of the sort complained of in bugzillas 12020 and 12195. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2008-12-13 14:31:03 -06:00
Mike Christie	2a3a59e5c9	[SCSI] Fix hang in starved list processing Close possible infinite loop with interrupts off when devices are added back to the starved list. Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=11898 Reported-by: <alex.shi@intel.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2008-11-16 08:13:58 -06:00
Kiyoshi Ueda	6c5121b78b	[SCSI] export busy state via q->lld_busy_fn() This patch implements q->lld_busy_fn() for scsi mid layer to export its busy state for request stacking drivers. For efficiency, no lock is taken to check the busy state of shost/starget/sdev, since the returned value is not guaranteed and may be changed after request stacking drivers call the function, regardless of taking lock or not. When scsi can't dispatch I/Os anymore and needs to kill I/Os (e.g. !sdev), scsi needs to return 'not busy'. Otherwise, request stacking drivers may hold requests forever. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2008-10-23 11:42:16 -05:00
Kiyoshi Ueda	9d11251709	[SCSI] refactor sdev/starget/shost busy checking This patch refactors the busy checking codes of scsi_device, Scsi_Host and scsi_target. There should be no functional change. This is a preparation for another patch which exports scsi's busy state to the block layer for request stacking drivers. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2008-10-23 11:42:16 -05:00

1 2 3 4 5 ...

424 Commits