Commit Graph

949147 Commits

Author SHA1 Message Date
Bean Huo b000862579 scsi: ufs: Remove several redundant goto statements
Link: https://lore.kernel.org/r/20200814095034.20709-3-huobean@gmail.com
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:52 -04:00
Bean Huo f273c54bb7 scsi: ufs: Change ufshcd_comp_devman_upiu() to ufshcd_compose_devman_upiu()
ufshcd_comp_devman_upiu() was poorly named leading people to think it was a
completion function. Rename it to ufshcd_compose_devman_upiu().

Link: https://lore.kernel.org/r/20200814095034.20709-2-huobean@gmail.com
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:52 -04:00
Saurav Kashyap 3079285bd7 scsi: qedf: Fix race between ELS completion and flushing ELS request
Fix race between ELS completion and flushing ELS request.

Link: https://lore.kernel.org/r/20200807110656.19965-8-jhasan@marvell.com
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:52 -04:00
Saurav Kashyap 22ddec31b0 scsi: qedf: Don't process ELS completion if event is flushed or cleaned up
Don't process ELS completion if event is flushed or cleaned up.

Link: https://lore.kernel.org/r/20200807110656.19965-7-jhasan@marvell.com
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:52 -04:00
Saurav Kashyap 1f6d1d4ca2 scsi: qedf: Initiate cleanup for ELS commands as well
Initiate cleanup for ELS commands as well.

Link: https://lore.kernel.org/r/20200807110656.19965-6-jhasan@marvell.com
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:52 -04:00
Saurav Kashyap 39d0357dd5 scsi: qedf: Send cleanup even for RRQ on timeout
Send cleanup even for RRQ on timeout.

Link: https://lore.kernel.org/r/20200807110656.19965-5-jhasan@marvell.com
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:51 -04:00
Saurav Kashyap b09ea43fec scsi: qedf: Do not kill timeout work for original I/O on RRQ completion
The timer is already cancelled when abort is completed, hence no need to
cancel it again.

Link: https://lore.kernel.org/r/20200807110656.19965-4-jhasan@marvell.com
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:51 -04:00
Saurav Kashyap 7fb8ff0806 scsi: qedf: Check the validity of rjt frame before processing
This is reported by Klockwork.

Link: https://lore.kernel.org/r/20200807110656.19965-3-jhasan@marvell.com
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:51 -04:00
Saurav Kashyap a521bbc38d scsi: qedf: Check for port type and role before processing an event
The rport lock gets initialized during offload. If a non-FCP or non-target
rport got logout then this rport will be uninitialized. KASAN was
complaining because of it.

=========
[   14.384434] the code is fine but needs lockdep annotation.
[   14.384482] turning off the locking correctness validator.
========

Link: https://lore.kernel.org/r/20200807110656.19965-2-jhasan@marvell.com
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:51 -04:00
Sai Prakash Ranjan 68bdb3db6c scsi: ufs-qcom: Remove unused MSM bus scaling APIs
MSM bus scaling has moved on to use interconnect framework and downstream
bus scaling APIs like msm_bus_scale*() do not exist anymore in the
kernel. Currently they are guarded by a config which also does not exist
and hence there are no build failures reported. Remove these unused
interfaces as they are currently no-ops and the scaling support that may be
added in future will use interconnect API.

Link: https://lore.kernel.org/r/20200804161033.15586-1-saiprakash.ranjan@codeaurora.org
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:51 -04:00
Don Brace ce60a2b827 scsi: smartpqi: Bump version to 1.2.16-010
Link: https://lore.kernel.org/r/159622931040.30579.9167901134341507088.stgit@brunhilda
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Gerry Morong <gerry.morong@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:51 -04:00
Kevin Barnett 8b664fefa3 scsi: smartpqi: Add RAID bypass counter
Add a counter to assist in verifying when RAID bypass is being used.

Link: https://lore.kernel.org/r/159622930468.30579.13153724465552773544.stgit@brunhilda
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:51 -04:00
Kevin Barnett 4d15ad3813 scsi: smartpqi: Support device deletion via sysfs
Support device deletion via sysfs.

I.e: echo 1 > /sys/block/sd<X>/device/delete

Link: https://lore.kernel.org/r/159622929885.30579.2727491506675011534.stgit@brunhilda
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:51 -04:00
Kevin Barnett 9e68cccc8e scsi: smartpqi: Avoid crashing kernel for controller issues
Eliminate kernel panics when getting invalid responses from controller.
Take controller offline instead of causing kernel panics.

Link: https://lore.kernel.org/r/159622929306.30579.16523318707596752828.stgit@brunhilda
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Prasad Munirathnam <Prasad.Munirathnam@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:51 -04:00
Mahesh Rajashekhara 244ca45e15 scsi: smartpqi: Update logical volume size after expansion
Have OS rescan after logical volume expansion to reflect new size.

Link: https://lore.kernel.org/r/159622928727.30579.298277463169866711.stgit@brunhilda
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:51 -04:00
Mahesh Rajashekhara 3af06083ba scsi: smartpqi: Add id support for SmartRAID 3152-8i
VID_9005, DID_028F, SVID_9005 and SDID_080A.

Link: https://lore.kernel.org/r/159622928143.30579.14769183842894725454.stgit@brunhilda
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:51 -04:00
Kevin Barnett ce14379350 scsi: smartpqi: Identify physical devices without issuing INQUIRY
Eliminate issuing INQUIRYs to problematic devices by using information
provided by controller.

Link: https://lore.kernel.org/r/159622927172.30579.3960527536810532094.stgit@brunhilda
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Reviewed-by: Scott Benesh <scott.benesh@microsemi.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:50 -04:00
Suganath Prabu S 0491bdc7ee scsi: mpt3sas: Update driver version to 35.100.00.00
Updated driver version to 35.100.00.00

Link: https://lore.kernel.org/r/1596096229-3341-8-git-send-email-suganath-prabu.subramani@broadcom.com
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:50 -04:00
Suganath Prabu S 711a923c14 scsi: mpt3sas: Postprocessing of target and LUN reset
If driver has not received the interrupt for the aborted SCSI command
before processing the TM reply, driver polls all the reply descriptor pools
looking for the reply for the aborted SCSI command before marking TM as
FAILED. If it finds the reply, then it marks the TM as SUCCESS otherwise it
marks it FAILED.

scsih_tm_cmd_map_status() checks whether TM has aborted the timed out SCSI
command or not. If TM has aborted the IO, then it returns SUCCESS else it
returns FAILED.

Link: https://lore.kernel.org/r/1596096229-3341-7-git-send-email-suganath-prabu.subramani@broadcom.com
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:50 -04:00
Suganath Prabu S 521e9c0b62 scsi: mpt3sas: Add functions to check if any cmd is outstanding on Target and LUN
Add helper functions to check whether any SCSI command is outstanding on
particular Target, LUN device.

Also add function parameters 'channel', 'id' to function
mpt3sas_scsih_issue_tm().

Link: https://lore.kernel.org/r/1596096229-3341-6-git-send-email-suganath-prabu.subramani@broadcom.com
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:50 -04:00
Suganath Prabu S 5afa9d4444 scsi: mpt3sas: Rename and export interrupt mask/unmask functions
Rename Function _base_unmask_interrupts() to
mpt3sas_base_unmask_interrupts() and _base_mask_interrupts() to
mpt3sas_base_mask_interrupts(). Also add function declarion to
mpt3sas_base.h

Link: https://lore.kernel.org/r/1596096229-3341-5-git-send-email-suganath-prabu.subramani@broadcom.com
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:50 -04:00
Suganath Prabu S 9e73ed2e4c scsi: mpt3sas: Cancel the running work during host reset
It is not recommended to issue back-to-back host reset without any delay.
However, if someone issues back-to-back host reset then we observe that
target devices get unregistered and re-register with SML.  And if OS drive
is behind the HBA when it gets unregistered, then file-system goes into
read-only mode.

Normally during host reset, driver marks accessible target devices as
responding and triggers the event MPT3SAS_REMOVE_UNRESPONDING_DEVICES to
remove any non-responding devices through FW worker thread. While
processing this event, driver unregisters the non-responding devices and
clears the responding flag for all the devices.

Currently, during host reset, driver is cancelling only those Firmware
event works which are pending in Firmware event workqueue. It is not
cancelling work which is currently running. Change the driver to cancel all
events.

Link: https://lore.kernel.org/r/1596096229-3341-4-git-send-email-suganath-prabu.subramani@broadcom.com
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:50 -04:00
Suganath Prabu S af6ec1eee5 scsi: mpt3sas: Dump system registers for debugging
When controller fails to transition to READY state during driver probe,
dump the system interface register set. This will give snapshot of the
firmware status for debugging driver load issues.

Link: https://lore.kernel.org/r/1596096229-3341-3-git-send-email-suganath-prabu.subramani@broadcom.com
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:50 -04:00
Suganath Prabu S f09219e48b scsi: mpt3sas: Memset config_cmds.reply buffer with zeros
Currently config_cmds.reply buffer is not memset to zero before posting
config page request message.  In some cases, for the current config
request, the previous config reply is getting processed and we will observe
PageType mismatch between request to reply buffer. It will be difficult to
debug this type of issue and it confuses by thinking that HBA Firmware
itself posted the wrong config reply.  So it is better to memset the
config_cmds.reply buffer with zeros before issuing the config request.

Link: https://lore.kernel.org/r/1596096229-3341-2-git-send-email-suganath-prabu.subramani@broadcom.com
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:50 -04:00
Can Guo 8bb2dde069 scsi: ufs: Properly release resources if a task is aborted successfully
In current UFS task abort hook, namely ufshcd_abort(), if one task is
aborted successfully, clk_gating.active_reqs held by this task is not
decreased, which makes clk_gating.active_reqs stay above zero forever, thus
clock gating would never happen. Instead of releasing resources of one task
"manually", use the existing func __ufshcd_transfer_req_compl().  This
change also eliminates a possible race of scsi_dma_unmap() from the real
completion in IRQ handler path.

Link: https://lore.kernel.org/r/1596975355-39813-10-git-send-email-cang@codeaurora.org
Fixes: 1ab27c9cf8 ("ufs: Add support for clock gating")
CC: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-20 21:41:25 -04:00
Quinn Tran dca93232b3 Revert "scsi: qla2xxx: Disable T10-DIF feature with FC-NVMe during probe"
FCP T10-PI and NVMe features are independent of each other. This patch
allows both features to co-exist.

This reverts commit 5da05a26b8.

Link: https://lore.kernel.org/r/20200806111014.28434-12-njavali@marvell.com
Fixes: 5da05a26b8 ("scsi: qla2xxx: Disable T10-DIF feature with FC-NVMe during probe")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:43:55 -04:00
Saurav Kashyap de7e619430 Revert "scsi: qla2xxx: Fix crash on qla2x00_mailbox_command"
FCoE adapter initialization failed for ISP8021 with the following patch
applied. In addition, reproduction of the issue the patch originally tried
to address has been unsuccessful.

This reverts commit 3cb182b3fa.

Link: https://lore.kernel.org/r/20200806111014.28434-11-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:43:55 -04:00
Quinn Tran 83949613fa scsi: qla2xxx: Fix null pointer access during disconnect from subsystem
NVMEAsync command is being submitted to QLA while the same NVMe controller
is in the middle of reset. The reset path has deleted the association and
freed aen_op->fcp_req.private. Add a check for this private pointer before
issuing the command.

...
 6 [ffffb656ca11fce0] page_fault at ffffffff8c00114e
    [exception RIP: qla_nvme_post_cmd+394]
    RIP: ffffffffc0d012ba  RSP: ffffb656ca11fd98  RFLAGS: 00010206
    RAX: ffff8fb039eda228  RBX: ffff8fb039eda200  RCX: 00000000000da161
    RDX: ffffffffc0d4d0f0  RSI: ffffffffc0d26c9b  RDI: ffff8fb039eda220
    RBP: 0000000000000013   R8: ffff8fb47ff6aa80   R9: 0000000000000002
    R10: 0000000000000000  R11: ffffb656ca11fdc8  R12: ffff8fb27d04a3b0
    R13: ffff8fc46dd98a58  R14: 0000000000000000  R15: ffff8fc4540f0000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 7 [ffffb656ca11fe08] nvme_fc_start_fcp_op at ffffffffc0241568 [nvme_fc]
 8 [ffffb656ca11fe50] nvme_fc_submit_async_event at ffffffffc0241901 [nvme_fc]
 9 [ffffb656ca11fe68] nvme_async_event_work at ffffffffc014543d [nvme_core]
10 [ffffb656ca11fe98] process_one_work at ffffffff8b6cd437
11 [ffffb656ca11fed8] worker_thread at ffffffff8b6cdcef
12 [ffffb656ca11ff10] kthread at ffffffff8b6d3402
13 [ffffb656ca11ff50] ret_from_fork at ffffffff8c000255

--
PID: 37824  TASK: ffff8fb033063d80  CPU: 20  COMMAND: "kworker/u97:451"
 0 [ffffb656ce1abc28] __schedule at ffffffff8be629e3
 1 [ffffb656ce1abcc8] schedule at ffffffff8be62fe8
 2 [ffffb656ce1abcd0] schedule_timeout at ffffffff8be671ed
 3 [ffffb656ce1abd70] wait_for_completion at ffffffff8be639cf
 4 [ffffb656ce1abdd0] flush_work at ffffffff8b6ce2d5
 5 [ffffb656ce1abe70] nvme_stop_ctrl at ffffffffc0144900 [nvme_core]
 6 [ffffb656ce1abe80] nvme_fc_reset_ctrl_work at ffffffffc0243445 [nvme_fc]
 7 [ffffb656ce1abe98] process_one_work at ffffffff8b6cd437
 8 [ffffb656ce1abed8] worker_thread at ffffffff8b6cdb50
 9 [ffffb656ce1abf10] kthread at ffffffff8b6d3402
10 [ffffb656ce1abf50] ret_from_fork at ffffffff8c000255

Link: https://lore.kernel.org/r/20200806111014.28434-10-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:40:15 -04:00
Saurav Kashyap dffa114533 scsi: qla2xxx: Check if FW supports MQ before enabling
OS boot during Boot from SAN was stuck at dracut emergency shell after
enabling NVMe driver parameter. For non-MQ support the driver was enabling
MQ. Add a check to confirm if FW supports MQ.

Link: https://lore.kernel.org/r/20200806111014.28434-9-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:40:15 -04:00
Arun Easi 897d68eb81 scsi: qla2xxx: Fix WARN_ON in qla_nvme_register_hba
qla_nvme_register_hba() puts out a warning when there are not enough queue
pairs available for FC-NVME.  Just fail the NVME registration rather than a
WARNING + call Trace.

Link: https://lore.kernel.org/r/20200806111014.28434-8-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:40:14 -04:00
Arun Easi 49030003a3 scsi: qla2xxx: Allow ql2xextended_error_logging special value 1 to be set anytime
ql2xextended_error_logging can now be set to 1 to get the default mask
value, as opposed to at module load time only.

Link: https://lore.kernel.org/r/20200806111014.28434-7-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:40:13 -04:00
Quinn Tran 81b9d1e19d scsi: qla2xxx: Reduce noisy debug message
Update debug level and message for ELS IOCB done.

Link: https://lore.kernel.org/r/20200806111014.28434-6-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:40:13 -04:00
Quinn Tran abb31aeaa9 scsi: qla2xxx: Fix login timeout
Multipath errors were seen during failback due to login timeout.  The
remote device sent LOGO, the local host tore down the session and did
relogin. The RSCN arrived indicates remote device is going through failover
after which the relogin is in a 20s timeout phase.  At this point the
driver is stuck in the relogin process.  Add a fix to delete the session as
part of abort/flush the login.

Link: https://lore.kernel.org/r/20200806111014.28434-5-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:40:12 -04:00
Quinn Tran 4709272f63 scsi: qla2xxx: Indicate correct supported speeds for Mezz card
Correct the supported speeds for 16G Mezz card.

Link: https://lore.kernel.org/r/20200806111014.28434-4-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:40:11 -04:00
Quinn Tran a117579d02 scsi: qla2xxx: Flush I/O on zone disable
Perform implicit logout to flush I/O on zone disable.

Link: https://lore.kernel.org/r/20200806111014.28434-3-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:40:11 -04:00
Quinn Tran 10ae30ba66 scsi: qla2xxx: Flush all sessions on zone disable
On Zone Disable, certain switches would ignore all commands. This causes
timeout for both switch scan command and abort of that command. On
detection of this condition, all sessions will be shutdown.

Link: https://lore.kernel.org/r/20200806111014.28434-2-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:40:10 -04:00
Enzo Matsumiya c314a014b1 scsi: qla2xxx: Use MBX_TOV_SECONDS for mailbox command timeout values
Improves readability of qla_mbx.c.

Link: https://lore.kernel.org/r/20200805200546.22497-1-ematsumiya@suse.de
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:19:06 -04:00
Douglas Gilbert 223f91b480 scsi: scsi_debug: Fix scp is NULL errors
John Garry reported 'sdebug_q_cmd_complete: scp is NULL' failures that were
mainly seen on aarch64 machines (e.g. RPi 4 with four A72 CPUs). The
problem was tracked down to a missing critical section on a "short circuit"
path. Namely, the time to process the current command so far has already
exceeded the requested command duration (i.e. the number of nanoseconds in
the ndelay parameter).

The random=1 parameter setting was pivotal in finding this error.  The
failure scenario involved first taking that "short circuit" path (due to a
very short command duration) and then taking the more likely
hrtimer_start() path (due to a longer command duration). With random=1 each
command's duration is taken from the uniformly distributed [0..ndelay)
interval.  The fio utility also helped by reliably generating the error
scenario at about once per minute on a RPi 4 (64 bit OS).

Link: https://lore.kernel.org/r/20200813155738.109298-1-dgilbert@interlog.com
Reported-by: John Garry <john.garry@huawei.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:13:17 -04:00
Steffen Maier 2d9a2c5f58 scsi: zfcp: Fix use-after-free in request timeout handlers
Before v4.15 commit 75492a5156 ("s390/scsi: Convert timers to use
timer_setup()"), we intentionally only passed zfcp_adapter as context
argument to zfcp_fsf_request_timeout_handler(). Since we only trigger
adapter recovery, it was unnecessary to sync against races between timeout
and (late) completion.  Likewise, we only passed zfcp_erp_action as context
argument to zfcp_erp_timeout_handler(). Since we only wakeup an ERP action,
it was unnecessary to sync against races between timeout and (late)
completion.

Meanwhile the timeout handlers get timer_list as context argument and do a
timer-specific container-of to zfcp_fsf_req which can have been freed.

Fix it by making sure that any request timeout handlers, that might just
have started before del_timer(), are completed by using del_timer_sync()
instead. This ensures the request free happens afterwards.

Space time diagram of potential use-after-free:

Basic idea is to have 2 or more pending requests whose timeouts run out at
almost the same time.

req 1 timeout     ERP thread        req 2 timeout
----------------  ----------------  ---------------------------------------
zfcp_fsf_request_timeout_handler
fsf_req = from_timer(fsf_req, t, timer)
adapter = fsf_req->adapter
zfcp_qdio_siosl(adapter)
zfcp_erp_adapter_reopen(adapter,...)
                  zfcp_erp_strategy
                  ...
                  zfcp_fsf_req_dismiss_all
                  list_for_each_entry_safe
                    zfcp_fsf_req_complete 1
                    del_timer 1
                    zfcp_fsf_req_free 1
                    zfcp_fsf_req_complete 2
                                    zfcp_fsf_request_timeout_handler
                    del_timer 2
                                    fsf_req = from_timer(fsf_req, t, timer)
                    zfcp_fsf_req_free 2
                                    adapter = fsf_req->adapter
                                              ^^^^^^^ already freed

Link: https://lore.kernel.org/r/20200813152856.50088-1-maier@linux.ibm.com
Fixes: 75492a5156 ("s390/scsi: Convert timers to use timer_setup()")
Cc: <stable@vger.kernel.org> #4.15+
Suggested-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:12:09 -04:00
Bean Huo d87a1f6d02 scsi: ufs: No need to send Abort Task if the task in DB was cleared
If the bit corresponding to a task in the Doorbell register has been
cleared, no need to poll the status of the task on the device side and to
send an Abort Task TM. Instead, let it directly goto cleanup.

In addition, to keep original debug output, move the goto below the debug
print.

Link: https://lore.kernel.org/r/20200811141859.27399-3-huobean@gmail.com
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:09:14 -04:00
Stanley Chu b10178ee7f scsi: ufs: Clean up completed request without interrupt notification
If somehow no interrupt notification is raised for a completed request and
its doorbell bit is cleared by host, UFS driver needs to cleanup its
outstanding bit in ufshcd_abort(). Otherwise, system may behave abnormally
in the following scenario:

After ufshcd_abort() returns, this request will be requeued by SCSI layer
with its outstanding bit set. Any future completed request will trigger
ufshcd_transfer_req_compl() to handle all "completed outstanding bits". At
this time the "abnormal outstanding bit" will be detected and the "requeued
request" will be chosen to execute request post-processing flow. This is
wrong because this request is still "alive".

Link: https://lore.kernel.org/r/20200811141859.27399-2-huobean@gmail.com
Reviewed-by: Can Guo <cang@codeaurora.org>
Acked-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:09:13 -04:00
Adrian Hunter 127d5f7c4b scsi: ufs: Improve interrupt handling for shared interrupts
For shared interrupts, the interrupt status might be zero, so check that
first.

Link: https://lore.kernel.org/r/20200811133936.19171-2-adrian.hunter@intel.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:03:29 -04:00
Adrian Hunter 6337f58cec scsi: ufs: Fix interrupt error message for shared interrupts
The interrupt might be shared, in which case it is not an error for the
interrupt handler to be called when the interrupt status is zero, so don't
print the message unless there was enabled interrupt status.

Link: https://lore.kernel.org/r/20200811133936.19171-1-adrian.hunter@intel.com
Fixes: 9333d77573 ("scsi: ufs: Fix irq return code")
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:03:29 -04:00
Adrian Hunter 8da76f71fe scsi: ufs-pci: Add quirk for broken auto-hibernate for Intel EHL
Intel EHL UFS host controller advertises auto-hibernate capability but it
does not work correctly. Add a quirk for that.

[mkp: checkpatch fix]

Link: https://lore.kernel.org/r/20200810141024.28859-1-adrian.hunter@intel.com
Fixes: 8c09d75276 ("scsi: ufshdc-pci: Add Intel PCI IDs for EHL")
Acked-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 22:00:26 -04:00
Stanley Chu 215d326702 scsi: ufs-mediatek: Fix incorrect time to wait link status
Fix incorrect calculation of "ms" based waiting time in function
ufs_mtk_setup_clocks().

Link: https://lore.kernel.org/r/20200809055702.20140-1-stanley.chu@mediatek.com
Fixes: 9006e3986f ("scsi: ufs-mediatek: Do not gate clocks if auto-hibern8 is not entered yet")
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 21:53:37 -04:00
Stanley Chu 93b6c5db06 scsi: ufs: Fix possible infinite loop in ufshcd_hold
In ufshcd_suspend(), after clk-gating is suspended and link is set
as Hibern8 state, ufshcd_hold() is still possibly invoked before
ufshcd_suspend() returns. For example, MediaTek's suspend vops may
issue UIC commands which would call ufshcd_hold() during the command
issuing flow.

Now if UFSHCD_CAP_HIBERN8_WITH_CLK_GATING capability is enabled,
then ufshcd_hold() may enter infinite loops because there is no
clk-ungating work scheduled or pending. In this case, ufshcd_hold()
shall just bypass, and keep the link as Hibern8 state.

Link: https://lore.kernel.org/r/20200809050734.18740-1-stanley.chu@mediatek.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Co-developed-by: Andy Teng <andy.teng@mediatek.com>
Signed-off-by: Andy Teng <andy.teng@mediatek.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 21:52:45 -04:00
Mike Christie fa39ab5184 scsi: fcoe: Fix I/O path allocation
ixgbe_fcoe_ddp_setup() can be called from the main I/O path and is called
with a spin_lock held, so we have to use GFP_ATOMIC allocation instead of
GFP_KERNEL.

Link: https://lore.kernel.org/r/1596831813-9839-1-git-send-email-michael.christie@oracle.com
cc: Hannes Reinecke <hare@suse.de>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 21:52:39 -04:00
Jing Xiangfeng 2138d1c918 scsi: ufs: ti-j721e-ufs: Fix error return in ti_j721e_ufs_probe()
Fix to return error code PTR_ERR() from the error handling case instead of
0.

Link: https://lore.kernel.org/r/20200806070135.67797-1-jingxiangfeng@huawei.com
Fixes: 22617e2163 ("scsi: ufs: ti-j721e-ufs: Fix unwinding of pm_runtime changes")
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 21:48:45 -04:00
Can Guo 5586dd8ea2 scsi: ufs: Fix a race condition between error handler and runtime PM ops
The current IRQ handler blocks SCSI requests before scheduling eh_work,
when error handler calls pm_runtime_get_sync, if ufshcd_suspend/resume
sends a SCSI cmd, most likely the SSU cmd, since SCSI requests are blocked,
pm_runtime_get_sync() will never return because ufshcd_suspend/resume is
blocked by the SCSI cmd.

 - In queuecommand path, hba->ufshcd_state check and ufshcd_send_command
   should stay under the same spin lock. This is to make sure that no more
   commands leak into doorbell after hba->ufshcd_state is changed.

 - Don't block SCSI requests before error handler starts to run, let error
   handler block SCSI requests when it is ready to start error recovery.

 - Don't let SCSI layer keep requeuing the SCSI cmds sent from HBA runtime
   PM ops, let them pass or fail them. Let them pass if eh_work is
   scheduled due to non-fatal errors. Fail them if eh_work is scheduled due
   to fatal errors, otherwise the cmds may eventually time out since UFS is
   in bad state, which gets error handler blocked for too long. If we fail
   the SCSI cmds sent from HBA runtime PM ops, HBA runtime PM ops fails
   too, but it does not hurt since error handler can recover HBA runtime PM
   error.

Link: https://lore.kernel.org/r/1596975355-39813-9-git-send-email-cang@codeaurora.org
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 20:54:56 -04:00
Can Guo c3be8d1ee1 scsi: ufs: Move dumps in IRQ handler to error handler
Performing dumps in the IRQ handler causes system stability issues. Move
dumps to the error handler and only print basic host registers here.

Link: https://lore.kernel.org/r/1596975355-39813-8-git-send-email-cang@codeaurora.org
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-08-17 20:54:55 -04:00