[ Upstream commit af984c8729 ]
A link bounce to a slow fabric may observe FDISC response delays lasting
longer than devloss tmo. Current logic decrements the final fabric node
kref during a devloss tmo event. This results in a NULL ptr dereference
crash if the FDISC completes for that fabric node after devloss tmo.
Fix by adding the NLP_IN_RECOV_POST_DEV_LOSS flag, which is set when
devloss tmo triggers and we've noticed that fabric node recovery has
already started or finished in between the time lpfc_dev_loss_tmo_callbk
queues lpfc_dev_loss_tmo_handler. If fabric node recovery succeeds, then
the driver reverses the devloss tmo marked kref put with a kref get. If
fabric node recovery fails, then the final kref put relies on the ELS
timing out or the REG_LOGIN cmpl routine.
Link: https://lore.kernel.org/r/20211020211417.88754-8-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1854f53ccd ]
If an FC link down transition while PLOGIs are outstanding to fabric well
known addresses, outstanding ABTS requests may result in a NULL pointer
dereference. Driver unload requests may hang with repeated "2878" log
messages.
The Link down processing results in ABTS requests for outstanding ELS
requests. The Abort WQEs are sent for the ELSs before the driver had set
the link state to down. Thus the driver is sending the Abort with the
expectation that an ABTS will be sent on the wire. The Abort request is
stalled waiting for the link to come up. In some conditions the driver may
auto-complete the ELSs thus if the link does come up, the Abort completions
may reference an invalid structure.
Fix by ensuring that Abort set the flag to avoid link traffic if issued due
to conditions where the link failed.
Link: https://lore.kernel.org/r/20211020211417.88754-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 79b20becce ]
An error is detected with the following report when unloading the driver:
"KASAN: use-after-free in lpfc_unreg_rpi+0x1b1b"
The NLP_REG_LOGIN_SEND nlp_flag is set in lpfc_reg_fab_ctrl_node(), but the
flag is not cleared upon completion of the login.
This allows a second call to lpfc_unreg_rpi() to proceed with nlp_rpi set
to LPFC_RPI_ALLOW_ERROR. This results in a use after free access when used
as an rpi_ids array index.
Fix by clearing the NLP_REG_LOGIN_SEND nlp_flag in
lpfc_mbx_cmpl_fc_reg_login().
Link: https://lore.kernel.org/r/20211020211417.88754-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f347c26836 ]
The following issue was observed running syzkaller:
BUG: KASAN: slab-out-of-bounds in memcpy include/linux/string.h:377 [inline]
BUG: KASAN: slab-out-of-bounds in sg_copy_buffer+0x150/0x1c0 lib/scatterlist.c:831
Read of size 2132 at addr ffff8880aea95dc8 by task syz-executor.0/9815
CPU: 0 PID: 9815 Comm: syz-executor.0 Not tainted 4.19.202-00874-gfc0fe04215a9 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xe4/0x14a lib/dump_stack.c:118
print_address_description+0x73/0x280 mm/kasan/report.c:253
kasan_report_error mm/kasan/report.c:352 [inline]
kasan_report+0x272/0x370 mm/kasan/report.c:410
memcpy+0x1f/0x50 mm/kasan/kasan.c:302
memcpy include/linux/string.h:377 [inline]
sg_copy_buffer+0x150/0x1c0 lib/scatterlist.c:831
fill_from_dev_buffer+0x14f/0x340 drivers/scsi/scsi_debug.c:1021
resp_report_tgtpgs+0x5aa/0x770 drivers/scsi/scsi_debug.c:1772
schedule_resp+0x464/0x12f0 drivers/scsi/scsi_debug.c:4429
scsi_debug_queuecommand+0x467/0x1390 drivers/scsi/scsi_debug.c:5835
scsi_dispatch_cmd+0x3fc/0x9b0 drivers/scsi/scsi_lib.c:1896
scsi_request_fn+0x1042/0x1810 drivers/scsi/scsi_lib.c:2034
__blk_run_queue_uncond block/blk-core.c:464 [inline]
__blk_run_queue+0x1a4/0x380 block/blk-core.c:484
blk_execute_rq_nowait+0x1c2/0x2d0 block/blk-exec.c:78
sg_common_write.isra.19+0xd74/0x1dc0 drivers/scsi/sg.c:847
sg_write.part.23+0x6e0/0xd00 drivers/scsi/sg.c:716
sg_write+0x64/0xa0 drivers/scsi/sg.c:622
__vfs_write+0xed/0x690 fs/read_write.c:485
kill_bdev:block_device:00000000e138492c
vfs_write+0x184/0x4c0 fs/read_write.c:549
ksys_write+0x107/0x240 fs/read_write.c:599
do_syscall_64+0xc2/0x560 arch/x86/entry/common.c:293
entry_SYSCALL_64_after_hwframe+0x49/0xbe
We get 'alen' from command its type is int. If userspace passes a large
length we will get a negative 'alen'.
Switch n, alen, and rlen to u32.
Link: https://lore.kernel.org/r/20211013033913.2551004-3-yebin10@huawei.com
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ee5d6e9ac ]
Correct kdump hangs when controller is locked up.
There are occasions when a controller reboot (controller soft reset) is
issued when a controller firmware crash dump is in progress.
This leads to incomplete controller firmware crash dump:
- When the controller crash dump is in progress, and a kdump is initiated,
the driver issues inbound doorbell reset to bring back the controller in
SIS mode.
- If the controller is in locked up state, the inbound doorbell reset does
not work causing controller initialization failures. This results in the
driver hanging waiting for SIS mode.
To avoid an incomplete controller crash dump, add in a controller crash
dump handshake:
- Controller will indicate start and end of the controller crash dump by
setting some register bits.
- Driver will look these bits when a kdump is initiated. If a controller
crash dump is in progress, the driver will wait for the controller crash
dump to complete before issuing the controller soft reset then complete
driver initialization.
Link: https://lore.kernel.org/r/20210928235442.201875-3-don.brace@microchip.com
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d4996c6eac ]
Pointers should be printed with %p or %px rather than cast to 'unsigned
long' and printed with %lx.
Change %lx to %p to print the hashed pointer.
Link: https://lore.kernel.org/r/20210929122538.1158235-1-qtxuning1999@sjtu.edu.cn
Signed-off-by: Guo Zhi <qtxuning1999@sjtu.edu.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 17b49bcbf8 ]
Several problems exist with scsi_mode_sense() buffer length handling:
1) The allocation length field of the MODE SENSE(10) command is 16-bits,
occupying bytes 7 and 8 of the CDB. With this command, access to mode
pages larger than 255 bytes is thus possible. However, the CDB
allocation length field is set by assigning len to byte 8 only, thus
truncating buffer length larger than 255.
2) If scsi_mode_sense() is called with len smaller than 8 with
sdev->use_10_for_ms set, or smaller than 4 otherwise, the buffer length
is increased to 8 and 4 respectively, and the buffer is zero filled
with these increased values, thus corrupting the memory following the
buffer.
Fix these 2 problems by using put_unaligned_be16() to set the allocation
length field of MODE SENSE(10) CDB and by returning an error when len is
too small.
Furthermore, if len is larger than 255B, always try MODE SENSE(10) first,
even if the device driver did not set sdev->use_10_for_ms. In case of
invalid opcode error for MODE SENSE(10), access to mode pages larger than
255 bytes are not retried using MODE SENSE(6). To avoid buffer length
overflows for the MODE_SENSE(10) case, check that len is smaller than 65535
bytes.
While at it, also fix the folowing:
* Use get_unaligned_be16() to retrieve the mode data length and block
descriptor length fields of the mode sense reply header instead of using
an open coded calculation.
* Fix the kdoc dbd argument explanation: the DBD bit stands for Disable
Block Descriptor, which is the opposite of what the dbd argument
description was.
Link: https://lore.kernel.org/r/20210820070255.682775-2-damien.lemoal@wdc.com
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 99154581b0 ]
When parsing the txq list in lpfc_drain_txq(), the driver attempts to pass
the requests to the adapter. If such an attempt fails, a local "fail_msg"
string is set and a log message output. The job is then added to a
completions list for cancellation.
Processing of any further jobs from the txq list continues, but since
"fail_msg" remains set, jobs are added to the completions list regardless
of whether a wqe was passed to the adapter. If successfully added to
txcmplq, jobs are added to both lists resulting in list corruption.
Fix by clearing the fail_msg string after adding a job to the completions
list. This stops the subsequent jobs from being added to the completions
list unless they had an appropriate failure.
Link: https://lore.kernel.org/r/20210910233159.115896-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 51e6ed83bb ]
Driver failed to release all memory allocated. This would lead to memory
leak during driver removal.
Properly free memory when the module is removed.
Link: https://lore.kernel.org/r/20210906170404.5682-5-Ajish.Koshy@microchip.com
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Ajish Koshy <Ajish.Koshy@microchip.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9ec5128a8b ]
The spec recommends that for transfer length larger than the max-single-cmd
attribute (bMAX_DATA_SIZE_FOR_HPB_SINGLE_CMD) it is possible to couple
pre-requests with the HPB-READ command. Being a recommendation, using
pre-requests can be perceived merely as a means of optimization. A common
practice was to send pre-requests for chunks within some interval, and
leave the READ10 untouched if larger.
Now that the pre-request flows have been removed, all the commands are
single commands. Properly handle this attribute and do not send HPB-READ
for transfer lengths larger than max-single-cmd.
[mkp: resolve conflict]
Fixes: 09d9e4d041 ("scsi: ufs: ufshpb: Remove HPB2.0 flows")
Link: https://lore.kernel.org/r/20211031123654.17719-1-avri.altman@wdc.com
Reviewed-by: Daejun Park <daejun7.park@samsung.com>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 351b3a849a ]
In ufshpb, pm_runtime_{get,put}_sync() are used to avoid unwanted runtime
suspend during query requests. Whereas commit b294ff3e34 ("scsi: ufs:
core: Enable power management for wlun") modified the driver core to use
ufshcd_rpm_{get,put}_sync() APIs.
Switch to these APIs in HPB module as well.
Link: https://lore.kernel.org/r/20210902003534epcms2p1937a0f0eeb48a441cb69f5ef13ff8430@epcms2p1
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5f7cf82c1d ]
When the value of error is printed, it will always be 0. We should print
the correct error code when scsi_bsg_register_queue() fails.
Link: https://lore.kernel.org/r/20211022010201.426746-1-liu.yun@linux.dev
Fixes: ead09dd3ae ("scsi: bsg: Simplify device registration")
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9fd26c633e ]
Various EDIF bsgs did not properly fill out the reply_payload_rcv_len
field. This causes app to parse empty data in the return payload.
Link: https://lore.kernel.org/r/20211026115412.27691-13-njavali@marvell.com
Fixes: 7ebb336e45 ("scsi: qla2xxx: edif: Add start + stop bsgs")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0f6d600a26 ]
Currently, firmware limits ELS payload to FC frame size/2112. This patch
adjusts memory buffer size to be able to handle max ELS payload.
Link: https://lore.kernel.org/r/20211026115412.27691-11-njavali@marvell.com
Fixes: 84318a9f01 ("scsi: qla2xxx: edif: Add send, receive, and accept for auth_els")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b1af26c245 ]
On session down, driver will flush all stale messages and doorbell
events. This prevents authentication application from having to process
stale data.
Link: https://lore.kernel.org/r/20211026115412.27691-7-njavali@marvell.com
Fixes: 4de067e5df ("scsi: qla2xxx: edif: Add N2N support for EDIF")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Co-developed-by: Karunakara Merugu <kmerugu@marvell.com>
Signed-off-by: Karunakara Merugu <kmerugu@marvell.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b492d6a488 ]
Current driver does unnecessary pause for each session to get to certain
state before allowing the app start call to return. In larger environment,
this introduces a long delay. Originally the delay was meant to
synchronize app and driver. However, the with current implementation the
two sides use various events to synchronize their state.
The same is applied to the authentication failure call.
Link: https://lore.kernel.org/r/20211026115412.27691-6-njavali@marvell.com
Fixes: 4de067e5df ("scsi: qla2xxx: edif: Add N2N support for EDIF")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8e6d5df3cb ]
On app start, all sessions need to be reset to see if secure connection can
be made. Fix the broken check which prevents that process.
Link: https://lore.kernel.org/r/20211026115412.27691-5-njavali@marvell.com
Fixes: 4de067e5df ("scsi: qla2xxx: edif: Add N2N support for EDIF")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0b7a9fd934 ]
When user uses issue_lip to do link bounce, driver sends additional target
reset to remote device before resetting the link. The target reset would
affect other paths with active I/Os. This patch will remove the unnecessary
target reset.
Link: https://lore.kernel.org/r/20211026115412.27691-4-njavali@marvell.com
Fixes: 5854771e31 ("[SCSI] qla2xxx: Add ISPFX00 specific bus reset routine")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bb2ca6b3f0 ]
For RSCN of type "Area, Domain, or Fabric", which indicate a portion or
entire fabric was disturbed, current driver does not set the scan_need flag
to indicate a session was affected by the disturbance. This in turn can
lead to I/O timeout and delay of relogin. Hence initiate relogin in the
event of fabric disturbance.
Link: https://lore.kernel.org/r/20211026115412.27691-2-njavali@marvell.com
Fixes: 1560bafdff ("scsi: qla2xxx: Use complete switch scan for RSCN events")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d305c253af ]
A prior patch introduced HBA_NEEDS_CFG_PORT flag logic, but in
lpfc_sli_brdrestart_s3() code path, right after HBA_NEEDS_CFG_PORT is set,
the phba->hba_flag is cleared in lpfc_sli_brdreset().
Fix by calling lpfc_sli_chipset_init() to wait for successful restart of
the HBA in lpfc_host_reset_handler() after lpfc_sli_brdrestart().
lpfc_sli_chipset_init() sets the HBA_NEEDS_CFG_PORT flag so that the
lpfc_sli_hba_setup() routine from lpfc_online() will execute
lpfc_sli_config_port() initialization step when the brdrestart is
successful.
Link: https://lore.kernel.org/r/20211020211417.88754-3-jsmart2021@gmail.com
Fixes: d2f2547efd ("scsi: lpfc: Fix auto sli_mode and its effect on CONFIG_PORT for SLI3")
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f4875d509a ]
This variable is just a temporary variable, used to do an endian
conversion. The problem is that the last byte is not initialized. After
the conversion is completely done, the last byte is discarded so it doesn't
cause a problem. But static checkers and the KMSan runtime checker can
detect the uninitialized read and will complain about it.
Link: https://lore.kernel.org/r/20211006073242.GA8404@kili
Fixes: 5036f0a0ec ("[SCSI] csiostor: Fix sparse warnings.")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4084a7235d ]
pm8001_mpi_get_nvmd_resp() handles a GET_NVMD_DATA response, not a
SET_NVMD_DATA response, as the log statement implies.
Fixes: 1f889b5871 ("scsi: pm80xx: Fix pm8001_mpi_get_nvmd_resp() race condition")
Link: https://lore.kernel.org/r/20210929025847.646999-1-ipylypiv@google.com
Reviewed-by: Changyuan Lyu <changyuanl@google.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e7dcc514a4 ]
IRQ polling thread calls ISR after enable_irq() to handle any missed I/O
completion. The atomic flag "in_used" was added to have the synchronization
between the IRQ polling thread and the interrupt context. There is a bug
around it leading to a race condition.
Below is the sequence:
- IRQ polling thread accesses ISR, fetches the reply descriptor.
- Real interrupt arrives and pre-empts polling thread (enable_irq() is
already called).
- Interrupt context picks the same reply descriptor as fetched by polling
thread, processes it, and exits.
- Polling thread resumes and processes the descriptor which is already
processed by interrupt thread leads to kernel crash.
Setting the "in_used" flag before fetching the reply descriptor ensures
synchronized access to ISR.
Link: https://www.spinics.net/lists/linux-scsi/msg159440.html
Link: https://lore.kernel.org/r/20210929124022.24605-2-sumit.saxena@broadcom.com
Fixes: 9bedd36e91 ("scsi: megaraid_sas: Handle missing interrupts while re-enabling IRQs")
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit edc0596cc0 ]
Commit aa53f580e6 ("scsi: ufs: Minor adjustments to error handling")
introduced a ufshcd_clear_ua_wluns() call in
ufshcd_err_handling_unprepare(). As explained in detail by Adrian Hunter,
this can trigger a deadlock. Avoid that deadlock by removing the code that
clears the unit attention. This is safe because the only software that
relies on clearing unit attentions is the Android Trusty software and
because support for handling unit attentions has been added in the Trusty
software.
See also https://lore.kernel.org/linux-scsi/20210930124224.114031-2-adrian.hunter@intel.com/
Note that "scsi: ufs: Retry START_STOP on UNIT_ATTENTION" is a prerequisite
for this commit.
Link: https://lore.kernel.org/r/20211001182015.1347587-3-jaegeuk@kernel.org
Fixes: aa53f580e6 ("scsi: ufs: Minor adjustments to error handling")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Bart Van Assche <bvanassche@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 68444d73d6 ]
Since commit 568dd99596 ("scsi: ufs: Rename the second ufshcd_probe_hba()
argument"), the second ufshcd_probe_hba() argument has been changed to
init_dev_params.
Link: https://lore.kernel.org/r/20210929200640.828611-3-huobean@gmail.com
Fixes: 568dd99596 ("scsi: ufs: Rename the second ufshcd_probe_hba() argument")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cbd9a3347c ]
dc395x_init_one()->adapter_init() might fail. In this case, the acb is
already cleaned up by adapter_init(), no need to do that in
adapter_uninit(acb) again.
[ 1.252251] dc395x: adapter init failed
[ 1.254900] RIP: 0010:adapter_uninit+0x94/0x170 [dc395x]
[ 1.260307] Call Trace:
[ 1.260442] dc395x_init_one.cold+0x72a/0x9bb [dc395x]
Link: https://lore.kernel.org/r/20210907040702.1846409-1-ztong0001@gmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reviewed-by: Finn Thain <fthain@linux-m68k.org>
Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b507357f79 ]
Currently, we hold off unregistering with NVMe transport layer until GID_FT
or ADISC completes upon receipt of RSCN. In the ADISC discovery routine,
for nodes not found in the GID_FT response, the nodes are unregistered from
the SCSI transport but not UNREG_RPI'd. Meaning outstanding WQEs continue
to be outstanding and were not failed back to the OS. If an NVMe device,
this mean there wasn't initial termination of the I/Os so they could be
issued on a different NVMe path.
Fix by unregistering the RPI so that I/O is cancelled.
Link: https://lore.kernel.org/r/20210910233159.115896-8-jsmart2021@gmail.com
Fixes: 0614568361 ("scsi: lpfc: Delay unregistering from transport until GIDFT or ADISC completes")
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 527d46e0b0 ]
Authentication application may be running and in the past tried to probe
driver (app_start) but was unsuccessful. This could be due to the bsg layer
not being ready to service the request. On a successful link up, driver
will use the netlink Link Up event to notify the app to retry the app_start
call.
In another case, app does not poll for new NPIV host. This link up event
would notify app of the presence of a new SCSI host.
Link: https://lore.kernel.org/r/20210908164622.19240-6-njavali@marvell.com
Fixes: 4de067e5df ("scsi: qla2xxx: edif: Add N2N support for EDIF")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b27a40534e ]
Commit 1f02beff22 ("scsi: pm80xx: Remove global lock from outbound queue
processing") introduced a lock per outbound queue. Prior to that change the
driver was using a global lock for all outbound queues.
While processing the I/O responses and events the driver takes the outbound
queue spinlock and is supposed to release it in pm8001_ccb_task_free_done()
before calling command done(). Since the older code was using a global
lock, pm8001_ccb_task_free_done() was releasing the global spin lock. The
change that split the lock per outbound queue did not consider this and
pm8001_ccb_task_free_done() was still releasing the global lock.
Link: https://lore.kernel.org/r/20210906170404.5682-3-Ajish.Koshy@microchip.com
Fixes: 1f02beff22 ("scsi: pm80xx: Remove global lock from outbound queue processing")
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Ajish Koshy <Ajish.Koshy@microchip.com>
Signed-off-by: Viswas G <Viswas.G@microchip.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e6a54d6f22 ]
devlink is a software interface that doesn't depend on any hardware
capabilities. The failure in SW means memory issues, wrong parameters,
programmer error e.t.c.
Like any other such interface in the kernel, the returned status of
devlink APIs should be checked and propagated further and not ignored.
Fixes: 755f982bb1 ("qed/qede: make devlink survive recovery")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit cd8a36a90b upstream.
A prior patch inadvertently caused lpfc_sli_sum_iocb() to exclude counting
of outstanding aborted I/Os and ABORT IOCBs. Thus,
lpfc_reset_flush_io_context() called from any TMF routine does not properly
wait to flush all outstanding FCP IOCBs leading to a block layer crash on
an invalid scsi_cmnd->request pointer.
kernel BUG at ../block/blk-core.c:1489!
RIP: 0010:blk_requeue_request+0xaf/0xc0
...
Call Trace:
<IRQ>
__scsi_queue_insert+0x90/0xe0 [scsi_mod]
blk_done_softirq+0x7e/0x90
__do_softirq+0xd2/0x280
irq_exit+0xd5/0xe0
do_IRQ+0x4c/0xd0
common_interrupt+0x87/0x87
</IRQ>
Fix by separating out the LPFC_IO_FCP, LPFC_IO_ON_TXCMPLQ,
LPFC_DRIVER_ABORTED, and CMD_ABORT_XRI_CN || CMD_CLOSE_XRI_CN checks into a
new lpfc_sli_validate_fcp_iocb_for_abort() routine when determining to
build an ABORT iocb.
Restore lpfc_reset_flush_io_context() functionality by including counting
of outstanding aborted IOCBs and ABORT IOCBs in lpfc_sli_sum_iocb().
Link: https://lore.kernel.org/r/20210910233159.115896-9-jsmart2021@gmail.com
Fixes: e136471135 ("scsi: lpfc: Fix illegal memory access on Abort IOCBs")
Cc: <stable@vger.kernel.org> # v5.12+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 982fc3965d upstream.
In a rarely executed path, FLOGI failure, there is a refcounting error. If
FLOGI completed with an error, typically a timeout, the initial completion
handler would remove the job reference. However, the job completion isn't
the actual end of the job/exchange as the timeout usually initiates an
ABTS, and upon that ABTS completion, a final completion is sent. The driver
removes the reference again in the final completion. Thus the imbalance.
In the buggy cases, if there was a link bounce while the delayed response
is outstanding, the fport node may be referenced again but there was no
additional reference as it is already present. The delayed completion then
occurs and removes the last reference freeing the node and causing issues
in the link up processed that is using the node.
Fix this scenario by removing the snippet that removed the reference in the
initial FLOGI completion. The bad snippet was poorly trying to identify the
FLOGI as OK to do so by realizing the node was not registered with either
SCSI or NVMe transport.
Link: https://lore.kernel.org/r/20210910233159.115896-3-jsmart2021@gmail.com
Fixes: 618e2ee146 ("scsi: lpfc: Fix FLOGI failure due to accessing a freed node")
Cc: <stable@vger.kernel.org> # v5.13+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 703535e6ae upstream.
No need to deduce command size in scsi_setup_scsi_cmnd() anymore as
appropriate checks have been added to scsi_fill_sghdr_rq() function and the
cmd_len should never be zero here. The code to do that wasn't correct
anyway, as it used uninitialized cmd->cmnd, which caused a null-ptr-deref
if the command size was zero as in the trace below. Fix this by removing
the unneeded code.
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 0 PID: 1822 Comm: repro Not tainted 5.15.0 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014
Call Trace:
blk_mq_dispatch_rq_list+0x7c7/0x12d0
__blk_mq_sched_dispatch_requests+0x244/0x380
blk_mq_sched_dispatch_requests+0xf0/0x160
__blk_mq_run_hw_queue+0xe8/0x160
__blk_mq_delay_run_hw_queue+0x252/0x5d0
blk_mq_run_hw_queue+0x1dd/0x3b0
blk_mq_sched_insert_request+0x1ff/0x3e0
blk_execute_rq_nowait+0x173/0x1e0
blk_execute_rq+0x15c/0x540
sg_io+0x97c/0x1370
scsi_ioctl+0xe16/0x28e0
sd_ioctl+0x134/0x170
blkdev_ioctl+0x362/0x6e0
block_ioctl+0xb0/0xf0
vfs_ioctl+0xa7/0xf0
do_syscall_64+0x3d/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xae
---[ end trace 8b086e334adef6d2 ]---
Kernel panic - not syncing: Fatal exception
Link: https://lore.kernel.org/r/20211103170659.22151-2-tadeusz.struk@linaro.org
Fixes: 2ceda20f0a ("scsi: core: Move command size detection out of the fast path")
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <linux-scsi@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: <stable@vger.kernel.org> # 5.15, 5.14, 5.10
Reported-by: syzbot+5516b30f5401d4dcbcae@syzkaller.appspotmail.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5ae17501bc upstream.
The changes to issue the abort from the scmd->abort_work instead of the EH
thread introduced a problem if eh_deadline is used. If aborting the
command(s) is successful, and there are never any scmds added to the
shost->eh_cmd_q, there is no code path which will reset the ->last_reset
value back to zero.
The effect of this is that after a successful abort with no EH thread
activity, a subsequent timeout, perhaps a long time later, might
immediately be considered past a user-set eh_deadline time, and the host
will be reset with no attempt at recovery.
Fix this by resetting ->last_reset back to zero in scmd_eh_abort_handler()
if it is determined that the EH thread will not run to do this.
Thanks to Gopinath Marappan for investigating this problem.
Link: https://lore.kernel.org/r/20211029194311.17504-2-emilne@redhat.com
Fixes: e494f6a728 ("[SCSI] improved eh timeout handler")
Cc: stable@vger.kernel.org
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 20aaef52eb upstream.
Need to make sure the command size is valid before copying the command from
user space.
Link: https://lore.kernel.org/r/20211103170659.22151-1-tadeusz.struk@linaro.org
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <linux-scsi@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Cc: <stable@vger.kernel.org> # 5.15, 5.14, 5.10
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Host Performance Buffer feature allows UFS read commands to carry the
physical media addresses along with the LBAs, thus allowing less internal
L2P-table switches in the device. HPB1.0 allowed a single LBA, while
HPB2.0 increases this capacity up to 255 blocks.
Carrying more than a single record, the read operation is no longer purely
of type "read" but a "hybrid" command: Writing the physical address to the
device in one operation and reading back the required payload in another.
The JEDEC HPB spec defines two commands for this operation:
HPB-WRITE-BUFFER (0x2) to write the physical addresses to device, and
HPB-READ to read the payload.
With the current HPB design the UFS driver has no alternative but to divide
the READ request into 2 separate commands: HPB-WRITE-BUFFER and HPB-READ.
This causes a great deal of aggravation to the block layer guys who
demanded that we completely revert the entire HPB driver regardless of the
huge amount of corporate effort already invested in it.
As a compromise, remove only the pieces that implement the 2.0
specification. This is done as a matter of urgency for the final 5.15
release.
Link: https://lore.kernel.org/r/20211030062301.248-1-avri.altman@wdc.com
Tested-by: Avri Altman <avri.altman@wdc.com>
Tested-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Co-developed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Testing revealed a problem with how the reference tag was handled for
a WRITE_INSERT operation. The SCSI_PROT_REF_CHECK flag is not set when
the controller is asked to generate the protection information
(i.e. not DIX). And as a result the initial reference tag would not be
set in the WRITE_INSERT case.
Separate handling of the REF_CHECK and REF_INCREMENT flags to align
with both the DIX spec and the MPI implementation.
Link: https://lore.kernel.org/r/20211028034202.24225-1-martin.petersen@oracle.com
Fixes: b3e2c72af1 ("scsi: mpt3sas: Use the proper SCSI midlayer interfaces for PI")
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit a264cf5e81 ("scsi: ibmvfc: Fix command state accounting and stale
response detection") introduced a regression in detecting duplicate
responses. This was observed in test where a command was sent to the VIOS
and completed before ibmvfc_send_event() set the active flag to 1, which
resulted in the atomic_dec_if_positive() call in ibmvfc_handle_crq()
thinking this was a duplicate response, which resulted in scsi_done() not
getting called, so we then hit a SCSI command timeout for this command once
the timeout expires. This simply ensures the active flag gets set prior to
making the hcall to send the command to the VIOS, in order to close this
window.
Link: https://lore.kernel.org/r/20211019152129.16558-1-brking@linux.vnet.ibm.com
Fixes: a264cf5e81 ("scsi: ibmvfc: Fix command state accounting and stale response detection")
Cc: stable@vger.kernel.org
Acked-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Implement the ->restore() PM operation and set the link to off, which will
force a full reset and restore. This ensures that Host Performance Booster
is reset after suspend-to-disk.
The Host Performance Booster feature caches logical-to-physical mapping
information in the host memory. After suspend-to-disk, such information is
not valid, so a full reset and restore is needed.
A full reset and restore is done if the SPM level is 5 or 6, but not for
other SPM levels, so this change fixes those cases.
A full reset and restore also restores base address registers, so that code
is removed.
Link: https://lore.kernel.org/r/20211018151004.284200-2-adrian.hunter@intel.com
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The sgl is freed in the target stack in target_release_cmd_kref() before
calling qlt_free_cmd() but there is an unmap of sgl in qlt_free_cmd() that
causes a panic if sgl is not yet DMA unmapped:
NIP dma_direct_unmap_sg+0xdc/0x180
LR dma_direct_unmap_sg+0xc8/0x180
Call Trace:
ql_dbg_prefix+0x68/0xc0 [qla2xxx] (unreliable)
dma_unmap_sg_attrs+0x54/0xf0
qlt_unmap_sg.part.19+0x54/0x1c0 [qla2xxx]
qlt_free_cmd+0x124/0x1d0 [qla2xxx]
tcm_qla2xxx_release_cmd+0x4c/0xa0 [tcm_qla2xxx]
target_put_sess_cmd+0x198/0x370 [target_core_mod]
transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod]
tcm_qla2xxx_complete_free+0x6c/0x90 [tcm_qla2xxx]
The sgl may be left unmapped in error cases of response sending. For
instance, qlt_rdy_to_xfer() maps sgl and exits when session is being
deleted keeping the sgl mapped.
This patch removes use-after-free of the sgl and ensures that the sgl is
unmapped for any command that was not sent to firmware.
Link: https://lore.kernel.org/r/20211018122650.11846-1-d.bogdanov@yadro.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 8c0eb596ba ("[SCSI] qla2xxx: Fix a memory leak in an error path of
qla2x00_process_els()"), intended to change:
bsg_job->request->msgcode == FC_BSG_HST_ELS_NOLOGIN
to:
bsg_job->request->msgcode != FC_BSG_RPT_ELS
but changed it to:
bsg_job->request->msgcode == FC_BSG_RPT_ELS
instead.
Change the == to a != to avoid leaking the fcport structure or freeing
unallocated memory.
Link: https://lore.kernel.org/r/20211012191834.90306-2-jgu@purestorage.com
Fixes: 8c0eb596ba ("[SCSI] qla2xxx: Fix a memory leak in an error path of qla2x00_process_els()")
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Joy Gu <jgu@purestorage.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver probing function should return < 0 for failure, otherwise
kernel will treat value > 0 as success.
Link: https://lore.kernel.org/r/1634522181-31166-1-git-send-email-zheyuma97@gmail.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>