There is a chance of the driver to be stuck in kdump if drives start
acting up in kdump discovery process and the kernel decides to send eh
resets, which would prompt rescan to be scheduled.
Do not perform a rescan in kdump context, since we do not expect a hotplug
event during kdump and all the devices are going to go away anyway.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add back the ability to scan for hotplug changes while eh was in progress.
Schedule a rescan for a later time in the eh recovery code and wait for
eh to complete in the rescan worker.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the driver fails to retrieve information from the fw (could happen when
the fw is not fully in its senses), the driver does nothing and change is
not processed correctly by the driver
Schedule host rescan in case of failure. This is only for SAFW, since
the information retrieval failure will happen on SAFW devices.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver uses scsi_scan_host to add new devices in the driver init path,
which adds all the fw exposed devices. The drivers resorts to queue
command checks to block out commands to _hidden_ devices.
Use the hotplug handler code to add new devices during driver init and
other areas, this is only for safw. For ARC scsi_scan_host will still
apply.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently driver will attempt to process hotplug events concurrently based
on the FW interrupt.
Protect safw update function with a scan mutex.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The device hotplug events are processed only after retrieving the updated
lun information from the fw. Does not make sense to keep them separate.
Merge both the hotplug handling and safw adapter setup code into single
function.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Resolve luns checks the if a sdev is already present in the os to figure
out if it needs to be removed. Internally the driver exposes HBA on bus
2 even though its bus 1 in the fw. Its mildly confusing.
Refactor out the sdev lookup into its function to check if sdev has been
added to the kernel or not. Add helper functions to add, remove and put
devices based on their fw bus and target number.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Added macros to loop through the MAX SUPPORTED Buses and Targets. This
will make the code a bit easier to read.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The hotplug handler code is duplicated for hba handling and container
handling.
Merged function to handle hba and container hot plug events into the
resolve luns functions. Added a bunch of helper functions to check the
validity of a given target and to check if bus, target is container
device.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Merge aac_get_containers to setup target function, so that information
about all the present devices can be retrieved in one shot.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add helper function to set queue depth from information retrieved from
the bmic phy structure.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Save the bmic information for each phy, so that it can processed in
target setup function.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Created inline function to retrieve lun info for each device from the
phy luns structure.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Move the function to get phy luns information to the top of function
to set target information
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove function call to process targets from the report phy luns function
and make it a function in its own right. This will help understand the
flow of the code.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add helper function to setup targets devices and create the base for the
upcoming patches
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Rename variables and functions to make bmic identify, report phy luns
to make them consistent across code internal existing code bases
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Edit function that retrieves phy lun information to use common
bmic function
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
safw command submission is duplicated across many functions.
Move the safw submission code from bmic identify into its own function
for common use
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Ideally driver needs to wait for IO to be submitted or responded to before
shutdown.
Move code to wait for IO completion into shutdown path
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Refactored the reset_host store function to make consistent across code
bases
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It is possible to restart the controller via the use of the reset_host
sysfs variable. This does work for controllers that can no longer respond,
since driver will attempt to send down a shutdown in this path.
Check if the controller is able to receive commands before sending down
a shutdown
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver would hang when attempting to send reset from the ioctl interface,
since it would wait to retrieve the ioctl mutex at send shutdown.
Set adapter shutdown and unlock mutex before sending down reset request.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
As part of the recovery process, the drivers removes offline devices (
done by the kernel) and then tries to add them back in the rescan code.
Removing the device is like taking a sledgehammer to a nail.
Set the device as running if it is marked offline.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver attempts to perform a device scan and device add after coming out
of reset. At times when the kdump kernel loads and it tries to perform
eh recovery, the device scan hangs since its commands are blocked because
of the eh recovery. This should have shown up in normal eh recovery path
(Should have been obvious)
Remove the code that performs scanning.I can live without the rescanning
support in the stable kernels but a hanging kdump/eh recovery needs to be
fixed.
Fixes: a2d0321dd5 (scsi: aacraid: Reload offlined drives after controller reset)
Cc: <stable@vger.kernel.org>
Reported-by: Douglas Miller <dougmill@linux.vnet.ibm.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Fixes: a2d0321dd5 (scsi: aacraid: Reload offlined drives after controller reset)
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Check if the adapter can receive abort requests, before sending aborts
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When udev requests for a devices inquiry string, it might create multiple
threads causing a race condition on the shared inquiry resource string.
Created a buffer with the string for each thread.
Cc: <stable@vger.kernel.org>
Fixes: 3bc8070fb7 ([SCSI] aacraid: SMC vendor identification)
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix kernel-doc warnings in drivers/scsi/ that are related to iscsi
support interfaces.
Fixes these kernel-doc warnings: (tested by adding these files to a new
target.rst documentation file: WIP)
../drivers/scsi/libiscsi.c:2740: warning: No description found for parameter 'dd_size'
../drivers/scsi/libiscsi.c:2740: warning: No description found for parameter 'id'
../drivers/scsi/libiscsi.c:2961: warning: No description found for parameter 'cls_conn'
../drivers/scsi/iscsi_tcp.c:313: warning: No description found for parameter 'conn'
../drivers/scsi/iscsi_tcp.c:363: warning: No description found for parameter 'conn'
../drivers/scsi/libiscsi_tcp.c:810: warning: No description found for parameter 'tcp_conn'
../drivers/scsi/libiscsi_tcp.c:810: warning: No description found for parameter 'segment'
../drivers/scsi/libiscsi_tcp.c:887: warning: No description found for parameter 'offloaded'
../drivers/scsi/libiscsi_tcp.c:887: warning: No description found for parameter 'status'
../drivers/scsi/libiscsi_tcp.c:887: warning: Excess function parameter 'offload' description in 'iscsi_tcp_recv_skb'
../drivers/scsi/libiscsi_tcp.c:964: warning: Excess function parameter 'conn' description in 'iscsi_tcp_task_init'
../drivers/scsi/libiscsi_tcp.c:964: warning: Excess function parameter 'sc' description in 'iscsi_tcp_task_init'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: linux-scsi@vger.kernel.org
Cc: target-devel@vger.kernel.org
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-rdma@vger.kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
1. In IO path, setting of "ATA command pending" flag early before device
removal, invalid device handle etc., checks causes any new commands
to be always returned with SAM_STAT_BUSY and when the driver removes
the drive the SML issues SYNC Cache command and that command is
always returned with SAM_STAT_BUSY and thus making SYNC Cache command
to requeued.
2. If the driver gets an ATA PT command for a SATA drive then the driver
set "ATA command pending" flag in device specific data structure not
to allow any further commands until the ATA PT command is completed.
However, after setting the flag if the driver decides to return the
command back to upper layers without actually issuing to the firmware
(i.e., returns from qcmd failure return paths) then the corresponding
flag is not cleared and this prevents the driver from sending any new
commands to the drive.
This patch fixes above two issues by setting of "ATA command pending"
flag after checking for whether device deleted, invalid device handle,
device busy with task management. And by setting "ATA command pending"
flag to false in all of the qcmd failure return paths after setting the
flag.
Signed-off-by: Chaitra P B <chaitra.basappa@broadcom.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Several statements are indented too far, fix these
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
localport is being dereferenced to assign lport and then immediately
afterwards localport is being sanity checked to see if it is null. Fix
this by only dereferencing localport until after it has been null
checked.
Detected by CoverityScan, CID#1463038 ("Dereference before null check")
Fixes: 3a8cefbfc5ee ("scsi: lpfc: Beef up stat counters for debug")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The define names specified 64Bit/128Bit, not 64GBIT/128GBIT. Correct
the names.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The function sas_parse_addr() could be easily substituted by hex2bin()
which is in kernel library code.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If, for any reason, userland shuts down iscsi transport interfaces
before proper logouts - like when logging in to LUNs manually, without
logging out on server shutdown, or when automated scripts can't
umount/logout from logged LUNs - kernel will hang forever on its
sd_sync_cache() logic, after issuing the SYNCHRONIZE_CACHE cmd to all
still existent paths.
PID: 1 TASK: ffff8801a69b8000 CPU: 1 COMMAND: "systemd-shutdow"
#0 [ffff8801a69c3a30] __schedule at ffffffff8183e9ee
#1 [ffff8801a69c3a80] schedule at ffffffff8183f0d5
#2 [ffff8801a69c3a98] schedule_timeout at ffffffff81842199
#3 [ffff8801a69c3b40] io_schedule_timeout at ffffffff8183e604
#4 [ffff8801a69c3b70] wait_for_completion_io_timeout at ffffffff8183fc6c
#5 [ffff8801a69c3bd0] blk_execute_rq at ffffffff813cfe10
#6 [ffff8801a69c3c88] scsi_execute at ffffffff815c3fc7
#7 [ffff8801a69c3cc8] scsi_execute_req_flags at ffffffff815c60fe
#8 [ffff8801a69c3d30] sd_sync_cache at ffffffff815d37d7
#9 [ffff8801a69c3da8] sd_shutdown at ffffffff815d3c3c
This happens because iscsi_eh_cmd_timed_out(), the transport layer
timeout helper, would tell the queue timeout function (scsi_times_out)
to reset the request timer over and over, until the session state is
back to logged in state. Unfortunately, during server shutdown, this
might never happen again.
Other option would be "not to handle" the issue in the transport
layer. That would trigger the error handler logic, which would also need
the session state to be logged in again.
Best option, for such case, is to tell upper layers that the command was
handled during the transport layer error handler helper, marking it as
DID_NO_CONNECT, which will allow completion and inform about the
problem.
After the session was marked as ISCSI_STATE_FAILED, due to the first
timeout during the server shutdown phase, all subsequent cmds will fail
to be queued, allowing upper logic to fail faster.
Signed-off-by: Rafael David Tinoco <rafael.tinoco@canonical.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Advance the qed* drivers to use firmware 8.33.1.0:
Modify core driver (qed) to utilize the new FW and initialize the device
with it. This is the lion's share of the patch, and includes changes to FW
interface files, device initialization flows, FW interaction flows, and
debug collection flows.
Modify Ethernet driver (qede) to make use of new FW in fastpath.
Modify RoCE/iWARP driver (qedr) to make use of new FW in fastpath.
Modify FCoE driver (qedf) to make use of new FW in fastpath.
Modify iSCSI driver (qedi) to make use of new FW in fastpath.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Bason <Yuval.Bason@cavium.com>
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Manish Chopra <Manish.Chopra@cavium.com>
Signed-off-by: Chad Dupuis <Chad.Dupuis@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch renames defines and structures in the FW HSI files to allow a
distinction between different types of HW.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Chad Dupuis <Chad.Dupuis@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch refactors and reorders the FW API files in preparation of
upgrading the code to support new FW.
- Make use of the BIT macro in appropriate places.
- Whitespace changes to align values and code blocks.
- Comments are updated (spelling mistakes, removed if not clear).
- Group together code blocks which are related or deal with similar
matters.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two simple fixes, both of which cause I/O hangs. The storvsc one is
from the hyper-v which can hang under certain hot add/remove
conditions and the other is generally, where removing a target and a
device in close proximity can result in the release method being
executed twice (and subsequent list and other corruption and an
eventual panic).
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJaSBHsAAoJEAVr7HOZEZN4CGUP/jAF44WD9sV5E0Rzx4IrFJkr
V0n5EBNeTOE6Qqy4M5LuXkLWVBRIAnctSa73r52sMamh2vkp9t5Fp1GafYRlM+3y
7avZ9IUXnupbIjZeiI80aq+5OK1jEdMjI/up5lqF8XiifZ+Dqp6MHQW8hTHLUaqg
y+bT16XzZ5ziRv9wJoxIscMCqGZHucN6Dsye98mwaS4bZuIi0hdDk+piY02GlI0C
Nf77xZtNmQXVY5R8ydEe43ci1j5GwSbIg6MjbOmhCOhurnG4NX10QlQSM9zFPDHV
XKVQcLFpJvdNVOwvwgkuMpeqDCzlSg9n2W8HjQDFUTsFeG03t6ylVtI6iYMXbgVm
fiedJdOFk50dw0qkBypYu425fPkX5S/rg3zv+yDBO0vc1FZxMkXInnJsgL3CMzPH
xXpGpICNCtcFmLpCbgyxjc4fcfWHwSsgqW0fD/NWcP//CcsUsOAgmM9HJw9jp+sM
AqBonDGpm+E8EU1It2qxHl5uSfaGRL1aZ8kL0EbOpYM07XSIW/diZ0l0+Ma2MLq9
lyKJsFmRt1I65aW36cT79SHEtArjtCBn9z184MO/9GyhLz5JxSk3CZ4KWo5eci8n
PWZyArt7LyFojNZhjT3MEflidr/HlecB4M6XLbYtba1A8oYeNE3NwueCrPz3gjAC
BCYPbM8Nx9IIarOy3pw7
=Dqbu
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two simple fixes, both of which cause I/O hangs.
The storvsc one is from the hyper-v which can hang under certain hot
add/remove conditions and the other is generally, where removing a
target and a device in close proximity can result in the release
method being executed twice (and subsequent list and other corruption
and an eventual panic)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: storvsc: Fix scsi_cmd error assignments in storvsc_handle_error
scsi: core: check for device state in __scsi_remove_target()
Pull block fixes from Jens Axboe:
"It's been a few weeks, so here's a small collection of fixes that
should go into the current series.
This contains:
- NVMe pull request from Christoph, with a few important fixes.
- kyber hang fix from Omar.
- A blk-throttl fix from Shaohua, fixing a case where we double
charge a bio.
- Two call_single_data alignment fixes from me, fixing up some
unfortunate changes that went into 4.14 without being properly
reviewed on the block side (since nobody was CC'ed on the
patch...).
- A bounce buffer fix in two parts, one from me and one from Ming.
- Revert bdi debug error handling patch. It's causing boot issues for
some folks, and a week down the line, we're still no closer to a
fix. Revert this patch for now until it's figured out, then we can
retry for 4.16"
* 'for-linus' of git://git.kernel.dk/linux-block:
Revert "bdi: add error handle for bdi_debug_register"
null_blk: unalign call_single_data
block: unalign call_single_data in struct request
block-throttle: avoid double charge
block: fix blk_rq_append_bio
block: don't let passthrough IO go into .make_request_fn()
nvme: setup streams after initializing namespace head
nvme: check hw sectors before setting chunk sectors
nvme: call blk_integrity_unregister after queue is cleaned up
nvme-fc: remove double put reference if admin connect fails
nvme: set discard_alignment to zero
kyber: fix another domain token wait queue hang
Prior patch mixed up what argument in the macro was what, so min value
was placed as the "default" argument, and the default value was placed
as the "min" argument. Thus, when the default was applied, it looked
like the default was smaller than the allowed min.
Swap argument postions to correct.
[mkp: fixed checkpatch warning]
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When an I/O is returned with an srb_status of SRB_STATUS_INVALID_LUN
which has zero good_bytes it must be assigned an error. Otherwise the
I/O will be continuously requeued and will cause a deadlock in the case
where disks are being hot added and removed. sd_probe_async will wait
forever for its I/O to complete while holding scsi_sd_probe_domain.
Also returning the default error of DID_TARGET_FAILURE causes multipath
to not retry the I/O resulting in applications receiving I/O errors
before a failover can occur.
Signed-off-by: Cathy Avery <cavery@redhat.com>
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch fixes following warnings reported by smatch:
drivers/scsi/qla2xxx/qla_mid.c:586 qla25xx_delete_req_que()
error: we previously assumed 'req' could be null (see line 580)
drivers/scsi/qla2xxx/qla_mid.c:602 qla25xx_delete_rsp_que()
error: we previously assumed 'rsp' could be null (see line 596)
Fixes: 7867b98dce ("scsi: qla2xxx: Fix memory leak in dual/target mode")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver may sleep under a spinlock.
The function call path is:
qedi_cpu_offline (acquire the spinlock)
qedi_fp_process_cqes
qedi_mtask_completion
qedi_process_tmf_resp
kzalloc(GFP_KERNEL) --> may sleep
To fix it, GFP_KERNEL is replaced with GFP_ATOMIC.
This bug is found by my static analysis tool(DSAC) and checked by my
code review.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Acked-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Simplify all arcmsr_hbaX_get_config routine by call a new
get_adapter_config function.
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Waiting for iop firmware ready before issue get_config command to iop
for adapter type A and D.
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update the driver version to 11.4.0.6
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If log verbose in not turned on, its hard to tell when certain error
paths get hit. Add stats counters and corresponding logic to
debugfs/sysfs to aid understanding what paths were traversed.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When unregistering a remote port the lpfc driver would eventually wait
for the remoteport_unreg done callback. But the driver never completed
the io aborts that would allow the connections to terminate thus the
unreg done callback was never issued. Turns out the coding style of the
driver allowed for the wait to occur on the same cpu that the deferred
isr is called on. The blocking for the wait, blocked the isr, and as the
isr didn't run, the io aborts wouldn't finish.
Turns out there was never a good reason to block waiting for the unreg
done in the first place. The driver can continue execution and the ref
counting within the driver will do the right thing.
Resolve by removing the wait and patching up a few cases where the ref
counting didn't look right - mainly cases where the remote port comes
back before the aborts had completed and the unreg done had been
called. Additionally, a few places which used pointer values to guide
driver actions weren't protected by lock, so correct those.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In the lpfc discovery engine, when as a nvme target, where the driver
was performing mailbox io with the adapter for port login when a NVME
PRLI is received from the host. Rather than queue and eventually get
back to sending a response after the mailbox traffic, the driver
rejected the io with an error response.
Turns out this particular initiator didn't like the rejection values
(unable to process command/command in progress) so it never attempted a
retry of the PRLI. Thus the host never established nvme connectivity
with the lpfc target.
By changing the rejection values (to Logical Busy/nothing more), the
initiator accepted the response and would retry the PRLI, resulting in
nvme connectivity.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When enabled for both SCSI and NVME support, and connected pt2pt to a
SCSI only target, the driver nodelist entry for the remote port is left
in PRLI_ISSUE state and no SCSI LUNs are discovered. Works fine if only
configured for SCSI support.
Error was due to some of the prli points still reflecting the need to
send only 1 PRLI. On a lot of fabric configs, targets were NVME only,
which meant the fabric-reported protocol attributes were only telling
the driver one protocol or the other. Thus things worked fine. With
pt2pt, the driver must send a PRLI for both protocols as there are no
hints on what the target supports. Thus pt2pt targets were hitting the
multiple PRLI issues.
Complete the dual PRLI support. Track explicitly whether scsi (fcp) or
nvme prli's have been sent. Accurately track protocol support detected
on each node as reported by the fabric or probed by PRLI traffic.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Increased the sizes of the SCSI WQ's and CQ's so that SCSI operation is
similar to that used by NVME. However, size increase restricted only to
those newer adapters that can support the larger WQE size, thus bigger
queue sizes.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Handling a rcv'ed PRLI incorrectly can cause the ndlp to end up in the
wrong state or the driver to ACC and PRLI when it should send LS_RJT.
The cause was due to the driver not properly looking at the PRLI type
and taking the multiple protocol support into consideration.
Resolved by adding checks in the various PRLI receive points to validate
PRLI type and reject if not valid for the enabled protocols and mode
(host vs target).
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is all set to handle the defer_rcv api for the nvmet_fc
transport, yet didn't properly recognize the return status when the
defer_rcv occurred. The driver treated it simply as an error and aborted
the io. Several residual issues occurred at that point.
Finish the defer_rcv support: recognize the return status when the io
request is being handled in a deferred style. This stops the rogue
aborts; Replenish the async cmd rcv buffer in the deferred receive if
needed.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
NVME targets appear to randomly disconnect from the initiator when
running heavy IO.
The error is due to the host aggregate (across all controllers) io load
was beyond the maximum exchange count for nvme on the adapter. The
driver was properly returning a resource busy status, but the io load
was so great heartbeat commands would be bounced and not have a
successful retry within the fuzz amount for the nvme heartbeat (yes, a
very high io load!). Thus the target was terminating the controller due
to a keep alive failure.
Resolve by reserving a few exchanges (by counters) which can be used
when the adapter is out of normal exchanges and the command is a NVME
heartbeat command. As counters are used, while the reserved command is
outstanding, as soon as any other exchange completes, the counters are
adjusted and the reserved count is replenished. The heartbeat completes
execution in a normal fashion.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For v3 hw SAS, it supports configuring power state from D0 to D3 for entering
Low Power status and power state from D3 to D0 for quit Low Power status.
When power state from D0 to D3, HW will send FLR to clear the registers of
ECAM and BAR space, and when power state from D3 to D0, it will clear the
registers of ECAM space only.
So when suspend, need to do like controller reset (including disable
interrupts/DQ/PHY/BUS), and also release slots after FLR. When resume,
re-config the registers of BAR space.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In function sas_suspend_devices(), it requires callback lldd_port_deformed
callback to be implemented if lldd_port_deformed is implemented.
So add a stub for lldd_port_deformed.
Callback lldd_port_deformed was not required as the port deformation is done
elsewhere in the LLDD.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch fix SAS_QUEUE_FULL problem. The test situation is close port while
running IO.
In sas_eh_handle_sas_errors(), SCSI EH will free sas_task of the device if
lldd_I_T_nexus_reset() return TMF_RESP_FUNC_COMPLETE or -ENODEV. But in our
SAS driver, we only free slots of the device when the return value is
TMF_RESP_FUNC_COMPLETE. So if the return value is -ENODEV, the slot resource
will not free any more.
As an solution, we should also free slots of the device in
lldd_I_T_nexus_reset() if the return value is -ENODEV.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Two simple fixes: one for sparse warnings that were introduced by the
merge window conversion to blist_flags_t and the other to fix dropped
I/O during reset in aacraid.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJaOvWtAAoJEAVr7HOZEZN4pNYQAISqS0tbDt7bgT+e6I/PGy7D
0vxv13ELf+MV9AV2uJKDvpf73nNtzrMgEC9N2rCsQyKeietF4yW1N3tk8n5Me32g
vCD3YdwJFBwO3UaaNFkP+wpzh60RutMeBRUFAYeQu7LqBkEp4jOGx21N0fAb89wt
SUkwfib20XUs518Tuqsyzy0keNsH3sRNJUenoxXVnqNMqIobKpigxZORFMIJaloZ
2VyQhYqrL75iqLRHTUUpWorQC4Db/FTyl58oG7rG8JdRN0Mww3Hp8Jv2E8cn5e2z
Ze9J9Z/IUCxAV75muGR2GfXd9e5zgILOyLSwKcjxniElWWZbqTIYnEUlyElqBg5Z
4eWytQUmQTixeAqnNfnEYXpUiiJR3snKYCZpGhF/a7+Kzmid64GuOEhIQsroPy60
unO9LG50/WDsqWMFlSaJPoePnzOEDj4LrnZiedkroYQrAQq4I6QNAPcUE6ruYvka
czzbkqhuHs/jHe0rbiYtG6YjlU6FdV4XqCdx10ijX2oUVFxZeIkUHu1uCwqhqg24
p6UE2bEzCwpKMEOwVeNlRsC6BQKpxugJNGJPHS6WeFiVeFl/tHNpYh7L7jnzVAQH
C1L6RIGCK6jrzG49mn9mySNf6WmSfG7L3hqaHY5ngkz5sfdhR+6kjvnv8xFkyTK7
BJIyJBJBnDsaw/mDNRVt
=IuyM
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two simple fixes: one for sparse warnings that were introduced by the
merge window conversion to blist_flags_t and the other to fix dropped
I/O during reset in aacraid"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: aacraid: Fix I/O drop during reset
scsi: core: Use blist_flags_t consistently
As it turned out device_get() doesn't use kref_get_unless_zero(), so we
will be always getting a device pointer. Consequently, we need to check
for the device state in __scsi_remove_target() to avoid tripping over
deleted objects.
Fixes: fbce4d97fd ("scsi: fixup kernel warning during rmmod()")
Reported-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit caa4b02476e3(blk-map: call blk_queue_bounce from blk_rq_append_bio)
moves blk_queue_bounce() into blk_rq_append_bio(), but don't consider
the fact that the bounced bio becomes invisible to caller since the
parameter type is 'struct bio *'. Make it a pointer to a pointer to
a bio, so the caller sees the right bio also after a bounce.
Fixes: caa4b02476 ("blk-map: call blk_queue_bounce from blk_rq_append_bio")
Cc: Christoph Hellwig <hch@lst.de>
Reported-by: Michele Ballabio <barra_cuda@katamail.com>
(handling failure of blk_rq_append_bio(), only call bio_get() after
blk_rq_append_bio() returns OK)
Tested-by: Michele Ballabio <barra_cuda@katamail.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The most important one is the bfa fix because it's easy to oops the
kernel with this driver (this includes the commit that corrects the
compiler warning in the original), a regression in the new timespec
conversion in aacraid and a regression in the Fibre Channel ELS
handling patch. The other three are a theoretical problem with
termination in the vendor/host matching code and a use after free in
lpfc.
The additional patches are a fix for an I/O hang in the mq code under
certain circumstances and a rare oops in some debugging code.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJaM/8tAAoJEAVr7HOZEZN4tqIP/ijN1H+K6LQ2lID8ocfBXfUC
wWFplIjuIOsFzo17o6U3TetRClU2JMLkd7aUnvYiyIadzQxGSGbWTBxW13vobZWg
uJd3oMjyRzP0DGgY5F0JWT3/DGKKthnNnsam7DDPUfY20h959aPhq0jayo274Dps
DnZb6KtJhdKS3l/Bu7FEA8cOmh4pJyPfKf4lft25dFDUpJIt1f/iIA8SUbnq9hpA
VwiZherXoDikOx9eEwAurvQLQ98emBaI085QusxV7d3aii4nKTnKelillSeaY7rd
mhRAGPiz/8d6HlMxBLu0XVd+I7lj/9hmhJbQsy7ytW1I/oLhAt9FoHvDLzWxMHZj
Zhraj3WAXQNIMWBf2n4CfvLKWsl3O+rCUESE3a7UHOlT2sMz5roYBPcpJ3yIfaPs
YyDc6gwTORm9YHArKMccQN+aWYez3ysx33Su+mdYKTMK9HlqSMtoSLAxcobeUaqr
nQdV4LQ6qeK9ILJSFv9BcKW/tA6s7CHFzflD/9PoxmI8jdiUV4DebMeh7Kkcw5m3
yeXOeUnYPebisK73z5DtgKZ8GJT2rIftGaitIilGXq8Q0GG5mkOOU+ng3skXKO+R
DHHMOHURnzyg27cBcanb5MYTkvkNb1i/f84tBrdQ5AoZycmmzU44nDCf+4peHE8g
k5THgzBVQXeXJ3Vq+cJV
=9sav
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"The most important one is the bfa fix because it's easy to oops the
kernel with this driver (this includes the commit that corrects the
compiler warning in the original), a regression in the new timespec
conversion in aacraid and a regression in the Fibre Channel ELS
handling patch.
The other three are a theoretical problem with termination in the
vendor/host matching code and a use after free in lpfc.
The additional patches are a fix for an I/O hang in the mq code under
certain circumstances and a rare oops in some debugging code"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: core: Fix a scsi_show_rq() NULL pointer dereference
scsi: MAINTAINERS: change FCoE list to linux-scsi
scsi: libsas: fix length error in sas_smp_handler()
scsi: bfa: fix type conversion warning
scsi: core: run queue if SCSI device queue isn't ready and queue is idle
scsi: scsi_devinfo: cleanly zero-pad devinfo strings
scsi: scsi_devinfo: handle non-terminated strings
scsi: bfa: fix access to bfad_im_port_s
scsi: aacraid: address UBSAN warning regression
scsi: libfc: fix ELS request handling
scsi: lpfc: Use after free in lpfc_rq_buf_free()
"FIB_CONTEXT_FLAG_TIMEDOUT" flag is set in aac_eh_abort to indicate
command timeout. Using the same flag in reset handler causes the command
to time out and the I/Os were dropped.
Define a new flag "FIB_CONTEXT_FLAG_EH_RESET" to make sure I/O is
properly handled in eh_reset handler.
[mkp: tweaked commit message]
Signed-off-by: Prasad B Munirathnam <prasad.munirathnam@microsemi.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use the type blist_flags_t for all variables that represent blacklist
flags. Additionally, suppress recently introduced sparse warnings
related to blacklist flags.
[mkp: fixed commit id]
Fixes: 5ebde4694e ("scsi: Use 'blist_flags_t' for scsi_devinfo flags")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We should do internal abort dev before TMF_ABORT_TASK_SET and TMF_LU_RESET.
Because we may only have done internal abort for single IO in the earlier part
of SCSI EH process. Even the internal abort to the single IO, we also don't
know whether it is successful.
Besides, we should release slots of the device in hisi_sas_abort_task_set() if
the abort is successful.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Normally, hardware should ensure that internal abort timeout will never
happen. If happen, it would be an SoC failure. What's more, HW will not
process any other commands if an internal abort hasn't return CQ, and they
will time out also.
So, we should judge the result of internal abort in SCSI EH, if it is failed,
we should give up to do TMF/softreset and return failure to the upper layer
directly.
This patch do following things to achieve this:
1. When internal abort timeout happened, we set return value to -EIO in
hisi_sas_internal_task_abort().
2. If prep_abort() is not support, let hisi_sas_internal_task_abort() return
TMF_RESP_FUNC_FAILED.
3. If hisi_sas_internal_task_abort() return an negative number, it can be
thought that it not executed properly or internal abort timeout. Then we
won't do behind TMF or softreset, and return failure directly.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We should do link reset of PHY when identify timeout or STP link timeout. They
are internal events of SOC and are notified to driver through interrupts of
CHL_INT2.
Besides, we should add an delay work to do link reset as it needs sleep. So,
this patch add an new PHY event HISI_PHYE_LINK_RESET for this.
Notes: v2 HW doesn't report the event of STP link timeout. So, we only need
to handle event of identify timeout for v2 HW.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use an general way to do delay work for a PHY. Then it will be easier to add
new delayed work for a PHY in future.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add port AXI errors handling for v2 hw. We do host controller reset for such
errors.
Besides, change port muli-bits ECC error handling, and we should also do host
reset for such error. So, this patch put them in the same struct with port AXI
error.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Change code format of int_chnl_int_v2_hw() to be consistent with v3 hw to
reduce an tag indent.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add some print at some places such as error info and cq of exception IO,
device found etc, and also adjust some log levels.
All this to assist debugging ability.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We use PCIe AER to support RAS feature for v3 hw. This driver should do
following two things to support this:
1. Enable RAS interrupts, so that errors can be reported to RAS module.
2. Realize err_handler for sas_v3_pci_driver. Then if non-fatal error is
detected, print error source and try to recover SAS controller.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For v3 hw, each NCQ will return a CQ, so it is no need to acquire IPTT from
ITCT, just acquire it from IPTT field of CQ.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Sometimes it is required to know when the controller reset has completed and
also if it has completed successfully. For such places, we call
hisi_sas_controller_reset() directly before. That may lead to multiple calls
to this function.
This patch create a per-reset structure which contains a completion structure
and status flag to know when the reset completes and also the status. It is
also in hisi_hba.wq to do reset work.
As all host reset works are done in hisi_hba.wq, we don't worry multiple calls
to hisi_sas_controller_reset().
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Do a couple of changes for when HISI_SAS_RESET_BIT is set for HBA:
- Clearing ITCT is not necessary
- Remove internal abort as it will fail during reset
Flag sas_dev->dev_type is kept as SAS_PHY_UNUSED.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch do following optimizations to host controller reset:
1. Unblock scsi requests before rescanning topology, as SCSI command need be
used if new device is found during rescanning topology.
2. Remove drain_workqueue(hisi_hba->wq) and drain_workqueue(shost->work_q), as
there is no need to ensure that all PHYs event are done before exiting host
reset.
3. Improve message print level of host reset. Host reset is an important and
very few occurrence event. We should know its progress even when not
debugging.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently refreshing the PHY port id after reset is done in the rescan
topology function, which is quite late in the reset process. It could be moved
earlier in the process, as the port id can be refreshed once the PHYs become
ready.
In addition to this, we should set the hisi_sas_dev port id to 0xff (invalid
port id) if all PHYs of this port remain down for the same device.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In certain scenarios we may just want to clear the ITCT for a device, and not
free other resources like the SATA bitmap using in v2 hw.
To facilitate this, this patch relocates the code of clearing ITCT from
free_device() to a new hw interface clear_itct(). Then for some hw, we should
not realise free_device() if there's nothing left to do for it.
[mkp: typo]
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For function dma_unmap_sg(), the <nents> parameter should be number of
elements in the scatterlist prior to the mapping, not after the mapping.
Fix this usage.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
It is required to initialize the dq spinlock before use, which was not being
done, so fix it. This issue can be detected when CONFIG_DEBUG_SPINLOCK is
enabled.
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Avoid that scsi_show_rq() triggers a NULL pointer dereference if called
after sd_uninit_command(). Swap the NULL pointer assignment and the
mempool_free() call in sd_uninit_command() to make it less likely that
scsi_show_rq() triggers a use-after-free. Note: even with these changes
scsi_show_rq() can trigger a use-after-free but that's a lesser evil
than e.g. suppressing debug information for T10 PI Type 2 commands
completely. This patch fixes the following oops:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: scsi_format_opcode_name+0x1a/0x1c0
CPU: 1 PID: 1881 Comm: cat Not tainted 4.14.0-rc2.blk_mq_io_hang+ #516
Call Trace:
__scsi_format_command+0x27/0xc0
scsi_show_rq+0x5c/0xc0
__blk_mq_debugfs_rq_show+0x116/0x130
blk_mq_debugfs_rq_show+0xe/0x10
seq_read+0xfe/0x3b0
full_proxy_read+0x54/0x90
__vfs_read+0x37/0x160
vfs_read+0x96/0x130
SyS_read+0x55/0xc0
entry_SYSCALL_64_fastpath+0x1a/0xa5
[mkp: added Type 2]
Fixes: 0eebd005dd ("scsi: Implement blk_mq_ops.show_rq()")
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
These duplicate includes have been found with scripts/checkincludes.pl
but they have been removed manually to avoid removing false positives.
Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Avoid that building with gcc 7 and W=1 triggers warnings similar to the
following:
drivers/scsi/qla2xxx/qla_isr.c:1189:27: warning: this statement may fall through [-Wimplicit-fallthrough=]
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
Cc: Quinn Tran <quinn.tran@cavium.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The return value of smp_execute_task_sg() is the untransferred residual,
but bsg_job_done() requires the length of payload received. This makes
SMP passthrough commands from userland by sg ioctl to libsas get a wrong
response. The userland tools such as smp_utils failed because of these
wrong responses:
~#smp_discover /dev/bsg/expander-2\:13
response too short, len=0
~#smp_discover /dev/bsg/expander-2\:134
response too short, len=0
Fix this by passing the actual received length to bsg_job_done(). And if
smp_execute_task_sg() returns 0, this means received length is exactly
the buffer length.
[mkp: typo]
Fixes: 651a013649 ("scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough")
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reported-by: chenqilin <chenqilin2@huawei.com>
Tested-by: chenqilin <chenqilin2@huawei.com>
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
fnic_fcpio_icmnd_cmpl_handler() displays the value of sc with:
FNIC_SCSI_DBG(KERN_INFO...
"... sc = 0x%p"
"scsi_status ..."
...
As the literal strings get merged, the function uses %ps instead of the
intended raw %p format. Fix this by inserting a space.
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Clean up some comment typos and fix some errors in documentation.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The check for secs being less than zero is redundant for two reasons.
Firstly, secs is unsigned so the check is always going to be false.
Secondly, if secs was signed the proceeding calculation of secs is never
going to be negative. Hence we can remove this redundant check and day
and secs re-adjustment.
Detected by static analysis with smatch:
arcmsr_set_iop_datetime() warn: unsigned 'secs' is never less than zero.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Ching Huang <ching2048@areca.com.tw>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The bfa driver has a number of real issues with string termination
that gcc-8 now points out:
drivers/scsi/bfa/bfad_bsg.c: In function 'bfad_iocmd_port_get_attr':
drivers/scsi/bfa/bfad_bsg.c:320:9: error: argument to 'sizeof' in 'strncpy' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c: In function 'bfa_fcs_fabric_psymb_init':
drivers/scsi/bfa/bfa_fcs.c:775:9: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:781:9: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:788:9: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:801:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:808:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c: In function 'bfa_fcs_fabric_nsymb_init':
drivers/scsi/bfa/bfa_fcs.c:837:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:844:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c:852:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs.c: In function 'bfa_fcs_fabric_psymb_init':
drivers/scsi/bfa/bfa_fcs.c:778:2: error: 'strncat' output may be truncated copying 10 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs.c:784:2: error: 'strncat' output may be truncated copying 30 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs.c:803:3: error: 'strncat' output may be truncated copying 44 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs.c:811:3: error: 'strncat' output may be truncated copying 16 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs.c: In function 'bfa_fcs_fabric_nsymb_init':
drivers/scsi/bfa/bfa_fcs.c:840:2: error: 'strncat' output may be truncated copying 10 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs.c:847:2: error: 'strncat' output may be truncated copying 30 bytes from a string of length 63 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_fdmi_get_hbaattr':
drivers/scsi/bfa/bfa_fcs_lport.c:2657:10: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs_lport.c:2659:11: error: argument to 'sizeof' in 'strncat' call is the same expression as the source; did you mean to use the size of the destination? [-Werror=sizeof-pointer-memaccess]
drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_lport_ms_gmal_response':
drivers/scsi/bfa/bfa_fcs_lport.c:3232:5: error: 'strncpy' output may be truncated copying 16 bytes from a string of length 247 [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_lport_ns_send_rspn_id':
drivers/scsi/bfa/bfa_fcs_lport.c:4670:3: error: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c:4682:3: error: 'strncat' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_lport_ns_util_send_rspn_id':
drivers/scsi/bfa/bfa_fcs_lport.c:5206:3: error: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c:5215:3: error: 'strncat' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcs_lport.c: In function 'bfa_fcs_fdmi_get_portattr':
drivers/scsi/bfa/bfa_fcs_lport.c:2751:2: error: 'strncpy' specified bound 128 equals destination size [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcbuild.c: In function 'fc_rspnid_build':
drivers/scsi/bfa/bfa_fcbuild.c:1254:2: error: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]
drivers/scsi/bfa/bfa_fcbuild.c:1253:25: note: length computed here
drivers/scsi/bfa/bfa_fcbuild.c: In function 'fc_rsnn_nn_build':
drivers/scsi/bfa/bfa_fcbuild.c:1275:2: error: 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length [-Werror=stringop-truncation]
In most cases, this can be addressed by correctly calling strlcpy and
strlcat instead of strncpy/strncat, with the size of the destination
buffer as the last argument.
For consistency, I'm changing the other callers of strncpy() in this
driver the same way.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Sudarsana Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Although it is important to be able to trigger the code in the SCSI core
for SCSI_MLQUEUE_HOST_BUSY handling, currently it is nontrivial to
trigger that code. Hence this patch that adds a new error injection
option to the scsi_debug driver for making the .queue_rq()
implementation of this driver return SCSI_MLQUEUE_HOST_BUSY.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The Dell PV650F is a re-branded CLARiiON FC5700. And DGC/RAID,DISK
identifies all CLARiiON family.
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Cc: SCSI ML <linux-scsi@vger.kernel.org>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add IBM 3542 and 3552, arrays: FAStT200 and FAStT500.
Add full STK OPENstorage family, arrays: 9176, D173, D178, D210, D220,
D240 and D280.
Add STK BladeCtlr family, arrays: B210, B220, B240 and B280.
These changes were done in multipath-tools time ago.
Cc: NetApp RDAC team <ng-eseries-upstream-maintainers@netapp.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christophe Varoqui <christophe.varoqui@opensvc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Cc: SCSI ML <linux-scsi@vger.kernel.org>
Cc: device-mapper development <dm-devel@redhat.com>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 56f3d383f3 ("scsi: scsi_devinfo: Add TRY_VPD_PAGES to HITACHI
OPEN-V blacklist entry") modified some Hitachi entries:
HITACHI is always supporting VPD pages, even though it's claiming to
support SCSI Revision 3 only.
The same should have been done also for HP-rebranded.
[mkp: checkpatch and tweaked commit message]
Cc: Hannes Reinecke <hare@suse.de>
Cc: Takahiro Yasui <takahiro.yasui@hds.com>
Cc: Matthias Rudolph <Matthias.Rudolph@hitachivantara.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Cc: SCSI ML <linux-scsi@vger.kernel.org>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 627511e3e6 ("[SCSI] scsi_devinfo: update Hitachi entries (v2)")
modified some Hitachi entries:
Four models, OPEN-/DF400/DF500/DISK-SUBSYSTEM, can handle
REPORT_LUN, and the BLIST_REPORTLUN2 flag needs to be set. And DF600
doesn't require any flags because it returns ANSI 03h (SPC).
The same should have been done also for HP counterparts.
[mkp: checkpatch and tweaked commit message]
Cc: Takahiro Yasui <takahiro.yasui@hds.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Matthias Rudolph <Matthias.Rudolph@hds.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
Cc: SCSI ML <linux-scsi@vger.kernel.org>
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
find_first_zero_bit()'s parameter 'size' is defined in bits, not in
bytes.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove this function since it has an empty body.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 651a013649 ("scsi: scsi_transport_sas: switch to bsg-lib for
SMP passthrough") removed the only call to scsi_initialize_rq() from
outside the SCSI core. Hence unexport scsi_initialize_rq().
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When driver is loaded in Target/Dual mode, it creates QPair to support
MQ and allocates resources for each QPair. This Qpair initialization is
delayed until the FW personality is changed to Dual/Target mode by
issuing chip reset. At the time of chip reset firmware is re-initilized
in correct personality all the QPairs are initialized by sending
MBC_INITIALIZE_MULTIQ (001Fh).
This patch fixes memory leak by adding check to issue
MBC_INITIALIZE_MULTIQ command only while deleting rsp/req queue when the
flag is set for initiator mode, and clean up QPair resources correctly
during the driver unload. This MBX does not need to be issued for
Target/Dual mode because chip reset will reset ISP.
Fixes: d65237c7f0 ("scsi: qla2xxx: Fix mailbox failure while deleting Queue pairs")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This fix the spinlock recursion issue seen while unloading the driver.
14 [ffff9f2e21e03db8] native_queued_spin_lock_slowpath at ffffffffad0d8802
15 [ffff9f2e21e03dc0] do_raw_spin_lock at ffffffffad0d99e4
16 [ffff9f2e21e03dd8] _raw_spin_lock_irqsave at ffffffffad652471
17 [ffff9f2e21e03e00] qla2x00_els_dcmd_iocb_timeout at ffffffffc070cd63
18 [ffff9f2e21e03e40] qla2x00_sp_timeout at ffffffffc06f06d3 [qla2xxx]
19 [ffff9f2e21e03e68] call_timer_fn at ffffffffad0f97d8
20 [ffff9f2e21e03ed8] run_timer_softirq at ffffffffad0faf47
21 [ffff9f2e21e03f68] __softirqentry_text_start at ffffffffad655f32
Fixes: 6eb54715b5 ("qla2xxx: Added interface to send explicit LOGO.")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Giridhar Malavali <giridhar.malavali@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Clear loop id after delete to prevent session invalidation of stale
session.
Fixes: 726b854870 ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add correct value of scan_state field indicating state of the FC port
Fixes: 726b854870 ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Current code manually allocate an fcport structure that is not properly
initialize. Replace kzalloc with qla2x00_alloc_fcport, so that all
fields are initialized. Also set set scan flag to port found
Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Original code acquires hardware_lock to add Abort IOCB onto driver
request queue for processing. However, abort_command() will also acquire
hardware lock to look up sp pointer before issuing abort IOCB command
resulting into a deadlock. This patch safely removes the possible
deadlock scenario by removing extra spinlock.
Fixes: 6eb54715b5 ("qla2xxx: Added interface to send explicit LOGO.")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Get Port Database MBX cmd is to validate current Login state upon PRLI
completion. Current code looks at the last login state for re-validation
which was incorrect. This patch removed incorrect state check.
Fixes: 15f30a5752 ("qla2xxx: Use IOCB interface to submit non-critical MBX.")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Current driver design schedules relogin process via DPC thread every 1
second. In a large fabric, this DPC thread tries to schedule too many
jobs and might get overloaded. As a result of this processing of DPC
thread, it can schedule relogin earlier than 1 second.
Fixes: 726b854870 ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If user swaps one target port for another target port for same switch
port, the new target port is not being recognized by the driver. Current
code assumes that old Target port has recovered from link down. The fix
will ask switch what is the WWPN of a specific NportID (GPNID) rather
than assuming it's the same Target port which has came back.
Fixes: 726b854870 ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add check to make sure we are cleaning up global target host list only
for NPIV hosts
Fixes: bdbe24de28 ("scsi: qla2xxx: Cleanup NPIV host in target mode during config teardown")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Sawan Chandak <sawan.chandak@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch returns discovery state machine back to Login Complete.
Fixes: 726b854870 ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
GPNID is triggered by RSCN. For multiple RSCNs of the same affected
NPORT ID, serialize the GPNID to prevent confusion.
Fixes: 726b854870 ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When NPort Handle is in use, driver needs to mark the handle as used and
pick another. Instead, the code clears the handle and re-pick the same
handle.
Fixes: 726b854870 ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix co-existence between Block MQ and Target Mode. Block MQ and
initiator mode requires midlayer queue mapping to check for IRQ to be
affinitized. For target mode, it's not the case.
Fixes: 09620eeb62 ("scsi: qla2xxx: Add debug knob for user control workload")
Cc: <stable@vger.kernel.org> # 4.12+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Move session delete from system work queue to driver's work queue for in
time processing.
Fixes: 726b854870 ("qla2xxx: Add framework for async fabric discovery")
Cc: <stable@vger.kernel.org> # 4.10+
Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While testing "sd: Micro-optimize READ / WRITE CDB encoding" patches it
was helpful to check various code paths associated with READ/WRITE 6, 10
and 16 byte cdb variants. There seems to be no user space "knobs" to
twiddle use_10_for_rw and friends in the scsi_device structure. So add
a parameter to scsi_debug called "cdb_len" for this purpose.
[mkp: fixed typo]
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since symbolic names for the device information keys alread exist,
associate an enumeration type with these symbolic values. This change
makes it clear what the valid values for the 'key' arguments are.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since scsi_get_device_flags_keyed() callers do not check whether or not
the returned value is an error code, change that function such that it
returns a flags value even if the 'key' argument is invalid. Note:
since commit 28a0bc4120 ("scsi: sd: Implement blacklist option for
WRITE SAME w/ UNMAP") bit 31 is a valid device information flag so
checking whether bit 31 is set in the return value is not sufficient to
tell the difference between an error code and a flags value.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If scsi_eh_scmd_add() is called concurrently with
scsi_host_queue_ready() while shost->host_blocked > 0 then it can
happen that neither function wakes up the SCSI error handler. Fix
this by making every function that decreases the host_busy counter
wake up the error handler if necessary and by protecting the
host_failed checks with the SCSI host lock.
Reported-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
References: https://marc.info/?l=linux-kernel&m=150461610630736
Fixes: commit 7466501608 ("scsi: convert host_busy to atomic_t")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: Konstantin Khorenko <khorenko@virtuozzo.com>
Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A regression fix introduced a harmless type mismatch warning:
drivers/scsi/bfa/bfad_bsg.c: In function 'bfad_im_bsg_vendor_request':
drivers/scsi/bfa/bfad_bsg.c:3137:35: error: initialization of 'struct bfad_im_port_s *' from 'long unsigned int' makes pointer from integer without a cast [-Werror=int-conversion]
struct bfad_im_port_s *im_port = shost->hostdata[0];
^~~~~
drivers/scsi/bfa/bfad_bsg.c: In function 'bfad_im_bsg_els_ct_request':
drivers/scsi/bfa/bfad_bsg.c:3353:35: error: initialization of 'struct bfad_im_port_s *' from 'long unsigned int' makes pointer from integer without a cast [-Werror=int-conversion]
struct bfad_im_port_s *im_port = shost->hostdata[0];
This changes the code back to shost_priv() once more, but encapsulates
it in an inline function to document the rather unusual way of
using the private data only as a pointer to the previously allocated
structure.
I did not try to get rid of the extra indirection level entirely,
which would have been rather invasive and required reworking the entire
initialization sequence.
Fixes: 45349821ab ("scsi: bfa: fix access to bfad_im_port_s")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Before commit 0df21c86bd ("scsi: implement .get_budget and .put_budget
for blk-mq"), we run queue after 3ms if queue is idle and SCSI device
queue isn't ready, which is done in handling BLK_STS_RESOURCE. After
commit 0df21c86bd is introduced, queue won't be run any more under
this situation.
IO hang is observed when timeout happened, and this patch fixes the IO
hang issue by running queue after delay in scsi_dev_queue_ready, just
like non-mq. This issue can be triggered by the following script[1].
There is another issue which can be covered by running idle queue: when
.get_budget() is called on request coming from hctx->dispatch_list, if
one request just completes during .get_budget(), we can't depend on
SCSI's restart to make progress any more. This patch fixes the race too.
With this patch, we basically recover to previous behaviour (before
commit 0df21c86bd) of handling idle queue when running out of
resource.
[1] script for test/verify SCSI timeout
rmmod scsi_debug
modprobe scsi_debug max_queue=1
DEVICE=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename`
DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*`
echo "using scsi device $DEVICE"
echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth
echo "temporary write through" >$DISK_DIR/cache_type
echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts
echo none > /sys/block/$DEVICE/queue/scheduler
dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 &
sleep 5
echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts
wait
echo "SUCCESS"
Fixes: 0df21c86bd ("scsi: implement .get_budget and .put_budget for blk-mq")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix report command result error when CHECK_CONDITION.
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update driver version to v1.40.00.04-20171130
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add module parameter msix_enable so user has the option of disabling
MSI-X interrupts if there is a platform problem.
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add module parameter msi_enable so user has the option of disabling MSI
interrupts if there is a platform problem.
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Spin off duplicate code of timer init for message isr BH in arcmsr_probe
and arcmsr_resume as a function arcmsr_init_get_devmap_timer.
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add a function arcmsr_set_iop_datetime and driver option set_date_time
to set date and time to firmware.
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add ACB_F_MSG_GET_CONFIG to acb->acb_flags for for message interrupt
checking before schedule work for get device map.
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add driver option cmd_per_lun to set host->cmd_per_lun value by user.
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Replace constant ARCMSR_MAX_OUTSTANDING_CMD by variable
acb->maxOutstanding that was determined by user.
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add driver option host_can_queue to set host->can_queue value by
user. It's value expands up to 1024.
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Replace constant ARCMSR_MAX_FREECCB_NUM by variable acb->maxFreeCCB that
was received from firmware.
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update ARCMSR_MAX_OUTSTANDING_CMD and ARCMSR_MAX_FREECCB_NUM to 1024.
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add code for ACB_ADAPTER_TYPE_E to support new adapter ARC-1884.
Signed-off-by: Ching Huang <ching2048@areca.com.tw>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We have a bunch of fixes for aacraid, a set of coherency fixes that
only affect non-coherent platforms and one coccinelle detected null
check after use.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJaJs8oAAoJEAVr7HOZEZN4MhkQAJZ/KfYI4CTrX45NAV3AlOT1
CW39vYT3ODjsw97uJkoJjbXKIqHemObP2JlcIjhdffd/Vrk1Yn88KclUktSjwhBV
X5wGprktrmCKKcL1iobSv5o2r/2TZpeMHBIgdC1LogCw7L5eCBibnx8GiU+OkXJM
2aGw8GPS+hySilTde20aL6OumQfLFuZzk8TeZ5bAAjyIgIJqw/1pyn+2Hy5EnyW7
n5RpC8qG+cNLXHHYaITHX666lSRM+DMdRGVNLxK9dzdQkaFpu7w598/aeM0zRJuS
IqAVlLowY+pt3C14ax9jxvGiZ96kuYClWBeWuw4oHGfLxNqNU2xd38xzJTkUoRX7
0F16+froi85DV8UWbDswakOfs0vAoW1kLES3nnwrZ6inQ9yHANEBrXY4jZ3HwcEy
ax81fYMrpd8kD9lI0mGiX5qoanfv08jTn5UfFYNddYFcCrKRymDCVZDw6p/9JFxV
Tkry526TxELziqsfKNHt2yFKKDJ8CjtQqFSUyeo/pBCo7X87aV5B3oFgtb4lxseb
yT7o+mo452jNuL8veMPe6vz21uTwbfQfof1wk4wV8bRydGwu7ofOdeILLgtQIieM
yb+8f/XGpg1Q+Y3pTfO46a/d76KxhEHVsgqQLwMQB7p+C9PZh5Fc3VdSyjmUFyEN
Dc6i3IbfiZVgKm9Tsouf
=boPv
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A bunch of fixes for aacraid, a set of coherency fixes that only
affect non-coherent platforms and one coccinelle detected null check
after use"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: libsas: align sata_device's rps_resp on a cacheline
scsi: use dma_get_cache_alignment() as minimum DMA alignment
scsi: dma-mapping: always provide dma_get_cache_alignment
scsi: ufs: ufshcd: fix potential NULL pointer dereference in ufshcd_config_vreg
scsi: aacraid: Prevent crash in case of free interrupt during scsi EH path
scsi: aacraid: Perform initialization reset only once
scsi: aacraid: Check for PCI state of device in a generic way
Where applicable, changes pr_debug, pr_info, pr_err, etc. calls to the
dev_* versions. This adds the DRC index of the device to the
corresponding trace statement.
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Brad Warrum <bwarrum@us.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove one extraneous level of indentation on an assignment statement.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Trivial fix to spelling mistake in error message text.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Trivial fix to spelling mistake in error message text.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
KERN_CONT is now required for continued printks(). Add it.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
These would be used in the future in some specific drivers.
Signed-off-by: Kiwoong Kim <kwmad.kim@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
No Functional change just cleanup. Removed variable requeue_event and
made function as void.
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.
Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove the duplicate copies of this simple function and use an
open-coded version.
drivers/scsi/fnic/fnic_debugfs.c:122:11-31: WARNING opportunity for simple_open, see also structure on line 223
Generated by: coccinelle/api/simple_open.cocci
Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We are testing if there is a match with the ses device in a loop by
calling ses_match_to_enclosure(), which will issue scsi receive
diagnostics commands to the ses device for every device on the same
host. On one of our boxes with 840 disks, it takes a long time to load
the driver:
[root@g1b-oss06 ~]# time modprobe ses
real 40m48.247s
user 0m0.001s
sys 0m0.196s
With the patch:
[root@g1b-oss06 ~]# time modprobe ses
real 0m17.915s
user 0m0.008s
sys 0m0.053s
Note that we still need to refresh page 10 when we see a new disk to
create the link.
Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
Tested-by: Jason Ozolins <jason.ozolins@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cleanly fill memory for "vendor" and "model" with 0-bytes for the
"compatible" case rather than adding only a single 0 byte. This
simplifies the devinfo code a a bit, and avoids mistakes in other places
of the code (not in current upstream, but we had one such mistake in the
SUSE kernel).
[mkp: applied by hand and added braces]
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don't populate the read-only array card_types on the stack but instead
make it static and constify it. Makes the object code smaller by over
110 bytes:
Before:
text data bss dec hex filename
25625 5752 0 31377 7a91 drivers/scsi/wd719x.o
After:
text data bss dec hex filename
25447 5816 0 31263 7a1f drivers/scsi/wd719x.o
(gcc version 7.2.0 x86_64)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is no need to go through an intermediate timespec to convert to
ktime_t when we just want a simple multiplication. This gets rid of one
of the few users of jiffies_to_timespec, which I hope to remove as part
of the y2038 cleanup.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
device->scsi3addr[] is an array, not a pointer, so it can't be NULL.
I've removed the check.
[mkp: fixed typo]
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update the driver version to 11.4.0.5
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The logic for sg_seg_cnt is a bit convoluted. This patch tries to clean
up a couple of areas, especially around the +2 and +1 logic.
This patch:
- Cleans up the lpfc_sg_seg_cnt attribute to specify a real minimum
rather than making the minimum be whatever the default is.
- Removes the hardcoding of +2 (for the number of elements we use in a
sgl for cmd iu and rsp iu) and +1 (an additional entry to compensate
for nvme's reduction of io size based on a possible partial page)
logic in sg list initialization. In the case where the +1 logic is
referenced in host and target io checks, use the values set in the
transport template as that value was properly set.
There can certainly be more done in this area and it will be addressed
in combined host/target driver effort.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During driver unload, the driver may crash due to NULL pointers. The
NULL pointers were due to the driver not protecting itself sufficiently
during some of the teardown paths. Additionally, the driver was not
waiting for and cleanup up nvme io resources. As such, the driver wasn't
making the callbacks to the transport, stalling the transports
association teardown.
This patch waits for io clean up before tearding down and adds checks
for possible NULL pointers.
Cc: <stable@vger.kernel.org> # 4.12+
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the driver is unloading, the nvme transport could be in the process
of submitting new requests, will send abort requests to terminate
associations, or may make LS-related requests. The driver's abort and
request entry points currently is ignorant of the unloading state and is
starting the requests even though the infrastructure to complete them
continues to teardown.
Change the entry points for new requests to check whether unloading and
if so, reject the requests. Abort routines check unloading, and if so,
noop the request. An abort is noop'd as the teardown paths are already
aborting/terminating the io outstanding at the time the teardown
initiated.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver's interaction with the host nvme transport has been incorrect
for a while. The driver did not wait for the unregister callbacks
(waited only 5 jiffies). Thus the driver may remove objects that may be
referenced by subsequent abort commands from the transport, and the
actual unregister callback was effectively a noop. This was especially
problematic if the driver was unloaded.
The driver now waits for the unregister callbacks, as it should, before
continuing with teardown.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver currently registers any remote port that has NVME support.
It should only be registering target ports.
Register only target ports.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During RSCN storms, the driver does not rediscover some targets. The
driver marks some RSCN as to be handled after the ones it's working
on. The driver missed processing some deferred RSCN.
Move where the driver checks for deferred RSCNs and initiate deferred
RSCN handling if the flag was set. Also revise nport state within the
RSCN confirm routine. Add some state data to a possible debug print to
aid future debugging.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
pt2pt ndlp ref count prematurely goes to 0. There was reference removed
that should only be removed if connected to a switch, not if in
point-to-point mode.
Add a mode check before the reference remove.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current default for async hw receive queues is 1, which presents
issues under heavy load as number of queues influence the available
async receive buffer limits.
Raise the default to the either the current hw limit (16) or the number
of hw qs configured (io channel value).
Revise the attribute definition for mrq to better reflect what we do for
hw queues. E.g. 0 means default to optimal (# of cpus), non-zero
specifies a specific limit. Before this change, mrq=0 meant target mode
was disabled. As 0 now has a different meaning, rework the if tests to
use the better nvmet_support check.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Display for lpfc/fnX/iDiag/queInfo isn't formatted perfectly. Corrected
the format strings for the queue info debug messages.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver does not respond to PLOGI from the direct attach target. The
driver uses incorrect S_ID in CONFIG_LINK, after FLOGI completion
Correct by issuing CONFIG_LINK with the correct S_ID after receiving the
PLOGI from the target
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Raise the maximum NVME sg list size allowed to 256 elements.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Performing an LS abort results in the following message being seen:
0603 Invalid CQ subtype 6: 00000300 22000002 ffff0016 d0050000
and the associated exchange is not properly freed.
The code did not recognize the exchange type that was aborted, thus it
was not properly handled.
Correct by adding the NVME LS ELS type to the exchange types that are
recognized.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In test cases where an instance of the driver is detached and
reattached, the driver will crash on reattachment. There is a compound
if statement that will skip over the bar setup if the pci_resource_start
call is not successful. The driver erroneously returns success to its
bar setup in this scenario even though the bars aren't properly
configured.
Rework the offending code segment for proper initialization steps. If
the pci_resource_start call fails, -ENOMEM is now returned.
Sample stack:
rport-5:0-10: blocked FC remote port time out: removing rport
BUG: unable to handle kernel NULL pointer dereference at (null)
... lpfc_sli4_wait_bmbx_ready+0x32/0x70 [lpfc]
...
... RIP: 0010:... ... lpfc_sli4_wait_bmbx_ready+0x32/0x70 [lpfc]
Call Trace:
... lpfc_sli4_post_sync_mbox+0x106/0x4d0 [lpfc]
... ? __alloc_pages_nodemask+0x176/0x420
... ? __kmalloc+0x2e/0x230
... lpfc_sli_issue_mbox_s4+0x533/0x720 [lpfc]
... ? mempool_alloc+0x69/0x170
... ? dma_generic_alloc_coherent+0x8f/0x140
... lpfc_sli_issue_mbox+0xf/0x20 [lpfc]
... lpfc_sli4_driver_resource_setup+0xa6f/0x1130 [lpfc]
... ? lpfc_pci_probe_one+0x23e/0x16f0 [lpfc]
... lpfc_pci_probe_one+0x445/0x16f0 [lpfc]
... local_pci_probe+0x45/0xa0
... work_for_cpu_fn+0x14/0x20
... process_one_work+0x17a/0x440
Cc: <stable@vger.kernel.org> # 4.12+
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
XRI_ABORTED_CQE completions were not being handled in the fast path.
They were being queued and deferred to the lpfc worker thread for
processing. This is an artifact of the driver design prior to moving
queue processing out of the isr and into a workq element. Now that queue
processing is already in a deferred context, remove this artifact and
process them directly.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Hardware queues are a fast staging area to push commands into the
adapter. The adapter should drain them extremely quickly. However,
under heavy io load, the host cpu is pushing commands faster than the
drain rate of the adapter causing the driver to resource busy commands.
Enlarge the hardware queue (wq & cq) to support a larger number of queue
entries (4x the prior size) before backpressure. Enlarging the queue
requires larger contiguous buffers (16k) per logical page for the
hardware. This changed calling sequences that were expecting 4K page
sizes that now must pass a parameter with the page sizes. It also
required use of a new version of an adapter command that can vary the
page size values.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When the HBA is connected to a private loop, the driver reports FLOGI
loop-open failure as functional error. This is an expected condition.
Mark loop-open failure as a warning instead of error.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The pointer 'port' is being assigned but it is never read, hence it is
redundant and can be removed. Cleans up clang warning:
drivers/scsi/bfa/bfad_attr.c:505:2: warning: Value stored to 'port' is
never read.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Variable managed_request_id is being assigned but it is never read,
hence it is redundant and can be removed. Cleans up clang warning:
drivers/scsi/aacraid/linit.c:706:5: warning: Value stored to
'managed_request_id' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix kernel-doc function name and comments in st.c::read_ns_show():
change us to ns to match the function name.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is no need to #define the license of the driver, just put it in
the MODULE_LICENSE() line directly as a text string.
This allows tools that check that the module license matches the source
code license to work properly, as there is no need to unwind the
unneeded dereference, especially when the string is defined in a .h file
far away from the .c file it is used in.
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Varun Prakash <varun@chelsio.com>
Reported-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The TW_IOCTL_GET_LOCK ioctl uses do_gettimeofday() to check whether a
lock has expired. This can misbehave due to a concurrent settimeofday()
call, as it is based on 'real' time, and it will overflow in y2038 on
32-bit architectures, producing unexpected results when used across the
overflow time.
This changes it to using monotonic time, using ktime_get() to simplify
the code.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The calculation of the number of seconds since Sunday 00:00:00 overflows
in 2106, meaning that we instead will return the seconds since Wednesday
06:28:16 afterwards.
Using 64-bit time stamps avoids this slight inconsistency, and the
deprecated do_gettimeofday(), replacing it with the simpler
ktime_get_real_seconds().
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
twl_aen_queue_event/twa_aen_queue_event, we use do_gettimeofday() to
read the lower 32 bits of the current time in seconds, to pass them to
the TW_IOCTL_GET_NEXT_EVENT ioctl or the 3ware_aen_read sysfs file.
This will overflow on all architectures in year 2106, there is not much
we can do about that without breaking the ABI. User space has 90 years
to learn to deal with it, so it's probably ok.
I'm changing it to use ktime_get_real_seconds() with a comment to
document what happens when.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
bfa_aen_entry_s is passed through a netlink socket that can be read by
either 32-bit or 64-bit processes, but the data format is different
between the two on current implementations.
Originally, this was using a 'struct timeval', which also suffers from
getting redefined with a new libc implementation.
With this patch, the layout gets fixed to having two 64-bit members for
the time, making it the same on 32-bit kernels and 64-bit kernels
running either compat or native user space including x32.
Provided that the new header file gets used to recompile any 32-bit
application binaries, this will fix running those on a 64-bit kernel
(with or without this patch) e.g. in a container environment, and it
will make binaries work that will be built against a future 32-bit glibc
that uses a 64-bit time_t, and avoid the y2038 overflow there.
However, this also breaks compatibility with any existing 32-bit binary
running on a native 32-bit kernel, those must be recompiled against the
new header, which in turn makes them incompatible with older kernels
unless the same change gets applied there.
Obviously this patch should only be applied when the benefits outweigh
the possible breakage. I'm posting it under the assumption that there
are no open-source tools using the netlink interface, and that users of
the binaries provided by qlogic for SLES10/11 and RHEL5/6 are not
actually being used on new future systems with 32-bit x86 kernels.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Anil Gurumurthy <Anil.Gurumurthy@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
bfa_aen_entry_s is passed to user space in a netlink message, but is
defined using a 'struct timeval' and an 'enum' that are not only
different between architectures, but also between 32-bit user space and
64-bit kernels they may run on, as well as depending on the particular C
library that defines timeval.
This changes the in-kernel definition to no longer use the timeval type
directly but instead use two open-coded 'unsigned long' members. This
keeps the existing ABI, but making the variable unsigned also helps make
it work after y2038, until it overflows in 2106.
Since the macro becomes overly complex at this point, I'm changing it to
an inline function for readability.
I'm not changing the 32-bit user-space ABI at this point, to keep the
changes separate, I deally this would be defined using the same binary
layout for all architectures.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Anil Gurumurthy <Anil.Gurumurthy@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The bfa_get_log_time() returns a 64-bit timestamp that does not suffer
from the y2038 overflow on 64-bit systems. However, on 32-bit
architectures the timestamp will jump from 0x000000007fffffff to
0xffffffff80000000 in y2038 and produce wrong results.
The ktime_get_real_seconds() function does the same thing as
bfa_get_log_time() without that problem, so we can simply remove the
former use ktime_get_real_seconds() instead.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Anil Gurumurthy <Anil.Gurumurthy@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
io_profile_start_time() gets read using do_gettimeofday() and passed
down as a 32-bit value through multiple functions. This will overflow in
y2038 or y2106, depending on whether it gets interpreted as unsigned in
the end.
This changes do_gettimeofday() to ktime_get_real_seconds() and pushes
the point at which it overflows to where we actually assign it to the
bfa_fcpim_del_itn_stats_s structure, with an appropriate comment.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Anil Gurumurthy <Anil.Gurumurthy@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In bfa_ioc_send_enable, we use the deprecated do_gettimeofday() function
to read the current time. This is not a problem, since the firmware
interface is already limited to 32-bit timestamps, but it's better to
use ktime_get_seconds() and document what the limitation is.
I noticed that I did the same change in commit a5af839253 ("bna: avoid
writing uninitialized data into hw registers") for the ethernet
driver. That commit also changed the "disable" funtion to initialize the
data we pass to the firmware properly, so I'm doing the same thing here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Anil Gurumurthy <Anil.Gurumurthy@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We use the deprecated do_gettimeofday() function to read the current
time when resetting the statistics in both bfa_port and bfa_svc. This
works fine because overflow is handled correctly, but we want to get rid
of do_gettimeofday() and using a non-monotonic time suffers from
concurrent settimeofday calls and other problems.
This uses the ktime_get_seconds() function instead, which does what we
need here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Anil Gurumurthy <Anil.Gurumurthy@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
BFA_TRC_TS() calculates a 32-bit microsecond timestamp using the
deprecated do_gettimeofday() function. This overflows roughly every 71
minutes, so it's obviously not used as an absolute time stamp, but it
seems wrong to use a time base for it that will jump during
settimeofday() calls, leap seconds, or the y2038 overflow.
This converts it to ktime_get_ts64(), which has none of those problems
but is not synchronized to wall-clock time.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Anil Gurumurthy <Anil.Gurumurthy@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 114988
Addresses-Coverity-ID: 114989
Addresses-Coverity-ID: 114990
Addresses-Coverity-ID: 114991
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Make use of the swap macro and remove unnecessary variable tmp. This
makes the code easier to read and maintain.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Acked-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
As reported by Meelis Roos, my previous patch causes an incorrect
calculation of the timeout, through an undefined signed integer
overflow:
[ 12.228155] UBSAN: Undefined behaviour in drivers/scsi/aacraid/commsup.c:2514:49
[ 12.228229] signed integer overflow:
[ 12.228283] 964297611 * 250 cannot be represented in type 'long int'
The problem is that doing a multiplication with HZ first and then
dividing by USEC_PER_SEC worked correctly for 32-bit microseconds,
but not for 32-bit nanoseconds, which would require up to 41 bits.
This reworks the calculation to first convert the nanoseconds into
jiffies, which should give us the same result as before and not overflow.
Unfortunately I did not understand the exact intention of the algorithm,
in particular the part where we add half a second, so it's possible that
there is still a preexisting problem in this function. I added a comment
that this would be handled more nicely using usleep_range(), which
generally works better for waking up at a particular time than the
current schedule_timeout() based implementation. I did not feel
comfortable trying to implement that without being sure what the
intent is here though.
Fixes: 820f188659 ("scsi: aacraid: use timespec64 instead of timeval")
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The modification of fc_lport_recv_els_req() in commit fcabb09e59 ("scsi:
libfc: directly call ELS request handlers") caused certain requests not to be
handled at all. Fix that.
Fixes: fcabb09e59 ("scsi: libfc: directly call ELS request handlers")
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The error message dereferences "rqb_entry" so we need to print it first
and then free the buffer.
Fixes: 6c621a2229 ("scsi: lpfc: Separate NVMET RQ buffer posting from IO resources SGL/iocbq/context")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>