Commit Graph

665 Commits

Author SHA1 Message Date
Chandrakanth Patil f0b9e7bdc3 scsi: megaraid_sas: Set affinity for high IOPS reply queues
High iops queues are mapped to non-managed IRQs. Set affinity of
non-managed irqs to local numa node.  Low latency queues are mapped to
managed IRQs.

Driver reserves some reply queues for high IOPS queues (through
pci_alloc_irq_vectors_affinity and .pre_vectors interface). The rest of
queues are for low latency.

Based on IO workload, driver will decide which group of reply queues
(either high IOPS queues or low latency queues) to be used.
High IOPS queues will be mapped to local numa node of controller and
low latency queues will be mapped to CPUs across numa nodes. In general,
high IOPS and low latency queues should fit into 128 reply queues
which is the max number of reply queues supported by Aero adapters.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-27 00:07:36 -04:00
Chandrakanth Patil ea836f40f8 scsi: megaraid_sas: Enable coalescing for high IOPS queues
Driver should enable interrupt coalescing (during driver load and after
Controller Reset) for High IOPS queues by masking appropriate bits in IOC
INIT frame.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-27 00:07:36 -04:00
Chandrakanth Patil 132147d7f6 scsi: megaraid_sas: Add support for High IOPS queues
Aero controllers support balanced performance mode through the ability to
configure queues with different properties.

Reply queues with interrupt coalescing enabled are called "high iops reply
queues" and reply queues with interrupt coalescing disabled are called "low
latency reply queues".

The driver configures a combination of high iops and low latency reply
queues if:

 - HBA is an AERO controller;

 - MSI-X vectors supported by the HBA is 128;

 - Total CPU count in the system more than high iops queue count;

 - Driver is loaded with default max_msix_vectors module parameter; and

 - System booted in non-kdump mode.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-27 00:07:35 -04:00
Chandrakanth Patil 5813685616 scsi: megaraid_sas: Add support for MPI toolbox commands
Added driver support to allow passthrough MPI toolbox type MFI commands to
firmware based on firmware capability.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-27 00:07:35 -04:00
Chandrakanth Patil 7fc557005c scsi: megaraid_sas: Offload Aero RAID5/6 division calculations to driver
For RAID5/RAID6 volumes configured behind Aero, driver will be doing 64bit
division operations on behalf of firmware as controller's ARM CPU is very
slow in this division. Later, driver calculates Q-ARM, P-ARM and Log-ARM and
passes those values to firmware by writing these values to RAID_CONTEXT.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-27 00:07:35 -04:00
Chandrakanth Patil 49f2bf1071 scsi: megaraid_sas: RAID1 PCI bandwidth limit algorithm is applicable for only Ventura
RAID1 PCI bandwidth limit algorithm is not applicable to Aero as it's PCIe
Gen4 adapter.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-27 00:07:35 -04:00
Chandrakanth Patil a4413a5859 scsi: megaraid_sas: megaraid_sas: Add check for count returned by HOST_DEVICE_LIST DCMD
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-27 00:07:35 -04:00
Chandrakanth Patil 59db5a931b scsi: megaraid_sas: Handle sequence JBOD map failure at driver level
Issue: This issue is applicable to scenario when JBOD sequence map is
unavailable (memory allocation for JBOD sequence map failed) to driver but
feature is supported by firmware.  If the driver sends a JBOD IO by not
adding 255 (MAX_PHYSICAL_DEVICES - 1) to device ID when underlying firmware
supports JBOD sequence map, it will lead to the IO failure.

Fix: For JBOD IOs, driver will not use the RAID map to fetch the devhandle
if JBOD sequence map is unavailable. Driver will set Devhandle to 0xffff
and Target ID to 'device ID + 255 (MAX_PHYSICAL_DEVICES - 1)'.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-27 00:07:35 -04:00
Chandrakanth Patil 798d44b04f scsi: megaraid_sas: Don't send FPIO to RL Bypass queue
Firmware does not expect FastPath IO sent through Region Lock Bypass queue.
Though firmware never exposes such settings when fastpath IO can be sent to
RL bypass queue but it's safer to remove dead code which directs fastpath
IO to RL Bypass queue.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-27 00:07:35 -04:00
Chandrakanth Patil ccf6c1f2e2 scsi: megaraid_sas: In probe context, retry IOC INIT once if firmware is in fault
Issue: Under certain conditions, controller goes in FAULT state after IOC
INIT fired to firmware. Such Fault can be recovered through controller
reset.

Fix: In driver probe context, if firmware fault is observed post IOC INIT,
driver would do controller reset followed by retry logic for IOC INIT
command.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-27 00:07:35 -04:00
Chandrakanth Patil 7fa3174b3e scsi: megaraid_sas: Release Mutex lock before OCR in case of DCMD timeout
Issue: There is possibility of few DCMDs timing out with 'reset_mutex' lock
held. As part of DCMD timeout handling, driver calls function
megasas_reset_fusion which also tries to acquire same lock 'reset_mutex'
and end up with deadlock.

Fix: Upon timeout of DCMDs (which are fired with 'reset_mutex' lock held),
driver will release 'reset_mutex' before calling OCR function and will
acquire lock again after OCR function returns.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-27 00:07:35 -04:00
Chandrakanth Patil a6ffd5bf68 scsi: megaraid_sas: Call disable_irq from process IRQ poll
On PowerPC architecture, calling disable_irq_nosync from IRQ context is not
providing the required effect.

In current megaraid_sas driver, disable_irq_nosync is being called from IRQ
context before enabling IRQ poll. But due to the issue seen on PPC, after
IRQ poll disable and legacy ISR is enabled, we are not seeing our ISR
getting called.

Fix: Call disable_irq from IRQ poll thread context instead of IRQ context.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-27 00:07:35 -04:00
Chandrakanth Patil 2181aacf46 scsi: megaraid_sas: Remove few debug counters from IO path
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-27 00:07:35 -04:00
Chandrakanth Patil dd80769923 scsi: megaraid_sas: Add support for Non-secure Aero PCI IDs
This patch will add support for non-secure Aero adapter PCI IDs.  Driver
will throw an error message when a non-secure type controller is
detected. Purpose of this interface is to avoid interacting with any
firmware which is not secured/signed by Broadcom. Any tampering on Firmware
component will be detected by hardware and it will be communicated to the
driver to avoid any further interaction with that component.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-27 00:07:34 -04:00
Chandrakanth Patil 5885571df7 scsi: megaraid_sas: Add 32 bit atomic descriptor support to AERO adapters
Aero adapters provides Atomic Request Descriptor as an alternative method
for posting an entry onto a request queue. The posting of an Atomic Request
Descriptor is an atomic operation, providing a safe mechanism for multiple
processors on the host to post requests without synchronization. This
Atomic Request Descriptor format is identical to first 32 bits of Default
Request Descriptor and uses only 32 bits.

If Aero adapters support Atomic descriptor, driver should use it for
posting IOs and DCMDs to firmware.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-27 00:07:34 -04:00
Gustavo A. R. Silva e58ed5002f scsi: megaraid_sas: Use struct_size() helper
One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with
memory for some number of elements for that array. For example:

struct MR_PD_CFG_SEQ_NUM_SYNC {
	...
        struct MR_PD_CFG_SEQ seq[1];
} __packed;

Make use of the struct_size() helper instead of an open-coded version in
order to avoid any potential type mistakes.

So, replace the following form:

sizeof(struct MR_PD_CFG_SEQ_NUM_SYNC) + (sizeof(struct MR_PD_CFG_SEQ) * (MAX_PHYSICAL_DEVICES - 1))

with:

struct_size(pd_sync, seq, MAX_PHYSICAL_DEVICES - 1)

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-20 15:37:03 -04:00
YueHaibing bc7625795c scsi: megaraid_sas: Remove unused including <linux/version.h>
Remove including <linux/version.h> that don't need it.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:25 -04:00
Tomas Henzl d635468349 scsi: megaraid_sas: use DEVICE_ATTR_{RO, RW}
Use existing macros.  No functional change.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:25 -04:00
Tomas Henzl ea14e46240 scsi: megaraid_sas: use octal permissions instead of constants
Checkpatch emits a warning when using symbolic permissions. Use octal
permissions instead.  No functional change.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:25 -04:00
Tomas Henzl deff370633 scsi: megaraid_sas: make max_sectors visible in sys
Support is easier with all driver parameters visible in sysfs.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:25 -04:00
YueHaibing cdf79db476 scsi: megaraid_sas: remove set but not used variables 'buff_addr' and 'ci_h'
Fixes gcc '-Wunused-but-set-variable' warnings:

drivers/scsi/megaraid/megaraid_sas_base.c: In function megasas_fw_crash_buffer_show:
drivers/scsi/megaraid/megaraid_sas_base.c:3138:16: warning: variable buff_addr set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_base.c: In function megasas_get_pd_list:
drivers/scsi/megaraid/megaraid_sas_base.c:4426:13: warning: variable ci_h set but not used [-Wunused-but-set-variable]

'buff_addr' is never used since inroduction in commit fc62b3fc90
("megaraid_sas : Firmware crash dump feature support")

'ci_h' is not used since commit 9b3d028f34 ("scsi: megaraid_sas:
Pre-allocate frequently used DMA buffers")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:25 -04:00
YueHaibing ed17190941 scsi: megaraid_sas: remove set but not used variable 'sge_sz'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/scsi/megaraid/megaraid_sas_base.c: In function megasas_create_frame_pool:
drivers/scsi/megaraid/megaraid_sas_base.c:4124:6: warning: variable sge_sz set but not used [-Wunused-but-set-variable]

It's not used any more since commit 200aed582d ("megaraid_sas: endianness
related bug fixes and code optimization")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:25 -04:00
YueHaibing de19212c28 scsi: megaraid_sas: remove set but not used variables 'host' and 'wait_time'
Fixes gcc '-Wunused-but-set-variable' warnings:

drivers/scsi/megaraid/megaraid_sas_base.c: In function megasas_suspend:
drivers/scsi/megaraid/megaraid_sas_base.c:7269:20: warning: variable host set but not used [-Wunused-but-set-variable]
drivers/scsi/megaraid/megaraid_sas_base.c: In function megasas_aen_polling:
drivers/scsi/megaraid/megaraid_sas_base.c:8397:15: warning: variable wait_time set but not used [-Wunused-but-set-variable]

'host' never used since introduction in commit 31ea708897 ("[SCSI]
megaraid_sas: add hibernation support")

'wait_time' never used since commit 11c71cb4ab ("megaraid_sas: Do
not allow PCI access during OCR")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:23 -04:00
YueHaibing dea98ba45a scsi: megaraid_sas: remove set but not used variable 'cur_state'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/scsi/megaraid/megaraid_sas_base.c: In function megasas_transition_to_ready:
drivers/scsi/megaraid/megaraid_sas_base.c:3900:6: warning: variable cur_state set but not used [-Wunused-but-set-variable]

Never used since commit 7218df69e3 ("[SCSI] megaraid_sas: use the
firmware boot timeout when waiting for commands")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:23 -04:00
Shivasharan S c9ac8e2466 scsi: megaraid_sas: Update driver version to 07.708.03.00
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:20 -04:00
Shivasharan S ba53572bf0 scsi: megaraid_sas: Export RAID map through debugfs
Create a debugfs interface for megaraid_sas driver.  Provide interface to
dump driver RAID map in debugfs.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:20 -04:00
Shivasharan S ce88418dce scsi: megaraid_sas: Fix MSI-X vector print
Print FW supported MSI-X vector count only if FW supports
MSI-X.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:20 -04:00
Shivasharan S 0a11c0b02a scsi: megaraid_sas: Add debug prints for device list
Add debug prints related to device list being returned by firmware.  The a
debug flag to activate these prints.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:20 -04:00
Shivasharan S f7331f1800 scsi: megaraid_sas: Add prints in suspend and resume path
Add prints in resume/suspend path to help in debugging hibernation
issues. The print gives an indication when the driver entry points are
called.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:20 -04:00
Shivasharan S 223d5818e7 scsi: megaraid_sas: Print firmware interrupt status
Add a print to dump the interrupt status in system log for debugging.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:20 -04:00
Shivasharan S b6661342f2 scsi: megaraid_sas: Print FW fault information
When driver detects a firmware fault during load, dump additional
information on fault code and subcode that will help in debugging.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:20 -04:00
Shivasharan S a6024a9e91 scsi: megaraid_sas: Export RAID map id through sysfs
Add a sysfs interface to get the raid map index that is being used by
driver.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:20 -04:00
Shivasharan S 9a5987101c scsi: megaraid_sas: Print BAR information from driver
Add prints for BAR address information during driver load.  This helps in
debugging issues with BAR address changing during OS boot.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:19 -04:00
Shivasharan S 3d1d9eb7f2 scsi: megaraid_sas: Dump system registers for debugging
When controller fails to transition to READY state during driver probe,
dump the system interface register set.  This will give snapshot of the
firmware status for debugging driver load issues.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:19 -04:00
Shivasharan S cfb9a30e5d scsi: megaraid_sas: Dump system interface regs from sysfs
Add a sysfs interface to dump the controller's system interface registers.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:19 -04:00
Shivasharan S 4fe55035f3 scsi: megaraid_sas: Add formatting option for megasas_dump
Add option to format the buffer that is being dumped.  Currently, the IO
frame and chain frame dumped in the syslog is getting split across multiple
lines based on the formatting.  Fix this by using KERN_CONT in printk.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:19 -04:00
Shivasharan S 2ce4350879 scsi: megaraid_sas: Enhance internal DCMD timeout prints
Add prints to identify the internal DCMD opcode that has timed out.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:19 -04:00
Shivasharan S 96c9603cf1 scsi: megaraid_sas: Enhance prints in OCR and TM path
This patch enhances the existing debug prints in reset and task management
path.

These debug prints in adapter reset path helps with debugging issues
related to IO timeouts that are seen frequently in the field.  Add
additional debug prints to dump the pending command frames before
initiating an adapter reset.  Also, print FastPath IOs that are
outstanding.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:19 -04:00
Shivasharan S 1d15d9098a scsi: megaraid_sas: Load balance completions across all MSI-X
Driver will use "reply descriptor post queues" in round robin fashion when
the combined MSI-X mode is not enabled. With this IO completions are
distributed and load balanced across all the available reply descriptor
post queues equally.

This is enabled only if combined MSI-X mode is not enabled in firmware.
This improves performance and also fixes soft lockups.

When load balancing is enabled, IRQ affinity from driver needs to be
disabled.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:19 -04:00
Shivasharan S 62a04f81e6 scsi: megaraid_sas: IRQ poll to avoid CPU hard lockups
Issue Description:

We have seen cpu lock up issues from field if system has a large (more than
96) logical cpu count.  SAS3.0 controller (Invader series) supports max 96
MSI-X vector and SAS3.5 product (Ventura) supports max 128 MSI-X vectors.

This may be a generic issue (if PCI device support completion on multiple
reply queues).

Let me explain it w.r.t megaraid_sas supported h/w just to simplify the
problem and possible changes to handle such issues.  MegaRAID controller
supports multiple reply queues in completion path.  Driver creates MSI-X
vectors for controller as "minimum of (FW supported Reply queues, Logical
CPUs)".  If submitter is not interrupted via completion on same CPU, there
is a loop in the IO path. This behavior can cause hard/soft CPU lockups, IO
timeout, system sluggish etc.

Example - one CPU (e.g. CPU A) is busy submitting the IOs and another CPU
(e.g. CPU B) is busy with processing the corresponding IO's reply
descriptors from reply descriptor queue upon receiving the interrupts from
HBA.  If CPU A is continuously pumping the IOs then always CPU B (which is
executing the ISR) will see the valid reply descriptors in the reply
descriptor queue and it will be continuously processing those reply
descriptor in a loop without quitting the ISR handler.

megaraid_sas driver will exit ISR handler if it finds unused reply
descriptor in the reply descriptor queue.  Since CPU A will be continuously
sending the IOs, CPU B may always see a valid reply descriptor (posted by
HBA Firmware after processing the IO) in the reply descriptor queue. In
worst case, driver will not quit from this loop in the ISR handler.
Eventually, CPU lockup will be detected by watchdog.

Above mentioned behavior is not common if "rq_affinity" set to 2 or
affinity_hint is honored by irqbalancer as "exact".  If rq_affinity is set
to 2, submitter will be always interrupted via completion on same CPU.  If
irqbalancer is using "exact" policy, interrupt will be delivered to
submitter CPU.

Problem statement:

If CPU count to MSI-X vectors (reply descriptor Queues) count ratio is not
1:1, we still have exposure of issue explained above and for that we don't
have any solution.

Exposure of soft/hard lockup is seen if CPU count is more than MSI-X
supported by device.

If CPUs count to MSI-X vectors count ratio is not 1:1, (Other way, if
CPU counts to MSI-X vector count ratio is something like X:1, where X > 1)
then 'exact' irqbalance policy OR rq_affinity = 2 won't help to avoid CPU
hard/soft lockups. There won't be any one to one mapping between
CPU to MSI-X vector instead one MSI-X interrupt (or reply descriptor queue)
is shared with group/set of CPUs and there is a possibility of having a
loop in the IO path within that CPU group and may observe lockups.

For example: Consider a system having two NUMA nodes and each node having
four logical CPUs and also consider that number of MSI-X vectors enabled on
the HBA is two, then CPUs count to MSI-X vector count ratio as 4:1.
e.g.
MSI-X vector 0 is affinity to CPU 0, CPU 1, CPU 2 & CPU 3 of NUMA node 0 and
MSI-X vector 1 is affinity to CPU 4, CPU 5, CPU 6 & CPU 7 of NUMA node 1.

numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3                 --> MSI-X 0
node 0 size: 65536 MB
node 0 free: 63176 MB
node 1 cpus: 4 5 6 7                 --> MSI-X 1
node 1 size: 65536 MB
node 1 free: 63176 MB

Assume that user started an application which uses all the CPUs of NUMA
node 0 for issuing the IOs.  Only one CPU from affinity list (it can be any
cpu since this behavior depends upon irqbalance) CPU0 will receive the
interrupts from MSI-X 0 for all the IOs. Eventually, CPU 0 IO submission
percentage will be decreasing and ISR processing percentage will be
increasing as it is more busy with processing the interrupts.  Gradually IO
submission percentage on CPU 0 will be zero and it's ISR processing
percentage will be 100% as IO loop has already formed within the
NUMA node 0, i.e. CPU 1, CPU 2 & CPU 3 will be continuously busy with
submitting the heavy IOs and only CPU 0 is busy in the ISR path as it
always find the valid reply descriptor in the reply descriptor queue.
Eventually, we will observe the hard lockup here.

Chances of occurring of hard/soft lockups are directly proportional to
value of X. If value of X is high, then chances of observing CPU lockups is
high.

Solution:

Use IRQ poll interface defined in "irq_poll.c".

megaraid_sas driver will execute ISR routine in softirq context and it will
always quit the loop based on budget provided in IRQ poll interface.
Driver will switch to IRQ poll only when more than a threshold number of
reply descriptors are handled in one ISR. Currently threshold is set as
1/4th of HBA queue depth.

In these scenarios (i.e. where CPUs count to MSI-X vectors count ratio is
X:1 (where X >  1)), IRQ poll interface will avoid CPU hard lockups due to
voluntary exit from the reply queue processing based on budget.
Note - Only one MSI-X vector is busy doing processing.

Select CONFIG_IRQ_POLL from driver Kconfig for driver compilation.

Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:19 -04:00
Shivasharan S 78409d4b47 scsi: megaraid_sas: Block PCI config space access from userspace during OCR
While an online controller reset(OCR) is in progress, there is short
duration where all access to controller's PCI config space from the host
needs to be blocked.  This is due to a hardware limitation of MegaRAID
controllers.

With this patch, driver will block all access to controller's config space
from userland applications by calling pci_cfg_access_lock() while OCR is in
progress and unlocking after controller comes back to ready state.

Added helper function which locks the config space before initiating OCR
and wait for controller to become READY.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:19 -04:00
Shivasharan S 44e8d6930f scsi: megaraid_sas: Rework code around controller reset
No functional change.  This patch reworks code around controller reset path
which gets rid of a couple of goto labels.  This is in preparation for the
next patch which adds PCI config space access locking while controller
reset is in progress.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:19 -04:00
Shivasharan S f10fb8523a scsi: megaraid_sas: fw_reset_no_pci_access required for MFI adapters only
fw_reset_no_pci_access is only applicable for MFI controllers and is not
used for Fusion controllers.

For all Fusion controllers, driver can check reset adapter bit in
status register before performing a chip reset without
setting "fw_reset_no_pci_access".

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:19 -04:00
Shivasharan S 4a0bcf362d scsi: megaraid_sas: Remove unused variable target_index
No functional change. Remove set but unused variable in
megasas_set_static_target_properties.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-06-18 19:46:19 -04:00
Linus Torvalds d1cd7c85f9 SCSI misc on 20190507
This is mostly update of the usual drivers: qla2xxx, qedf, smartpqi,
 hpsa, lpfc, ufs, mpt3sas, ibmvfc and hisi_sas.  Plus number of minor
 changes, spelling fixes and other trivia.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXNIK0yYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishbeFAP4wsOm6
 BNqDvLbF7gHAH+oSVAeENd9tepXG2xKbUHmLRgEA1wPaUxon8L8v/SSsqwewqMXP
 y6CnU6aO2iaViTNBsPs=
 =RX74
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This is mostly update of the usual drivers: qla2xxx, qedf, smartpqi,
  hpsa, lpfc, ufs, mpt3sas, ibmvfc and hisi_sas. Plus number of minor
  changes, spelling fixes and other trivia"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (298 commits)
  scsi: qla2xxx: Avoid that lockdep complains about unsafe locking in tcm_qla2xxx_close_session()
  scsi: qla2xxx: Avoid that qlt_send_resp_ctio() corrupts memory
  scsi: qla2xxx: Fix hardirq-unsafe locking
  scsi: qla2xxx: Complain loudly about reference count underflow
  scsi: qla2xxx: Use __le64 instead of uint32_t[2] for sending DMA addresses to firmware
  scsi: qla2xxx: Introduce the dsd32 and dsd64 data structures
  scsi: qla2xxx: Check the size of firmware data structures at compile time
  scsi: qla2xxx: Pass little-endian values to the firmware
  scsi: qla2xxx: Fix race conditions in the code for aborting SCSI commands
  scsi: qla2xxx: Use an on-stack completion in qla24xx_control_vp()
  scsi: qla2xxx: Make qla24xx_async_abort_cmd() static
  scsi: qla2xxx: Remove unnecessary locking from the target code
  scsi: qla2xxx: Remove qla_tgt_cmd.released
  scsi: qla2xxx: Complain if a command is released that is owned by the firmware
  scsi: qla2xxx: target: Fix offline port handling and host reset handling
  scsi: qla2xxx: Fix abort handling in tcm_qla2xxx_write_pending()
  scsi: qla2xxx: Fix error handling in qlt_alloc_qfull_cmd()
  scsi: qla2xxx: Simplify qlt_send_term_imm_notif()
  scsi: qla2xxx: Fix use-after-free issues in qla2xxx_qpair_sp_free_dma()
  scsi: qla2xxx: Fix a qla24xx_enable_msix() error path
  ...
2019-05-08 10:12:46 -07:00
Colin Ian King efc372c1bf scsi: megaraid_sas: fix spelling mistake "oustanding" -> "outstanding"
There are a couple of spelling mistakes in some kernel info and notice
messages. Fix these.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-04-18 20:34:46 -04:00
YueHaibing 7c3f8ca8e4 scsi: megaraid_sas: Make megasas_host_device_list_query() static
Fix sparse warning:

drivers/scsi/megaraid/megaraid_sas_base.c:4652:1: warning:
 symbol 'megasas_host_device_list_query' was not declared. Should it be static?

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-04-15 22:20:16 -04:00
Will Deacon fb24ea52f7 drivers: Remove explicit invocations of mmiowb()
mmiowb() is now implied by spin_unlock() on architectures that require
it, so there is no reason to call it from driver code. This patch was
generated using coccinelle:

	@mmiowb@
	@@
	- mmiowb();

and invoked as:

$ for d in drivers include/linux/qed sound; do \
spatch --include-headers --sp-file mmiowb.cocci --dir $d --in-place; done

NOTE: mmiowb() has only ever guaranteed ordering in conjunction with
spin_unlock(). However, pairing each mmiowb() removal in this patch with
the corresponding call to spin_unlock() is not at all trivial, so there
is a small chance that this change may regress any drivers incorrectly
relying on mmiowb() to order MMIO writes between CPUs using lock-free
synchronisation. If you've ended up bisecting to this commit, you can
reintroduce the mmiowb() calls using wmb() instead, which should restore
the old behaviour on all architectures other than some esoteric ia64
systems.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-08 12:01:02 +01:00
Linus Torvalds 477558d7e8 SCSI misc on 20190315
This is the final round of mostly small fixes and performance
 improvements to our initial submit.  The main regression fix is the
 ia64 simscsi build failure which was missed in the serial number
 elimination conversion.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXIxBayYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pisherpAP4rxLpX
 bcUnQnEsvoxys/JyoK08Qfv1JebZo1B2MAZ62wD/VZ7LpOuzVLhsM2KhLFGRrs1/
 7D2K4tgtO2dQsFix7H0=
 =pcHl
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull more SCSI updates from James Bottomley:
 "This is the final round of mostly small fixes and performance
  improvements to our initial submit.

  The main regression fix is the ia64 simscsi build failure which was
  missed in the serial number elimination conversion"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (24 commits)
  scsi: ia64: simscsi: use request tag instead of serial_number
  scsi: aacraid: Fix performance issue on logical drives
  scsi: lpfc: Fix error codes in lpfc_sli4_pci_mem_setup()
  scsi: libiscsi: Hold back_lock when calling iscsi_complete_task
  scsi: hisi_sas: Change SERDES_CFG init value to increase reliability of HiLink
  scsi: hisi_sas: Send HARD RESET to clear the previous affiliation of STP target port
  scsi: hisi_sas: Set PHY linkrate when disconnected
  scsi: hisi_sas: print PHY RX errors count for later revision of v3 hw
  scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO
  scsi: hisi_sas: Change return variable type in phy_up_v3_hw()
  scsi: qla2xxx: check for kstrtol() failure
  scsi: lpfc: fix 32-bit format string warning
  scsi: lpfc: fix unused variable warning
  scsi: target: tcmu: Switch to bitmap_zalloc()
  scsi: libiscsi: fall back to sendmsg for slab pages
  scsi: qla2xxx: avoid printf format warning
  scsi: lpfc: resolve static checker warning in lpfc_sli4_hba_unset
  scsi: lpfc: Correct __lpfc_sli_issue_iocb_s4 lockdep check
  scsi: ufs: hisi: fix ufs_hba_variant_ops passing
  scsi: qla2xxx: Fix panic in qla_dfs_tgt_counters_show
  ...
2019-03-16 12:51:50 -07:00
Linus Torvalds 92fff53b71 SCSI misc on 20190306
This is mostly update of the usual drivers: arcmsr, qla2xxx, lpfc,
 hisi_sas, target/iscsi and target/core.  Additionally Christoph
 refactored gdth as part of the dma changes.  The major mid-layer
 change this time is the removal of bidi commands and with them the
 whole of the osd/exofs driver and filesystem.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXIC54SYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishT1GAPwJEV23
 ExPiPsnuVgKj49nLTagZ3rILRQcYNbL+MNYqxQEA0cT8FHzSDBfWY5OKPNE+RQ8z
 f69LpXGmMpuagKGvvd4=
 =Fhy1
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This is mostly update of the usual drivers: arcmsr, qla2xxx, lpfc,
  hisi_sas, target/iscsi and target/core.

  Additionally Christoph refactored gdth as part of the dma changes. The
  major mid-layer change this time is the removal of bidi commands and
  with them the whole of the osd/exofs driver and filesystem. This is a
  major simplification for block and mq in particular"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (240 commits)
  scsi: cxgb4i: validate tcp sequence number only if chip version <= T5
  scsi: cxgb4i: get pf number from lldi->pf
  scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c
  scsi: mpt3sas: Add missing breaks in switch statements
  scsi: aacraid: Fix missing break in switch statement
  scsi: kill command serial number
  scsi: csiostor: drop serial_number usage
  scsi: mvumi: use request tag instead of serial_number
  scsi: dpt_i2o: remove serial number usage
  scsi: st: osst: Remove negative constant left-shifts
  scsi: ufs-bsg: Allow reading descriptors
  scsi: ufs: Allow reading descriptor via raw upiu
  scsi: ufs-bsg: Change the calling convention for write descriptor
  scsi: ufs: Remove unused device quirks
  Revert "scsi: ufs: disable vccq if it's not needed by UFS device"
  scsi: megaraid_sas: Remove a bunch of set but not used variables
  scsi: clean obsolete return values of eh_timed_out
  scsi: sd: Optimal I/O size should be a multiple of physical block size
  scsi: MAINTAINERS: SCSI initiator and target tweaks
  scsi: fcoe: make use of fip_mode enum complete
  ...
2019-03-09 16:53:47 -08:00