Firmware AIF messages about cache loss and data recovery are being missed
by the driver since currently they are not captured but rather let go.
This patch to capture those messages and log them for the user.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Typically under error conditions, it is possible for aac_command_thread()
to miss the wakeup from kthread_stop() and go back to sleep, causing it
to hang aac_shutdown.
In the observed scenario, the adapter is not functioning correctly and so
aac_fib_send() never completes (or time-outs depending on how it was
called). Shortly after aac_command_thread() starts it performs
aac_fib_send(SendHostTime) which hangs. When aac_probe_one
/aac_get_adapter_info send time outs, kthread_stop is called which breaks
the command thread out of it's hang.
The code will still go back to sleep in schedule_timeout() without
checking kthread_should_stop() so it causes aac_probe_one to hang until
the schedule_timeout() which is 30 minutes.
Fixed by: Adding another kthread_should_stop() before schedule_timeout()
Cc: stable@vger.kernel.org
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
aac_fib_send has a special function case for initial commands during
driver initialization using wait < 0(pseudo sync mode). In this case,
the command does not sleep but rather spins checking for timeout.This
loop is calls cpu_relax() in an attempt to allow other processes/threads
to use the CPU, but this function does not relinquish the CPU and so the
command will hog the processor. This was observed in a KDUMP
"crashkernel" and that prevented the "command thread" (which is
responsible for completing the command from being timed out) from
starting because it could not get the CPU.
Fixed by replacing "cpu_relax()" call with "schedule()"
Cc: stable@vger.kernel.org
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
aac_fib_map_free() calls pci_free_consistent() without checking that
dev->hw_fib_va is not NULL and dev->max_fib_size is not zero.If they are
indeed NULL/0, this will result in a hang as pci_free_consistent() will
attempt to invalidate cache for the entire 64-bit address space
(which would take a very long time).
Fixed by adding a check to make sure that dev->hw_fib_va and
dev->max_fib_size are not NULL and 0 respectively.
Fixes: 9ad5204d6 - "[SCSI]aacraid: incorrect dma mapping mask during blinked recover or user initiated reset"
Cc: stable@vger.kernel.org
Signed-off-by: Raghava Aditya Renukunta <raghavaaditya.renukunta@pmcs.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver utilizes an array of atomic variables to keep track of IO
submissions to each vector. To submit an IO multiple threads iterate
through the array to find a vector which has empty slots to send an
IO. The reading and updating of the variable is not atomic, causing race
conditions when a thread uses a full vector to submit an IO.
Fixed by mapping each FIB to a vector, the submission path then uses
said vector to submit IO thereby removing the possibly of a race
condition.The vector assignment is started from 1 since vector 0 is
reserved for the use of AIF management FIBS.If the number of MSIx
vectors is 1 (MSI or INTx mode) then all the fibs are allocated to
vector 0.
Fixes: 495c0217 "aacraid: MSI-x support"
Cc: stable@vger.kernel.org # v4.1
Signed-off-by: Raghava Aditya Renukunta <raghavaaditya.renukunta@pmcs.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The method to allocate and free FIB's in the present code utilizes
spinlocks. Multiple IO's have to wait on the spinlock to acquire or free
fibs creating a performance bottleneck.
An alternative solution would be to use block layer tags to keep track
of the fibs allocated and freed. To this end aac_fib_alloc_tag was
created to utilize the blk layer tags to plug into the Fib pool.These
functions are used exclusively in the IO path. 8 fibs are reserved for
the use of AIF management software and utilize the previous spinlock
based implementations.
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@pmcs.com>
Reviewed-by: Shane Seymour <shane.seymour@hpe.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Calling kthread_run with a single name parameter causes it to be handled
as a format string. Many callers are passing potentially dynamic string
content, so use "%s" in those cases to avoid any potential accidents.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Series 7 Async. (performance) mode support added
- New scatter/gather list format for Series 7
- Driver converts s/g list to a firmware suitable list for best performance on
Series 7, this can be disabled with driver parameter "aac_convert_sgl" for
testing purposes
- New container read/write command structure for Series 7
- Fast response support for the SCSI pass-through path added
- Async. status response buffer changes
Signed-off-by: Mahesh Rajashekhara <Mahesh_Rajashekhara@pmc-sierra.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The loop that waited for syncronous fib commands was causing a CPU stall
when a timeout actually occured.
1) Switch to using a more accurate timeout mechanism.
2) Do not pace the loop with udelay(). Use cpu_relax() to allow for
scheduling to occur.
Signed-off-by: Ben Collins <bcollins@ubuntu.com>
Acked-by: Achim Leubner <Achim_Leubner@pmc-sierra.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Added Sync. mode to support Series 7/8/9 controller families: This is a
compatibility mode for all these controller families. The Async. (Performance)
mode can be changed in the future. First Async. mode version added for Series
7; Controller parameter aac_sync_mode added
Signed-off-by: Mahesh Rajashekhara <aacraid@pmc-sierra.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Added new hardware device 0x28b interface for PMC-Sierra's SRC based
controller family.
- new src.c file for 0x28b specific functions
- new XPORT header required
- sync. command interface: doorbell bits shifted (SRC_ODR_SHIFT, SRC_IDR_SHIFT)
- async. Interface: different inbound queue handling, no outbound I2O
queue available, using doorbell ("PmDoorBellResponseSent") and
response buffer on the host ("host_rrq") for status
- changed AIF (adapter initiated FIBs) interface: "DoorBellAifPending"
bit to inform about pending AIF, "AifRequest" command to read AIF,
"NoMoreAifDataAvailable" to mark the end of the AIFs
Signed-off-by: Mahesh Rajashekhara <aacraid@pmc-sierra.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Get rid of init_MUTEX[_LOCKED]() and use sema_init() instead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: aacraid@adaptec.com
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Problem description:
--------------------
The problem reported by one of the customer was when a logical array
is deleted(from the SDK, from the GUI, from arcconf) then the
corresponding physical device (/dev/sdb, for example) is not removed
from the Linux namespace. So you end up with a "dead" device
entry. And some of the linux tools go slightly wonky.
Solution:
---------
Based on the notification from FW, the driver calls
"scsi_remove_device" for the DELETED drive. This call not only informs
the scsi device status to the SCSI mid layer and also it will remove
corresponding scsi device entries from the Linux sysfs.
Signed-off-by: Mahesh Rajashekhara <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Problem description:
--------------------
When the JBOD is created from the OS using Adaptec Storage Manager
utility device is not available under FDISK until a system restart is
done.
Solution:
---------
AIF handling: If there is a JBOD drive added to the system, identify
the old one with scsi_device_lookup() and remove it to enable a fresh
scsi_add_device(); else the new JBOD is not available until reboot.
Signed-off-by: Mahesh Rajashekhara <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
These particular problems were reported by Cisco and SAP and customers
as well. Cisco reported on RHEL4 U6 and SAP reported on SLES9 SP4 and
SLES10 SP2. We added these fixes on RHEL4 U6 and gave a private build
to IBM and Cisco. Cisco and IBM tested it for more than 15 days and
they reported that they did not see the issue so far. Before the fix,
Cisco used to see the issue within 5 days. We generated a patch for
SLES9 SP4 and SLES10 SP2 and submitted to Novell. Novell applied the
patch and gave a test build to SAP. SAP tested and reported that the
build is working properly.
We also tested in our lab using the tools "dishogsync", which is IO
stress tool and the tool was provided by Cisco.
Issue1: File System going into read-only mode
Root cause: The driver tends to not free the memory (FIB) when the
management request exits prematurely. The accumulation of such
un-freed memory causes the driver to fail to allocate anymore memory
(FIB) and hence return 0x70000 value to the upper layer, which puts
the file system into read only mode.
Fix details: The fix makes sure to free the memory (FIB) even if the
request exits prematurely hence ensuring the driver wouldn't run out
of memory (FIBs).
Issue2: False Raid Alert occurs
When the Physical Drives and Logical drives are reported as deleted or
added, even though there is no change done on the system
Root cause: Driver IOCTLs is signaled with EINTR while waiting on
response from the lower layers. Returning "EINTR" will never initiate
internal retry.
Fix details: The issue was fixed by replacing "EINTR" with
"ERESTARTSYS" for mid-layer retries.
Signed-off-by: Penchala Narasimha Reddy <ServeRAIDDriver@hcl.in>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Replace all DMA_31BIT_MASK macro with DMA_BIT_MASK(31)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As JBOD devices (really just Simple Single Drive Volumes exported to
the SCSI channel) are managed, they fail to update correctly when the
driver triggers a SCSI scan. In addition, the ability to change
multiple arrays or JBODs at the same time was resulting in dropped
scans, set up a mechanism to issue a list of single target scans on a
single configuration change notification from the Firmware.
Performed some additional sundry cosmetic code style cleanups.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
On Apr 21, 2008, at 8:42 PM, Yinghai Lu wrote:
> bisected to:
>
> commit e6990c6448
> Author: Mark Salyzyn <Mark_Salyzyn@adaptec.com>
> Date: Mon Apr 14 14:20:16 2008 -0400
>
> [SCSI] aacraid: Fix down_interruptible() to check the return value
The return value for down_interruptible was incorrectly checked!
updated patch enclosed.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Instead of ignoring the return value in aac_fib_send() return 2 to
indicate to the layers above that fib transmission was aborted due to
timeout.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The Adapter's Ignore Reset flag and insmod parameter boolean polarity
is incorrect in the driver.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The cards being added are supported in a limited sense already through
family matching, but we needed to add some functionality to the driver
to expose selectively the physical drives. These Physical drives are
specifically marked to not be part of any array and thus are declared
JBODs (Just a Bunch Of Drives) for generic SCSI access.
We report that this is the second patch in a set of two, but merely
depends on the stand-alone functionality of the first patch which adds
in that case the ability to report a driver feature flag via sysfs. We
leverage that functionality by reporting that this driver now supports
this new JBOD feature for the controller so that the array management
applications may react accordingly and guide the user as they manage
the controller.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
I was amazed at how much embedded space was present in the aacraid
driver source files. Just selected five files from the set to clean up
for now and the attached patch swelled to 73K in size!
- Removed trailing space or tabs
- Removed spaces embedded within tabs
- Replaced leading 8 spaces with tabs
- Removed spaces before )
- Removed ClusterCommand as it was unused (noticed it as one triggered by above)
- Replaced scsi_status comparison with 0x02, to compare against SAM_STATUS_CHECK_CONDITION.
- Replaced a long series of spaces with tabs
- Replaced some simple if...defined() with ifdef/ifndef
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Added support to respond to enclosure service events
(controller AIFs) to add, online or offline physical targets
reported to sg. Also added online and offlining of arrays.
Removed an automatic variable definition in a sub block that
hid an earlier definition, determined to be inert as the
sub-block use did not interfere. Bumped the driver versioning
to stamp the addition of this feature.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
In experiments in the lab we managed to trigger an Adapter firmware
panic (BlinkLED) coincidentally while several pass-through ioctl
command from the management software were outstanding on a bug only
present on a class of RAID Adapters that require a hardware reset
rather than a commanded reset. The net result was an attempt to time
out the management software command as if it came from the SCSI layer
resulting in an OS panic.
Adapters that use commanded reset, management commands are returned
failed by the Adapter correctly. The adapter firmware panic that
resulted in this condition was also resolved, and there were no
adapters in the field with this specific firmware bug so we do not
expect any field reports. This is a rare or unlikely corner condition,
and no reports have ever been forwarded from the field.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Big endian systems issues discovered in the aacraid driver. Somewhat
reverses a patch from November 7th of last year that removed swap
operations because they formerly were being assigned to an u8 array
when they should have been assigned to an le32 array.
This patch is largely inert for any little endian processor
architecture. It resolves a bug in delivering the BlinkLED AIF event
to registered applications when the adapter or associated hardware was
reset due to ill health. A rare corner case occurrence, also largely
unnoticed by any as it was a new (untested!) feature.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
aacraid.cache parameter, Disable Queue Flush commands:
bit 0 - Disable FUA in WRITE SCSI commands
bit 1 - Disable SYNCHRONIZE_CACHE SCSI command
bit 2 - Disable only if Battery not protecting adapter supplied Cache
e.g.: aacraid.cache=7 will disable the FUA and SYNCHRONIZE_CACHE
commands if the adapter has reported that it's cache is battery backed
up.
This parameter permits experimentation with tradeoffs between
performance and caching policy.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
As reported in http://bugzilla.kernel.org/show_bug.cgi?id=3D9133 it was
discovered that the PERC line of controllers lacked a key 64 bit
ScatterGather capable SCSI pass-through function. The adapters are still
capable of 64 bit ScatterGather I/O commands, but these two can not be
mixed. This problem was exacerbated by the introduction of the SCSI
Generic access to the DASD physical devices.
The fix for users before this patch is applied is aacraid.dacmode=3D0 on
the kernel command line to disable 64 bit I/O.
The enclosed patch introduces a new adapter quirk and tries to limp
along by enabling pass-through in situations where memory is 32 bit
addressable on 64 bit machines, or disable the pass-through functions
altogether. I expect that the check for 32 bit addressable memory to be
controversial in that it can be incorrect in non-Dell non-Intel systems
that PERC would never be installed under, the alternative is to disable
pass-through in all cases which could be reported as another regression.
Pass-through is used for SCSI Generic access to the physical devices, or
for the management applications to properly function.
In systems where this patch has disabled pass-through because it is
unsupportable in combination with I/O performance, the user can choose
to enable pass-through by turning off dacmode (aacraid.dacmode=3D0) or
limiting the discovered kernel memory (mem=3D4G) with an associated loss
in runtime performance. If we chose instead to turn off 64 bit dacmode
for the adapters with this quirk, then this would be reported as another
regression.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
On Wed, Nov 07, 2007 at 01:51:44PM -0500, Salyzyn, Mark wrote:
> Christoph Hellwig [mailto:hch@infradead.org] sez:
> > Did anyone run the driver through sparse to see if we have
> > more issues like this?
>
> There are some warnings from sparse, none like this one. I will deal
> with the warnings ...
Actually there are a lot of endianess warnings, fortunately most of them
harmless. The patch below fixes all of them up (including the ones in
the patch I replied to), except for aac_init_adapter which is really odd
and I don't know what to do.
[jejb fixed up rejections and checkpatch issues]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mark Salyzyn <mark_salyzyn@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Some of our vendors have requested that our adapters ignore the hardware
reset attempts during recovery and have enforced this with changes in
Adapter Firmware. Some of our customers have requested the option to be
able to reset the adapter under adverse adapter failure, we even had a
few defects reported here considering it a regression that the Adapter
could not be reset. This patch addresses this dichotomy. The user can
force the adapter to be reset if it supports the IOP_RESET_ALWAYS
command, in cases where the adapter has been programmed to ignore the
reset, by setting the aacraid.check_reset parameter to a value of -1.
The driver will not reset an Adapter that does not support the reset
command(s).
This patch also fixes and cleans up some of the logic associated with
resetting the adapter.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Noticed on PowerPC allmod config build:
drivers/scsi/aacraid/commsup.c:1342: warning: large integer implicitly truncated to unsigned type
drivers/scsi/aacraid/commsup.c:1343: warning: large integer implicitly truncated to unsigned type
drivers/scsi/aacraid/commsup.c:1344: warning: large integer implicitly truncated to unsigned type
Also fix some whitespace on the changed lines.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mark Salyzyn <mark_salyzyn@adaptec.com>
Signed-off-by: James <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Too generic, clashes with ISDN.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Incorrect dma mask was used for blinkled (firmware assert) recovery or
user initiated reset during initialization portion. Ensure that all
callers of aac_fib_map_free null out the fib allocation references to
prevent multiple free. Although serious sounding, no reports of these
problems have surfaced...
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add the ability for an application to issue a hardware reset to the
adapter via sysfs. Typical uses include restarting the adapter after it
has been flashed. Bumped revision number for the driver and added a
feature to periodically check the adapter's health (check_interval),
update the adapter's concept of time (update_interval) and block
checking/resetting of the adapter (check_reset).
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Inspired somewhat by Vignesh Babu <vignesh.babu@wipro.com> patch to
dpt_i2o.c to replace kmalloc/memset sequences with kzalloc, doing the
same for the aacraid driver.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add some likely() and unlikely() compiler hints in some of the aacraid
hardware interface layers. There should be no operational side effects
resulting from this patch and the changes should be mostly benign on x86
platforms.
Signed-off-by: Mark Salyzyn <aacraid@adaptec.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn,
This set of fixes improve error handling stability of the driver. A popular
manifestation of the problems is an NULL pointer reference in the interrupt
handler when referencing portions of the scsi command context, or in the
scsi_done handling when an offlined device is referenced.
The aacraid driver currently does not get notification of orphaned command
completions due to devices going offline. The driver also fails to handle the
commands that are finished by the error handler, and thus can complete again
later at the hands of the adapter causing situations of completion of an
invalid scsi command context. Test Unit Ready calls abort assuming that the
abort was successful, but are not, and thus when the interrupt from the adapter
occurs, they reference invalid command contexts. We add in a TIMED_OUT flag to
inform the aacraid FIB context that the interrupt service should merely release
the driver resources and not complete the command up. We take advantage of this
with the abort handler as well for select abortable commands. And we detect and
react if a command that can not be aborted is currently still outstanding to
the controller when reissued by the retry mechanism.
Signed-off-by: Mark Haverkamp <markh@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn,
Outstanding ioctl calls still have some problems with aborting cleanly
in the face of a reset iop recovery action should the adapter ever enter
into a Firmware Assert (BlinkLED) condition. The enclosed patch resolves
some uncovered flawed handling.
Signed-off-by: Mark Haverkamp <markh@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>