Thanks to James Smart for the inspiration.
Stall error handler if attempting recovery while an rport is blocked.
This avoids device offline scenarios due to errors in the error handler.
Also verify that VirtDevice is available before issuing scsi command.
VirtDevice is removed when fc transport removes a target.
See James Smart's patch of 08/17/2006 for greater detail.
http://marc.theaimsgroup.com/?l=linux-scsi&m=115583213624803&w=2
Also bump version number per Eric's request.
Signed-off-by: Michael Reed <mdr@sgi.com>
Acked-by: Eric Moore <eric.moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Acked-by: "Moore, Eric Dean" <Eric.Moore@lsil.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This flag denotes local attachment of the phy. There are two problems
with it:
1) It's actually redundant ... you can get the same information simply
by seeing whether a host is the phys parent
2) we condition a lot of phy parameters on it on the false assumption
that we can only control local phys. I'm wiring up phy resets in the
aic94xx now, and it will be able to reset non-local phys as well.
I fixed 2) by moving the local check into the reset and stats function
of the mptsas, since that seems to be the only HBA that can't
(currently) control non-local phys.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add host_supported_speeds, host_maxframe_size, host_speed, host_fabric_name,
host_port_type, host_port_state, and host_symbolic_name transport attributes
to fusion fibre channel.
Signed-off-by: Michael Reed <mdr@sgi.com>
Acked-by: Moore, Eric <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch corrects a problem in mptfc which can result in targets
being removed after executing an "lsiutil 99" reset of the fibre
channel ports.
The last rescan event was being processed before the setup reset work
due to an inappropriate optimization in the event processing logic.
Every rescan event is now queued for execution and the setup reset
work now executes in the proper sequence.
Signed-off-by: Michael Reed <mdr@sgi.com>
Acked-by: Moore, Eric <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Based upon a conversation I had with LSI's fibre channel firmware guru,
this patch adds another condition under which the driver waits for the
firmware link initialization / target discovery to complete.
Signed-off-by: Michael Reed <mdr@sgi.com>
Acked-by: Moore, Eric <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This takes advantage of the sas class backlink function to show which
port on an expander is used to communicate with the parent.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Fix's to insure download boot could occur when
either channel of 1030 is reset. Necessary in order
for onboard controller in flashless environment
to become operational.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Fix's to insure proper status is returned to midlayer
when a task abort failed to be aborted by controller
firmware.
Also sanity checks to prevent scsi cmd from being
double completed during error recovery.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
sas nexus loss support for systems that suport failover.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Fix panic for when mptctl is loading at the same time
when one of the fusion llds (mptsas/mptfc/mptspi) is loading.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Adding support for sas enclosures with smart drives.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Using the port_id for the channel is completely unnecessary since the
host_id/target_id are constructed to be globally unique. Also move
the mptsas driver on to virtual channel 1 for its raid devices.
Acked-by: "Moore, Eric" <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This allows us to be rid of the machinery in mptsas for creating and
tracking port numbers. Since mptsas is merely inventing the numbers,
the SAS transport class may as well do it instead.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Conflicts:
drivers/scsi/nsp32.c
drivers/scsi/pcmcia/nsp_cs.c
Removal of randomness flag conflicts with SA_ -> IRQF_ global
replacement.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
One of the current problems the mptsas driver has is that of "ghost"
devices (these are devices the firmware reports as existing, but what
they actually represent are the parents of a lower device), so for
example in my dual expander configuration, three expanders actually show
up, two for the real expanders but a third is created because the
firmware reports that the lower expander also has another expander
connected (which is simply the port going back to the upper expander).
The attached patch eliminates all these ghosts by not allocating any
devices for them if the SAS address is the SAS address of the parent.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The way mpt_interrupt() was coded, it was impossible for the unhandled
interrupt detection logic to ever trigger. All interrupt handlers should
return IRQ_NONE when they have nothing to do.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.com>
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Make two needlessly global functions static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.com>
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Conflicts:
drivers/scsi/aacraid/comminit.c
Fixed up by removing the now renamed CONFIG_IOMMU option from
aacraid
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
* Adding 1078 ROC (Raid On Chip) Support - New host adapter
* Moving all PCI Vendor/Device ids to using internal defines; a request
from Christoph/James B. some time ago for when the next chip was added.
* Removing SAS 1066/1066E Vendor/Device IDs, as there are no plans to
manufacture that controller.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
* Wide port support added - using James Bottomley's new SAS wide port API.
(There is a known problem in sas transport layer reported yesterday to
James. The Kobject dev.bus_ids for end devices are not unique across
expanders. I have added a work around in this patch, where I asigning
an unique port identifier for every port within the host - this solves
the problem, but I expect a fix from James in the sas transport).
* Adding target_alloc and target_destroy entry points, and moving code over
from the slave entry points.
* The renaming of some mptscsih_xxx functions declared in mptsas.c,
to mptsas_xxx.
* Target Reset moved from slave_destroy to hotplug work thread
handling (with regard to device removal). Also inhibit IO to end device
while device is being broken down . Talked to James Smart about this
at Linux Expo (with questions of how the fc transport handles this).
* Cleaning up the kzalloc's, and kfree's
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This ugly hack was long overdue to die.
It was a way to print out Sparc interrupts in a more freindly format,
since IRQ numbers were arbitrary opaque 32-bit integers which vectored
into PIL levels. These 32-bit integers were not necessarily in the
0-->NR_IRQS range, but the PILs they vectored to were.
The idea now is that we will increase NR_IRQS a little bit and use a
virtual<-->real IRQ number mapping scheme similar to PowerPC.
That makes this IRQ printing hack irrelevant, and furthermore only a
handful of drivers actually used __irq_itoa() making it even less
useful.
Signed-off-by: David S. Miller <davem@davemloft.net>
Bump driver version number to reflect addition of various
fibre channel patches.
Signed-off-by: Michael Reed <mdr@sgi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The driver uses msleep_interruptible() in the code path responsible
for resetting the card's ports via the lsiutil command. If a
<ctrl-c> is received during the reset it can leave a port in such
a state that the only way to regain its use is to reboot the system.
Changing from msleep_interruptible() to msleep() corrects the problem.
Signed-off-by: Michael Reed <mdr@sgi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
While doing board reset testing I was able to put the system in
an infinite request/response loop between the scsi layer and
mptscsih_qcmd() by aborting the reset. This patch installs
a "SETUP RESET" handler which calls fc_remote_port_delete()
for all registered rports. This blocks the target which
prevents the loop. Additionally, should the reset fail to
complete, the transport will now terminate i/o to the target.
Signed-off-by: Michael Reed <mdr@sgi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The fibre channel firmware provides a timer which is similar in purpose
to the fibre channel transport's device loss timer. The effect of this
timer is to extend the total time that a target will be missing beyond
the value associated with the transport's timer. This patch changes
the firmware timer to a default of one second which significantly reduces
the lag between when a target goes missing and the notification of the
fibre channel transport.
Signed-off-by: Michael Reed <mdr@sgi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Move fibre channel event and reset handling to mptfc. This will
result in fewer changes over time that need to be applied to
either mptbase.c or mptscsih.c.
Signed-off-by: Michael Reed <mdr@sgi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
MPT fusion driver initialization fails while second kernel is booting,
after a system crash (if kdump kernel is configured). Oops message is
pasted below.
*****************************************************************************
Fusion MPT base driver 3.03.08
Copyright (c) 1999-2005 LSI Logic Corporation
Fusion MPT SAS Host driver 3.03.08 ACPI: PCI Interrupt 0000:01:00.0[A] -> Link [LNKA] -> GSI 5 (level, low) -> IRQ 5
mptbase: Initiating ioc0 bringup
BUG: unable to handle kernel paging request at virtual address 00002608
printing eip:
c11782fd
*pde = 00000000
Oops: 0000 [#1]
Modules linked in:
CPU: 0
EIP: 0060:[<c11782fd>] Not tainted VLI
EFLAGS: 00010046 (2.6.17-rc1-16M #2)
EIP is at mptscsih_io_done+0x27/0x3a3
eax: c4fed000 ebx: c4fed000 ecx: 00002600 edx: 00000298
esi: c11782d6 edi: 00002600 ebp: 00000000 esp: c1332f74
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c1332000 task=c128f9c0) Stack: <0>0000006c 00000020 00000298 00002600 c4fed000 c4fed000 c11782d6 0000260 0
00000000 c1172c49 c4fed000 c1305b40 00000005 00000000 c1172d75 c48877e0
c1029687 00000000 c1307fb8 00000000 c1305a00 00000001 00000000 c1307fb8
Call Trace:
<c11782d6> mptscsih_io_done+0x0/0x3a3 <c1172c49> mpt_turbo_reply+0xbb/0xd3
<c1172d75> mpt_interrupt+0x22/0x2b <c1029687> misrouted_irq+0x63/0xcb
<c10297b3> note_interrupt+0x43/0x98 <c10292f9> __do_IRQ+0x68/0x8f
<c1003fac> do_IRQ+0x36/0x4e
=======================
<c1002aa6> common_interrupt+0x1a/0x20 <c1001150> mwait_idle+0x1a/0x2a
<c10010bf> cpu_idle+0x40/0x5c <c1308610> start_kernel+0x17a/0x17c Code: 5e 5f 5d c3 55 89 cd 57 56 53 83 ec 14 89 54 24 0c 89 44 24 10 8b 90 cc 00 00 00 8b 4c 24 0c 81 c2 98 02 00 00 85 ed 89 54 24 08 <0f> b7 79 08 89 fe 74 04 0f b7 75 08 66 39 f7 75 0d 8b 44 24 0c
*******************************************************************************
o Kdump capture kernel boot fails during initialization of MPT fusion driver.
(LSI Logic / Symbios Logic SAS1064E PCI-Express Fusion-MPT SAS (rev 01))
o Problem is easily reproducible, if system crashed while some disk activity
like cp operation was going on.
o After a system crash, devices are not shutdown and capture kernel starts
booting while skipping BIOS. Hence underlying device is left in operational
state. In this case scsi contoller was left with interrupt line asserted
reply FIFO was not empty. When driver starts initializing in the second
kernel, it receives the interrupt the moment request_irq() is called.
Interrupt handler, reads the message from reply FIFO and tries to access
the associated message frame and panics, as in the new kernel's context
that message frame is not valid at all.
o In this scenario, probably we should delay the request_irq() call. First
bring up the IOC, reset it if needed and then should register for irq.
o I have tested the patch with SAS1064E and 53c1030 controllers.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Acked-by: "Moore, Eric Dean" <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
All registered reset callback handlers are called during reset processing.
The mptspi modules has its own reset callback handler, just recently
added for issuing domain validation after host reset. If either the mptsas or
mptfc driver are loaded, this callback could be called. Thus resulting
in domain validation being issued for sas or fibre end devices.
Fix this by having mptbase.c check the bus type against the driver
type and only call the reset handler if they match (or if it's a
non-bus specific reset handler).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
A race condition exists in mptfc between the thread registering a device
with the fc transport and the scan work generated by the transport.
This race existed prior to the application of the mptfc bug fix patch.
mptfc_register_dev() calls fc_remote_port_add() with the FC_RPORT_ROLE_TARGET
bit set in the rport ids passed to the function. Having this bit set causes
fc_remote_port_add() to schedule a scan of the device.
This scan can execute before mptfc_register_dev() can fill in the dd_data
in the rport structure. When this happens, mptfc_target_alloc() will fail
because dd_data is null.
Attached is a patch which fixes the problem. The patch changes the rport ids
passed to fc_remote_port_add() to not have the TARGET bit set. This prevents
the scan from being scheduled. After mptfc_register_dev() fills in the rport
dd_data field, fc_remote_port_rolechg() is called, changing the role of the
rport to TARGET. Thus, the scan is scheduled after dd_data is filled
in which prevents the failure in mptfc_target_alloc().
Signed-off-by: Michael Reed <mdr@sgi.com>
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This is a bug fix for mptspi driver, where after a host reset or
resume, we revalidate the negotiation parameters for all devices.
This bug was introduced when the driver was ported to use the spi
transport layer.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Bug fix for stack overflow in EventDescriptionStr, (a function
for debuging firmware events). We allocated 50 bytes on local stack
for buff[], however there are places in the code where we've attempted
copying in greater than 50 bytes into buff[].
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
mptbase.h
bump version number to 3.03.09
remove unneeded flags
define workq and remove old fc specific locks
mptbase.c
initialize new lock and don't initialize two removed locks
mptscsih.c
when firmware reports target is no longer there, return
DID_REQUEUE for fc hosts so that i/o doesn't get killed until
the transport has an opportunity to manage the loss via its
dev loss timer
when the "eh_abort" routine is called, check to see if the
driver has the command or not before looking to see if a reset
is pending. James Smart and I talked about this and believe
that the API for this routine is: if driver doesn't have
command, return SUCCESS. This change helps prevent a target
from being taken offline. SUCCESS is returned because it's
likely that the command completed after error recovery timed
it out but before it could be aborted.
provide a routine to queue work to newly created workq, and
use it.
remove "ioc" from mptscsih_abort() it was only used one time.
the other references were via hd->ioc, so I just moved it....
net change in references to ioc via hd->ioc is zero
move hd->resetPending test and hd->timeouts increment to after
the test for whether the command to be aborted remains known
to the driver
Make certain that the workq exists before queuing work to it.
mptfc.c
no longer need to lock rport data structures as I was able to
single thread the code! I fixed up the debug code to
eliminate compilation messages due to type mismatch in the
printk. Got rid of some no longer needed rport flags.
Initialize and destroy the workq used for the rescan work.
simplify the logic regarding the increment of
fc_rescan_work_count. use post increment and test for zero
vs. pre increment and test for one; eliminate work_count
variable: queue_work can be called with the work_lock held as
it doesn't sleep
Signed-off-by: Michael Reed <mdr@sgi.com>
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch handles case where raid hidden components
are not being removed when power turned off to device
attached to expander, as well as the case of
exposing raid components when power is turned back on
to devices attached to an expander. (This is a repost
of this patch, with mptsas_is_end_device declared
further up in the code.)
This patch contains some other miscellaneous bug fix's.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Driver panic when RAID logical volume was present when driver
loaded, or when a RAID logical volume was created on the fly.
This issue was created in due to recent scsi_transport_sas change,
when sas_read_port_mode_page was added into the mptsas drivers
slave_config entry point.
This new API expects that all sdev's to be assocated to an rphy, however
that is not the case for logical volumes, as they are created using
scsi_add_device, instead of sas_rphy_add().
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The conversion of mptsas should allow the elimination of the contained
flag in the sas transport class.
Acked-by: "Moore, Eric" <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>