linux

Commit Graph

Author	SHA1	Message	Date
Tejun Heo	ae01b2493c	libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65 NVIDIA mcp65 familiy of controllers cause command timeouts when DIPM is used. Implement ATA_FLAG_NO_DIPM and apply it. This problem was reported by Stefan Bader in the following thread. http://thread.gmane.org/gmane.linux.ide/48841 stable: applicable to 2.6.37 and 38. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Stefan Bader <stefan.bader@canonical.com> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@pobox.com>	2011-04-24 11:32:16 -04:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
James Bottomley	0e0b494ca8	libata: separate error handler into usable components Right at the moment, the libata error handler is incredibly monolithic. This makes it impossible to use from composite drivers like libsas and ipr which have to handle error themselves in the first instance. The essence of the change is to split the monolithic error handler into two components: one which handles a queue of ata commands for processing and the other which handles the back end of readying a port. This allows the upper error handler fine grained control in calling libsas functions (and making sure they only get called for ATA commands whose lower errors have been fixed up). Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2011-03-02 02:36:45 -05:00
James Bottomley	c34aeebc06	libata: fix eh locking The SCSI host eh_cmd_q should be protected by the host lock (not the port lock). This probably doesn't matter that much at the moment, since we try to serialise the add and eh pieces, but it might matter in future for more convenient error handling. Plus this switches libata to the standard eh pattern where you lock, remove from the cmd queue to a local list and unlock and then operate on the local list. Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2011-03-02 02:36:45 -05:00
Tejun Heo	eb0e85e36b	libata: fix hotplug for drivers which don't implement LPM ata_eh_analyze_serror() suppresses hotplug notifications if LPM is being used because LPM generates spurious hotplug events. It compared whether link->lpm_policy was different from ATA_LPM_MAX_POWER to determine whether LPM is enabled; however, this is incorrect as for drivers which don't implement LPM, lpm_policy is always ATA_LPM_UNKNOWN. This disabled hotplug detection for all drivers which don't implement LPM. Fix it by comparing whether lpm_policy is greater than ATA_LPM_MAX_POWER. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2011-03-02 02:34:20 -05:00
Tejun Heo	e5005b15c9	libata: issue DIPM enable commands with LPM state updated Low level drivers may behave differently depending on the current link->lpm_policy. During ata_eh_set_lpm(), DIPM enable commands are issued after the successful completion of ap->ops->set_lpm(), which means that the controller is already in the target state. This causes DIPM enable commands to be processed with mismatching controller power state and link->lpm_policy value. In ahci, link->lpm_policy is used to ignore certain PHY events if LPM is enabled; however, as DIPM commands are issued with stale link->lpm_policy, they sometimes end up triggering these conditions and get aborted leading to LPM configuration failure. Fix it by updating link->lpm_policy before issuing DIPM enable commands. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Kyle McMartin <kyle@mcmartin.ca> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2010-12-24 13:34:34 -05:00
Tejun Heo	c0c362b60e	libata: implement cross-port EH exclusion In libata, the non-EH code paths should always take and release ap->lock explicitly when accessing hardware or shared data structures. However, once EH is active, it's assumed that the port is owned by EH and EH methods don't explicitly take ap->lock unless race from irq handler or other code paths are expected. However, libata EH didn't guarantee exclusion among EHs for ports of the same host. IOW, multiple EHs may execute in parallel on multiple ports of the same controller. In many cases, especially in SATA, the ports are completely independent of each other and this doesn't cause problems; however, there are cases where different ports share the same resource, which lead to obscure timing related bugs such as the one fixed by commit `213373cf` (ata_piix: fix locking around SIDPR access). This patch implements exclusion among EHs of the same host. When EH begins, it acquires per-host EH ownership by calling ata_eh_acquire(). When EH finishes, the ownership is released by calling ata_eh_release(). EH ownership is also released whenever the EH thread goes to sleep from ata_msleep() or explicitly and reacquired after waking up. This ensures that while EH is actively accessing the hardware, it has exclusive access to it while allowing EHs to interleave and progress in parallel as they hit waiting stages, which dominate the time spent in EH. This achieves cross-port EH exclusion without pervasive and fragile changes while still allowing parallel EH for the most part. This was first reported by yuanding02@gmail.com more than three years ago in the following bugzilla. :-) https://bugzilla.kernel.org/show_bug.cgi?id=8223 Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Reported-by: yuanding02@gmail.com Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2010-10-21 20:21:05 -04:00
Tejun Heo	97750cebb3	libata: add @ap to ata_wait_register() and introduce ata_msleep() Add optional @ap argument to ata_wait_register() and replace msleep() calls with ata_msleep() which take optional @ap in addition to the duration. These will be used to implement EH exclusion. This patch doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2010-10-21 20:21:05 -04:00
Tejun Heo	6c8ea89cec	libata: implement LPM support for port multipliers Port multipliers can do DIPM on fan-out links fine. Implement support for it. Tested w/ SIMG 57xx and marvell PMPs. Both the host and fan-out links enter power save modes nicely. SIMG 37xx and 47xx report link offline on SStatus causing EH to detach the devices. Blacklisted. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2010-10-21 20:21:04 -04:00
Tejun Heo	6b7ae9545a	libata: reimplement link power management The current LPM implementation has the following issues. * Operation order isn't well thought-out. e.g. HIPM should be configured after IPM in SControl is properly configured. Not the other way around. * Suspend/resume paths call ata_lpm_enable/disable() which must only be called from EH context directly. Also, ata_lpm_enable/disable() were called whether LPM was in use or not. * Implementation is per-port when it should be per-link. As a result, it can't be used for controllers with slave links or PMP. * LPM state isn't managed consistently. After a link reset for whatever reason including suspend/resume the actual LPM state would be reset leaving ap->lpm_policy inconsistent. * Generic/driver-specific logic boundary isn't clear. Currently, libahci has to mangle stuff which libata EH proper should be handling. This makes the implementation unnecessarily complex and fragile. * Tied to ALPM. Doesn't consider DIPM only cases and doesn't check whether the device allows HIPM. * Error handling isn't implemented. Given the extent of mismatch with the rest of libata, I don't think trying to fix it piecewise makes much sense. This patch reimplements LPM support. * The new implementation is per-link. The target policy is still port-wide (ap->target_lpm_policy) but all the mechanisms and states are per-link and integrate well with the rest of link abstraction and can work with slave and PMP links. * Core EH has proper control of LPM state. LPM state is reconfigured when and only when reconfiguration is necessary. It makes sure that LPM state is reset when probing for new device on the link. Controller agnostic logic is now implemented in libata EH proper and driver implementation only has to deal with controller specifics. * Proper error handling. LPM config failure is attributed to the device on the link and LPM is disabled for the link if it fails repeatedly. * ops->enable/disable_pm() are replaced with single ops->set_lpm() which takes @policy and @hints. This simplifies driver specific implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2010-10-21 20:21:04 -04:00
Tejun Heo	c93b263e0d	libata: clean up lpm related symbols and sysfs show/store functions Link power management related symbols are in confusing state w/ mixed usages of lpm, ipm and pm. This patch cleans up lpm related symbols and sysfs show/store functions as follows. * lpm states - NOT_AVAILABLE, MIN_POWER, MAX_PERFORMANCE and MEDIUM_POWER are renamed to ATA_LPM_UNKNOWN and ATA_LPM_{MIN\|MAX\|MED}_POWER. * Pre/postfixes are unified to lpm. * sysfs show/store functions for link_power_management_policy were curiously named get/put and unnecessarily complex. Renamed to show/store and simplified. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2010-10-21 20:21:04 -04:00
Gwendal Grignou	d9027470b8	[libata] Add ATA transport class This is a scheleton for libata transport class. All information is read only, exporting information from libata: - ata_port class: one per ATA port - ata_link class: one per ATA port or 15 for SATA Port Multiplier - ata_device class: up to 2 for PATA link, usually one for SATA. Signed-off-by: Gwendal Grignou <gwendal@google.com> Reviewed-by: Grant Grundler <grundler@google.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2010-10-21 20:21:03 -04:00
Tejun Heo	e2f3d75fc0	libata: skip EH autopsy and recovery during suspend For some mysterious reason, certain hardware reacts badly to usual EH actions while the system is going for suspend. As the devices won't be needed until the system is resumed, ask EH to skip usual autopsy and recovery and proceed directly to suspend. Signed-off-by: Tejun Heo <tj@kernel.org> Tested-by: Stephan Diestelhorst <stephan.diestelhorst@amd.com> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2010-09-09 22:27:59 -04:00
Linus Torvalds	3b7433b8a8	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits) workqueue: mark init_workqueues() as early_initcall() workqueue: explain for_each_cwq_cpu() iterators fscache: fix build on !CONFIG_SYSCTL slow-work: kill it gfs2: use workqueue instead of slow-work drm: use workqueue instead of slow-work cifs: use workqueue instead of slow-work fscache: drop references to slow-work fscache: convert operation to use workqueue instead of slow-work fscache: convert object to use workqueue instead of slow-work workqueue: fix how cpu number is stored in work->data workqueue: fix mayday_mask handling on UP workqueue: fix build problem on !CONFIG_SMP workqueue: fix locking in retry path of maybe_create_worker() async: use workqueue for worker pool workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead workqueue: implement unbound workqueue workqueue: prepare for WQ_UNBOUND implementation libata: take advantage of cmwq and remove concurrency limitations workqueue: fix worker management invocation without pending works ... Fixed up conflicts in fs/cifs/ as per Tejun. Other trivial conflicts in include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c	2010-08-07 12:42:58 -07:00
FUJITA Tomonori	acad76272c	[libata] add ATA_CMD_DSM to ata_get_cmd_descript Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2010-08-01 19:36:03 -04:00
Tejun Heo	ad72cf9885	libata: take advantage of cmwq and remove concurrency limitations libata has two concurrency related limitations. a. ata_wq which is used for polling PIO has single thread per CPU. If there are multiple devices doing polling PIO on the same CPU, they can't be executed simultaneously. b. ata_aux_wq which is used for SCSI probing has single thread. In cases where SCSI probing is stalled for extended period of time which is possible for ATAPI devices, this will stall all probing. #a is solved by increasing maximum concurrency of ata_wq. Please note that polling PIO might be used under allocation path and thus needs to be served by a separate wq with a rescuer. #b is solved by using the default wq instead and achieving exclusion via per-port mutex. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jeff Garzik <jgarzik@pobox.com>	2010-07-02 10:59:24 +02:00
Tejun Heo	fe06e5f9b7	libata-sff: separate out BMDMA EH Some of error handling logic in ata_sff_error_handler() and all of ata_sff_post_internal_cmd() are for BMDMA. Create ata_bmdma_error_handler() and ata_bmdma_post_internal_cmd() and move BMDMA part into those. While at it, change DMA protocol check to ata_is_dma(), fix post_internal_cmd to call ap->ops->bmdma_stop instead of directly calling ata_bmdma_stop() and open code hardreset selection so that ata_std_error_handler() doesn't have to know about sff hardreset. As these two functions are BMDMA specific, there's no reason to check for bmdma_addr before calling bmdma methods if the protocol of the failed command is DMA. sata_mv and pata_mpc52xx now don't need to set .post_internal_cmd to ATA_OP_NULL and pata_icside and sata_qstor don't need to set it to their bmdma_stop routines. ata_sff_post_internal_cmd() becomes noop and is removed. This fixes p3 described in clean-up-BMDMA-initialization patch. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2010-05-19 13:36:46 -04:00
Tejun Heo	c429137a67	libata-sff: port_task is SFF specific port_task is tightly bound to the standard SFF PIO HSM implementation. Using it for any other purpose would be error-prone and there's no such user and if some drivers need such feature, it would be much better off using its own. Move it inside CONFIG_ATA_SFF and rename it to sff_pio_task. The only function which is exposed to the core layer is ata_sff_flush_pio_task() which is renamed from ata_port_flush_task() and now also takes care of resetting hsm_task_state to HSM_ST_IDLE, which is possible as it's now specific to PIO HSM. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2010-05-19 13:35:49 -04:00
Jeff Garzik	a09bf4cd53	libata: ensure NCQ error result taskfile is fully initialized before returning it via qc->result_tf. Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2010-04-22 21:59:13 -04:00
Tejun Heo	fa41efdae7	libata: fix locking around blk_abort_request() blk_abort_request() expectes queue lock to be held by the caller. Grab it before calling the function. Lack of this synchronization led to infinite loop on corrupt q->timeout_list. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2010-04-22 21:47:52 -04:00
Tejun Heo	534ead7092	libata: retry FS IOs even if it has failed with AC_ERR_INVALID libata currently doesn't retry if a command fails with AC_ERR_INVALID assuming that retrying won't get it any further even if retried. However, a failure may be classified as invalid through hardware glitch (incorrect reading of the error register or firmware bug) and there isn't whole lot to gain by not retrying as actually invalid commands will be failed immediately. Also, commands serving FS IOs are extremely unlikely to be invalid. Retry FS IOs even if it's marked invalid. Transient and incorrect invalid failure was seen while debugging firmware related issue on Samsung n130 on bko#14314. http://bugzilla.kernel.org/show_bug.cgi?id=14314 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Johannes Stezenbach <js@sig21.net> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2010-01-20 14:25:11 -05:00
Tejun Heo	6013efd886	libata: retry failed FLUSH if device didn't fail it If ATA device failed FLUSH, it means that the device failed to write out some amount of data and the error needs to be reported to upper layers. As retries can't recover the lost data, FLUSH failures need to be reported immediately in general. However, if FLUSH fails due to transmission errors, the FLUSH needs to be retried; otherwise, filesystems may switch to RO mode and/or raid array may drop a drive for a random transmission glitch. This condition can be rather easily reproduced on certain ahci controllers which go through a PHY event after powersave mode switch + ext4 combination. Powersave mode switch is often closely followed by flush from the filesystem failing the FLUSH with ATA bus error which makes the filesystem code believe that data is lost and drop to RO mode. This was reported in the following bugzilla bug. http://bugzilla.kernel.org/show_bug.cgi?id=14543 This patch makes libata EH retry FLUSH if it wasn't failed by the device. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Andrey Vihrov <andrey.vihrov@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-12-03 02:46:35 -05:00
Tejun Heo	4f7c287499	libata: fix PMP initialization Commit `842faa6c1a` fixed error handling during attach by not committing detected device class to dev->class while attaching a new device. However, this change missed the PMP class check in the configuration loop causing a new PMP device to go through ata_dev_configure() as if it were an ATA or ATAPI device. As PMP device doesn't have a regular IDENTIFY data, this makes ata_dev_configure() tries to configure a PMP device using an invalid data. For the most part, it wasn't too harmful and went unnoticed but this ends up clearing dev->flags which may have ATA_DFLAG_AN set by sata_pmp_attach(). This means that SATA_PMP_FEAT_NOTIFY ends up being disabled on PMPs and on PMPs which honor the flag breaks hotplug support. This problem was discovered and reported by Ethan Hsiao. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Ethan Hsiao <ethanhsiao@jmicron.com> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-10-16 06:21:54 -04:00
Tejun Heo	3b761d3d43	libata: fix incorrect link online check during probe While trying to work around spurious detection retries for non-existent devices on slave links, commit `816ab89782` incorrectly added link offline check logic before ata_eh_thaw() was called. This means that if an occupied link goes down briefly at the time that offline check was performed, device class will be cleared to ATA_DEV_NONE and libata wouldn't retry thus failing detection of the device. The offline check should be done after the port is thawed together with online check so that such link glitches can be detected by the interrupt handler and handled properly. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Tim Blechmann <tim@klingt.org> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-10-06 20:58:18 -04:00
Robert Hancock	6521148c64	libata: add command name parsing for error output This patch improve libata's output for error/notification messages to allow easier comprehension and debugging: When ATAPI commands issued through the SCSI layer fail, use SCSI functions to print the CDB in human-readable form instead of just dumping out the CDB in hex. Print out the name of the failed command (as defined by the ATA specification) in error handling output along with the raw register contents. When reporting status of ACPI taskfile commands executed on resume, also output the names of the commands being executed (or not) in readable form. Since the extra data for printing command names increases kernel size slightly, a config option has been added to allow disabling command name output (as well as some of the error register parsing) for those highly sensitive to kernel text size. Signed-off-by: Robert Hancock <hancockrwd@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-09-01 19:47:20 -04:00
Tejun Heo	1e641060c4	libata: clear eh_info on reset completion Resets are done with port frozen but some controllers still issue interrupts during reset and they may end up recording error conditions in ehi leading to unnecessary EH retrials. This patch makes ata_eh_reset() clear ehi on reset completion. As reset is the most severe recovery action, there's nothing to lose by clearing ehi on its completion. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Zdenek Kaspar <zkaspar82@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-09-01 19:47:19 -04:00
Jeff Garzik	54c38444fa	[libata] EH: freeze port before aborting commands Call the ->freeze() hook before aborting qc's, because some hardware requires special handling prior to accessing the taskfile registers (for diagnosis/analysis/reset). Most notably, hardware may wish to disable the DMA engine or interrupts in the ->freeze() hook. Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-09-01 19:47:19 -04:00
Bartlomiej Zolnierkiewicz	705d201414	libata: add missing NULL pointer check to ata_eh_reset() drivers/ata/libata-eh.c +2403 ata_eh_reset(80) warning: variable derefenced before check 'slave' Please note that this is _not_ a real bug at the moment since ata_eh_context structure is embedded into ata_list structure and the code alwas checks for 'slave' before accessing 'sehc'. Anyway lets add missing check and always have a valid 'sehc' pointer (which makes code easier to understand and prevents introducing some possible bugs in the future). Reported-by: Dan Carpenter <error27@gmail.com> Cc: corbet@lwn.net Cc: eteo@redhat.com Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-07-28 21:05:41 -04:00
Tejun Heo	fe2c4d018f	libata: fix follow-up SRST failure path ata_eh_reset() was missing error return handling after follow-up SRST allowing EH to continue the normal probing path after reset failure. This was discovered while testing new WD 2TB drives which take longer than 10 secs to spin up and cause the first follow-up SRST to time out. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-07-14 22:41:28 -04:00
Martin Olsson	98a1708de1	trivial: fix typos s/paramter/parameter/ and s/excute/execute/ in documentation and source comments. Signed-off-by: Martin Olsson <martin@minimum.se> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-06-12 18:01:46 +02:00
Tejun Heo	6f9c1ea2c1	libata: clear ering on resume Error timestamps are in jiffies which doesn't run while suspended and PHY events during resume isn't too uncommon. When the two are combined, it can lead to unnecessary speed downs if the machine is suspended and resumed repeatedly. Clear error history on resume. This was reported and verified in bnc#486803 by Vladimir Botka. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Vladimir Botka <vbotka@novell.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-05-11 14:30:59 -04:00
Tejun Heo	842faa6c1a	libata: fix attach error handling New device attach path in ata_eh_revalidate_and_attach() is divided into two separate loops because ATA requires IDENTIFY to be issued to slave first while the user expects to see device probe messages from the master device. new_mask is used to track which devices are the new ones between the first loop and the second. This usually works well but if an error occurs during configuration stage, ata_dev_revalidate_and_attach() returns with error code and forgets new_mask. On the retry run, dev->class is set and new_mask for the device is clear, so the device just gets revalidated and thus ends up skipping post-configuration procedure including scheduling of SCSI_HOTPLUG for the device. When this occurs, ATA part of probing works fine but SCSI probing usually doesn't happen and makes the device unreachable. The behavior has been around for a very long time but it has been uncovered with the recent addition of 1_5_GBPS horkage which uses -EAGAIN return value from ata_dev_configure() to restart the probing sequence after forcing cable speed. This can be fixed by making sure dev->class is permanently set only after all configurations are successfully complete. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Tim Connors <tconnors+linuxkml@astro.swin.edu.au> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-05-11 14:26:01 -04:00
Alan Cox	c96f1732e2	[libata] Improve timeout handling On a timeout call a device specific handler early in the recovery so that we can complete and process successful commands which timed out due to IRQ loss or the like rather more elegantly. [Revised to exclude the timeout handling on a few devices that inherit from SFF but are not SFF enough to use the default timeout handler] Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-03-24 22:52:39 -04:00
Tejun Heo	d6515e6ff4	libata: make sure port is thawed when skipping resets When SCR access is available and the link is offline, softreset is skipped as it only wastes time and some controllers don't respond very well. However, the skip path forgot to thaw the port, which not only blocks further event notification from the port but also causes repeated EH invocations on the same event on drivers which rely on ->thaw() to clear events if the IRQ is shared with another device or port. This problem has always been there but is uncovered by recent sata_nv nf2/3 change which dropped hardreset support while maintaining SCR access. nf2/3 doesn't clear hotplug event mask from the interrupt handler but relies on ->thaw() to clear them. When the hardreset was there, the reset action was never skipped and the port was always thawed but, with the hardreset gone, ->prereset() determines that there's no need for softreset and both ->softreset() and ->thaw() are skipped. This leads to stuck hotplug event in the IRQ status register triggering hotplug event whenever IRQ is delieverd on the same IRQ. As the controller shares the same IRQ for both ports, this happens on every IO if one port is occpupied and the other isn't. This patch fixes the problem by making sure that the port is thawed on reset-skip path. bko#11615 reports this problem. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Robert Hancock <hancockrwd@gmail.com> Reported-by: Dan Andresan <danyer@gmail.com> Reported-by: Arne Woerner <arne_woerner@yahoo.com> Reported-by: Stefan Lippers-Hollmann <s.L-H@gmx.de> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-03-05 07:25:43 -05:00
Tejun Heo	b535708146	libata: don't use on-stack sense buffer sense_buffer is used as DMA target and shouldn't be allocated on stack. Use ap->sector_buf instead. This problem is spotted by Chuck Ebbert. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-03-05 07:25:10 -05:00
Tejun Heo	cf9a590a9e	libata: add no penalty retry request for EH device handling routines Let -EAGAIN from EH device handling routines trigger EH retry without consuming its tries count. This will be used to implement link SPD horkage which requires hardreset to adjust SPD without affecting other EH decisions. As it bypasses the forward progress guarantee provided by the tries count, the requester is responsible for ensuring forward progress. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-02-02 23:04:19 -05:00
Tejun Heo	c2c7a89c5e	libata: improve probe failure handling When link is flaky at high speed, it isn't uncommon for a device to repeatedly fail probing sequence early after successfully negotiating high link speed. This often leads to consecutive hotplug events without successful probing. This patch improves libata EH such that it remembers probing trials and if there have been more than two unsuccessful trials in the past 60 seconds, slows down link speed to 1.5Gbps. As link speed negotiation is the duty of the PHY layer proper, the goal of this fallback mechanism is to provide the last resort when everything else fails, which unfortunately happens not too infrequently, so no fancy 6->3->1.5 speeding down or highest successful transmission speed seen kind of logics (yet). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-02-02 23:03:34 -05:00
Tejun Heo	a07d499b47	libata: add @spd_limit to sata_down_spd_limit() Add @spd_limit to sata_down_spd_limit() so that the caller can specify the SPD limit it wants. This parameter doesn't get in the way even when it's too low. The closest possible limit is applied. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-02-02 23:03:22 -05:00
Tejun Heo	99cf610aa4	libata: clear dev->ering in smarter way dev->ering used to be cleared together with the rest of ata_device in ata_dev_init() which is called whenever a probing event occurs. dev->ering is about to be used to track probing failures so it needs to remain persistent over multiple porbing events. This patch achieves this by doing the following. * Instead of CLEAR_OFFSET, define CLEAR_BEGIN and CLEAR_END and only clear between BEGIN and END. ering is moved after END. The split of persistent area is to allow hotter items remain at the head. * ering is explicitly cleared on ata_dev_disable() and when device attach succeeds. So, ering is persistent throug a device's life time (unless explicitly cleared of course) and also through periods inbetween disablement of an attached device and successful detection of the next one. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-02-02 23:03:17 -05:00
Tejun Heo	678afac678	libata: move ata_dev_disable() to libata-eh.c ata_dev_disable() is about to be more tightly integrated into EH logic. Move it to libata-eh.c. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-02-02 23:03:00 -05:00
Tejun Heo	d89293abd9	libata: fix EH device failure handling The dev->pio_mode > XFER_PIO_0 test is there to avoid unnecessary speed down warning messages but it accidentally disabled SATA link spd down during configuration phase after reset where PIO mode is always zero. This patch fixes the problem by moving the test where it belongs. This makes libata probing sequence behave better when the connection is flaky at higher link speeds which isn't too uncommon for eSATA devices. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2009-02-02 23:02:57 -05:00
Tejun Heo	ece180d1cf	libata: perform port detach in EH ata_port_detach() first made sure EH saw ATA_PFLAG_UNLOADING and then assumed EH context belongs to it and performed detach operation itself. However, UNLOADING doesn't disable all of EH and this could lead to problems including triggering WARN_ON()'s in EH path. This patch makes port detach behave more like other EH actions such that ata_port_detach() requests EH to detach and waits for completion. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-12-28 22:43:21 -05:00
Tejun Heo	1eca4365be	libata: beef up iterators There currently are the following looping constructs. * __ata_port_for_each_link() for all available links * ata_port_for_each_link() for edge links * ata_link_for_each_dev() for all devices * ata_link_for_each_dev_reverse() for all devices in reverse order Now there's a need for looping construct which is similar to __ata_port_for_each_link() but iterates over PMP links before the host link. Instead of adding another one with long name, do the following cleanup. * Implement and export ata_link_next() and ata_dev_next() which take @mode parameter and can be used to build custom loop. * Implement ata_for_each_link() and ata_for_each_dev() which take looping mode explicitly. The following iteration modes are implemented. * ATA_LITER_EDGE : loop over edge links * ATA_LITER_HOST_FIRST : loop over all links, host link first * ATA_LITER_PMP_FIRST : loop over all links, PMP links first * ATA_DITER_ENABLED : loop over enabled devices * ATA_DITER_ENABLED_REVERSE : loop over enabled devices in reverse order * ATA_DITER_ALL : loop over all devices * ATA_DITER_ALL_REVERSE : loop over all devices in reverse order This change removes exlicit device enabledness checks from many loops and makes it clear which ones are iterated over in which direction. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-12-28 22:43:20 -05:00
Tejun Heo	19b723218b	libata: fix last_reset timestamp handling ehc->last_reset is used to ensure that resets are not issued too close to each other. It's initialized to jiffies minus one minute on EH entry. However, when new links are initialized after PMP is probed, new links have zero for this timestamp resulting in long wait depending on the current jiffies. This patch makes last_set considered iff ATA_EHI_DID_RESET is set, in which case last_reset is always initialized. As an added precaution, WARN_ON() is added so that warning is printed if last_reset is in future. This problem is spotted and debugged by Shane Huang. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Shane Huang <Shane.Huang@amd.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-11-11 03:01:21 -05:00
Tejun Heo	90484ebfc9	libata: clear saved xfer_mode and ncq_enabled on device detach libata EH saves xfer_mode and ncq_enabled at start to later set DUBIOUS_XFER flag if it has changed. These values need to be cleared on device detach such that hot device swap doesn't accidentally miss DUBIOUS_XFER. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-10-27 23:55:40 -04:00
Tejun Heo	4a9c7b3359	libata: fix device iteration bugs There were several places where only enabled devices should be iterated over but device enabledness wasn't checked. * IDENTIFY data 40 wire check in cable_is_40wire() * xfer_mode/ncq_enabled saving in ata_scsi_error() * DUBIOUS_XFER handling in ata_set_mode() While at it, reformat comments in cable_is_40wire(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-10-27 23:55:12 -04:00
Tejun Heo	816ab89782	libata: set device class to NONE if phys_offline Reset methods don't have access to phys link status for slave links and may incorrectly indicate device presence causing unnecessary probe failures for unoccupied links. This patch clears device class to NONE during post-reset processing if phys link is offline. As on/offlineness semantics is strictly defined and used in multiple places by the core layer, this won't change behavior for drivers which don't use slave links. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-10-22 20:42:43 -04:00
Tejun Heo	a568d1d2e2	libata-eh: fix slave link EH action mask handling Slave link action mask is transferred to master link and all the EH actions are taken by the master link. ata_eh_about_to_do() and ata_eh_done() are called with ATA_EH_ALL_ACTIONS to clear the slave link actions during transfer. This always sets ATA_PFLAG_RECOVERED flag causing spurious "EH complete" messages. Don't set ATA_PFLAG_RECOVERED for slave link actions. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-10-22 20:40:21 -04:00
Tejun Heo	848e4c68c4	libata: transfer EHI control flags to slave ehc.i ATA_EHI_NO_AUTOPSY and ATA_EHI_QUIET are used to control the behavior of EH. As only the master link is visible outside EH, these flags are set only for the master link although they should also apply to the slave link, which causes spurious EH messages during probe and suspend/resume. This patch transfers those two flags to slave ehc.i before performing slave autopsy and reporting. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-10-22 20:40:19 -04:00
Linus Torvalds	e26feff647	Merge branch 'for-2.6.28' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.28' of git://git.kernel.dk/linux-2.6-block: (132 commits) doc/cdrom: Trvial documentation error, file not present block_dev: fix kernel-doc in new functions block: add some comments around the bio read-write flags block: mark bio_split_pool static block: Find bio sector offset given idx and offset block: gendisk integrity wrapper block: Switch blk_integrity_compare from bdev to gendisk block: Fix double put in blk_integrity_unregister block: Introduce integrity data ownership flag block: revert part of d7533ad0e132f92e75c1b2eb7c26387b25a583c1 bio.h: Remove unused conditional code block: remove end_{queued\|dequeued}_request() block: change elevator to use __blk_end_request() gdrom: change to use __blk_end_request() memstick: change to use __blk_end_request() virtio_blk: change to use __blk_end_request() blktrace: use BLKTRACE_BDEV_SIZE as the name size for setup structure block: add lld busy state exporting interface block: Fix blk_start_queueing() to not kick a stopped queue include blktrace_api.h in headers_install ...	2008-10-10 10:52:45 -07:00
Jens Axboe	242f9dcb8b	block: unify request timeout handling Right now SCSI and others do their own command timeout handling. Move those bits to the block layer. Instead of having a timer per command, we try to be a bit more clever and simply have one per-queue. This avoids the overhead of having to tear down and setup a timer for each command, so it will result in a lot less timer fiddling. Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-10-09 08:56:13 +02:00
Tejun Heo	11fc33da8d	libata-eh: clear UNIT ATTENTION after reset Resets make ATAPI devices raise UNIT ATTENTION which fails the next command. As resets can happen asynchronously for unrelated reasons, this sometimes disrupts innocent users. For example, reading DVD fails after the system wakes up from suspend or the other device sharing the channel went through bus error. Clearing UA has some problems as it might clear UA which the userland needs to know about. However, UA after resets can only be about the reset itself and benefits of clearing it overweights cons. Missing UA can only delay failure to one of the following commands anyway. For example, timeout while burning is in progress will trigger reset and reset the device state and probably corrupt the burning run. Although the userland application won't get the UA, its pending writes will fail. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-09-29 00:29:06 -04:00
Elias Oltmanns	45fabbb77b	libata: Implement disk shock protection support On user request (through sysfs), the IDLE IMMEDIATE command with UNLOAD FEATURE as specified in ATA-7 is issued to the device and processing of the request queue is stopped thereafter until the specified timeout expires or user space asks to resume normal operation. This is supposed to prevent the heads of a hard drive from accidentally crashing onto the platter when a heavy shock is anticipated (like a falling laptop expected to hit the floor). In fact, the whole port stops processing commands until the timeout has expired in order to avoid any resets due to failed commands on another device. Signed-off-by: Elias Oltmanns <eo@nebensachen.de> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-09-29 00:27:54 -04:00
Tejun Heo	b1c72916ab	libata: implement slave_link Explanation taken from the comment of ata_slave_link_init(). In libata, a port contains links and a link contains devices. There is single host link but if a PMP is attached to it, there can be multiple fan-out links. On SATA, there's usually a single device connected to a link but PATA and SATA controllers emulating TF based interface can have two - master and slave. However, there are a few controllers which don't fit into this abstraction too well - SATA controllers which emulate TF interface with both master and slave devices but also have separate SCR register sets for each device. These controllers need separate links for physical link handling (e.g. onlineness, link speed) but should be treated like a traditional M/S controller for everything else (e.g. command issue, softreset). slave_link is libata's way of handling this class of controllers without impacting core layer too much. For anything other than physical link handling, the default host link is used for both master and slave. For physical link handling, separate @ap->slave_link is used. All dirty details are implemented inside libata core layer. From LLD's POV, the only difference is that prereset, hardreset and postreset are called once more for the slave link, so the reset sequence looks like the following. prereset(M) -> prereset(S) -> hardreset(M) -> hardreset(S) -> softreset(M) -> postreset(M) -> postreset(S) Note that softreset is called only for the master. Softreset resets both M/S by definition, so SRST on master should handle both (the standard method will work just fine). As slave_link excludes PMP support and only code paths which deal with the attributes of physical link are affected, all the changes are localized to libata.h, libata-core.c and libata-eh.c. * ata_is_host_link() updated so that slave_link is considered as host link too. * iterator extended to iterate over the slave_link when using the underbarred version. * force param handling updated such that devno 16 is mapped to the slave link/device. * ata_link_on/offline() updated to return the combined result from master and slave link. ata_phys_link_on/offline() are the direct versions. * EH autopsy and report are performed separately for master slave links. Reset is udpated to implement the above described reset sequence. Except for reset update, most changes are minor, many of them just modifying dev->link to ata_dev_phys_link(dev) or using phys online test instead. After this update, LLDs can take full advantage of per-dev SCR registers by simply turning on slave link. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-09-29 00:25:28 -04:00
Tejun Heo	da0e21d3fa	libata: use ata_link_printk() when printing SError SError belongs to link not port. Use ata_link_printk() to print it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-08-22 02:19:44 -04:00
Tejun Heo	5dbfc9cb59	libata: always do follow-up SRST if hardreset returned -EAGAIN As an optimization, follow-up SRST used to be skipped if classification wasn't requested even when hardreset requested it via -EAGAIN. However, some hardresets can't wait for device readiness and skipping SRST can cause timeout or other failures during revalidation. Always perform follow-up SRST if hardreset returns -EAGAIN. This makes reset paths more predictable and thus less error-prone. While at it, move hardreset error checking such that it's done right after hardreset is finished. This simplifies followup SRST condition check a bit and makes the reset path easier to modify. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-08-22 02:19:41 -04:00
Tejun Heo	a674050e06	libata: fix EH action overwriting in ata_eh_reset() ehc->i.action got accidentally overwritten to ATA_EH_HARD/SOFTRESET in ata_eh_reset(). The original intention was to clear reset action which wasn't selected. This can cause unexpected behavior when other EH actions are scheduled together with reset. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-08-22 02:19:39 -04:00
Tejun Heo	05944bdf6f	libata: implement no[hs]rst force params Implement force params nohrst, nosrst and norst. This is to work around reset related problems and ease debugging. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-08-22 02:07:43 -04:00
Tejun Heo	3eabddb8ed	libata-eh: update atapi_eh_request_sense() to take @dev instead of @qc Update atapi_eh_request_sense() to take @dev, @sense_buf and @dfl_sense_key instead of taking @qc and extracting information from it. This change is to make the function more generic and allow it to be called from other places. While at it, make cdb initialization use initializer. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-07-14 15:59:33 -04:00
Tejun Heo	87fbc5a060	libata: improve EH internal command timeout handling ATA_TMOUT_INTERNAL which was 30secs were used for all internal commands which is way too long when something goes wrong. This patch implements command type based stepped timeouts. Different command types can use different timeouts and each command type can use different timeout values after timeouts. ie. the initial timeout is set to a value which should cover most of the cases but not too long so that run away cases don't delay things too much. After the first try times out, the second try can use longer timeout and if that one times out too, it can go for full 30sec timeout. IDENTIFYs use 5s - 10s - 30s timeout and all other commands use 5s - 10s timeouts. This patch significantly cuts down the needed time to handle failure cases while still allowing libata to work with nut job devices through retries. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-07-14 15:59:32 -04:00
Tejun Heo	d8af0eb604	libata: use ULONG_MAX to terminate reset timeout table This doesn't introduce any functional changes. This is to make reset timeout table consistent with to-be-added command timeout tables. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-07-14 15:59:32 -04:00
Tejun Heo	0a2c0f5615	libata: improve EH retry delay handling EH retries were delayed by 5 seconds to ensure that resets don't occur back-to-back. However, this 5 second delay is superflous or excessive in many cases. For example, after IDENTIFY times out, there's no reason to wait five more seconds before retrying. This patch adds ehc->last_reset timestamp and record the timestamp for the last reset trial or success and uses it to space resets by ATA_EH_RESET_COOL_DOWN which is 5 secs and removes unconditional 5 sec sleeps. As this change makes inter-try waits often shorter and they're redundant in nature, this patch also removes the "retrying..." messages. While at it, convert explicit rounding up division to DIV_ROUND_UP(). This change speeds up EH in many cases w/o sacrificing robustness. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-07-14 15:59:32 -04:00
Tejun Heo	341c2c958e	libata: consistently use msecs for time durations libata has been using mix of jiffies and msecs for time druations. This is getting confusing. As writing sub HZ values in jiffies is PITA and msecs_to_jiffies() can't be used as initializer, unify unit for all time durations to msecs. So, durations are in msecs and deadlines are in jiffies. ata_deadline() is added to compute deadline from a start time and duration in msecs. While at it, drop now superflous _msec suffix from arguments and rename @timeout to @deadline if it represents a fixed point in time rather than duration. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-07-14 15:59:32 -04:00
Tejun Heo	e0614db2a3	libata: ignore recovered PHY errors No reason to get overzealous about recovered comm and data errors. Some PHYs habitually sets them w/o no good reason and being draconian about these soft error conditions doesn't seem to help anybody. If need ever rises, we might need to add soft PHY error condition, say AC_ERR_MAYBE_ATA_BUS and use it only to determine whether speed down is necessary but I don't think that's very likely to happen. It's far more likely we'll get timeouts or fatal transmission errors if recovered errors are so prominent that they hamper operation. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-05-19 17:51:47 -04:00
Tejun Heo	f046519fc8	libata: kill hotplug related race condition Originally, whole reset processing was done while the port is frozen and SError was cleared during @postreset(). This had two race conditions. 1: hotplug could occur after reset but before SError is cleared and libata won't know about it. 2: hotplug could occur after all the reset is complete but before the port is thawed. As all events are cleared on thaw, the hotplug event would be lost. Commit `ac371987a8` kills the first race by clearing SError during link resume but before link onlineness test. However, this doesn't fix race #2 and in some cases clearing SError after SRST is a good idea. This patch solves this problem by cross checking link onlineness with classification result after SError is cleared and port is thawed. Reset is retried if link is online but all devices attached to the link are unknown. As all devices will be revalidated, this one-way check is enough to ensure that all devices are detected and revalidated reliably. This, luckily, also fixes the cases where host controller returns bogus status while harddrive is spinning up after hotplug making classification run before the device sends the first FIS and thus causes misdetection. Low level drivers can bypass the logic by setting class explicitly to ATA_DEV_NONE if ever necessary (currently none requires this). Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-05-19 17:51:47 -04:00
Tejun Heo	dc98c32cbe	libata: move reset freeze/thaw handling into ata_eh_reset() Previously reset freeze/thaw handling lived outside of ata_eh_reset() mainly because the original PMP reset code needed the port frozen while resetting all the fan-out ports, which is no longer the case. This patch moves freeze/thaw handling into ata_eh_reset(). @prereset() and @postreset() are now called w/o freezing the port although @prereset() an be called frozen if the port is frozen prior to entering ata_eh_reset(). This makes code simpler and will help removing hotplug event related races. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-05-19 17:51:47 -04:00
Tejun Heo	932648b007	libata: reorganize ata_eh_reset() no reset method path Reorganize ata_eh_reset() such that @prereset() is called even when no reset method is available and if block is used instead of goto to skip actual reset. This makes no reset case behave better (readiness wait) and future changes easier. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-05-19 17:51:47 -04:00
Mark Lord	10acf3b0d3	libata: export ata_eh_analyze_ncq_error Export ata_eh_analyze_ncq_error() for subsequent use by sata_mv, as suggested by Tejun. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-05-06 11:37:58 -04:00
Mark Lord	a6116c9e60	libata-eh set tf flags in NCQ EH result_tf Fix mis-reporting of NCQ errors by ensuring that result_tf->flags is properly initialized in libata-eh. This allows ata_gen_ata_sense() to report the failed block number correctly to SCSI after a media error during NCQ. This patch may also be a candidate for backporting to earlier kernels. Without this fix, SCSI will fail I/O on the entire request rather than just the bad sector. That can be bad for a request that was merged from many independent read reads from different tasks. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-04-25 01:11:37 -04:00
Tejun Heo	4f7faa3f2b	libata: make EH fail gracefully if no reset method is available When no reset method is available, libata currently oopses. Although the condition can't happen unless there's a bug in a low level driver, oopsing isn't the best way to report the error condition. Complain, dump stack and fail reset instead. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-04-17 15:44:26 -04:00
Tejun Heo	45db2f6c95	libata: move link onlineness check out of softreset methods Currently, SATA softresets should do link onlineness check before actually performing SRST protocol but it doesn't really belong to softreset. This patch moves onlineness check in softreset to ata_eh_reset() and ata_eh_followup_srst_needed() to clean up code and help future sata_mv changes which need clear separation between SCR and TF accesses. sata_fsl is peculiar in that its softreset really isn't softreset but combination of hardreset and softreset. This patch adds dummy private ->prereset to keep the current behavior but the driver really should implement separate hard and soft resets and return -EAGAIN from hardreset if it should be follwed by softreset. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-04-17 15:44:25 -04:00
Tejun Heo	2a0c15ca39	libata: kill dead code paths in reset path Some code paths which had been made obsolete by recent reset simplification were still around. Kill them. * ata_eh_reset() checked for ATA_DEV_UNKNOWN to determine classification failure. This is no longer applicable. * ata_do_reset() should convert ATA_DEV_UNKNOWN to ATA_DEV_NONE regardless of reset result (e.g. -EAGAIN). * LLDs don't need to convert ATA_DEV_UNKNOWN to ATA_DEV_NONE. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-04-17 15:44:25 -04:00
Tejun Heo	071f44b1d2	libata: implement PMP helpers Implement helpers to test whether PMP is supported, attached and determine pmp number to use when issuing SRST to a link. While at it, move ata_is_host_link() so that it's together with the two new PMP helpers. This change simplifies LLDs and helps making PMP support optional. Signed-off-by: Tejun Heo <htejun@gmail.com>	2008-04-17 15:44:25 -04:00
Tejun Heo	305d2a1ab1	libata: unify mechanism to request follow-up SRST Previously, there were two ways to trigger follow-up SRST from hardreset method - returning -EAGAIN and leaving all device classes unmodified. Drivers never used the latter mechanism and the only use case for the former was when hardreset couldn't classify. Drop the latter mechanism and let -EAGAIN mean "perform follow-up SRST if classification is required". This change removes unnecessary follow-up SRSTs and simplifies reset implementations. Signed-off-by: Tejun Heo <htejun@gmail.com>	2008-04-17 15:44:23 -04:00
Tejun Heo	5958e3025f	libata: move PMP SCR access failure during reset to ata_eh_reset() If PMP fan-out reset fails and SCR isn't accessible, PMP should be reset. This used to be tested by sata_pmp_std_hardreset() and communicated to EH by -ERESTART. However, this logic is generic and doesn't really have much to do with specific hardreset implementation. This patch moves SCR access failure detection logic to ata_eh_reset() where it belongs. As this makes sata_pmp_std_hardreset() identical to sata_std_hardreset(), the function is killed and replaced with the standard method. Signed-off-by: Tejun Heo <htejun@gmail.com>	2008-04-17 15:44:23 -04:00
Tejun Heo	57c9efdfb3	libata: implement and use sata_std_hardreset() Implement sata_std_hardreset(), which simply wraps around sata_link_hardreset(). sata_std_hardreset() becomes new standard hardreset method for sata_port_ops and sata_sff_hardreset() moves from ata_base_port_ops to ata_sff_port_ops, which is where it really belongs. ata_is_builtin_hardreset() is added so that both ata_std_error_handler() and ata_sff_error_handler() skip both builtin hardresets if SCR isn't accessible. piix_sidpr_hardreset() in ata_piix.c is identical to sata_std_hardreset() in functionality and got replaced with the standard function. Signed-off-by: Tejun Heo <htejun@gmail.com>	2008-04-17 15:44:23 -04:00
Tejun Heo	9363c3825e	libata: rename SFF functions SFF functions have confusing names. Some have sff prefix, some have bmdma, some std, some pci and some none. Unify the naming by... * SFF functions which are common to both BMDMA and non-BMDMA are prefixed with ata_sff_. * SFF functions which are specific to BMDMA are prefixed with ata_bmdma_. * SFF functions which are specific to PCI but apply to both BMDMA and non-BMDMA are prefixed with ata_pci_sff_. * SFF functions which are specific to PCI and BMDMA are prefixed with ata_pci_bmdma_. * Drop generic prefixes from LLD specific routines. For example, bfin_std_dev_select -> bfin_dev_select. The following renames are noteworthy. ata_qc_issue_prot() -> ata_sff_qc_issue() ata_pci_default_filter() -> ata_bmdma_mode_filter() ata_dev_try_classify() -> ata_sff_dev_classify() This rename is in preparation of separating SFF support out of libata core layer. This patch strictly renames functions and doesn't introduce any behavior difference. Signed-off-by: Tejun Heo <htejun@gmail.com>	2008-04-17 15:44:21 -04:00
Tejun Heo	03faab7827	libata: implement ATA_QCFLAG_RETRY Currently whether a command should be retried after failure is determined inside ata_eh_finish(). Add ATA_QCFLAG_RETRY and move the logic into ata_eh_autopsy(). This makes things clearer and helps extending retry determination logic. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-04-17 15:44:20 -04:00
Tejun Heo	a1efdaba2d	libata: make reset related methods proper port operations Currently reset methods are not specified directly in the ata_port_operations table. If a LLD wants to use custom reset methods, it should construct and use a error_handler which uses those reset methods. It's done this way for two reasons. First, the ops table already contained too many methods and adding four more of them would noticeably increase the amount of necessary boilerplate code all over low level drivers. Second, as ->error_handler uses those reset methods, it can get confusing. ie. By overriding ->error_handler, those reset ops can be made useless making layering a bit hazy. Now that ops table uses inheritance, the first problem doesn't exist anymore. The second isn't completely solved but is relieved by providing default values - most drivers can just override what it has implemented and don't have to concern itself about higher level callbacks. In fact, there currently is no driver which actually modifies error handling behavior. Drivers which override ->error_handler just wraps the standard error handler only to prepare the controller for EH. I don't think making ops layering strict has any noticeable benefit. This patch makes ->prereset, ->softreset, ->hardreset, ->postreset and their PMP counterparts propoer ops. Default ops are provided in the base ops tables and drivers are converted to override individual reset methods instead of creating custom error_handler. * ata_std_error_handler() doesn't use sata_std_hardreset() if SCRs aren't accessible. sata_promise doesn't need to use separate error_handlers for PATA and SATA anymore. * softreset is broken for sata_inic162x and sata_sx4. As libata now always prefers hardreset, this doesn't really matter but the ops are forced to NULL using ATA_OP_NULL for documentation purpose. * pata_hpt374 needs to use different prereset for the first and second PCI functions. This used to be done by branching from hpt374_error_handler(). The proper way to do this is to use separate ops and port_info tables for each function. Converted. Signed-off-by: Tejun Heo <htejun@gmail.com>	2008-04-17 15:44:18 -04:00
Tejun Heo	b558edddb1	libata: kill ata_ehi_schedule_probe() ata_ehi_schedule_probe() was created to hide details of link-resuming reset magic. Now that all the softreset workarounds are gone, scheduling probe is very simple - set probe_mask and request RESET. Kill ata_ehi_schedule_probe() and open code it. This also increases consistency as ata_ehi_schedule_probe() couldn't cover individual device probings so they were open-coded even when the helper existed. While at it, define ATA_ALL_DEVICES as mask of all possible devices on a link and always use it when requesting probe on link level for simplicity and consistency. Setting extra bits in the probe_mask doesn't hurt anybody. Signed-off-by: Tejun Heo <htejun@gmail.com>	2008-04-17 15:44:16 -04:00
Tejun Heo	672b2d65ba	libata: kill ATA_EHI_RESUME_LINK ATA_EHI_RESUME_LINK has two functions - promote reset to hardreset if ATA_LFLAG_HRST_TO_RESUME is set and preventing EH from shortcutting reset action when probing is requested. The former is gone now and the latter can easily be achieved by making EH to perform at least one reset if reset is requested, which also makes more sense than depending on RESUME_LINK flag. As ATA_EHI_RESUME_LINK was the only EHI reset modifier, this also kills reset modifier handling. Signed-off-by: Tejun Heo <htejun@gmail.com>	2008-04-17 15:44:16 -04:00
Tejun Heo	cf48062658	libata: prefer hardreset When both soft and hard resets are available, libata preferred softreset till now. The logic behind it was to be softer to devices; however, this doesn't really help much. Rationales for the change: * BIOS may freeze lock certain things during boot and softreset can't unlock those. This by itself is okay but during operation PHY event or other error conditions can trigger hardreset and the device may end up with different configuration. For example, after a hardreset, previously unlockable HPA can be unlocked resulting in different device size and thus revalidation failure. Similar condition can occur during or after resume. * Certain ATAPI devices require hardreset to recover after certain error conditions. On PATA, this is done by issuing the DEVICE RESET command. On SATA, COMRESET has equivalent effect. The problem is that DEVICE RESET needs its own execution protocol. For SFF controllers with bare TF access, it can be easily implemented but more advanced controllers (e.g. ahci and sata_sil24) require specialized implementations. Simply using hardreset solves the problem nicely. * COMRESET initialization sequence is the norm in SATA land and many SATA devices don't work properly if only SRST is used. For example, some PMPs behave this way and libata works around by always issuing hardreset if the host supports PMP. Like the above example, libata has developed a number of mechanisms aiming to promote softreset to hardreset if softreset is not going to work. This approach is time consuming and error prone. Also, note that, dependingon how you read the specs, it could be argued that PMP fan-out ports require COMRESET to start operation. In fact, all the PMPs on the market except one don't work properly if COMRESET is not issued to fan-out ports after PMP reset. * COMRESET is an integral part of SATA connection and any working device should be able to handle COMRESET properly. After all, it's the way to signal hardreset during reboot. This is the most used and recommended (at least by the ahci spec) method of resetting devices. So, this patch makes libata prefer hardreset over softreset by making the following changes. * Rename ATA_EH_RESET_MASK to ATA_EH_RESET and use it whereever ATA_EH_{SOFT\|HARD}RESET used to be used. ATA_EH_{SOFT\|HARD}RESET is now only used to tell prereset whether soft or hard reset will be issued. * Strip out now unneeded promote-to-hardreset logics from ata_eh_reset(), ata_std_prereset(), sata_pmp_std_prereset() and other places. Signed-off-by: Tejun Heo <htejun@gmail.com>	2008-04-17 15:44:15 -04:00
Tejun Heo	3ec25ebd69	libata: ATA_EHI_LPM should be ATA_EH_LPM EH actions are ATA_EH_* not ATA_EHI_*. Rename ATA_EHI_LPM to ATA_EH_LPM. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-03-29 12:21:31 -04:00
Tejun Heo	eec59f76e9	libata: allow LLDs w/o any reset method Some old SFF controllers don't have any way to reset the channel. Currently, this isn't supported and libata EH causes an oops. Allow LLDs w/o any reset method and just assume ATA class in such cases. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-03-10 20:50:34 -04:00
Tejun Heo	3326732570	libata: implement libata.force module parameter This patch implements libata.force module parameter which can selectively override ATA port, link and device configurations including cable type, SATA PHY SPD limit, transfer mode and NCQ. For example, you can say "use 1.5Gbps for all fan-out ports attached to the second port but allow 3.0Gbps for the PMP device itself, oh, the device attached to the third fan-out port chokes on NCQ and shouldn't go over UDMA4" by the following. libata.force=2:1.5g,2.15:3.0g,2.03:noncq,udma4 Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-02-20 12:12:28 -05:00
Tejun Heo	75f9cafc2d	libata: fix off-by-one in error categorization ATA_ECAT_DUBIOUS_BASE was too high by one and thus all DUBIOUS error categorizations were wrong. This passed test because only ATA_BUS and UNK_DEV were used during testing and the ones after them - ATA_BUS and an overflowed entry - behaved similarly. This patch fixes the problem by adding DUBIOUS_NONE category and use it as base. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-01-23 05:24:17 -05:00
Tejun Heo	0dc36888d4	libata: rename ATA_PROT_ATAPI_* to ATAPI_PROT_* ATA_PROT_ATAPI_* are ugly and naming schemes between ATA_PROT_* and ATA_PROT_ATAPI_* are inconsistent causing confusion. Rename them to ATAPI_PROT_* and make them consistent with ATA counterpart. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-01-23 05:24:14 -05:00
Andrew Morton	e6a73ab1c8	drivers/ata/libata-eh.c: fix printk warning drivers/ata/libata-eh.c: In function `ata_port_pbar_desc': drivers/ata/libata-eh.c:215: warning: long long unsigned int format, long unsigned int arg (arg 4) Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-01-23 05:24:13 -05:00
Jeff Garzik	e39eec13ff	[libata] Build fix WRT ata_is_xxx() new API introduction Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-01-23 05:24:11 -05:00
Tejun Heo	76326ac1ac	libata: implement fast speed down for unverified data transfer mode It's very likely that the configured data transfer mode is the wrong one if device fails data transfers right after initial data transfer mode configuration (including NCQ on/off and xfermode). libata EH needs to speed down fast before upper layers give up on probing. This patch implement fast speed down rules to handle such cases better. Error occured while data transfer hasn't been verified trigger fast back-to-back speed down actions until data transfer works. This change will make cable mis-detection and other initial configuration problems corrected before partition scanning code gives up. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-01-23 05:24:11 -05:00
Tejun Heo	00115e0f5b	libata: implement ATA_DFLAG_DUBIOUS_XFER ATA_DFLAG_DUBIOUS_XFER is set whenever data transfer speed or method changes and gets cleared when data transfer command succeeds in the newly configured transfer mode. This will be used to improve speed down logic. Signed-off-by: Tejun Heo <htejun@gmail.com< Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-01-23 05:24:11 -05:00
Tejun Heo	663f99b86a	libata: adjust speed down rules Speed down rules were too conservative. Adjust them a bit. * More than 10 timeouts can't happen in 5 minutes as command timeout is 30secs. Lower the limit for rule #1 to 6. * 10 timeouts is too high for rule #3 too. Lower it to 6. * SATAPI can benefit from falling back to PIO too. Allow SATAPI devices to fall back to PIO. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-01-23 05:24:11 -05:00
Tejun Heo	3884f7b0a8	libata: clean up EH speed down implementation Clean up EH speed down implementation. * is_io boolean variable is replaced eflags. is_io is ATA_EFLAG_IS_IO. * Error categories now have names. * Better comments. * Reorder 5min and 10min rules in ata_eh_speed_down_verdict() * Use local variable @link to cache @dev->link in ata_eh_speed_down() These changes are to improve readability and ease further changes. This patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-01-23 05:24:10 -05:00
Tejun Heo	6f1d1e3a03	libata: move ata_set_mode() to libata-eh.c Move ata_set_mode() to libata-eh.c. ata_set_mode() is surely an EH action and will be more tightly coupled with the rest of error handling. Move it to libata-eh.c. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-01-23 05:24:10 -05:00
Tejun Heo	02c05a27e8	libata: factor out ata_eh_schedule_probe() Factor out ata_eh_schedule_probe() from ata_eh_handle_dev_fail() and ata_eh_recover(). This is to improve maintainability and make future changes easier. In the previous revision, ata_dev_enabled() test was accidentally dropped while factoring out. This problem was spotted by Bartlomiej. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-01-23 05:24:10 -05:00
Shaohua Li	bd3adca52b	libata-acpi: add ACPI _PSx method ACPI spec (ver 3.0a, p289) requires IDE power on/off executes ACPI _PSx methods. As recently most PATA drivers use libata, this patch adds _PSx method support in libata. ACPI spec doesn't mention if SATA requires the same _PSx method. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Len Brown <len.brown@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-01-23 05:24:09 -05:00
Tejun Heo	4ccd3329a2	libata: don't normalize UNKNOWN to NONE after reset After non-classifying reset, ehc->classes[] could contain ATA_DEV_UNKNOWN which used to be normalized to ATA_DEV_NONE for consistency. However, this causes unfortunate side effect for drivers which have non-classifying hardresets (e.g. sata_nv) by making hardreset report ATA_DEV_NONE for non-classifying resets and thus makes EH believe that the port is unoccupied and recovery can be skipped. The end result is that after a device is swapped with another one, the new device isn't attached after the old one is detached. This patch makes ata_eh_reset() not normalize UNKNOWN to NONE after non-classifying resets. This fixes the above problem. As UNKNOWN and NONE are handled differently by only EH hotplug logic, this doesn't cause other behavior changes. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Robert Hancock <hancockr@shaw.ca> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-01-10 16:53:22 -05:00
Tejun Heo	2695e36616	libata-pmp: propagate timeout to host link Timeout on downstream command may indicate transmission problem on host link. Propagate timeouts to host link. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2008-01-10 16:53:16 -05:00
Tejun Heo	f2dfc1a12b	libata: update atapi_eh_request_sense() such that lbam/lbah contains buffer size While updating lbam/h for ATAPI commands, atapi_eh_request_sense() was left out. Update it. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-12-17 20:33:15 -05:00
Tejun Heo	abb6a88974	libata: report protocol and full CDB on error Protocol and CDB allocation size field are important in determining what went wrong with ATAPI commands. Report them on failure. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-12-01 17:35:58 -05:00

1 2 3 4 5

239 Commits