linux/drivers
Michael d213ad0836 [media] ivtv: Fix corrective action taken upon DMA ERR interrupt to avoid hang
After upgrading the kernel from stock Ubuntu 7.10 to
10.04, with no hardware changes, I started getting the dreaded DMA
TIMEOUT errors, followed by inability to encode until the machine was
rebooted.

I came across a post from Andy in March
(http://www.gossamer-threads.com/lists/ivtv/users/40943#40943) where he
speculates that perhaps the corrective actions being taken after a DMA
ERROR are not sufficient to recover the situation.  After some testing
I suspect that this is indeed the case, and that in fact the corrective
action may be what hangs the card's DMA engine, rather than the
original error.

Specifically these DMA ERROR IRQs seem to present with two different
values in the IVTV_REG_DMASTATUS register: 0x11 and 0x13.  The current
corrective action is to clear that status register back to 0x01 or
0x03, and then issue the next DMA request.  In the case of a 0x13 this
seems to result in a minor glitch in the encoded stream due to the
failed transfer that was not retried, but otherwise things continue OK.
In the case of a 0x11 the card's DMA write engine is never heard from
again, and a DMA TIMEOUT follows shortly after.  0x11 is the killer.

I suspect that the two cases need to be handled differently.  The
difference is in bit 1 (0x02), which is set when the error is about to
be successfully recovered, and clear when things are about to go bad.

Bit 1 of DMASTATUS is described differently in different places either
as a positive "write finished", or an inverted "write busy".  If we
take the first definition, then when an error arises with state 0x11,
it means that the write did not complete.   It makes sense to start a
new transfer, as in the current code.  But if we take the second
definition, then 0x11 means "an error but the write engine is still
busy".  Trying to feed it a new transfer in this situation might not be
a good idea.

As an experiment, I added code to ignore the DMA ERROR IRQ if DMASTATUS
is 0x11.  I.e., don't start a new transfer, don't clear our flags, etc.
The hope was that the card would complete the transfer and issue a ENC
DMA COMPLETE, either successfully or with an error condition there.
However the card still hung.

The only remaining corrective action being taken with a 0x11 status was
then the write back to the status register to clear the error, i.e.
DMASTATUS = DMASTATUS & ~3.  This would have the effect of clearing the
error bit 4, while leaving the lower bits indicating DMA write busy.

Strangely enough, removing this write to the status register solved the
problem!  If the DMA ERROR IRQ with DMASTATUS=0x11 is completely
ignored, with no corrective action at all, then the card will complete
the transfer and issue a new IRQ.  If the status register is written to
when it has the value 0x11, then the DMA engine hangs.  Perhaps it's
illegal to write to
DMASTATUS while the read or write busy bit is set?  At any rate, it
appears that the current corrective action is indeed making things
worse rather than better.

I put together a patch that modifies ivtv_irq_dma_err to do the
following:

- Don't write back to IVTV_REG_DMASTATUS.
- If write-busy is asserted, leave the card alone.  Just extend the
timeout slightly.
- If write-busy is de-asserted, retry the current transfer.

This has completely fixed my DMA TIMEOUT woes.  DMA ERR events still
occur, but now they seem to be correctly handled.  0x11 events no
longer hang the card, and 0x13 events no longer result in a glitch in
the stream, as the failed transfer is retried.  I'm happy.

I've inlined the patch below in case it is of interest.  As described
above, I have a theory about why it works (based on a different
interpretation of bit 1 of DMASTATUS), but I can't guarantee that my
theory is correct.  There may be another explanation, or it may be a
fluke.  Maybe ignoring that IRQ entirely would be equally effective?
Maybe the status register read/writeback sequence is race condition if
the card changes it in the mean time?  Also as I am using a PVR-150
only, I have not been able to test it on other cards, which may be
especially relevant for 350s that support concurrent decoding.
Hopefully the patch does not break the DMA READ path.

Mike

[awalls@md.metrocast.net: Modified patch to add a verbose comment, make minor
brace reformats, and clear the error flags in the IVTV_REG_DMASTATUS iff both
read and write DMA were not in progress.  Mike's conjecture about a race
condition with the writeback is correct; it can confuse the DMA engine.]

[Comment and analysis from the ML post by Michael <mike@rsy.com>]
Signed-off-by: Andy Walls <awalls@md.metrocast.net>
Cc: stable@kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-03-02 13:45:42 -03:00
..
accessibility
acpi ACPI / PM: Call suspend_nvs_free() earlier during resume 2011-01-20 18:30:17 -08:00
amba
ata kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
atm Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-01-13 10:05:56 -08:00
auxdisplay
base kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
block Merge branch 'for-2.6.38/drivers' of git://git.kernel.dk/linux-2.6-block 2011-01-13 10:50:24 -08:00
bluetooth
cdrom Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block 2011-01-13 10:45:01 -08:00
char Merge branch 'akpm' 2011-01-20 17:02:14 -08:00
clk
clocksource
connector
cpufreq kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
cpuidle Merge branch 'cpuidle-perf-events' into idle-test 2011-01-12 18:06:19 -05:00
crypto Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2011-01-13 10:25:58 -08:00
dca dca: remove unneeded NULL check 2011-01-13 08:03:09 -08:00
dio
dma Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx 2011-01-17 10:54:41 -08:00
edac Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-01-13 10:05:56 -08:00
eisa
firewire Merge branches 'fixes' and 'fwnet' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 2011-01-21 13:34:39 -08:00
firmware kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
gpio Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 2011-01-14 09:08:00 -08:00
gpu Merge branch 'akpm' 2011-01-20 17:02:14 -08:00
hid kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
hwmon hwmon: (lm93) Add support for LM94 2011-01-18 12:22:54 -08:00
i2c i2c: Encourage move to dev_pm_ops by warning on use of legacy methods 2011-01-14 22:03:50 +01:00
ide kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
idle Merge branch 'cpuidle-perf-events' into idle-test 2011-01-12 18:06:19 -05:00
ieee802154
infiniband kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
input kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
isdn Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-01-13 10:05:56 -08:00
leds drivers/leds/ledtrig-gpio.c: make output match input, tighten input checking 2011-01-20 17:02:06 -08:00
lguest lguest: compile fixes 2011-01-20 21:37:29 +10:30
macintosh powerpc/macintosh: Fix wrong test in fan_{read,write}_reg() 2011-01-21 14:08:34 +11:00
mca
md block: restore multiple bd_link_disk_holder() support 2011-01-14 18:44:22 +01:00
media [media] ivtv: Fix corrective action taken upon DMA ERR interrupt to avoid hang 2011-03-02 13:45:42 -03:00
memstick memstick: factor out transfer initiating functionality in mspro_block.c 2011-01-13 08:03:22 -08:00
message
mfd mfd: ab8500-core chip version cut 2.0 support 2011-01-14 12:38:18 +01:00
misc misc: Make AB8500_PWM driver depend on U8500 due to PWM breakage 2011-01-14 12:38:12 +01:00
mmc mmc: sdhci-of: fix build on non-powerpc platforms 2011-01-14 00:22:44 -07:00
mtd Merge git://git.infradead.org/mtd-2.6 2011-01-17 11:15:30 -08:00
net kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
nfc drivers/nfc/pn544.c: fix min_t warnings 2011-01-16 17:28:21 -08:00
nubus
of dt/flattree: Return virtual address from early_init_dt_alloc_memory_arch() 2011-01-15 22:01:58 -07:00
oprofile
parisc
parport
pci kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
pcmcia kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
platform Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 2011-01-13 20:15:35 -08:00
pnp Merge branch 'pnp' into release 2011-01-12 04:59:44 -05:00
power Merge git://git.infradead.org/battery-2.6 2011-01-14 09:25:59 -08:00
pps pps: add parallel port PPS signal generator 2011-01-13 08:03:21 -08:00
ps3
rapidio rapidio: add new idt sRIO switches 2011-01-13 08:03:18 -08:00
regulator regulator: Support MAX8998/LP3974 DVS-GPIO 2011-01-14 12:38:16 +01:00
rtc mfd: Support LP3974 RTC 2011-01-14 12:38:16 +01:00
s390 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2011-01-19 20:25:45 -08:00
sbus sparc: explicitly cast negative phandle checks to s32 2011-01-03 20:02:06 -07:00
scsi Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-post-merge-2.6 2011-01-14 16:29:49 -08:00
sfi SFI: use ioremap_cache() instead of ioremap() 2011-01-11 23:27:25 -05:00
sh headers: kobject.h redux 2011-01-10 08:51:44 -08:00
sn
spi Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus 2011-01-18 14:28:48 -08:00
ssb kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
staging [media] lirc_zilog: z8 on usb doesn't like back-to-back i2c_master_send 2011-01-31 12:18:42 -02:00
target [SCSI] target: Add LIO target core v4.0.0-rc6 2011-01-14 10:12:29 -06:00
tc
telephony Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2011-01-13 10:05:56 -08:00
thermal Merge branch 'misc' into release 2011-01-12 05:14:15 -05:00
tty Merge branch 'akpm' 2011-01-20 17:02:14 -08:00
uio
usb kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT 2011-01-20 17:02:05 -08:00
uwb
vhost vhost: fix signed/unsigned comparison 2011-01-10 10:03:39 +02:00
video backlight: fix 88pm860x_bl macro collision 2011-01-20 17:02:06 -08:00
virtio virtio: remove virtio-pci root device 2011-01-20 21:37:30 +10:30
vlynq
w1 w1: DS2423 counter driver and documentation 2011-01-13 08:03:22 -08:00
watchdog watchdog: Add MCF548x watchdog driver. 2011-01-12 13:51:35 +00:00
xen Merge branch 'xen/xenbus' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen 2011-01-20 16:37:28 -08:00
zorro
Kconfig [SCSI] target: Add LIO target core v4.0.0-rc6 2011-01-14 10:12:29 -06:00
Makefile Merge branch 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 2011-01-20 16:39:23 -08:00