linux/drivers/media/video/ivtv
Michael d213ad0836 [media] ivtv: Fix corrective action taken upon DMA ERR interrupt to avoid hang
After upgrading the kernel from stock Ubuntu 7.10 to
10.04, with no hardware changes, I started getting the dreaded DMA
TIMEOUT errors, followed by inability to encode until the machine was
rebooted.

I came across a post from Andy in March
(http://www.gossamer-threads.com/lists/ivtv/users/40943#40943) where he
speculates that perhaps the corrective actions being taken after a DMA
ERROR are not sufficient to recover the situation.  After some testing
I suspect that this is indeed the case, and that in fact the corrective
action may be what hangs the card's DMA engine, rather than the
original error.

Specifically these DMA ERROR IRQs seem to present with two different
values in the IVTV_REG_DMASTATUS register: 0x11 and 0x13.  The current
corrective action is to clear that status register back to 0x01 or
0x03, and then issue the next DMA request.  In the case of a 0x13 this
seems to result in a minor glitch in the encoded stream due to the
failed transfer that was not retried, but otherwise things continue OK.
In the case of a 0x11 the card's DMA write engine is never heard from
again, and a DMA TIMEOUT follows shortly after.  0x11 is the killer.

I suspect that the two cases need to be handled differently.  The
difference is in bit 1 (0x02), which is set when the error is about to
be successfully recovered, and clear when things are about to go bad.

Bit 1 of DMASTATUS is described differently in different places either
as a positive "write finished", or an inverted "write busy".  If we
take the first definition, then when an error arises with state 0x11,
it means that the write did not complete.   It makes sense to start a
new transfer, as in the current code.  But if we take the second
definition, then 0x11 means "an error but the write engine is still
busy".  Trying to feed it a new transfer in this situation might not be
a good idea.

As an experiment, I added code to ignore the DMA ERROR IRQ if DMASTATUS
is 0x11.  I.e., don't start a new transfer, don't clear our flags, etc.
The hope was that the card would complete the transfer and issue a ENC
DMA COMPLETE, either successfully or with an error condition there.
However the card still hung.

The only remaining corrective action being taken with a 0x11 status was
then the write back to the status register to clear the error, i.e.
DMASTATUS = DMASTATUS & ~3.  This would have the effect of clearing the
error bit 4, while leaving the lower bits indicating DMA write busy.

Strangely enough, removing this write to the status register solved the
problem!  If the DMA ERROR IRQ with DMASTATUS=0x11 is completely
ignored, with no corrective action at all, then the card will complete
the transfer and issue a new IRQ.  If the status register is written to
when it has the value 0x11, then the DMA engine hangs.  Perhaps it's
illegal to write to
DMASTATUS while the read or write busy bit is set?  At any rate, it
appears that the current corrective action is indeed making things
worse rather than better.

I put together a patch that modifies ivtv_irq_dma_err to do the
following:

- Don't write back to IVTV_REG_DMASTATUS.
- If write-busy is asserted, leave the card alone.  Just extend the
timeout slightly.
- If write-busy is de-asserted, retry the current transfer.

This has completely fixed my DMA TIMEOUT woes.  DMA ERR events still
occur, but now they seem to be correctly handled.  0x11 events no
longer hang the card, and 0x13 events no longer result in a glitch in
the stream, as the failed transfer is retried.  I'm happy.

I've inlined the patch below in case it is of interest.  As described
above, I have a theory about why it works (based on a different
interpretation of bit 1 of DMASTATUS), but I can't guarantee that my
theory is correct.  There may be another explanation, or it may be a
fluke.  Maybe ignoring that IRQ entirely would be equally effective?
Maybe the status register read/writeback sequence is race condition if
the card changes it in the mean time?  Also as I am using a PVR-150
only, I have not been able to test it on other cards, which may be
especially relevant for 350s that support concurrent decoding.
Hopefully the patch does not break the DMA READ path.

Mike

[awalls@md.metrocast.net: Modified patch to add a verbose comment, make minor
brace reformats, and clear the error flags in the IVTV_REG_DMASTATUS iff both
read and write DMA were not in progress.  Mike's conjecture about a race
condition with the writeback is correct; it can confuse the DMA engine.]

[Comment and analysis from the ML post by Michael <mike@rsy.com>]
Signed-off-by: Andy Walls <awalls@md.metrocast.net>
Cc: stable@kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-03-02 13:45:42 -03:00
..
Kconfig [media] rc: rename the remaining things to rc_core 2010-12-29 08:16:50 -02:00
Makefile
ivtv-cards.c [media] ivtv: Add Adaptec Remote Controller 2010-12-30 12:17:08 -02:00
ivtv-cards.h [media] ivtv: Add Adaptec Remote Controller 2010-12-30 12:17:08 -02:00
ivtv-controls.c V4L/DVB: ivtv: convert to the new control framework 2010-08-08 23:43:06 -03:00
ivtv-controls.h V4L/DVB: ivtv: convert to the new control framework 2010-08-08 23:43:06 -03:00
ivtv-driver.c [media] ivtv, cx18: Make ioremap failure messages more useful for users 2010-12-29 08:17:09 -02:00
ivtv-driver.h V4L/DVB: V4L2: avoid name conflicts in macros 2010-10-21 01:04:48 -02:00
ivtv-fileops.c [media] ivtv: Return EFAULT when copy_from_user() fails in ivtv_write_vbi_from_user() 2010-12-29 08:17:10 -02:00
ivtv-fileops.h V4L/DVB (10135): v4l2: introduce v4l2_file_operations. 2009-01-02 17:11:12 -02:00
ivtv-firmware.c V4L/DVB: ivtv: convert to the new control framework 2010-08-08 23:43:06 -03:00
ivtv-firmware.h V4L/DVB: ivtv: Add firmare monitoring and debug mode to ignore firmware problems 2010-08-02 14:47:55 -03:00
ivtv-gpio.c V4L/DVB: ivtv: convert gpio subdev to new control framework 2010-08-08 23:43:05 -03:00
ivtv-gpio.h V4L/DVB (9835): ivtv/ivtvfb: convert to v4l2_device/v4l2_subdev. 2008-12-30 09:38:44 -02:00
ivtv-i2c.c [media] v4l2-subdev: remove core.s_config and v4l2_i2c_new_subdev_cfg() 2011-01-19 11:45:30 -02:00
ivtv-i2c.h V4L/DVB (13443): ivtv: Defer legacy I2C IR probing until after setup of known I2C devices 2009-12-05 18:41:56 -02:00
ivtv-ioctl.c V4L/DVB: v4l: remove unused i2c-id.h headers 2010-10-21 01:06:05 -02:00
ivtv-ioctl.h V4L/DVB (9325): ivtv: switch to unlocked_ioctl. 2008-10-21 14:31:28 -02:00
ivtv-irq.c [media] ivtv: Fix corrective action taken upon DMA ERR interrupt to avoid hang 2011-03-02 13:45:42 -03:00
ivtv-irq.h ivtv: use kthread_worker instead of workqueue 2010-06-29 10:07:09 +02:00
ivtv-mailbox.c V4L/DVB: ivtv: Automatic firmware reload 2010-08-02 14:48:13 -03:00
ivtv-mailbox.h V4L/DVB: ivtv: Automatic firmware reload 2010-08-02 14:48:13 -03:00
ivtv-queue.c ivtv: Fix PCI DMA direction 2009-06-05 08:05:23 -07:00
ivtv-queue.h V4L/DVB (10488): ivtv: cleanup naming conventions 2009-03-30 12:42:41 -03:00
ivtv-routing.c V4L/DVB (11380): v4l2-subdev: change s_routing prototype 2009-04-06 21:44:27 -03:00
ivtv-routing.h
ivtv-streams.c V4L/DVB: ivtv: convert to the new control framework 2010-08-08 23:43:06 -03:00
ivtv-streams.h
ivtv-udma.c V4L/DVB: ivtv: Fix race condition for queued udma transfers 2010-02-26 15:10:41 -03:00
ivtv-udma.h V4L/DVB (10488): ivtv: cleanup naming conventions 2009-03-30 12:42:41 -03:00
ivtv-vbi.c [media] ivtv: Return EFAULT when copy_from_user() fails in ivtv_write_vbi_from_user() 2010-12-29 08:17:10 -02:00
ivtv-vbi.h [media] ivtv: Return EFAULT when copy_from_user() fails in ivtv_write_vbi_from_user() 2010-12-29 08:17:10 -02:00
ivtv-version.h V4L/DVB: ivtv: Increment driver version due to firmware loading changes 2010-08-02 14:48:30 -03:00
ivtv-yuv.c V4L/DVB (11674): ivtv: fix incorrect bit tests 2009-05-09 18:53:10 -03:00
ivtv-yuv.h V4L/DVB (7963): ivtv: trivial annotations 2008-06-05 06:35:49 -03:00
ivtvfb.c V4L/DVB: ivtvfb: prevent reading uninitialized stack memory 2010-09-27 22:22:11 -03:00