linux/arch/arm
Grygorii Strashko fe8291e82b ARM: OMAP2+: omap-device: fix race deferred probe of omap_hsmmc vs omap_device_late_init
Kernel fails to boot 50% of times (form build to build) with
RT-patchset applied due to the following race - on late boot
stages deferred_probe_work_func->omap_hsmmc_probe races with omap_device_late_ini.

The same issue has been reported now on linux-next (4.3) by Keerthy [1]

late_initcall
 - deferred_probe_initcal() tries to re-probe all pending driver's probe.

- later on, some driver is probing in this case It's cpsw.c
  (but could be any other drivers)
  cpsw_init
  - platform_driver_register
    - really_probe
       - driver_bound
         - driver_deferred_probe_trigger
  and boot proceed.
  So, at this moment we have deferred_probe_work_func scheduled.

late_initcall_sync
  - omap_device_late_init
    - omap_device_idle

CPU1					CPU2
  - deferred_probe_work_func
    - really_probe
      - omap_hsmmc_probe
	- pm_runtime_get_sync
					late_initcall_sync
					- omap_device_late_init
						if (od->_driver_status != BUS_NOTIFY_BOUND_DRIVER) {
							if (od->_state == OMAP_DEVICE_STATE_ENABLED) {
								- omap_device_idle [ops - IP is disabled]
	- [fail]
	- pm_runtime_put_sync
          - omap_hsmmc_runtime_suspend [ooops!]

== log ==
 omap_hsmmc 480b4000.mmc: unable to get vmmc regulator -517
 davinci_mdio 48485000.mdio: davinci mdio revision 1.6
 davinci_mdio 48485000.mdio: detected phy mask fffffff3
 libphy: 48485000.mdio: probed
 davinci_mdio 48485000.mdio: phy[2]: device 48485000.mdio:02, driver unknown
 davinci_mdio 48485000.mdio: phy[3]: device 48485000.mdio:03, driver unknown
 omap_hsmmc 480b4000.mmc: unable to get vmmc regulator -517
 cpsw 48484000.ethernet: Detected MACID = b4:99:4c:c7:d2:48
 cpsw 48484000.ethernet: cpsw: Detected MACID = b4:99:4c:c7:d2:49
 hctosys: unable to open rtc device (rtc0)
 omap_hsmmc 480b4000.mmc: omap_device_late_idle: enabled but no driver.  Idling
 ldousb: disabling
 Unhandled fault: imprecise external abort (0x1406) at 0x00000000
 [00000000] *pgd=00000000
 Internal error: : 1406 [#1] PREEMPT SMP ARM
 Modules linked in:
 CPU: 1 PID: 58 Comm: kworker/u4:1 Not tainted 4.1.2-rt1-00467-g6da3c0a-dirty #5
 Hardware name: Generic DRA74X (Flattened Device Tree)
 Workqueue: deferwq deferred_probe_work_func
 task: ee6ddb00 ti: edd3c000 task.ti: edd3c000
 PC is at omap_hsmmc_runtime_suspend+0x1c/0x12c
 LR is at _od_runtime_suspend+0xc/0x24
 pc : [<c0471998>]    lr : [<c0029590>]    psr: a0000013
 sp : edd3dda0  ip : ee6ddb00  fp : c07be540
 r10: 00000000  r9 : c07be540  r8 : 00000008
 r7 : 00000000  r6 : ee646c10  r5 : ee646c10  r4 : edd79380
 r3 : fa0b4100  r2 : 00000000  r1 : 00000000  r0 : ee646c10
 Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
 Control: 10c5387d  Table: 8000406a  DAC: 00000015
 Process kworker/u4:1 (pid: 58, stack limit = 0xedd3c218)
 Stack: (0xedd3dda0 to 0xedd3e000)
 dda0: ee646c70 ee646c10 c0029584 00000000 00000008 c0029590 ee646c70 ee646c10
 ddc0: c0029584 c03adfb8 ee646c10 00000004 0000000c c03adff0 ee646c10 00000004
 dde0: 0000000c c03ae4ec 00000000 edd3c000 ee646c10 00000004 ee646c70 00000004
 de00: fa0b4000 c03aec20 ee6ddb00 ee646c10 00000004 ee646c70 ee646c10 fffffdfb
 de20: edd79380 00000000 fa0b4000 c03aee90 fffffdfb edd79000 ee646c00 c0474290
 de40: 00000000 edda24c0 edd79380 edc81f00 00000000 00000200 00000001 c06dd488
 de60: edda3960 ee646c10 ee646c10 c0824cc4 fffffdfb c0880c94 00000002 edc92600
 de80: c0836378 c03a7f84 ee646c10 c0824cc4 00000000 c0880c80 c0880c94 c03a6568
 dea0: 00000000 ee646c10 c03a66ac ee4f8000 00000000 00000001 edc92600 c03a4b40
 dec0: ee404c94 edc83c4c ee646c10 ee646c10 ee646c44 c03a63c4 ee646c10 ee646c10
 dee0: c0814448 c03a5aa8 ee646c10 c0814220 edd3c000 c03a5ec0 c0814250 ee6be400
 df00: edd3c000 c004e5bc ee6ddb01 00000078 ee6ddb00 ee4f8000 ee6be418 edd3c000
 df20: ee4f8028 00000088 c0836045 ee4f8000 ee6be400 c004e928 ee4f8028 00000000
 df40: c004e8ec 00000000 ee6bf1c0 ee6be400 c004e8ec 00000000 00000000 00000000
 df60: 00000000 c0053450 2e56fa97 00000000 afdffbd7 ee6be400 00000000 00000000
 df80: edd3df80 edd3df80 00000000 00000000 edd3df90 edd3df90 edd3dfac ee6bf1c0
 dfa0: c0053384 00000000 00000000 c000f668 00000000 00000000 00000000 00000000
 dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
 dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 f1fc9d7e febfbdff
 [<c0471998>] (omap_hsmmc_runtime_suspend) from [<c0029590>] (_od_runtime_suspend+0xc/0x24)
 [<c0029590>] (_od_runtime_suspend) from [<c03adfb8>] (__rpm_callback+0x24/0x3c)
 [<c03adfb8>] (__rpm_callback) from [<c03adff0>] (rpm_callback+0x20/0x80)
 [<c03adff0>] (rpm_callback) from [<c03ae4ec>] (rpm_suspend+0xe4/0x618)
 [<c03ae4ec>] (rpm_suspend) from [<c03aee90>] (__pm_runtime_idle+0x60/0x80)
 [<c03aee90>] (__pm_runtime_idle) from [<c0474290>] (omap_hsmmc_probe+0x6bc/0xa7c)
 [<c0474290>] (omap_hsmmc_probe) from [<c03a7f84>] (platform_drv_probe+0x44/0xa4)
 [<c03a7f84>] (platform_drv_probe) from [<c03a6568>] (driver_probe_device+0x170/0x2b4)
 [<c03a6568>] (driver_probe_device) from [<c03a4b40>] (bus_for_each_drv+0x64/0x98)
 [<c03a4b40>] (bus_for_each_drv) from [<c03a63c4>] (device_attach+0x70/0x88)
 [<c03a63c4>] (device_attach) from [<c03a5aa8>] (bus_probe_device+0x84/0xac)
 [<c03a5aa8>] (bus_probe_device) from [<c03a5ec0>] (deferred_probe_work_func+0x58/0x88)
 [<c03a5ec0>] (deferred_probe_work_func) from [<c004e5bc>] (process_one_work+0x134/0x464)
 [<c004e5bc>] (process_one_work) from [<c004e928>] (worker_thread+0x3c/0x4fc)
 [<c004e928>] (worker_thread) from [<c0053450>] (kthread+0xcc/0xe4)
 [<c0053450>] (kthread) from [<c000f668>] (ret_from_fork+0x14/0x2c)
 Code: e594302c e593202c e584205c e594302c (e5932128)
 ---[ end trace 0000000000000002 ]---

The issue happens because omap_device_late_init() do not take into
account that some drivers are present, but their probes were not
finished successfully and where deferred instead. This is the valid
case, and omap_device_late_init() should not idle such devices.

To fix this issue, the value of omap_device->_driver_status field
should be checked not only for BUS_NOTIFY_BOUND_DRIVER (driver is
present and has been bound to device successfully), but also checked
for BUS_NOTIFY_BIND_DRIVER (driver about to be bound) - which means
driver is present and there was try to bind it to device.

[1] http://www.spinics.net/lists/arm-kernel/msg441880.html
Cc: Tero Kristo <t-kristo@ti.com>
Cc: Keerthy <j-keerthy@ti.com>
Tested-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2015-09-01 13:59:24 -07:00
..
boot ARM: DT updates for v4.3 2015-09-01 13:09:20 -07:00
common Merge branch 'queue/irq/arm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next/cleanup 2015-08-05 17:24:11 +02:00
configs ARM: SoC defconfig updates for v4.3 2015-09-01 13:17:43 -07:00
crypto
firmware
include ARM: SoC platform updates for v4.3 2015-09-01 12:18:40 -07:00
kernel ARM: SoC cleanups for v4.3 2015-09-01 12:10:20 -07:00
kvm
lib ARM: 8414/1: __copy_to_user_memcpy: fix mmap semaphore usage 2015-08-18 13:59:59 +01:00
mach-alpine
mach-asm9260
mach-at91 ARM: SoC platform updates for v4.3 2015-09-01 12:18:40 -07:00
mach-axxia
mach-bcm ARM: SoC platform updates for v4.3 2015-09-01 12:18:40 -07:00
mach-berlin
mach-clps711x ARM: appropriate __init annotation for const data 2015-07-28 13:55:27 +02:00
mach-cns3xxx ARM: appropriate __init annotation for const data 2015-07-28 13:55:27 +02:00
mach-davinci ARM: SoC platform updates for v4.3 2015-09-01 12:18:40 -07:00
mach-digicolor ARM: appropriate __init annotation for const data 2015-07-28 13:55:27 +02:00
mach-dove Merge branch 'queue/irq/arm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next/cleanup 2015-08-05 17:24:11 +02:00
mach-ebsa110 ARM: kill off set_irq_flags usage 2015-07-28 13:58:13 +02:00
mach-efm32
mach-ep93xx ARM/fb: ep93xx: switch framebuffer to use modedb only 2015-08-13 12:25:44 +02:00
mach-exynos ARM: SoC cleanups for v4.3 2015-09-01 12:10:20 -07:00
mach-footbridge ARM: kill off set_irq_flags usage 2015-07-28 13:58:13 +02:00
mach-gemini ARM: gemini: Setup timer3 as free running timer 2015-08-13 11:41:52 +02:00
mach-highbank
mach-hisi
mach-imx ARM: SoC platform updates for v4.3 2015-09-01 12:18:40 -07:00
mach-integrator
mach-iop13xx ARM: kill off set_irq_flags usage 2015-07-28 13:58:13 +02:00
mach-iop32x ARM: kill off set_irq_flags usage 2015-07-28 13:58:13 +02:00
mach-iop33x ARM: kill off set_irq_flags usage 2015-07-28 13:58:13 +02:00
mach-ixp4xx ARM: kill off set_irq_flags usage 2015-07-28 13:58:13 +02:00
mach-keystone
mach-ks8695 ARM: kill off set_irq_flags usage 2015-07-28 13:58:13 +02:00
mach-lpc18xx
mach-lpc32xx Merge branch 'queue/irq/arm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next/cleanup 2015-08-05 17:24:11 +02:00
mach-mediatek ARM: mediatek: Add regmap to mediatek Kconfig 2015-07-23 20:01:42 +02:00
mach-meson
mach-mmp ARM: appropriate __init annotation for const data 2015-07-28 13:55:27 +02:00
mach-moxart
mach-mv78xx0
mach-mvebu ARM: SoC driver updates for v4.3 2015-09-01 13:00:04 -07:00
mach-mxs ARM: appropriate __init annotation for const data 2015-07-28 13:55:27 +02:00
mach-netx ARM: kill off set_irq_flags usage 2015-07-28 13:58:13 +02:00
mach-nomadik ARM: nomadik: move l2x0 setup to device tree 2015-08-11 15:29:59 +02:00
mach-nspire
mach-omap1 ARM: kill off set_irq_flags usage 2015-07-28 13:58:13 +02:00
mach-omap2 ARM: OMAP2+: omap-device: fix race deferred probe of omap_hsmmc vs omap_device_late_init 2015-09-01 13:59:24 -07:00
mach-orion5x ARM: DT updates for v4.3 2015-09-01 13:09:20 -07:00
mach-picoxcell
mach-prima2
mach-pxa ARM: SoC platform updates for v4.3 2015-09-01 12:18:40 -07:00
mach-qcom
mach-realview ARM: appropriate __init annotation for const data 2015-07-28 13:55:27 +02:00
mach-rockchip ARM: rockchip: pm: Fix PTR_ERR() argument 2015-08-24 12:39:14 +02:00
mach-rpc ARM: kill off set_irq_flags usage 2015-07-28 13:58:13 +02:00
mach-s3c24xx Samsung cleanup for v4.3 2015-08-11 14:59:02 +02:00
mach-s3c64xx ARM: SoC cleanups for v4.3 2015-09-01 12:10:20 -07:00
mach-s5pv210
mach-sa1100 ARM: kill off set_irq_flags usage 2015-07-28 13:58:13 +02:00
mach-shmobile ARM: SoC driver updates for v4.3 2015-09-01 13:00:04 -07:00
mach-socfpga
mach-spear ARM: SoC cleanups for v4.3 2015-09-01 12:10:20 -07:00
mach-sti ARM: SoC platform updates for v4.3 2015-09-01 12:18:40 -07:00
mach-stm32
mach-sunxi
mach-tegra ARM: SoC driver updates for v4.3 2015-09-01 13:00:04 -07:00
mach-u300
mach-uniphier ARM: uniphier: drop v7_invalidate_l1 call at secondary entry 2015-08-13 12:12:10 +02:00
mach-ux500 ARM: SoC cleanups for v4.3 2015-09-01 12:10:20 -07:00
mach-versatile
mach-vexpress
mach-vt8500
mach-w90x900 ARM: kill off set_irq_flags usage 2015-07-28 13:58:13 +02:00
mach-zx ARM: SoC platform updates for v4.3 2015-09-01 12:18:40 -07:00
mach-zynq ARM: zynq: reserve space for jump target in secondary trampoline 2015-07-31 10:24:41 +02:00
mm
net
nwfpe
oprofile
plat-iop
plat-omap
plat-orion Merge branch 'queue/irq/arm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next/cleanup 2015-08-05 17:24:11 +02:00
plat-pxa
plat-samsung ARM: SAMSUNG: remove keypad-core header in plat-samsung 2015-07-30 02:02:06 +09:00
plat-versatile
probes
tools
vdso ARM: 8405/1: VDSO: fix regression with toolchains lacking ld.bfd executable 2015-07-31 18:54:45 +01:00
vfp
xen
Kconfig ARM: SoC cleanups for v4.3 2015-09-01 12:10:20 -07:00
Kconfig-nommu
Kconfig.debug The i.MX SoC changes for 4.3: 2015-08-18 13:10:05 -07:00
Makefile ARM: 8418/1: add boot image dependencies to not generate invalid images 2015-08-18 13:59:59 +01:00