linux/drivers/net/ethernet/cisco/enic
Firo Yang 0f90522591 enic: prevent waking up stopped tx queues over watchdog reset
Recent months, our customer reported several kernel crashes all
preceding with following message:
NETDEV WATCHDOG: eth2 (enic): transmit queue 0 timed out
Error message of one of those crashes:
BUG: unable to handle kernel paging request at ffffffffa007e090

After analyzing severl vmcores, I found that most of crashes are
caused by memory corruption. And all the corrupted memory areas
are overwritten by data of network packets. Moreover, I also found
that the tx queues were enabled over watchdog reset.

After going through the source code, I found that in enic_stop(),
the tx queues stopped by netif_tx_disable() could be woken up over
a small time window between netif_tx_disable() and the
napi_disable() by the following code path:
napi_poll->
  enic_poll_msix_wq->
     vnic_cq_service->
        enic_wq_service->
           netif_wake_subqueue(enic->netdev, q_number)->
              test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &txq->state)
In turn, upper netowrk stack could queue skb to ENIC NIC though
enic_hard_start_xmit(). And this might introduce some race condition.

Our customer comfirmed that this kind of kernel crash doesn't occur over
90 days since they applied this patch.

Signed-off-by: Firo Yang <firo.yang@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-12 09:43:26 -08:00
..
Kconfig treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
Makefile treewide: Add SPDX license identifier - Makefile/Kconfig 2019-05-21 10:50:46 +02:00
cq_desc.h
cq_enet_desc.h
enic.h enic: set IG desc cache flag in open 2018-03-04 18:19:26 -05:00
enic_api.c
enic_api.h
enic_clsf.c drivers: net: Remove unnecessary semicolon 2019-03-01 23:13:49 -08:00
enic_clsf.h treewide: setup_timer() -> timer_setup() (2 field) 2017-11-21 15:57:09 -08:00
enic_dev.c
enic_dev.h
enic_ethtool.c net: core: dev: Add extack argument to dev_open() 2018-12-06 13:26:06 -08:00
enic_main.c enic: prevent waking up stopped tx queues over watchdog reset 2020-02-12 09:43:26 -08:00
enic_pp.c
enic_pp.h
enic_res.c enic: fix UDP rss bits 2018-06-06 09:09:09 -04:00
enic_res.h ethernet: use core min/max MTU checking 2016-10-18 11:34:22 -04:00
rq_enet_desc.h
vnic_cq.c
vnic_cq.h
vnic_dev.c net: cisco: enic: Replace GFP_ATOMIC with GFP_KERNEL 2018-08-04 13:08:06 -07:00
vnic_dev.h enic: fix UDP rss bits 2018-06-06 09:09:09 -04:00
vnic_devcmd.h enic: fix UDP rss bits 2018-06-06 09:09:09 -04:00
vnic_enet.h enic: add devcmds for vxlan offload 2017-02-09 17:24:29 -05:00
vnic_intr.c
vnic_intr.h
vnic_nic.h enic: fix UDP rss bits 2018-06-06 09:09:09 -04:00
vnic_resource.h
vnic_rq.c net: cisco: enic: Replace GFP_ATOMIC with GFP_KERNEL 2018-08-04 13:08:06 -07:00
vnic_rq.h enic: Remove local ndo_busy_poll() implementation. 2017-02-03 17:28:21 -05:00
vnic_rss.h
vnic_stats.h
vnic_vic.c
vnic_vic.h
vnic_wq.c net: cisco: enic: Replace GFP_ATOMIC with GFP_KERNEL 2018-08-04 13:08:06 -07:00
vnic_wq.h
wq_enet_desc.h