linux/drivers
Wei Hu (Xavier) d061effc36 RDMA/hns: Fix the Oops during rmmod or insmod ko when reset occurs
In the reset process, the hns3 NIC driver notifies the RoCE driver to
perform reset related processing by calling the .reset_notify() interface
registered by the RoCE driver in hip08 SoC.

In the current version, if a reset occurs simultaneously during the
execution of rmmod or insmod ko, there may be Oops error as below:

 Internal error: Oops: 86000007 [#1] PREEMPT SMP
 Modules linked in: hns_roce(O) hns3(O) hclge(O) hnae3(O) [last unloaded: hns_roce_hw_v2]
 CPU: 0 PID: 14 Comm: kworker/0:1 Tainted: G           O      4.19.0-ge00d540 #1
 Hardware name: Huawei Technologies Co., Ltd.
 Workqueue: events hclge_reset_service_task [hclge]
 pstate: 60c00009 (nZCv daif +PAN +UAO)
 pc : 0xffff00000100b0b8
 lr : 0xffff00000100aea0
 sp : ffff000009afbab0
 x29: ffff000009afbab0 x28: 0000000000000800
 x27: 0000000000007ff0 x26: ffff80002f90c004
 x25: 00000000000007ff x24: ffff000008f97000
 x23: ffff80003efee0a8 x22: 0000000000001000
 x21: ffff80002f917ff0 x20: ffff8000286ea070
 x19: 0000000000000800 x18: 0000000000000400
 x17: 00000000c4d3225d x16: 00000000000021b8
 x15: 0000000000000400 x14: 0000000000000400
 x13: 0000000000000000 x12: ffff80003fac6e30
 x11: 0000800036303000 x10: 0000000000000001
 x9 : 0000000000000000 x8 : ffff80003016d000
 x7 : 0000000000000000 x6 : 000000000000003f
 x5 : 0000000000000040 x4 : 0000000000000000
 x3 : 0000000000000004 x2 : 00000000000007ff
 x1 : 0000000000000000 x0 : 0000000000000000
 Process kworker/0:1 (pid: 14, stack limit = 0x00000000af8f0ad9)
 Call trace:
  0xffff00000100b0b8
  0xffff00000100b3a0
  hns_roce_init+0x624/0xc88 [hns_roce]
  0xffff000001002df8
  0xffff000001006960
  hclge_notify_roce_client+0x74/0xe0 [hclge]
  hclge_reset_service_task+0xa58/0xbc0 [hclge]
  process_one_work+0x1e4/0x458
  worker_thread+0x40/0x450
  kthread+0x12c/0x130
  ret_from_fork+0x10/0x18
 Code: bad PC value

In the reset process, we will release the resources firstly, and after the
hardware reset is completed, we will reapply resources and reconfigure the
hardware.

We can solve this problem by modifying both the NIC and the RoCE
driver. We can modify the concurrent processing in the NIC driver to avoid
calling the .reset_notify and .uninit_instance ops at the same time. And
we need to modify the RoCE driver to record the reset stage and the
driver's init/uninit state, and check the state in the .reset_notify,
.init_instance. and uninit_instance functions to avoid NULL pointer
operation.

Fixes: cb7a94c9c8 ("RDMA/hns: Add reset process for RoCE in hip08")
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04 16:13:50 -07:00
..
accessibility
acpi acpi/nfit: Fix command-supported detection 2019-01-21 09:58:31 -08:00
amba
android binderfs: switch from d_add() to d_instantiate() 2019-01-22 12:25:54 +01:00
ata SCSI fixes on 20190125 2019-01-26 15:03:43 -08:00
atm atm: he: fix sign-extension overflow on large shift 2019-01-17 11:27:00 -08:00
auxdisplay auxdisplay: charlcd: fix x/y command parsing 2018-12-21 21:27:21 +01:00
base PM-runtime: Fix deadlock with ktime_get() 2019-01-30 22:49:06 +01:00
bcma
block for-linus-20190118 2019-01-20 09:12:50 +12:00
bluetooth Bluetooth: hci_bcm: Handle specific unknown packets after firmware loading 2018-12-19 13:43:42 +01:00
bus ARM: SoC driver updates 2018-12-31 17:32:35 -08:00
cdrom gdrom: fix a memory leak bug 2018-12-29 08:20:44 -07:00
char Char/Misc driver fixes for 5.0-rc4 2019-01-25 13:03:34 -10:00
clk clk: qcom: gcc: Use active only source for CPUSS clocks 2019-01-24 11:41:48 -08:00
clocksource arch/csky patches for 4.21-rc1 2019-01-05 09:50:07 -08:00
connector
cpufreq Merge branches 'pm-cpuidle', 'pm-cpufreq' and 'pm-sleep' 2019-01-11 10:09:51 +01:00
cpuidle cpuidle: poll_state: Fix default time limit 2019-01-30 22:57:42 +01:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2019-01-31 23:09:00 -08:00
dax mm, devm_memremap_pages: fix shutdown handling 2018-12-28 12:11:47 -08:00
dca
devfreq PM / devfreq: add devfreq_suspend/resume() functions 2018-12-11 11:40:13 +09:00
dio
dma cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
dma-buf drivers/dma-buf/udmabuf.c: convert to use vm_fault_t 2019-01-04 13:13:46 -08:00
edac EDAC, altera: Fix S10 persistent register offset 2019-01-24 17:13:59 +01:00
eisa
extcon
firewire scsi: communicate max segment size to the DMA mapping code 2019-01-22 20:40:59 -05:00
firmware efi/arm64: Fix debugfs crash by adding a terminator for ptdump marker 2019-02-02 11:27:29 +01:00
fmc
fpga Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
fsi
gnss
gpio gpio: vf610: Mask all GPIO interrupts 2019-01-28 15:28:43 +01:00
gpu Merge tag 'drm-msm-fixes-2019-01-24' of git://people.freedesktop.org/~robclark/linux into drm-fixes 2019-01-25 07:45:00 +10:00
hid HID: core: simplify active collection tracking 2019-01-16 14:29:48 +01:00
hsi
hv vmbus: fix subchannel removal 2019-01-09 19:20:31 -05:00
hwmon hwmon: (tmp421) Correct the misspelling of the tmp442 compatible attribute in OF device ID table 2019-01-17 12:54:52 -08:00
hwspinlock hwspinlock: fix return value check in stm32_hwspinlock_probe() 2019-01-03 11:42:10 -08:00
hwtracing intel_th: msu: Fix an off-by-one in attribute store 2018-12-19 20:21:06 +01:00
i2c i2c: tegra: Fix Maximum transfer size 2019-01-11 00:15:04 +01:00
i3c i3c: master: dw: fix deadlock 2019-01-26 11:14:25 +01:00
ide ide: ensure atapi sense request aren't preempted 2019-01-31 08:25:09 -07:00
idle
iio - New Device Support 2019-01-15 06:24:36 +12:00
infiniband RDMA/hns: Fix the Oops during rmmod or insmod ko when reset occurs 2019-02-04 16:13:50 -07:00
input Mostly driver fixes, but there's a core framework fix in here too. 2019-01-31 23:22:57 -08:00
iommu IOMMU Fixes for Linux v5.0-rc4 2019-01-30 09:30:03 -08:00
ipack
irqchip xtensa fixes for v5.0-rc5 2019-02-01 16:56:30 -08:00
isdn isdn: avm: Fix string plus integer warning from Clang 2019-01-19 10:01:03 -08:00
leds leds: lp5523: fix a missing check of return value of lp55xx_read 2019-01-17 22:27:39 +01:00
lightnvm lightnvm: pblk: fix use-after-free bug 2018-12-22 14:45:35 -07:00
macintosh Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
mailbox mailbox: tegra-hsp: Use device-managed registration API 2018-12-21 22:31:26 -06:00
mcb
md md/raid5: fix 'out of memory' during raid cache recovery 2019-01-28 11:44:40 -08:00
media media: vim2m: only cancel work if it is for right context 2019-01-16 11:13:25 -05:00
memory ARM: SoC: late updates 2019-01-05 11:30:37 -08:00
memstick MMC core: 2018-12-28 16:52:18 -08:00
message scsi: flip the default on use_clustering 2018-12-18 23:13:12 -05:00
mfd mfd: Fix unmet dependency warning for MFD_TPS68470 2019-01-29 10:55:34 +01:00
misc Char/Misc driver fixes for 5.0-rc4 2019-01-25 13:03:34 -10:00
mmc mmc: mediatek: fix incorrect register setting of hs400_cmd_int_delay 2019-01-28 12:49:28 +01:00
mtd mtd: rawnand: denali: get ->setup_data_interface() working again 2019-01-18 10:27:01 +01:00
mux
net Linux 5.0-rc5 2019-02-04 14:53:42 -07:00
nfc
ntb cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
nubus
nvdimm libnvdimm/security: Require nvdimm_security_setup_events() to succeed 2019-01-21 09:57:43 -08:00
nvme for-linus-20190125 2019-01-26 12:42:41 -08:00
nvmem
of OF: properties: add missing of_node_put 2019-01-16 12:49:53 -06:00
opp cpufreq: scpi/scmi: Fix freeing of dynamic OPPs 2019-01-04 12:19:40 +01:00
oprofile
parisc Kconfig file consolidation for v4.21 2018-12-29 13:40:29 -08:00
parport
pci pci-v5.0-fixes-3 2019-01-31 23:06:17 -08:00
pcmcia Included in this update: 2019-01-05 11:23:17 -08:00
perf drivers/perf: hisi: Fixup one DDRC PMU register offset 2019-01-04 10:13:27 +00:00
phy USB/PHY fixes for 5.0-rc4 2019-01-25 12:57:09 -10:00
pinctrl Pin control bulk changes for the v4.21 kernel cycle: 2019-01-01 13:19:16 -08:00
platform platform/x86: Fix unmet dependency warning for SAMSUNG_Q10 2019-01-29 10:59:07 +01:00
pnp Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
power power supply and reset changes for the v4.21 series 2018-12-28 20:22:45 -08:00
powercap
pps Kconfig updates for v4.21 2018-12-29 13:03:29 -08:00
ps3
ptp ptp: check that rsv field is zero in struct ptp_sys_offset_extended 2019-01-08 16:22:56 -05:00
pwm pwm: imx: Add ipg clock operation 2018-12-24 12:06:56 +01:00
rapidio cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
ras treewide: surround Kconfig file paths with double quotes 2018-12-22 00:25:54 +09:00
regulator Merge remote-tracking branch 'regulator/topic/coupled' into regulator-next 2018-12-21 13:43:35 +00:00
remoteproc virtio: don't allocate vqs when names[i] = NULL 2019-01-14 20:15:19 -05:00
reset reset: uniphier-glue: Add AHCI reset control support in glue layer 2019-01-07 16:38:51 +01:00
rpmsg
rtc RTC for 4.21 2019-01-01 13:24:31 -08:00
s390 SCSI fixes on 20190201 2019-02-02 10:12:53 -08:00
sbus Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next 2018-12-26 10:32:18 -08:00
scsi SCSI fixes on 20190201 2019-02-02 10:12:53 -08:00
sfi
sh
siox
slimbus
sn
soc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-16 05:13:36 +12:00
soundwire
spi cross-tree: phase out dma_zalloc_coherent() 2019-01-08 07:58:37 -05:00
spmi
ssb
staging staging: android: ion: Support cpu access during dma_buf_detach 2019-01-22 11:38:09 +01:00
target scsi: tcmu: fix use after free 2019-01-22 20:54:00 -05:00
tc
tee OP-TEE dynamic shm log message 2018-12-31 13:06:30 -08:00
thermal Merge branch 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux 2019-01-23 16:23:41 +13:00
thunderbolt
tty RISC-V Fixes for 5.0-rc5 2019-02-02 10:26:14 -08:00
uio Char/Misc driver patches for 4.21-rc1 2018-12-28 20:54:57 -08:00
usb USB-serial fixes for 5.0-rc3 2019-01-18 12:58:20 +01:00
uwb
vfio vfio-pci/nvlink2: Fix ancient gcc warnings 2019-01-23 08:20:43 -07:00
vhost vhost: fix OOB in get_rx_bufs() 2019-01-28 22:53:09 -08:00
video TTY/Serial driver fixes for 5.0-rc4 2019-01-25 12:58:40 -10:00
virt
virtio virtio-balloon: tweak config_changed implementation 2019-01-14 20:15:20 -05:00
visorbus
vlynq
vme
w1 treewide: surround Kconfig file paths with double quotes 2018-12-22 00:25:54 +09:00
watchdog watchdog: tqmx86: Fix a couple IS_ERR() vs NULL bugs 2019-01-07 10:10:35 +01:00
xen arm64/xen: fix xen-swiotlb cache flushing 2019-01-23 22:14:56 +01:00
zorro
Kconfig Kconfig file consolidation for v4.21 2018-12-29 13:40:29 -08:00
Makefile