linux/kernel
Juergen Gross a5aabace5f locking/csd_lock: Add more data to CSD lock debugging
In order to help identifying problems with IPI handling and remote
function execution add some more data to IPI debugging code.

There have been multiple reports of CPUs looping long times (many
seconds) in smp_call_function_many() waiting for another CPU executing
a function like tlb flushing. Most of these reports have been for
cases where the kernel was running as a guest on top of KVM or Xen
(there are rumours of that happening under VMWare, too, and even on
bare metal).

Finding the root cause hasn't been successful yet, even after more than
2 years of chasing this bug by different developers.

Commit:

  35feb60474 ("kernel/smp: Provide CSD lock timeout diagnostics")

tried to address this by adding some debug code and by issuing another
IPI when a hang was detected. This helped mitigating the problem
(the repeated IPI unlocks the hang), but the root cause is still unknown.

Current available data suggests that either an IPI wasn't sent when it
should have been, or that the IPI didn't result in the target CPU
executing the queued function (due to the IPI not reaching the CPU,
the IPI handler not being called, or the handler not seeing the queued
request).

Try to add more diagnostic data by introducing a global atomic counter
which is being incremented when doing critical operations (before and
after queueing a new request, when sending an IPI, and when dequeueing
a request). The counter value is stored in percpu variables which can
be printed out when a hang is detected.

The data of the last event (consisting of sequence counter, source
CPU, target CPU, and event type) is stored in a global variable. When
a new event is to be traced, the data of the last event is stored in
the event related percpu location and the global data is updated with
the new event's data. This allows to track two events in one data
location: one by the value of the event data (the event before the
current one), and one by the location itself (the current event).

A typical printout with a detected hang will look like this:

csd: Detected non-responsive CSD lock (#1) on CPU#1, waiting 5000000003 ns for CPU#06 scf_handler_1+0x0/0x50(0xffffa2a881bb1410).
	csd: CSD lock (#1) handling prior scf_handler_1+0x0/0x50(0xffffa2a8813823c0) request.
        csd: cnt(00008cc): ffff->0000 dequeue (src cpu 0 == empty)
        csd: cnt(00008cd): ffff->0006 idle
        csd: cnt(0003668): 0001->0006 queue
        csd: cnt(0003669): 0001->0006 ipi
        csd: cnt(0003e0f): 0007->000a queue
        csd: cnt(0003e10): 0001->ffff ping
        csd: cnt(0003e71): 0003->0000 ping
        csd: cnt(0003e72): ffff->0006 gotipi
        csd: cnt(0003e73): ffff->0006 handle
        csd: cnt(0003e74): ffff->0006 dequeue (src cpu 0 == empty)
        csd: cnt(0003e7f): 0004->0006 ping
        csd: cnt(0003e80): 0001->ffff pinged
        csd: cnt(0003eb2): 0005->0001 noipi
        csd: cnt(0003eb3): 0001->0006 queue
        csd: cnt(0003eb4): 0001->0006 noipi
        csd: cnt now: 0003f00

The idea is to print only relevant entries. Those are all events which
are associated with the hang (so sender side events for the source CPU
of the hanging request, and receiver side events for the target CPU),
and the related events just before those (for adding data needed to
identify a possible race). Printing all available data would be
possible, but this would add large amounts of data printed on larger
configurations.

Signed-off-by: Juergen Gross <jgross@suse.com>
[ Minor readability edits. Breaks col80 but is far more readable. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20210301101336.7797-4-jgross@suse.com
2021-03-06 12:49:48 +01:00
..
bpf idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
cgroup idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
configs staging: ION: remove some references to CONFIG_ION 2021-01-06 17:39:38 +01:00
debug kgdb: fix to kill breakpoints on initmem after boot 2021-02-26 09:41:05 -08:00
dma Merge branch 'stable/for-linus-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb 2021-02-26 13:59:32 -08:00
entry entry: Explicitly flush pending rcuog wakeup before last rescheduling point 2021-02-17 14:12:43 +01:00
events kernel: delete repeated words in comments 2021-02-26 09:41:03 -08:00
gcov init/gcov: allow CONFIG_CONSTRUCTORS on UML to fix module gcov 2021-02-05 11:03:47 -08:00
irq Driver core / debugfs update for 5.12-rc1 2021-02-24 10:13:55 -08:00
kcsan kcsan: Rewrite kcsan_prandom_u32_max() without prandom_u32_state() 2021-01-04 14:39:07 -08:00
livepatch kallsyms: refactor {,module_}kallsyms_on_each_symbol 2021-02-08 12:22:08 +01:00
locking kernel: delete repeated words in comments 2021-02-26 09:41:03 -08:00
power Merge branches 'powercap' and 'pm-misc' 2021-02-15 18:50:01 +01:00
printk Merge branch 'printk-rework' into for-linus 2021-02-22 13:43:55 +01:00
rcu Scheduler updates for v5.12: 2021-02-21 12:35:04 -08:00
sched kernel: delete repeated words in comments 2021-02-26 09:41:03 -08:00
time These are the latest RCU updates for v5.12: 2021-02-21 12:04:41 -08:00
trace tracing: Skip selftests if tracing is disabled 2021-03-04 09:51:25 -05:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt preempt: Introduce CONFIG_PREEMPT_DYNAMIC 2021-02-17 14:12:24 +01:00
Makefile kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE 2021-02-16 09:59:41 +01:00
acct.c kernel/acct.c: use #elif instead of #end and #elif 2020-12-15 22:46:15 -08:00
async.c treewide: Remove uninitialized_var() usage 2020-07-16 12:35:15 -07:00
audit.c audit: Remove leftover reference to the audit_tasklet 2021-01-15 11:58:10 -05:00
audit.h audit: change unnecessary globals into statics 2020-08-17 20:26:58 -04:00
audit_fsnotify.c audit_alloc_mark(): don't open-code ERR_CAST() 2021-02-23 10:25:27 -05:00
audit_tree.c fsnotify: generalize handle_inode_event() 2020-12-03 14:58:35 +01:00
audit_watch.c fsnotify: generalize handle_inode_event() 2020-12-03 14:58:35 +01:00
auditfilter.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
auditsc.c idmapped-mounts-v5.12 2021-02-23 13:39:45 -08:00
backtracetest.c treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD() 2020-07-30 11:15:58 -07:00
bounds.c
capability.c capability: handle idmapped mounts 2021-01-24 14:27:16 +01:00
compat.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
configs.c
context_tracking.c
cpu.c cpu/hotplug: Add lockdep_is_cpus_held() 2021-01-06 16:24:59 -08:00
cpu_pm.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
crash_core.c kdump: append uts_namespace.name offset to VMCOREINFO 2020-12-15 22:46:18 -08:00
crash_dump.c
cred.c
delayacct.c
dma.c
exec_domain.c
exit.c kernel/io_uring: cancel io_uring before task works 2020-12-30 19:36:54 -07:00
extable.c
fail_function.c fault-injection: handle EI_ETYPE_TRUE 2020-12-15 22:46:19 -08:00
fork.c kernel: provide create_io_thread() helper 2021-03-04 15:45:03 -07:00
freezer.c
futex.c Merge branch 'linus' into locking/core, to pick up upstream fixes 2021-02-12 12:54:58 +01:00
gen_kheaders.sh
groups.c groups: simplify struct group_info allocation 2021-02-26 09:41:03 -08:00
hung_task.c kernel/hung_task.c: make type annotations consistent 2020-11-02 12:14:19 -08:00
iomem.c
irq_work.c irq_work: Optimize irq_work_single() 2020-11-24 16:47:49 +01:00
jump_label.c jump_label: Fix usage in module __init 2020-12-18 16:53:12 +01:00
kallsyms.c kallsyms: only build {,module_}kallsyms_on_each_symbol when required 2021-02-08 12:24:04 +01:00
kcmp.c Merge branch 'exec-update-lock-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:36:48 -08:00
kcov.c kernel: make kcov_common_handle consider the current context 2020-11-02 18:00:20 -08:00
kexec.c LSM: Introduce kernel_post_load_data() hook 2020-10-05 13:37:03 +02:00
kexec_core.c Merge branch 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-02-21 09:29:23 -08:00
kexec_elf.c
kexec_file.c ima: Free IMA measurement buffer after kexec syscall 2021-02-10 15:49:38 -05:00
kexec_internal.h kexec: move machine_kexec_post_load() to public interface 2021-02-22 12:33:26 +00:00
kheaders.c
kmod.c kmod: remove redundant "be an" in the comment 2020-08-12 10:58:01 -07:00
kprobes.c kprobes: Fix to delay the kprobes jump optimization 2021-02-19 14:57:12 -05:00
ksysfs.c
kthread.c - Correct the marking of kthreads which are supposed to run on a specific, 2021-01-24 10:09:20 -08:00
latencytop.c
module-internal.h
module.c module: potential uninitialized return in module_kallsyms_on_each_symbol() 2021-02-10 16:57:04 +01:00
module_signature.c module: harden ELF info handling 2021-01-19 10:24:45 +01:00
module_signing.c module: harden ELF info handling 2021-01-19 10:24:45 +01:00
notifier.c notifier: Fix broken error handling pattern 2020-09-01 09:58:03 +02:00
nsproxy.c fixes-v5.11 2020-12-14 16:40:27 -08:00
padata.c padata: fix possible padata_works_lock deadlock 2020-09-04 17:51:55 +10:00
panic.c panic: don't dump stack twice on warn 2020-11-14 11:26:04 -08:00
params.c Modules updates for v5.11 2020-12-17 13:01:31 -08:00
pid.c Merge branch 'exec-update-lock-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:36:48 -08:00
pid_namespace.c fixes-v5.11 2020-12-14 16:40:27 -08:00
profile.c
ptrace.c kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals 2021-02-21 17:25:22 -07:00
range.c kernel.h: split out min()/max() et al. helpers 2020-10-16 11:11:19 -07:00
reboot.c reboot: hide from sysfs not applicable settings 2020-12-15 22:46:19 -08:00
regset.c regset: kill ->get() 2020-07-27 14:31:12 -04:00
relay.c relay: allow the use of const callback structs 2020-12-15 22:46:18 -08:00
resource.c resource: Move devmem revoke code to resource framework 2021-01-12 14:26:31 +01:00
resource_kunit.c resource: provide meaningful MODULE_LICENSE() in test suite 2020-11-25 18:52:35 +01:00
rseq.c
scftorture.c scftorture: Add debug output for wrong-CPU warning 2021-01-04 13:53:41 -08:00
scs.c scs: switch to vmapped shadow stacks 2020-12-01 10:30:28 +00:00
seccomp.c seccomp: Improve performace by optimizing rmb() 2021-02-10 12:40:11 -08:00
signal.c kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals 2021-02-21 17:25:22 -07:00
smp.c locking/csd_lock: Add more data to CSD lock debugging 2021-03-06 12:49:48 +01:00
smpboot.c kthread: Extract KTHREAD_IS_PER_CPU 2021-01-22 15:09:42 +01:00
smpboot.h
softirq.c softirq: Move do_softirq_own_stack() to generic asm header 2021-02-10 23:34:16 +01:00
stackleak.c stackleak: let stack_erasing_sysctl take a kernel pointer buffer 2020-09-19 13:13:39 -07:00
stacktrace.c stacktrace: Remove reliable argument from arch_stack_walk() callback 2020-09-18 14:24:16 +01:00
static_call.c static_call: Fix the module key fixup 2021-03-06 12:49:08 +01:00
stop_machine.c Merge branch 'linus' into sched/core, to resolve semantic conflict 2020-11-27 11:10:50 +01:00
sys.c Kbuild updates for v5.12 2021-02-25 10:17:31 -08:00
sys_ni.c epoll: wire up syscall epoll_pwait2 2020-12-19 11:18:38 -08:00
sysctl-test.c
sysctl.c sysctl.c: fix underflow value setting risk in vm_table 2021-02-26 09:41:03 -08:00
task_work.c task_work: remove legacy TWA_SIGNAL path 2020-12-12 09:17:38 -07:00
taskstats.c treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
test_kprobes.c
torture.c torture: Maintain torture-specific set of CPUs-online books 2021-01-06 17:17:22 -08:00
tracepoint.c tracepoints: Code clean up 2021-02-09 12:27:29 -05:00
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c usermodehelper: reset umask to default before executing user process 2020-10-06 10:31:52 -07:00
up.c
user-return-notifier.c
user.c user: Use generic ns_common::count 2020-08-19 14:14:12 +02:00
user_namespace.c fixes-v5.11 2020-12-14 16:40:27 -08:00
usermode_driver.c
utsname.c uts: Use generic ns_common::count 2020-08-19 14:13:20 +02:00
utsname_sysctl.c
watch_queue.c watch_queue: rectify kernel-doc for init_watch() 2021-01-26 11:16:34 +00:00
watchdog.c kernel/watchdog: fix watchdog_allowed_mask not used warning 2020-11-14 11:26:03 -08:00
watchdog_hld.c
workqueue.c Merge branch 'for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2021-02-22 17:06:54 -08:00
workqueue_internal.h