platform_kernel-5.15/kernel
Tejun Heo d1faa010ca FROMGIT: cgroup: Use separate src/dst nodes when preloading css_sets for migration
Each cset (css_set) is pinned by its tasks. When we're moving tasks around
across csets for a migration, we need to hold the source and destination
csets to ensure that they don't go away while we're moving tasks about. This
is done by linking cset->mg_preload_node on either the
mgctx->preloaded_src_csets or mgctx->preloaded_dst_csets list. Using the
same cset->mg_preload_node for both the src and dst lists was deemed okay as
a cset can't be both the source and destination at the same time.

Unfortunately, this overloading becomes problematic when multiple tasks are
involved in a migration and some of them are identity noop migrations while
others are actually moving across cgroups. For example, this can happen with
the following sequence on cgroup1:

 #1> mkdir -p /sys/fs/cgroup/misc/a/b
 #2> echo $$ > /sys/fs/cgroup/misc/a/cgroup.procs
 #3> RUN_A_COMMAND_WHICH_CREATES_MULTIPLE_THREADS &
 #4> PID=$!
 #5> echo $PID > /sys/fs/cgroup/misc/a/b/tasks
 #6> echo $PID > /sys/fs/cgroup/misc/a/cgroup.procs

the process including the group leader back into a. In this final migration,
non-leader threads would be doing identity migration while the group leader
is doing an actual one.

After #3, let's say the whole process was in cset A, and that after #4, the
leader moves to cset B. Then, during #6, the following happens:

 1. cgroup_migrate_add_src() is called on B for the leader.

 2. cgroup_migrate_add_src() is called on A for the other threads.

 3. cgroup_migrate_prepare_dst() is called. It scans the src list.

 4. It notices that B wants to migrate to A, so it tries to A to the dst
    list but realizes that its ->mg_preload_node is already busy.

 5. and then it notices A wants to migrate to A as it's an identity
    migration, it culls it by list_del_init()'ing its ->mg_preload_node and
    putting references accordingly.

 6. The rest of migration takes place with B on the src list but nothing on
    the dst list.

This means that A isn't held while migration is in progress. If all tasks
leave A before the migration finishes and the incoming task pins it, the
cset will be destroyed leading to use-after-free.

This is caused by overloading cset->mg_preload_node for both src and dst
preload lists. We wanted to exclude the cset from the src list but ended up
inadvertently excluding it from the dst list too.

This patch fixes the issue by separating out cset->mg_preload_node into
->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst
preloadings don't interfere with each other.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mukesh Ojha <quic_mojha@quicinc.com>
Reported-by: shisiyuan <shisiyuan19870131@gmail.com>
Link: http://lkml.kernel.org/r/1654187688-27411-1-git-send-email-shisiyuan@xiaomi.com
Link: https://www.spinics.net/lists/cgroups/msg33313.html
Fixes: f817de9851 ("cgroup: prepare migration path for unified hierarchy")
Cc: stable@vger.kernel.org # v3.16+
(cherry picked from commit 07fd5b6cdf3cc30bfde8fe0f644771688be04447
 https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-5.19-fixes)
Bug: 235577024
Change-Id: Ieaf1c0c8fc23753570897fd6e48a54335ab939ce
Signed-off-by: Steve Muckle <smuckle@google.com>
2022-06-18 13:20:30 -07:00
..
bpf UPSTREAM: bpf: Fix crash due to out of bounds access into reg2btf_ids. 2022-06-18 18:28:45 +00:00
cgroup FROMGIT: cgroup: Use separate src/dst nodes when preloading css_sets for migration 2022-06-18 13:20:30 -07:00
configs drivers/char: remove /dev/kmem for good 2021-05-07 00:26:34 -07:00
debug kdb: Fix the putarea helper function 2022-04-08 14:23:51 +02:00
dma FROMLIST: dma-mapping: Add dma_release_coherent_memory to DMA API 2022-05-31 17:48:53 +00:00
entry signal: Replace force_fatal_sig with force_exit_sig when in doubt 2021-11-25 09:49:07 +01:00
events Merge 5.15.36 into android13-5.15 2022-05-18 08:55:59 +02:00
gcov Kconfig: Introduce ARCH_WANTS_NO_INSTR and CC_HAS_NO_PROFILE_FN_ATTR 2021-06-22 11:07:18 -07:00
irq Merge 5.15.39 into android13-5.15 2022-05-18 09:38:58 +02:00
kcsan LKMM updates: 2021-09-02 13:00:15 -07:00
livepatch livepatch: Fix build failure on 32 bits processors 2022-04-08 14:23:29 +02:00
locking ANDROID: mutex: Add vendor hook to init mutex oem data. 2022-06-10 18:14:26 +00:00
power Merge 5.15.33 into android13-5.15 2022-04-20 08:18:54 +02:00
printk UPSTREAM: printk: ringbuffer: Improve prb_next_seq() performance 2022-05-10 23:41:39 +00:00
rcu Merge 5.15.39 into android13-5.15 2022-05-18 09:38:58 +02:00
sched ANDROID: GKI: sched: add Android ABI padding to some structures 2022-06-18 18:44:16 +00:00
time Merge 5.15.39 into android13-5.15 2022-05-18 09:38:58 +02:00
trace UPSTREAM: bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem. 2022-06-18 18:28:45 +00:00
.gitignore ANDROID: add gki_module headers to .gitignore file 2022-04-27 15:34:29 +00:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/rwlock: Provide RT variant 2021-08-17 17:50:51 +02:00
Kconfig.preempt sched/core: Disable CONFIG_SCHED_CORE by default 2021-06-28 22:43:05 +02:00
Makefile Merge 5.15.34 into android13-5.15 2022-04-24 16:57:32 +02:00
acct.c kernel/acct.c: use dedicated helper to access rlimit values 2021-09-08 11:50:26 -07:00
async.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2022-02-23 12:03:07 +01:00
audit.c audit: improve audit queue handling when "audit=1" on cmdline 2022-02-08 18:34:03 +01:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:23:06 +02:00
audit_fsnotify.c
audit_tree.c audit: move put_tree() to avoid trim_trees refcount underflow and UAF 2021-08-24 18:52:36 -04:00
audit_watch.c
auditfilter.c lsm: separate security_task_getsecid() into subjective and objective variants 2021-03-22 15:23:32 -04:00
auditsc.c audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:23:06 +02:00
backtracetest.c
bounds.c FROMLIST: mm: multi-gen LRU: minimal implementation 2022-04-20 17:38:55 +00:00
capability.c
cfi.c FROMGIT: cfi: Fix __cfi_slowpath_diag RCU usage with cpuidle 2022-06-02 20:21:48 +00:00
compat.c arch: remove compat_alloc_user_space 2021-09-08 15:32:35 -07:00
configs.c
context_tracking.c
cpu.c Merge 5.15.35 into android13-5.15 2022-04-24 16:58:59 +02:00
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-08-16 18:55:32 +02:00
crash_core.c kernel/crash_core: suppress unknown crashkernel parameter warning 2021-12-29 12:28:49 +01:00
crash_dump.c
cred.c This is the 5.15.25 stable release 2022-02-23 12:30:26 +01:00
delayacct.c delayacct: Add sysctl to enable at runtime 2021-05-12 11:43:25 +02:00
dma.c
exec_domain.c
exit.c ANDROID: vendor_hooks: Add hooks for memory when debug 2022-06-02 14:37:19 -07:00
extable.c
fail_function.c
fork.c ANDROID: add vma->file_ref_count to synchronize vma->vm_file destruction 2022-06-18 18:28:46 +00:00
freezer.c FROMLIST: sched: Defer wakeup in ttwu() for unschedulable frozen tasks 2022-02-10 09:29:57 +00:00
futex.c Merge tag 'v5.15-rc1' into `android-mainline` 2021-09-16 09:51:19 +02:00
gen_kheaders.sh kbuild: clean up ${quiet} checks in shell scripts 2021-05-27 04:01:50 +09:00
gki_module.c ANDROID: GKI: Add module load time protected symbol lookup 2022-01-05 18:38:02 +00:00
groups.c
hung_task.c FROMLIST: freezer: Add frozen_or_skipped() helper function 2022-02-10 09:29:34 +00:00
iomem.c
irq_work.c Merge 8ecfa36cd4 ("Merge tag 'riscv-for-linus-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux") into android-mainline 2021-06-13 15:19:53 +02:00
jump_label.c jump_label: Fix jump_label_text_reserved() vs __init 2021-07-05 10:46:20 +02:00
kallsyms.c module: add printk formats to add module build ID to stacktraces 2021-07-08 11:48:22 -07:00
kcmp.c
kcov.c
kexec.c kexec: avoid compat_alloc_user_space 2021-09-08 15:32:34 -07:00
kexec_core.c Merge branch 'rework/printk_safe-removal' into for-linus 2021-08-30 16:36:10 +02:00
kexec_elf.c
kexec_file.c kernel: kexec_file: fix error return code of kexec_calculate_store_digests() 2021-05-07 00:26:32 -07:00
kexec_internal.h
kheaders.c
kmod.c modules: add CONFIG_MODPROBE_PATH 2021-05-07 00:26:33 -07:00
kprobes.c kprobes: Limit max data_size of the kretprobe instances 2021-12-08 09:04:41 +01:00
ksysfs.c
kthread.c Merge commit c288d9cd71 ("Merge tag 'for-5.14/io_uring-2021-06-30' of git://git.kernel.dk/linux-block") into android-mainline 2021-07-12 10:02:27 +01:00
latencytop.c
module-internal.h ANDROID: GKI: Add module load time protected symbol lookup 2022-01-05 18:38:02 +00:00
module.c This is the 5.15.25 stable release 2022-02-23 12:30:26 +01:00
module_signature.c
module_signing.c
notifier.c notifier: Remove atomic_notifier_call_chain_robust() 2021-08-16 18:55:32 +02:00
nsproxy.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
padata.c padata: Remove repeated verbose license text 2021-08-27 16:30:18 +08:00
panic.c Merge branch 'rework/printk_safe-removal' into for-linus 2021-08-30 16:36:10 +02:00
params.c params: lift param_set_uint_minmax to common code 2021-08-16 14:42:22 +02:00
pid.c Merge 4520dcbe0d ("Merge tag 'for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply") into android-mainline 2021-09-02 09:42:36 +02:00
pid_namespace.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
profile.c profiling: fix shift-out-of-bounds bugs 2021-09-08 11:50:26 -07:00
ptrace.c ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE 2022-04-08 14:22:50 +02:00
range.c
reboot.c ANDROID: reboot: Export reboot_mode 2021-09-28 08:49:54 -07:00
regset.c
relay.c
resource.c kernel/resource: fix kfree() of bootmem memory again 2022-04-08 14:23:43 +02:00
resource_kunit.c
rseq.c rseq: Remove broken uapi field layout on 32-bit little endian 2022-04-08 14:23:10 +02:00
scftorture.c scftorture: Avoid NULL pointer exception on early exit 2021-07-27 11:39:30 -07:00
scs.c FROMLIST: kasan, scs: support tagged vmalloc mappings 2022-03-21 15:31:19 +00:00
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2022-02-16 12:56:38 +01:00
signal.c ANDROID: Add vendor hooks to signal. 2022-06-16 20:28:30 +00:00
smp.c smp: Fix offline cpu check in flush_smp_call_function_queue() 2022-04-20 09:34:21 +02:00
smpboot.c smpboot: Replace deprecated CPU-hotplug functions. 2021-08-10 14:57:42 +02:00
smpboot.h
softirq.c ANDROID: softirq: Export irq_handler_exit tracepoint 2021-11-11 18:55:19 +00:00
stackleak.c gcc-plugins/stackleak: Use noinstr in favor of notrace 2022-02-23 12:03:07 +01:00
stacktrace.c Merge 5.15.34 into android13-5.15 2022-04-24 16:57:32 +02:00
static_call.c static_call: Don't make __static_call_return0 static 2022-04-13 20:59:28 +02:00
static_call_inline.c static_call: Don't make __static_call_return0 static 2022-04-13 20:59:28 +02:00
stop_machine.c ANDROID: sched: allow access to critical common code for CPU Pause 2021-11-09 00:42:10 +00:00
sys.c UPSTREAM: mm: refactor vm_area_struct::anon_vma_name usage code 2022-03-24 18:44:39 -07:00
sys_ni.c compat: remove some compat entry points 2021-09-08 15:32:35 -07:00
sysctl-test.c kernel/sysctl-test: Remove some casts which are no-longer required 2021-06-23 16:41:24 -06:00
sysctl.c Merge 5.15.28 into android13-5.15 2022-03-18 07:53:14 +01:00
task_work.c kasan: record task_work_add() call stack 2021-04-30 11:20:42 -07:00
taskstats.c
test_kprobes.c
torture.c torture: Replace deprecated CPU-hotplug functions. 2021-08-10 10:48:07 -07:00
tracepoint.c Merge 996fe06160 ("Merge tag 'kgdb-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux") into android-mainline 2021-09-14 16:16:54 +02:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-01-27 11:05:35 +01:00
ucount.c ucounts: Handle wrapping in is_ucounts_overlimit 2022-02-23 12:03:20 +01:00
uid16.c
uid16.h
umh.c Revert "ANDROID: umh: Enable usermode helper for required use cases" 2022-06-15 05:43:05 +00:00
up.c A set of locking related fixes and updates: 2021-05-09 13:07:03 -07:00
user-return-notifier.c
user.c fs/epoll: use a per-cpu counter for user's watches count 2021-09-08 11:50:27 -07:00
user_namespace.c ucounts: Fix systemd LimitNPROC with private users regression 2022-03-08 19:12:42 +01:00
usermode_driver.c Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:41:14 -07:00
utsname.c
utsname_sysctl.c
watch_queue.c watch_queue: Free the page array when watch_queue is dismantled 2022-04-08 14:24:11 +02:00
watchdog.c ANDROID: softlockup: add vendor hook for a softlockup task 2022-03-15 11:18:09 +09:00
watchdog_hld.c
workqueue.c UPSTREAM: workqueue, kasan: avoid alloc_pages() when recording stack 2022-02-14 15:50:54 +01:00
workqueue_internal.h workqueue: Assign a color to barrier work items 2021-08-17 07:49:10 -10:00