linux/block
Yu Kuai af25dbc846 blk-cgroup: synchronize blkg creation against policy deactivation
[ Upstream commit 0c9d338c84 ]

Our test reports a null pointer dereference:

[  168.534653] ==================================================================
[  168.535614] Disabling lock debugging due to kernel taint
[  168.536346] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  168.537274] #PF: supervisor read access in kernel mode
[  168.537964] #PF: error_code(0x0000) - not-present page
[  168.538667] PGD 0 P4D 0
[  168.539025] Oops: 0000 [#1] PREEMPT SMP KASAN
[  168.539656] CPU: 13 PID: 759 Comm: bash Tainted: G    B             5.15.0-rc2-next-202100
[  168.540954] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_0738364
[  168.542736] RIP: 0010:bfq_pd_init+0x88/0x1e0
[  168.543318] Code: 98 00 00 00 e8 c9 e4 5b ff 4c 8b 65 00 49 8d 7c 24 08 e8 bb e4 5b ff 4d0
[  168.545803] RSP: 0018:ffff88817095f9c0 EFLAGS: 00010002
[  168.546497] RAX: 0000000000000001 RBX: ffff888101a1c000 RCX: 0000000000000000
[  168.547438] RDX: 0000000000000003 RSI: 0000000000000002 RDI: ffff888106553428
[  168.548402] RBP: ffff888106553400 R08: ffffffff961bcaf4 R09: 0000000000000001
[  168.549365] R10: ffffffffa2e16c27 R11: fffffbfff45c2d84 R12: 0000000000000000
[  168.550291] R13: ffff888101a1c098 R14: ffff88810c7a08c8 R15: ffffffffa55541a0
[  168.551221] FS:  00007fac75227700(0000) GS:ffff88839ba80000(0000) knlGS:0000000000000000
[  168.552278] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  168.553040] CR2: 0000000000000008 CR3: 0000000165ce7000 CR4: 00000000000006e0
[  168.554000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  168.554929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  168.555888] Call Trace:
[  168.556221]  <TASK>
[  168.556510]  blkg_create+0x1c0/0x8c0
[  168.556989]  blkg_conf_prep+0x574/0x650
[  168.557502]  ? stack_trace_save+0x99/0xd0
[  168.558033]  ? blkcg_conf_open_bdev+0x1b0/0x1b0
[  168.558629]  tg_set_conf.constprop.0+0xb9/0x280
[  168.559231]  ? kasan_set_track+0x29/0x40
[  168.559758]  ? kasan_set_free_info+0x30/0x60
[  168.560344]  ? tg_set_limit+0xae0/0xae0
[  168.560853]  ? do_sys_openat2+0x33b/0x640
[  168.561383]  ? do_sys_open+0xa2/0x100
[  168.561877]  ? __x64_sys_open+0x4e/0x60
[  168.562383]  ? __kasan_check_write+0x20/0x30
[  168.562951]  ? copyin+0x48/0x70
[  168.563390]  ? _copy_from_iter+0x234/0x9e0
[  168.563948]  tg_set_conf_u64+0x17/0x20
[  168.564467]  cgroup_file_write+0x1ad/0x380
[  168.565014]  ? cgroup_file_poll+0x80/0x80
[  168.565568]  ? __mutex_lock_slowpath+0x30/0x30
[  168.566165]  ? pgd_free+0x100/0x160
[  168.566649]  kernfs_fop_write_iter+0x21d/0x340
[  168.567246]  ? cgroup_file_poll+0x80/0x80
[  168.567796]  new_sync_write+0x29f/0x3c0
[  168.568314]  ? new_sync_read+0x410/0x410
[  168.568840]  ? __handle_mm_fault+0x1c97/0x2d80
[  168.569425]  ? copy_page_range+0x2b10/0x2b10
[  168.570007]  ? _raw_read_lock_bh+0xa0/0xa0
[  168.570622]  vfs_write+0x46e/0x630
[  168.571091]  ksys_write+0xcd/0x1e0
[  168.571563]  ? __x64_sys_read+0x60/0x60
[  168.572081]  ? __kasan_check_write+0x20/0x30
[  168.572659]  ? do_user_addr_fault+0x446/0xff0
[  168.573264]  __x64_sys_write+0x46/0x60
[  168.573774]  do_syscall_64+0x35/0x80
[  168.574264]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  168.574960] RIP: 0033:0x7fac74915130
[  168.575456] Code: 73 01 c3 48 8b 0d 58 ed 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 444
[  168.577969] RSP: 002b:00007ffc3080e288 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  168.578986] RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007fac74915130
[  168.579937] RDX: 0000000000000009 RSI: 000056007669f080 RDI: 0000000000000001
[  168.580884] RBP: 000056007669f080 R08: 000000000000000a R09: 00007fac75227700
[  168.581841] R10: 000056007655c8f0 R11: 0000000000000246 R12: 0000000000000009
[  168.582796] R13: 0000000000000001 R14: 00007fac74be55e0 R15: 00007fac74be08c0
[  168.583757]  </TASK>
[  168.584063] Modules linked in:
[  168.584494] CR2: 0000000000000008
[  168.584964] ---[ end trace 2475611ad0f77a1a ]---

This is because blkg_alloc() is called from blkg_conf_prep() without
holding 'q->queue_lock', and elevator is exited before blkg_create():

thread 1                            thread 2
blkg_conf_prep
 spin_lock_irq(&q->queue_lock);
 blkg_lookup_check -> return NULL
 spin_unlock_irq(&q->queue_lock);

 blkg_alloc
  blkcg_policy_enabled -> true
  pd = ->pd_alloc_fn
  blkg->pd[i] = pd
                                   blk_mq_exit_sched
                                    bfq_exit_queue
                                     blkcg_deactivate_policy
                                      spin_lock_irq(&q->queue_lock);
                                      __clear_bit(pol->plid, q->blkcg_pols);
                                      spin_unlock_irq(&q->queue_lock);
                                    q->elevator = NULL;
  spin_lock_irq(&q->queue_lock);
   blkg_create
    if (blkg->pd[i])
     ->pd_init_fn -> q->elevator is NULL
  spin_unlock_irq(&q->queue_lock);

Because blkcg_deactivate_policy() requires queue to be frozen, we can
grab q_usage_counter to synchoronize blkg_conf_prep() against
blkcg_deactivate_policy().

Fixes: e21b7a0b98 ("block, bfq: add full hierarchical scheduling and cgroups support")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20211020014036.2141723-1-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 19:16:41 +01:00
..
partitions block: fix incorrect references to disk objects 2021-10-18 11:20:38 -06:00
Kconfig SCSI misc on 20210902 2021-09-02 15:09:46 -07:00
Kconfig.iosched Revert "block/mq-deadline: Add cgroup support" 2021-08-11 13:47:26 -06:00
Makefile block-5.15-2021-09-11 2021-09-11 10:19:51 -07:00
badblocks.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
bdev.c block: genhd: fix double kfree() in __alloc_disk_node() 2021-10-02 07:29:20 -06:00
bfq-cgroup.c block, bfq: reset last_bfqq_created on group change 2021-10-17 07:03:02 -06:00
bfq-iosched.c Revert "block, bfq: honor already-setup queue merges" 2021-09-28 06:33:15 -06:00
bfq-iosched.h block, bfq: cleanup the repeated declaration 2021-08-25 06:45:33 -06:00
bfq-wf2q.c block: Introduce IOPRIO_NR_LEVELS 2021-08-18 07:21:12 -06:00
bio-integrity.c block: use bvec_virt in bio_integrity_{process,free} 2021-08-16 10:50:32 -06:00
bio.c block: don't call rq_qos_ops->done_bio if the bio isn't tracked 2021-09-24 11:04:39 -06:00
blk-cgroup-rwstat.c blk-cgroup: Fix the recursive blkg rwstat 2021-03-05 11:32:15 -07:00
blk-cgroup-rwstat.h blk-cgroup: separate out blkg_rwstat under CONFIG_BLK_CGROUP_RWSTAT 2019-11-07 12:28:13 -07:00
blk-cgroup.c blk-cgroup: synchronize blkg creation against policy deactivation 2021-11-18 19:16:41 +01:00
blk-core.c block: drain file system I/O on del_gendisk 2021-10-15 21:02:50 -06:00
blk-crypto-fallback.c block: rename BIO_MAX_PAGES to BIO_MAX_VECS 2021-03-11 07:47:48 -07:00
blk-crypto-internal.h block: make blk_crypto_rq_bio_prep() able to fail 2020-10-05 10:47:43 -06:00
blk-crypto.c blk-crypto: fix check for too-large dun_bytes 2021-08-25 06:45:00 -06:00
blk-exec.c block: return errors from blk_execute_rq() 2021-06-30 15:35:45 -06:00
blk-flush.c blk-mq: fix is_flush_rq 2021-08-17 20:17:34 -06:00
blk-integrity.c block: flush the integrity workqueue in blk_integrity_unregister 2021-09-14 20:03:30 -06:00
blk-ioc.c block: remove retry loop in ioc_release_fn() 2020-07-16 10:22:15 -06:00
blk-iocost.c for-5.15/block-2021-08-30 2021-08-30 18:52:11 -07:00
blk-iolatency.c for-5.15/block-2021-08-30 2021-08-30 18:52:11 -07:00
blk-ioprio.c block: Introduce the ioprio rq-qos policy 2021-06-21 15:03:40 -06:00
blk-ioprio.h block: Introduce the ioprio rq-qos policy 2021-06-21 15:03:40 -06:00
blk-lib.c block: export blk_next_bio() 2021-06-17 15:51:20 +02:00
blk-map.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
blk-merge.c io_uring-bio-cache.5-2021-08-30 2021-08-30 19:30:30 -07:00
blk-mq-cpumap.c blk-mq: remove the calling of local_memory_node() 2020-10-20 07:08:17 -06:00
blk-mq-debugfs-zoned.c block: Cleanup license notice 2019-01-17 21:21:40 -07:00
blk-mq-debugfs.c block: decode QUEUE_FLAG_HCTX_ACTIVE in debugfs output 2021-10-04 06:58:39 -06:00
blk-mq-debugfs.h blk-mq: no need to check return value of debugfs_create functions 2019-06-13 03:00:30 -06:00
blk-mq-pci.c block: Fix blk_mq_*_map_queues() kernel-doc headers 2019-05-31 15:12:34 -06:00
blk-mq-rdma.c block: Fix blk_mq_*_map_queues() kernel-doc headers 2019-05-31 15:12:34 -06:00
blk-mq-sched.c blk-mq-sched: Fix blk_mq_sched_alloc_tags() error handling 2021-07-27 16:44:38 -06:00
blk-mq-sched.h blk: Fix lock inversion between ioc lock and bfqd lock 2021-06-24 18:43:55 -06:00
blk-mq-sysfs.c block: remove blk-mq-sysfs dead code 2021-08-02 13:37:29 -06:00
blk-mq-tag.c blk-mq: avoid to iterate over stale request 2021-09-12 19:32:43 -06:00
blk-mq-tag.h blk-mq: Some tag allocation code refactoring 2021-05-24 06:47:22 -06:00
blk-mq-virtio.c blk-mq: Fix typo in comment 2020-03-17 20:55:21 +01:00
blk-mq.c block: remove inaccurate requeue check 2021-11-18 19:16:17 +01:00
blk-mq.h blk: Fix lock inversion between ioc lock and bfqd lock 2021-06-24 18:43:55 -06:00
blk-pm.c scsi: block: Fix a race in the runtime power management code 2020-12-09 11:41:41 -05:00
blk-pm.h block: Remove unused blk_pm_*() function definitions 2021-02-22 06:33:48 -07:00
blk-rq-qos.c rq-qos: fix missed wake-ups in rq_qos_throttle try two 2021-06-08 15:12:57 -06:00
blk-rq-qos.h block: Introduce the ioprio rq-qos policy 2021-06-21 15:03:40 -06:00
blk-settings.c block: Fix partition check for host-aware zoned block devices 2021-10-27 06:58:01 -06:00
blk-stat.c blk-stat: make q->stats->lock irqsafe 2020-09-01 16:48:46 -06:00
blk-stat.h block: deactivate blk_stat timer in wbt_disable_default() 2018-12-12 06:47:51 -07:00
blk-sysfs.c block: call blk_register_queue earlier in device_add_disk 2021-08-23 12:55:45 -06:00
blk-throttle.c blk-throttle: fix UAF by deleteing timer in blk_throtl_exit() 2021-09-07 08:36:56 -06:00
blk-timeout.c block: blk-timeout: delete duplicated word 2020-07-31 16:29:47 -06:00
blk-wbt.c blk-wbt: prevent NULL pointer dereference in wb_timer_fn 2021-11-18 19:16:34 +01:00
blk-wbt.h blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled() 2021-06-21 15:03:41 -06:00
blk-zoned.c blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN 2021-08-24 10:12:36 -06:00
blk.h block: bump max plugged deferred size from 16 to 32 2021-11-18 19:16:16 +01:00
bounce.c block: use memcpy_from_bvec in __blk_queue_bounce 2021-08-02 13:37:28 -06:00
bsg-lib.c scsi: bsg-lib: Fix commands without data transfer in bsg_transport_sg_io_fn() 2021-08-01 13:21:40 -04:00
bsg.c scsi: bsg: Fix device unregistration 2021-09-14 00:22:15 -04:00
disk-events.c block: return errors from disk_alloc_events 2021-08-23 12:55:45 -06:00
elevator.c block: return ELEVATOR_DISCARD_MERGE if possible 2021-08-09 14:37:47 -06:00
fops.c block: hold ->invalidate_lock in blkdev_fallocate 2021-09-24 11:06:58 -06:00
genhd.c block: drain queue after disk is removed from sysfs 2021-10-26 08:44:38 -06:00
holder.c block: add back the bd_holder_dir reference in bd_link_disk_holder 2021-08-20 21:14:26 -06:00
ioctl.c block: pass a gendisk to bdev_resize_partition 2021-08-12 10:31:36 -06:00
ioprio.c block: fix default IO priority handling 2021-08-18 07:23:15 -06:00
keyslot-manager.c - Fix DM integrity's HMAC support to provide enhanced security of 2021-02-22 10:22:54 -08:00
kyber-iosched.c kyber: avoid q->disk dereferences in trace points 2021-10-15 21:02:57 -06:00
mq-deadline.c block/mq-deadline: Move dd_queued() to fix defined but not used warning 2021-09-02 06:34:45 -06:00
opal_proto.h block: sed-opal: Change the check condition for regular session validity 2020-03-12 08:00:10 -06:00
sed-opal.c block: sed-opal: Change the check condition for regular session validity 2020-03-12 08:00:10 -06:00
t10-pi.c block: use bvec_kmap_local in t10_pi_type1_{prepare,complete} 2021-08-02 13:37:28 -06:00