linux/block
Tejun Heo 07c2bd3735 block: don't call elevator callbacks for plug merges
Plug merge calls two elevator callbacks outside queue lock -
elevator_allow_merge_fn() and elevator_bio_merged_fn().  Although
attempt_plug_merge() suggests that elevator is guaranteed to be there
through the existing request on the plug list, nothing prevents plug
merge from calling into dying or initializing elevator.

For regular merges, bypass ensures elvpriv count to reach zero, which
in turn prevents merges as all !ELVPRIV requests get REQ_SOFTBARRIER
from forced back insertion.  Plug merge doesn't check ELVPRIV, and, as
the requests haven't gone through elevator insertion yet, it doesn't
have SOFTBARRIER set allowing merges on a bypassed queue.

This, for example, leads to the following crash during elevator
switch.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
 IP: [<ffffffff813b34e9>] cfq_allow_merge+0x49/0xa0
 PGD 112cbc067 PUD 115d5c067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP
 CPU 1
 Modules linked in: deadline_iosched

 Pid: 819, comm: dd Not tainted 3.3.0-rc2-work+ #76 Bochs Bochs
 RIP: 0010:[<ffffffff813b34e9>]  [<ffffffff813b34e9>] cfq_allow_merge+0x49/0xa0
 RSP: 0018:ffff8801143a38f8  EFLAGS: 00010297
 RAX: 0000000000000000 RBX: ffff88011817ce28 RCX: ffff880116eb6cc0
 RDX: 0000000000000000 RSI: ffff880118056e20 RDI: ffff8801199512f8
 RBP: ffff8801143a3908 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000000 R12: ffff880118195708
 R13: ffff880118052aa0 R14: ffff8801143a3d50 R15: ffff880118195708
 FS:  00007f19f82cb700(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 0000000000000008 CR3: 0000000112c6a000 CR4: 00000000000006e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process dd (pid: 819, threadinfo ffff8801143a2000, task ffff880116eb6cc0)
 Stack:
  ffff88011817ce28 ffff880118195708 ffff8801143a3928 ffffffff81391bba
  ffff88011817ce28 ffff880118195708 ffff8801143a3948 ffffffff81391bf1
  ffff88011817ce28 0000000000000000 ffff8801143a39a8 ffffffff81398e3e
 Call Trace:
  [<ffffffff81391bba>] elv_rq_merge_ok+0x4a/0x60
  [<ffffffff81391bf1>] elv_try_merge+0x21/0x40
  [<ffffffff81398e3e>] blk_queue_bio+0x8e/0x390
  [<ffffffff81396a5a>] generic_make_request+0xca/0x100
  [<ffffffff81396b04>] submit_bio+0x74/0x100
  [<ffffffff811d45c2>] __blockdev_direct_IO+0x1ce2/0x3450
  [<ffffffff811d0dc7>] blkdev_direct_IO+0x57/0x60
  [<ffffffff811460b5>] generic_file_aio_read+0x6d5/0x760
  [<ffffffff811986b2>] do_sync_read+0xe2/0x120
  [<ffffffff81199345>] vfs_read+0xc5/0x180
  [<ffffffff81199501>] sys_read+0x51/0x90
  [<ffffffff81aeac12>] system_call_fastpath+0x16/0x1b

There are multiple ways to fix this including making plug merge check
ELVPRIV; however,

* Calling into elevator outside queue lock is confusing and
  error-prone.

* Requests on plug list aren't known to the elevator.  They aren't on
  the elevator yet, so there's no elevator specific state to update.

* Given the nature of plug merges - collecting bio's for the same
  purpose from the same issuer - elevator specific restrictions aren't
  applicable.

So, simply don't call into elevator methods from plug merge by moving
elv_bio_merged() from bio_attempt_*_merge() to blk_queue_bio(), and
using blk_try_merge() in attempt_plug_merge().

This is based on Jens' patch to skip elevator_allow_merge_fn() from
plug merge.

Note that this makes per-cgroup merged stats skip plug merging.

Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <4F16F3CA.90904@kernel.dk>
Original-patch-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-02-08 09:19:42 +01:00
..
partitions separate partition format handling from generic code 2012-01-03 22:54:06 -05:00
Kconfig move fs/partitions to block/ 2012-01-03 22:54:06 -05:00
Kconfig.iosched blk-cgroup: config options re-arrangement 2010-04-26 19:27:56 +02:00
Makefile separate partition format handling from generic code 2012-01-03 22:54:06 -05:00
blk-cgroup.c block: strip out locking optimization in put_io_context() 2012-02-07 07:51:30 +01:00
blk-cgroup.h block: fix a typo in the blk-cgroup.h file 2011-10-24 16:08:38 +02:00
blk-core.c block: don't call elevator callbacks for plug merges 2012-02-08 09:19:42 +01:00
blk-exec.c block: add missing blk_queue_dead() checks 2011-12-14 00:33:37 +01:00
blk-flush.c blk-flush: move the queue kick into 2011-10-24 16:24:31 +02:00
blk-integrity.c block: add export.h to files using EXPORT_SYMBOL/THIS_MODULE macros 2011-10-31 19:31:12 -04:00
blk-ioc.c block: strip out locking optimization in put_io_context() 2012-02-07 07:51:30 +01:00
blk-iopoll.c tree-wide: fix assorted typos all over the place 2009-12-04 15:39:55 +01:00
blk-lib.c block: fix patch import error in max_discard_sectors check 2011-07-23 20:34:59 +02:00
blk-map.c block: re-use existing 'reading' variable instead of checking direction again 2011-12-21 15:27:24 +01:00
blk-merge.c block: separate out blk_rq_merge_ok() and blk_try_merge() from elevator functions 2012-02-08 09:19:38 +01:00
blk-settings.c block: Introduce blk_set_stacking_limits function 2012-01-11 16:27:11 +01:00
blk-softirq.c block: Don't check QUEUE_FLAG_SAME_COMP in __blk_complete_request 2011-09-14 09:31:01 +02:00
blk-sysfs.c block, cfq: move io_cq exit/release to blk-ioc.c 2011-12-14 00:33:42 +01:00
blk-tag.c block: fix blk_queue_end_tag() 2011-12-29 09:16:28 +01:00
blk-throttle.c block: add blk_queue_dead() 2011-12-14 00:33:37 +01:00
blk-timeout.c fault-injection: add ability to export fault_attr in arbitrary directory 2011-08-03 14:25:20 -10:00
blk.h block: separate out blk_rq_merge_ok() and blk_try_merge() from elevator functions 2012-02-08 09:19:38 +01:00
bsg-lib.c block: Change module.h -> export.h in bsg-lib.c 2011-10-31 19:31:13 -04:00
bsg.c Merge branch 'for-3.3/core' of git://git.kernel.dk/linux-block 2012-01-15 12:24:45 -08:00
cfq-iosched.c block: don't call elevator callbacks for plug merges 2012-02-08 09:19:42 +01:00
cfq.h blk-cgroup: Add unaccounted time to timeslice_used. 2011-03-12 16:54:00 +01:00
compat_ioctl.c block: Add BLKROTATIONAL ioctl 2012-01-11 16:29:31 +01:00
deadline-iosched.c block, cfq: move icq cache management to block core 2011-12-14 00:33:42 +01:00
elevator.c block: separate out blk_rq_merge_ok() and blk_try_merge() from elevator functions 2012-02-08 09:19:38 +01:00
genhd.c Merge branch 'for-3.3/core' of git://git.kernel.dk/linux-block 2012-01-15 12:24:45 -08:00
ioctl.c Merge branch 'for-3.3/core' of git://git.kernel.dk/linux-block 2012-01-15 12:24:45 -08:00
noop-iosched.c block, cfq: move icq cache management to block core 2011-12-14 00:33:42 +01:00
partition-generic.c separate partition format handling from generic code 2012-01-03 22:54:06 -05:00
scsi_ioctl.c block: fail SCSI passthrough ioctls on partition devices 2012-01-14 15:07:24 -08:00