Commit Graph

1413 Commits

Author SHA1 Message Date
Kees Cook 490b94be02 iosched: remove redundant sprintf
After the anticipatory scheduler was dropped, there was no need to
special-case the request_module string. As such, drop the redundant
sprintf and stack variable.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-05 18:02:12 -06:00
Tao Ma addd0a09fc block: Remove 'plug/unplug' comment in blk_execute_rq_nowait
unplug is replaced with blk_run_queue now in blk_execute_rq_nowait,
so change the comment accordingly.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-05 15:10:05 -06:00
Christoph Hellwig 24ecfbe27f block: add blk_run_queue_async
Instead of overloading __blk_run_queue to force an offload to kblockd
add a new blk_run_queue_async helper to do it explicitly.  I've kept
the blk_queue_stopped check for now, but I suspect it's not needed
as the check we do when the workqueue items runs should be enough.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-18 11:41:33 +02:00
Jens Axboe 4521cc4ed5 block: blk_delay_queue() should use kblockd workqueue
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-18 11:36:39 +02:00
Jens Axboe 99e22598e9 block: drop queue lock before calling __blk_run_queue() for kblockd punt
If we know we are going to punt to kblockd, we can drop the queue
lock before calling into __blk_run_queue() since it only does a
safe bit test and a workqueue call. Since kblockd needs to grab
this very lock as one of the first things it does, it's a good
optimization to drop the lock before waking kblockd.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-18 09:59:55 +02:00
Jens Axboe b4cb290e0a Revert "block: add callback function for unplug notification"
MD can't use this since it really requires us to be able to
keep more than a single piece of state for the unplug. Commit
048c9374 added the required support for MD, so get rid of this
now unused code.

This reverts commit f75664570d.

Conflicts:

	block/blk-core.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-18 09:54:05 +02:00
NeilBrown 048c9374a7 block: Enhance new plugging support to support general callbacks
md/raid requires an unplug callback, but as it does not uses
requests the current code cannot provide one.

So allow arbitrary callbacks to be attached to the blk_plug.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-18 09:52:22 +02:00
Jens Axboe 49cac01e1f block: make unplug timer trace event correspond to the schedule() unplug
It's a pretty close match to what we had before - the timer triggering
would mean that nobody unplugged the plug in due time, in the new
scheme this matches very closely what the schedule() unplug now is.
It's essentially the difference between an explicit unplug (IO unplug)
or an implicit unplug (timer unplug, we scheduled with pending IO
queued).

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-16 13:51:05 +02:00
Jens Axboe f6603783f9 block: only force kblockd unplugging from the schedule() path
For the explicit unplugging, we'd prefer to kick things off
immediately and not pay the penalty of the latency to switch
to kblockd. So let blk_finish_plug() do the run inline, while
the implicit-on-schedule-out unplug will punt to kblockd.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-15 15:49:07 +02:00
Christoph Hellwig 88b996cd06 block: cleanup the block plug helper functions
It's a bit of a mess currently. task->plug is being cleared
and reset in __blk_finish_plug(), and blk_finish_plug() is
testing for a NULL plug which cannot happen even from schedule()
anymore since it uses blk_needs_flush_plug() to determine
whether to call into this function at all.

So get rid of some of the cruft.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-15 15:20:10 +02:00
Liu Yuan 80656b67b3 block, blk-sysfs: Use the variable directly instead of a function call
In the function blk_register_queue(), var _dev_ is already assigned by
disk_to_dev().So use it directly instead of calling disk_to_dev() again.

Signed-off-by: Liu Yuan <tailai.ly@taobao.com>

Modified by me to delete an empty line in the same function while
in there anyway.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-13 22:14:54 +02:00
Jens Axboe f4af3c3d07 block: move queue run on unplug to kblockd
There are worries that we are now consuming a lot more stack in
some cases, since we potentially call into IO dispatch from
schedule() or io_schedule(). We can reduce this problem by moving
the running of the queue to kblockd, like the old plugging scheme
did as well.

This may or may not be a good idea from a performance perspective,
depending on how many tasks have queue plugs running at the same
time. For even the slightly contended case, doing just a single
queue run from kblockd instead of multiple runs directly from the
unpluggers will be faster.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-12 14:58:51 +02:00
Jens Axboe cf82c79839 block: kill queue_sync_plugs()
The original use for this dates back to when we had to track write
requests for serializing around barriers. That's not needed anymore,
so kill it.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-12 10:30:53 +02:00
Jens Axboe dc6d36c971 block: readd plug trace event
This was removed with the queue plug state. But we can easily readd
by checking if this is the first request going to this queue. It's
good information to have when tracing to see how effective the
plugging is.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-12 10:28:28 +02:00
Jens Axboe f75664570d block: add callback function for unplug notification
MD would like to know when a queue is unplugged, so it can flush
it's bitmap writes. Add such a callback.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-12 10:17:31 +02:00
Jens Axboe 188112722c block: add comment on why we save and disable interrupts in flush_plug_list()
It's done at the top to avoid doing it for every queue we unplug.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-12 10:12:29 +02:00
Jens Axboe 94b5eb28b4 block: fixup block IO unplug trace call
It was removed with the on-stack plugging, readd it and track the
depth of requests added when flushing the plug.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-12 10:12:19 +02:00
NeilBrown 109b81296c block: splice plug list to local context
If the request_fn ends up blocking, we could be re-entering
the plug flush. Since the list is protected by explicitly
not allowing schedule events, this isn't a terribly good idea.

Additionally, it can cause us to recurse. As request_fn called by
__blk_run_queue is allowed to 'schedule()' (after dropping the queue
lock of course), it is possible to get a recursive call:

 schedule -> blk_flush_plug -> __blk_finish_plug -> flush_plug_list
      -> __blk_run_queue -> request_fn -> schedule

We must make sure that the second schedule does not call into
blk_flush_plug again.  So instead of leaving the list of requests on
blk_plug->list, move them to a separate list leaving blk_plug->list
empty.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-11 14:13:10 +02:00
Linus Torvalds 42933bac11 Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6
* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6:
  Fix common misspellings
2011-04-07 11:14:49 -07:00
Konstantin Khlebnikov f83e826181 block: fix request sorting at unplug
Comparison function for list_sort() must be anticommutative,
otherwise it is not sorting in ordinary meaning.

But fortunately list_sort() always check ((*cmp)(priv, a, b) <= 0)
it not distinguish negative and zero, so comparison function can
implement only less-or-equal instead of full three-way comparison.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-05 23:52:49 +02:00
Mike Snitzer a63a5cf84d dm: improve block integrity support
The current block integrity (DIF/DIX) support in DM is verifying that
all devices' integrity profiles match during DM device resume (which
is past the point of no return).  To some degree that is unavoidable
(stacked DM devices force this late checking).  But for most DM
devices (which aren't stacking on other DM devices) the ideal time to
verify all integrity profiles match is during table load.

Introduce the notion of an "initialized" integrity profile: a profile
that was blk_integrity_register()'d with a non-NULL 'blk_integrity'
template.  Add blk_integrity_is_initialized() to allow checking if a
profile was initialized.

Update DM integrity support to:
- check all devices with _initialized_ integrity profiles match
  during table load; uninitialized profiles (e.g. for underlying DM
  device(s) of a stacked DM device) are ignored.
- disallow a table load that would result in an integrity profile that
  conflicts with a DM device's existing (in-use) integrity profile
- avoid clearing an existing integrity profile
- validate all integrity profiles match during resume; but if they
  don't all we can do is report the mismatch (during resume we're past
  the point of no return)

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-05 23:52:43 +02:00
Andreas Schwab 6f03793770 blk-throttle: don't call xchg on bool
xchg does not work portably with smaller than 32bit types.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-05 23:51:37 +02:00
Jens Axboe 53d63e6b0d block: make the flush insertion use the tail of the dispatch list
It's not a preempt type request, in fact we have to insert it
behind requests that do specify INSERT_FRONT.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-05 23:51:37 +02:00
Jens Axboe b710a48055 block: get rid of elv_insert() interface
Merge it with __elv_add_request(), it's pretty pointless to
have a function with only two callers. The main interface
is elv_add_request()/__elv_add_request().

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-05 23:51:37 +02:00
Jens Axboe 8182924bc5 block: dump request state on seeing a corrupted request completion
Currently we just dump a non-informative 'request botched' message.
Lets actually try and print something sane to help debug issues
around this.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-05 23:51:37 +02:00
Lucas De Marchi 25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Jens Axboe ad3d9d7ede block: fix issue with calling blk_stop_queue() from the request_fn handler
When the queue work handler was converted to delayed work, the
stopping was inadvertently made sync as well. Change this back
to being async stop, using __cancel_delayed_work() instead of
cancel_delayed_work().

Reported-by: Jeremy Fitzhardinge <jeremy@goop.org>
Reported-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-25 17:04:08 +01:00
Jens Axboe 401a18e92c block: fix bug with inserting flush requests as sort/merge
With the introduction of the on-stack plugging, we would assume
that any request being inserted was a normal file system request.
As flush/fua requires a special insert mode, this caused problems.

Fix this up by checking for this in flush_plug_list() and use
the appropriate insert mechanism.

Big thanks goes to Markus Tripplesdorf for tirelessly testing
patches, and to Sergey Senozhatsky for helping find the real
issue.

Reported-by: Markus Tripplesdorf <markus@trippelsdorf.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-25 17:04:08 +01:00
Linus Torvalds 6c51038900 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
  Documentation/iostats.txt: bit-size reference etc.
  cfq-iosched: removing unnecessary think time checking
  cfq-iosched: Don't clear queue stats when preempt.
  blk-throttle: Reset group slice when limits are changed
  blk-cgroup: Only give unaccounted_time under debug
  cfq-iosched: Don't set active queue in preempt
  block: fix non-atomic access to genhd inflight structures
  block: attempt to merge with existing requests on plug flush
  block: NULL dereference on error path in __blkdev_get()
  cfq-iosched: Don't update group weights when on service tree
  fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
  block: Require subsystems to explicitly allocate bio_set integrity mempool
  jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
  jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
  fs: make fsync_buffers_list() plug
  mm: make generic_writepages() use plugging
  blk-cgroup: Add unaccounted time to timeslice_used.
  block: fixup plugging stubs for !CONFIG_BLOCK
  block: remove obsolete comments for blkdev_issue_zeroout.
  blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
  ...

Fix up conflicts in fs/{aio.c,super.c}
2011-03-24 10:16:26 -07:00
Li, Shaohua c4ade94fc0 cfq-iosched: removing unnecessary think time checking
Removing think time checking. A high thinktime queue might means the queue
dispatches several requests and then do away. Limitting such queue seems
meaningless. And also this can simplify code. This is suggested by Vivek.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-23 08:30:34 +01:00
Justin TerAvest 62a37f6bad cfq-iosched: Don't clear queue stats when preempt.
For v2, I added back lines to cfq_preempt_queue() that were removed
during updates for accounting unaccounted_time. Thanks for pointing out
that I'd missed these, Vivek.

Previous commit "cfq-iosched: Don't set active queue in preempt" wrongly
cleared stats for preempting queues when it shouldn't have, because when
we choose a queue to preempt, it still isn't necessarily scheduled next.

Thanks to Vivek Goyal for figuring this out and understanding how the
preemption code works.

Signed-off-by: Justin TerAvest <teravest@google.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-23 08:25:44 +01:00
Vivek Goyal 04521db04e blk-throttle: Reset group slice when limits are changed
Lina reported that if throttle limits are initially very high and then
dropped, then no new bio might be dispatched for a long time. And the
reason being that after dropping the limits we don't reset the existing
slice and do the rate calculation with new low rate and account the bios
dispatched at high rate. To fix it, reset the slice upon rate change.

https://lkml.org/lkml/2011/3/10/298

Another problem with very high limit is that we never queued the
bio on throtl service tree. That means we kept on extending the
group slice but never trimmed it. Fix that also by regulary
trimming the slice even if bio is not being queued up.

Reported-by: Lina Lu <lulina_nuaa@foxmail.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-22 21:55:00 +01:00
Justin TerAvest 9026e521c0 blk-cgroup: Only give unaccounted_time under debug
This change moves unaccounted_time to only be reported when
CONFIG_DEBUG_BLK_CGROUP is true.

Signed-off-by: Justin TerAvest <teravest@google.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-22 21:26:54 +01:00
Justin TerAvest eda5e0c91f cfq-iosched: Don't set active queue in preempt
Commit "Add unaccounted time to timeslice_used" changed the behavior of
cfq_preempt_queue to set cfqq active. Vivek pointed out that other
preemption rules might get involved, so we shouldn't manually set which
queue is active.

This cleans up the code to just clear the queue stats at preemption
time.

Signed-off-by: Justin TerAvest <teravest@google.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-22 21:26:49 +01:00
Jens Axboe 5e84ea3a9c block: attempt to merge with existing requests on plug flush
One of the disadvantages of on-stack plugging is that we potentially
lose out on merging since all pending IO isn't always visible to
everybody. When we flush the on-stack plugs, right now we don't do
any checks to see if potential merge candidates could be utilized.

Correct this by adding a new insert variant, ELEVATOR_INSERT_SORT_MERGE.
It works just ELEVATOR_INSERT_SORT, but first checks whether we can
merge with an existing request before doing the insertion (if we fail
merging).

This fixes a regression with multiple processes issuing IO that
can be merged.

Thanks to Shaohua Li <shaohua.li@intel.com> for testing and fixing
an accounting bug.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-21 10:14:27 +01:00
Linus Torvalds c55d267de2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (170 commits)
  [SCSI] scsi_dh_rdac: Add MD36xxf into device list
  [SCSI] scsi_debug: add consecutive medium errors
  [SCSI] libsas: fix ata list corruption issue
  [SCSI] hpsa: export resettable host attribute
  [SCSI] hpsa: move device attributes to avoid forward declarations
  [SCSI] scsi_debug: Logical Block Provisioning (SBC3r26)
  [SCSI] sd: Logical Block Provisioning update
  [SCSI] Include protection operation in SCSI command trace
  [SCSI] hpsa: fix incorrect PCI IDs and add two new ones (2nd try)
  [SCSI] target: Fix volume size misreporting for volumes > 2TB
  [SCSI] bnx2fc: Broadcom FCoE offload driver
  [SCSI] fcoe: fix broken fcoe interface reset
  [SCSI] fcoe: precedence bug in fcoe_filter_frames()
  [SCSI] libfcoe: Remove stale fcoe-netdev entries
  [SCSI] libfcoe: Move FCOE_MTU definition from fcoe.h to libfcoe.h
  [SCSI] libfc: introduce __fc_fill_fc_hdr that accepts fc_hdr as an argument
  [SCSI] fcoe, libfc: initialize EM anchors list and then update npiv EMs
  [SCSI] Revert "[SCSI] libfc: fix exchange being deleted when the abort itself is timed out"
  [SCSI] libfc: Fixing a memory leak when destroying an interface
  [SCSI] megaraid_sas: Version and Changelog update
  ...

Fix up trivial conflicts due to whitespace differences in
drivers/scsi/libsas/{sas_ata.c,sas_scsi_host.c}
2011-03-17 17:54:40 -07:00
Justin TerAvest 8184f93ece cfq-iosched: Don't update group weights when on service tree
Version 3 is updated to apply to for-2.6.39/core.

For version 2, I took Vivek's advice and made sure we update the group
weight from cfq_group_service_tree_add().

If a weight was updated while a group is on the service tree, the
calculation for the total weight of the service tree can be adjusted
improperly, which either leads to bad service tree weights, or
potentially crashes (if total_weight becomes 0).

This patch defers updates to the weight until a group is off the service
tree.

Signed-off-by: Justin TerAvest <teravest@google.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-17 16:12:36 +01:00
Justin TerAvest 167400d340 blk-cgroup: Add unaccounted time to timeslice_used.
There are two kind of times that tasks are not charged for: the first
seek and the extra time slice used over the allocated timeslice. Both
of these exported as a new unaccounted_time stat.

I think it would be good to have this reported in 'time' as well, but
that is probably a separate discussion.

Signed-off-by: Justin TerAvest <teravest@google.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-12 16:54:00 +01:00
Tao Ma eba2ed9c96 block: remove obsolete comments for blkdev_issue_zeroout.
barrier is already removed, so remove the obsolete comments
in blkdev_issue_zeroout.

Cc: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-11 20:13:54 +01:00
Lukas Czerner 0aeea18964 block: fix mis-synchronisation in blkdev_issue_zeroout()
BZ29402
https://bugzilla.kernel.org/show_bug.cgi?id=29402

We can hit serious mis-synchronization in bio completion path of
blkdev_issue_zeroout() leading to a panic.

The problem is that when we are going to wait_for_completion() in
blkdev_issue_zeroout() we check if the bb.done equals issued (number of
submitted bios). If it does, we can skip the wait_for_completition()
and just out of the function since there is nothing to wait for.
However, there is a ordering problem because bio_batch_end_io() is
calling atomic_inc(&bb->done) before complete(), hence it might seem to
blkdev_issue_zeroout() that all bios has been completed and exit. At
this point when bio_batch_end_io() is going to call complete(bb->wait),
bb and wait does not longer exist since it was allocated on stack in
blkdev_issue_zeroout() ==> panic!

(thread 1)                      (thread 2)
bio_batch_end_io()              blkdev_issue_zeroout()
  if(bb) {                      ...
    if (bb->end_io)             ...
      bb->end_io(bio, err);     ...
    atomic_inc(&bb->done);      ...
    ...                         while (issued != atomic_read(&bb.done))
    ...                         (let issued == bb.done)
    ...                         (do the rest of the function)
    ...                         return ret;
    complete(bb->wait);
    ^^^^^^^^
    panic

We can fix this easily by simplifying bio_batch and completion counting.

Also remove bio_end_io_t *end_io since it is not used.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Eric Whitney <eric.whitney@hp.com>
Tested-by: Eric Whitney <eric.whitney@hp.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
CC: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-11 15:36:08 +01:00
Jens Axboe 4c63f5646e Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core
Conflicts:
	block/blk-core.c
	block/blk-flush.c
	drivers/md/raid1.c
	drivers/md/raid10.c
	drivers/md/raid5.c
	fs/nilfs2/btnode.c
	fs/nilfs2/mdt.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:58:35 +01:00
Vivek Goyal 69d60eb96a blk-throttle: Use blk_plug in throttle dispatch
Use plug in throttle dispatch also as we are dispatching a bunch of
bios in throttle context and some of them might merge.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:27 +01:00
Jens Axboe 721a9602e6 block: kill off REQ_UNPLUG
With the plugging now being explicitly controlled by the
submitter, callers need not pass down unplugging hints
to the block layer. If they want to unplug, it's because they
manually plugged on their own - in which case, they should just
unplug at will.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:27 +01:00
Jens Axboe 7eaceaccab block: remove per-queue plugging
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:52:07 +01:00
Jens Axboe 73c1010119 block: initial patch for on-stack per-task plugging
This patch adds support for creating a queuing context outside
of the queue itself. This enables us to batch up pieces of IO
before grabbing the block device queue lock and submitting them to
the IO scheduler.

The context is created on the stack of the process and assigned in
the task structure, so that we can auto-unplug it if we hit a schedule
event.

The current queue plugging happens implicitly if IO is submitted to
an empty device, yet callers have to remember to unplug that IO when
they are going to wait for it. This is an ugly API and has caused bugs
in the past. Additionally, it requires hacks in the vm (->sync_page()
callback) to handle that logic. By switching to an explicit plugging
scheme we make the API a lot nicer and can get rid of the ->sync_page()
hack in the vm.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:45:54 +01:00
Jens Axboe 3cca6dc1c8 block: add API for delaying work/request_fn a little bit
Currently we use plugging for that, but as plugging is going away,
we need an alternative mechanism.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:45:54 +01:00
Tejun Heo facc31ddc3 block: Don't implicitly trigger event check on disk_unblock_events()
Currently, disk_unblock_events() implicitly kick event check if the
block count reaches zero.  This behavior is not described in the
comment and hinders with future changes.  Make the unblocker
explicitly check events by calling disk_check_events() as necessary.

This patch doesn't cause any behavior difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kay Sievers <kay.sievers@vrfy.org>
2011-03-09 19:54:27 +01:00
Justin TerAvest df457f845e blk-cgroup: Lower minimum weight from 100 to 10.
We've found that we still get good, useful isolation at weights this
low. I'd like to adjust the minimum so that any other changes can take
these values into account.

Signed-off-by: Justin TerAvest <teravest@google.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-08 19:45:00 +01:00
Vivek Goyal de701c74a3 blk-throttle: Some cleanups and race fixes in limit update code
When throttle group limits are updated through cgroups, a thread is
woken up to process these updates. While reviewing that code, oleg noted
couple of race conditions existed in the code and he also suggested that
code can be simplified.

This patch fixes the races simplifies the code based on Oleg's suggestions:

	- Use xchg().
	- Introduced a common function throtl_update_blkio_group_common()
          which is shared now by all iops/bps update functions.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>

Fixed a merge issue, throtl_schedule_delayed_work() takes throtl_data
as the argument now, not the queue.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-07 21:09:32 +01:00
Vivek Goyal 231d704b4a blk-throttle: process limit change only through one function
With the help of cgroup interface one can go and upate the bps/iops
limits of existing group. Once the limits are udpated, a thread is
woken up to see if some blocked group needs recalculation based on new
limits and needs to be requeued.

There was also a piece of code where I was checking for group limit
update when a fresh bio comes in. This patch gets rid of that piece of
code and keeps processing the limit change at one place
throtl_process_limit_change().  It just keeps the code simple and easy
to understand.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-07 21:05:14 +01:00