Commit Graph

164 Commits

Author SHA1 Message Date
Ryan Harper a5eb9e4ff1 virtio_blk: Add 'serial' attribute to virtio-blk devices (v2)
Create a new attribute for virtio-blk devices that will fetch the serial number
of the block device.  This attribute can be used by udev to create disk/by-id
symlinks for devices that don't have a UUID (filesystem) associated with them.

ATA_IDENTIFY strings are special in that they can be up to 20 chars long
and aren't required to be nul-terminated.  The buffer is also zero-padded
meaning that if the serial is 19 chars or less that we get a nul-terminated
string.  When copying this value into a string buffer, we must be careful to
copy up to the nul (if it present) and only 20 if it is longer and not to
attempt to nul terminate; this isn't needed.

Changes since v1:
- Added BUILD_BUG_ON() for PAGE_SIZE check
- Removed min() since BUILD_BUG_ON() handles the check
- Replaced serial_sysfs() by copying id directly to buffer

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-08-05 13:05:30 +09:30
Christoph Hellwig 10bc310c27 virtio_blk: support barriers without FLUSH feature
If we want to support barriers with the cache=writethrough mode in qemu
we need to tell the block layer that we only need queue drains to
implement a barrier.  Follow the model set by SCSI and IDE and assume
that there is no volatile write cache if the host doesn't advertize it.
While this might imply working barriers on old qemu versions or other
hypervisors that actually have a volatile write cache this is only a
cosmetic issue - these hypervisors don't guarantee any data integrity
with or without this patch, but with the patch we at least provide
data ordering.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-08-05 13:05:29 +09:30
Christoph Hellwig a5b365a652 virtio-blk: fix minimum number of S/G elements
We need at least one S/G element to operate properly, as does the block
layer which increments it to one anyway.  We hit this due to a qemu
bug which advertises a sg_elements of 0 under some circumstances.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (tweaked logic)
2010-06-03 22:39:18 +09:30
Michael S. Tsirkin 09ec6b69d2 virtio_blk: use virtqueue_xxx wrappers
Switch virtio_blk to new virtqueue_xxx wrappers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-05-19 22:15:42 +09:30
Rusty Russell bdb4a13057 virtio_blk: remove multichar constant.
drivers/block/virtio_blk.c:228:13: warning: multi-character character constant

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: john cooper <john.cooper@redhat.com>
2010-05-19 22:15:41 +09:30
john cooper 234f2725a5 Add virtio disk identification ioctl
Return serial string to the guest application via
ioctl driver call.

Note this form of interface to the guest userland
was the consensus when the prior version using
the ATA_IDENTIFY came under dispute.

Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-05-19 22:15:40 +09:30
john cooper 4cb2ea28c5 Add virtio disk identification support
Add virtio-blk device id (s/n) support via virtio request.

Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-05-19 22:15:40 +09:30
Linus Torvalds 2f4084209a Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: (34 commits)
  cfq-iosched: Fix the incorrect timeslice accounting with forced_dispatch
  loop: Update mtime when writing using aops
  block: expose the statistics in blkio.time and blkio.sectors for the root cgroup
  backing-dev: Handle class_create() failure
  Block: Fix block/elevator.c elevator_get() off-by-one error
  drbd: lc_element_by_index() never returns NULL
  cciss: unlock on error path
  cfq-iosched: Do not merge queues of BE and IDLE classes
  cfq-iosched: Add additional blktrace log messages in CFQ for easier debugging
  i2o: Remove the dangerous kobj_to_i2o_device macro
  block: remove 16 bytes of padding from struct request on 64bits
  cfq-iosched: fix a kbuild regression
  block: make CONFIG_BLK_CGROUP visible
  Remove GENHD_FL_DRIVERFS
  block: Export max number of segments and max segment size in sysfs
  block: Finalize conversion of block limits functions
  block: Fix overrun in lcm() and move it to lib
  vfs: improve writeback_inodes_wb()
  paride: fix off-by-one test
  drbd: fix al-to-on-disk-bitmap for 4k logical_block_size
  ...
2010-04-09 11:50:29 -07:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Martin K. Petersen ee714f2dd3 block: Finalize conversion of block limits functions
Remove compatibility wrappers and update remaining drivers.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-03-15 12:47:59 +01:00
Linus Torvalds b1bf936840 Merge branch 'for-2.6.34' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.34' of git://git.kernel.dk/linux-2.6-block: (38 commits)
  block: don't access jiffies when initialising io_context
  cfq: remove 8 bytes of padding from cfq_rb_root on 64 bit builds
  block: fix for "Consolidate phys_segment and hw_segment limits"
  cfq-iosched: quantum check tweak
  blktrace: perform cleanup after setup error
  blkdev: fix merge_bvec_fn return value checks
  cfq-iosched: requests "in flight" vs "in driver" clarification
  cciss: Fix problem with scatter gather elements in the scsi half of the driver
  cciss: eliminate unnecessary pointer use in cciss scsi code
  cciss: do not use void pointer for scsi hba data
  cciss: factor out scatter gather chain block mapping code
  cciss: fix scatter gather chain block dma direction kludge
  cciss: simplify scatter gather code
  cciss: factor out scatter gather chain block allocation and freeing
  cciss: detect bad alignment of scsi commands at build time
  cciss: clarify command list padding calculation
  cfq-iosched: rethink seeky detection for SSDs
  cfq-iosched: rework seeky detection
  block: remove padding from io_context on 64bit builds
  block: Consolidate phys_segment and hw_segment limits
  ...
2010-03-01 09:00:29 -08:00
Christoph Hellwig 69740c8ba8 virtio_blk: add block topology support
Allow reading various alignment values from the config page.  This
allows the guest to much better align I/O requests depending on the
storage topology.

Note that the formats for the config values appear a bit messed up,
but we follow the formats used by ATA and SCSI so they are expected in
the storage world.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-02-24 14:22:26 +10:30
Márton Németh 47483e2520 block: make virtio device id constant
The id_table field of the struct virtio_driver is constant in <linux/virtio.h>
so it is worth to make id_table also constant.

The semantic match that finds this kind of pattern is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
disable decl_init,const_decl_init;
identifier I1, I2, x;
@@
	struct I1 {
	  ...
	  const struct I2 *x;
	  ...
	};
@s@
identifier r.I1, y;
identifier r.x, E;
@@
	struct I1 y = {
	  .x = E,
	};
@c@
identifier r.I2;
identifier s.E;
@@
	const struct I2 E[] = ... ;
@depends on !c@
identifier r.I2;
identifier s.E;
@@
+	const
	struct I2 E[] = ...;
// </smpl>

Signed-off-by: Márton Németh <nm127@freemail.hu>
Cc: Julia Lawall <julia@diku.dk>
Cc: cocci@diku.dk
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-01-11 14:31:27 +01:00
Rusty Russell 3225beaba0 virtio_blk: Revert serial number support
This reverts "Add serial number support for virtio_blk, V4a".

Turns out that virtio_pci, lguest and s/390 all have an 8 bit limit
on virtio config space, so noone could ever use this.

This is coming back later in a cleaner form.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: john cooper <john.cooper@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
2009-10-22 16:39:30 +10:30
Christian Borntraeger e95646c3ec virtio: let header files include virtio_ids.h
Rusty,

commit 3ca4f5ca73
    virtio: add virtio IDs file
moved all device IDs into a single file. While the change itself is
a very good one, it can break userspace applications. For example
if a userspace tool wanted to get the ID of virtio_net it used to
include virtio_net.h. This does no longer work, since virtio_net.h
does not include virtio_ids.h.
This patch moves all "#include <linux/virtio_ids.h>" from the C
files into the header files, making the header files compatible with
the old ones.

In addition, this patch exports virtio_ids.h to userspace.

CC: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-10-22 16:39:28 +10:30
Christoph Hellwig f8b12e513b virtio_blk: revert QUEUE_FLAG_VIRT addition
It seems like the addition of QUEUE_FLAG_VIRT caueses major performance
regressions for Fedora users:

	https://bugzilla.redhat.com/show_bug.cgi?id=509383
	https://bugzilla.redhat.com/show_bug.cgi?id=505695

while I can't reproduce those extreme regressions myself I think the flag
is wrong.

Rationale:

  QUEUE_FLAG_VIRT expands to QUEUE_FLAG_NONROT which casus the queue
  unplugged immediately.  This is not a good behaviour for at least
  qemu and kvm where we do have significant overhead for every
  I/O operations.  Even with all the latested speeups (native AIO,
  MSI support, zero copy) we can only get native speed for up to 128kb
  I/O requests we already are down to 66% of native performance for 4kb
  requests even on my laptop running the Intel X25-M SSD for which the
  QUEUE_FLAG_NONROT was designed.
  If we ever get virtio-blk overhead low enough that this flag makes
  sense it should only be set based on a feature flag set by the host.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-10-22 16:39:26 +10:30
Linus Torvalds 1f0918d03f Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  lguest: don't force VIRTIO_F_NOTIFY_ON_EMPTY
  lguest: cleanup for map_switcher()
  lguest: use PGDIR_SHIFT for PAE code to allow different PAGE_OFFSET
  lguest: use set_pte/set_pmd uniformly for real page table entries
  lguest: move panic notifier registration to its expected place.
  virtio_blk: add support for cache flush
  virtio: add virtio IDs file
  virtio: get rid of redundant VIRTIO_ID_9P definition
  virtio: make add_buf return capacity remaining
  virtio_pci: minor MSI-X cleanups
2009-09-23 09:23:45 -07:00
Christoph Hellwig f1b0ef0626 virtio_blk: add support for cache flush
Recent qemu has added a VIRTIO_BLK_F_FLUSH flag to advertise that the
virtual disk has a volatile write cache that needs to be flushed.  In case
we see this feature implement tell the Linux block layer about the fact
and use the new VIRTIO_BLK_T_FLUSH to flush the cache when required.  This
allows for an correct and simple implementation of write barriers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-23 22:26:36 +09:30
Fernando Luis Vazquez Cao 3ca4f5ca73 virtio: add virtio IDs file
Virtio IDs are spread all over the tree which makes assigning new IDs
bothersome. Putting them together should make the process less error-prone.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-23 22:26:32 +09:30
Rusty Russell 3c1b27d504 virtio: make add_buf return capacity remaining
This API change means that virtio_net can tell how much capacity
remains for buffers.  It's necessarily fuzzy, since
VIRTIO_RING_F_INDIRECT_DESC means we can fit any number of descriptors
in one, *if* we can kmalloc.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dinesh Subhraveti <dineshs@us.ibm.com>
2009-09-23 22:26:31 +09:30
Alexey Dobriyan 83d5cde47d const: make block_device_operations const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:25 -07:00
Linus Torvalds bb184d11ff Merge branch 'tj-block-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc
* 'tj-block-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc:
  virtio_blk: mark virtio_blk with __refdata to kill spurious section mismatch
  block: sysfs fix mismatched queue_var_{store,show} in 64bit kernel
  ataflop: adjust NULL test
  block: fix failfast merge testing in elv_rq_merge_ok()
  z2ram: Small cleanup for z2ram.c
2009-07-22 10:06:33 -07:00
Rakib Mullick 4fbfff7607 virtio_blk: mark virtio_blk with __refdata to kill spurious section mismatch
The variable virtio_blk references the function virtblk_probe() (which
is in .devinit section) and also references the function
virtblk_remove() ( which is in .devexit section). So, virtio_blk
simultaneously refers .devinit and .devexit section. To avoid this
messup, we mark virtio_blk as __refdata.

We were warned by the following warning:

  LD      drivers/block/built-in.o
  WARNING: drivers/block/built-in.o(.data+0xc8dc): Section mismatch in
  reference from the variable virtio_blk to the function
  .devinit.text:virtblk_probe()
  The variable virtio_blk references
  the function __devinit virtblk_probe()
  If the reference is valid then annotate the
  variable with __init* or __refdata (see linux/init.h) or name the variable:
  *driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,

  WARNING: drivers/block/built-in.o(.data+0xc8e0): Section mismatch in
  reference from the variable virtio_blk to the function
  .devexit.text:virtblk_remove()
  The variable virtio_blk references
  the function __devexit virtblk_remove()
  If the reference is valid then annotate the
  variable with __exit* (see linux/init.h) or name the variable:
  *driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-07-19 10:46:48 +09:00
Christoph Hellwig d9ecdea7ed virtio_blk: ioctl return value fix
Block driver ioctl methods must return ENOTTY and not -ENOIOCTLCMD if
they expect the block layer to handle generic ioctls.

This triggered a BLKROSET failure in xfsqa #200.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-07-17 21:47:46 +09:30
Christoph Hellwig 4eff3cae9c virtio_blk: don't bounce highmem requests
By default a block driver bounces highmem requests, but virtio-blk is
perfectly fine with any request that fit into it's 64 bit addressing scheme,
mapped in the kernel virtual space or not.

Besides improving performance on highmem systems this also makes the
reproducible oops in __bounce_end_io go away (but hiding the real cause).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-07-17 21:47:46 +09:30
Mike Frysinger 98e9444474 virtio_blk: add missing __dev{init,exit} markings
The remove member of the virtio_driver structure uses __devexit_p(), so
the remove function itself should be marked with __devexit.  And where
there be __devexit on the remove, so is there __devinit on the probe.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:16:39 +09:30
Michael S. Tsirkin d2a7ddda9f virtio: find_vqs/del_vqs virtio operations
This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
and updates all drivers. This is needed for MSI support, because MSI
needs to know the total number of vectors upfront.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ lguest/9p compile fixes)
2009-06-12 22:16:36 +09:30
Rusty Russell 9499f5e7ed virtio: add names to virtqueue struct, mapping from devices to queues.
Add a linked list of all virtqueues for a virtio device: this helps for
debugging and is also needed for upcoming interface change.

Also, add a "name" field for clearer debug messages.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12 22:16:36 +09:30
john cooper 1d589bb16b Add serial number support for virtio_blk, V4a
This patch extracts the opaque data from pci i/o
region 0 via the added VIRTIO_BLK_F_IDENTIFY
field.  By convention this data takes the form of
that returned by an ATA IDENTIFY DEVICE command,
however the driver (except for structure size)
makes no interpretation of the data.  The structure
data is copied wholesale to userspace via a
HDIO_GET_IDENTITY ioctl command (eg: hdparm -i <dev>).

Signed-off-by: john cooper <john.cooper@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-09 14:41:40 +02:00
Martin K. Petersen e1defc4ff0 block: Do away with the notion of hardsect_size
Until now we have had a 1:1 mapping between storage device physical
block size and the logical block sized used when addressing the device.
With SATA 4KB drives coming out that will no longer be the case.  The
sector size will be 4KB but the logical block size will remain
512-bytes.  Hence we need to distinguish between the physical block size
and the logical ditto.

This patch renames hardsect_size to logical_block_size.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-22 23:22:54 +02:00
Jens Axboe f831cc0349 virtio_blk: get rid of unused variable
drivers/block/virtio_blk.c: In function 'blk_done':
drivers/block/virtio_blk.c:53: warning: unused variable 'nr_bytes'

Leftover from commit 1cde26f928

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-18 14:44:45 +02:00
Hannes Reinecke 1cde26f928 virtio_blk: SG_IO passthru support
Add support for SG_IO passthru to virtio_blk.  We add the scsi command
block after the normal outhdr, and the scsi inhdr with full status
information aswell as the sense buffer before the regular inhdr.

[hch: forward ported, added the VIRTIO_BLK_F_SCSI flags, some comments
 and tested the whole beast]
[axboe: updated to use ->resid and not dual-path the byte count]

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ checkpatch.pl tweak)
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-18 14:41:30 +02:00
Christoph Hellwig 6c3b46f745 virtio_blk: don't blindly derefence req->rq_disk
request->rq_disk is only set for FS requests or BLOCK_PC requests
originating from the generic block layer scsi ioctls.  It's not set
for requests origination from other soures or internal cache flush
commands implemented by the patch I'll send after this.

So instead of using it to get at the private data in do_virtblk_request
setup queue->queuedata and use it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-18 14:38:28 +02:00
Tejun Heo 9934c8c045 block: implement and enforce request peek/start/fetch
Till now block layer allowed two separate modes of request execution.
A request is always acquired from the request queue via
elv_next_request().  After that, drivers are free to either dequeue it
or process it without dequeueing.  Dequeue allows elv_next_request()
to return the next request so that multiple requests can be in flight.

Executing requests without dequeueing has its merits mostly in
allowing drivers for simpler devices which can't do sg to deal with
segments only without considering request boundary.  However, the
benefit this brings is dubious and declining while the cost of the API
ambiguity is increasing.  Segment based drivers are usually for very
old or limited devices and as converting to dequeueing model isn't
difficult, it doesn't justify the API overhead it puts on block layer
and its more modern users.

Previous patches converted all block low level drivers to dequeueing
model.  This patch completes the API transition by...

* renaming elv_next_request() to blk_peek_request()

* renaming blkdev_dequeue_request() to blk_start_request()

* adding blk_fetch_request() which is combination of peek and start

* disallowing completion of queued (not started) requests

* applying new API to all LLDs

Renamings are for consistency and to break out of tree code so that
it's apparent that out of tree drivers need updating.

[ Impact: block request issue API cleanup, no functional change ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: unsik Kim <donari75@gmail.com>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Tim Waugh <tim@cyberelk.net>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Laurent Vivier <Laurent@lvivier.info>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
Cc: Stefan Weinhuber <wein@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-11 09:52:18 +02:00
Tejun Heo 83096ebf12 block: convert to pos and nr_sectors accessors
With recent cleanups, there is no place where low level driver
directly manipulates request fields.  This means that the 'hard'
request fields always equal the !hard fields.  Convert all
rq->sectors, nr_sectors and current_nr_sectors references to
accessors.

While at it, drop superflous blk_rq_pos() < 0 test in swim.c.

[ Impact: use pos and nr_sectors accessors ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Tested-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Tested-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Acked-by: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Acked-by: Mike Miller <mike.miller@hp.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Eric Moore <Eric.Moore@lsi.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Tim Waugh <tim@cyberelk.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Dario Ballabio <ballabio_dario@emc.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: unsik Kim <donari75@gmail.com>
Cc: Laurent Vivier <Laurent@lvivier.info>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-11 09:50:54 +02:00
Tejun Heo 40cbbb781d block: implement and use [__]blk_end_request_all()
There are many [__]blk_end_request() call sites which call it with
full request length and expect full completion.  Many of them ensure
that the request actually completes by doing BUG_ON() the return
value, which is awkward and error-prone.

This patch adds [__]blk_end_request_all() which takes @rq and @error
and fully completes the request.  BUG_ON() is added to to ensure that
this actually happens.

Most conversions are simple but there are a few noteworthy ones.

* cdrom/viocd: viocd_end_request() replaced with direct calls to
  __blk_end_request_all().

* s390/block/dasd: dasd_end_request() replaced with direct calls to
  __blk_end_request_all().

* s390/char/tape_block: tapeblock_end_request() replaced with direct
  calls to blk_end_request_all().

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mike Miller <mike.miller@hp.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-04-28 07:37:35 +02:00
Linus Torvalds ab70537c32 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  lguest: struct device - replace bus_id with dev_name()
  lguest: move the initial guest page table creation code to the host
  kvm-s390: implement config_changed for virtio on s390
  virtio_console: support console resizing
  virtio: add PCI device release() function
  virtio_blk: fix type warning
  virtio: block: dynamic maximum segments
  virtio: set max_segment_size and max_sectors to infinite.
  virtio: avoid implicit use of Linux page size in balloon interface
  virtio: hand virtio ring alignment as argument to vring_new_virtqueue
  virtio: use KVM_S390_VIRTIO_RING_ALIGN instead of relying on pagesize
  virtio: use LGUEST_VRING_ALIGN instead of relying on pagesize
  virtio: Don't use PAGE_SIZE for vring alignment in virtio_pci.
  virtio: rename 'pagesize' arg to vring_init/vring_size
  virtio: Don't use PAGE_SIZE in virtio_pci.c
  virtio: struct device - replace bus_id with dev_name(), dev_set_name()
  virtio-pci queue allocation not page-aligned
2008-12-30 17:37:25 -08:00
Randy Dunlap b194aee956 virtio_blk: fix type warning
Fix parameter type warning:

linux-next-20081126/drivers/block/virtio_blk.c:307: warning: large integer implicitly truncated to unsigned type

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-12-30 09:26:06 +10:30
Rusty Russell 0864b79a15 virtio: block: dynamic maximum segments
Enhance the driver to handle whatever maximum segment number the host
tells us to handle.  Do to this, we need to allocate the scatterlist
dynamically.

We set max_phys_segments and max_hw_segments to the same value (1 if
the host doesn't tell us, since that's safest and all known hosts do
tell us).

Note that kmalloc'ing the structure for large sg_elems might be
problematic: the fix for this is sg_table, but that requires more
work.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-12-30 09:26:05 +10:30
Rusty Russell 4b7f7e2049 virtio: set max_segment_size and max_sectors to infinite.
Setting max_segment_size allows more than 64k per sg element, unless
the host specified a limit.  Setting max_sectors indicates that our
max_hw_segments is the only limit.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-12-30 09:26:05 +10:30
Fernando Luis Vázquez Cao 7d116b626b virtio_blk: set queue paravirt flag
As a paravirt front-end driver, virtio_blk is not a rotational device so
we want do avoid idling in AS/CFQ. Tell the block layer about this.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-12-29 08:28:41 +01:00
Al Viro 4e10985298 [PATCH] switch virtio_blk
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-21 07:48:09 -04:00
Al Viro d4430d62fa [PATCH] beginning of methods conversion
To keep the size of changesets sane we split the switch by drivers;
to keep the damn thing bisectable we do the following:
	1) rename the affected methods, add ones with correct
prototypes, make (few) callers handle both.  That's this changeset.
	2) for each driver convert to new methods.  *ALL* drivers
are converted in this series.
	3) kill the old (renamed) methods.

Note that it _is_ a flagday; all in-tree drivers are converted and by the
end of this series no trace of old methods remain.  The only reason why
we do that this way is to keep the damn thing bisectable and allow per-driver
debugging if anything goes wrong.

New methods:
	open(bdev, mode)
	release(disk, mode)
	ioctl(bdev, mode, cmd, arg)		/* Called without BKL */
	compat_ioctl(bdev, mode, cmd, arg)
	locked_ioctl(bdev, mode, cmd, arg)	/* Called with BKL, legacy */

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-21 07:47:32 -04:00
Al Viro 74f3c8aff3 [PATCH] switch scsi_cmd_ioctl() to passing fmode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-21 07:47:14 -04:00
Kiyoshi Ueda 8316982ac0 virtio_blk: change to use __blk_end_request()
This patch converts virtio_blk to use __blk_end_request() directly
so that end_{queued|dequeued}_request() can be removed.
Related 'uptodate' argument is converted to 'error'.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-10-09 08:56:20 +02:00
Fernando Luis Vázquez Cao 766ca4428d virtio_blk: use a wrapper function to access io context information of IO requests
struct request has an ioprio member but it is never updated because
currently bios do not hold io context information. The implication of
this is that virtio_blk ends up passing useless information to the
backend driver.

That said, some IO schedulers such as CFQ do store io context
information in struct request, but use private members for that, which
means that that information cannot be directly accessed in a IO
scheduler-independent way.

This patch adds a function to obtain the ioprio of a request. We should
avoid accessing ioprio directly and use this function instead, so that
its users do not have to care about future changes in block layer
structures or what the currently active IO controller is.

This patch does not introduce any functional changes but paves the way
for future clean-ups and enhancements.

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-10-09 08:56:02 +02:00
Christian Borntraeger 066f4d82a6 virtio_blk: check for hardsector size from host
Currently virtio_blk assumes a 512 byte hard sector size. This can cause
trouble / performance issues if the backing has a different block size
(like a file on an ext3 file system formatted with 4k block size or a dasd).

Lets add a feature flag that tells the guest to use a different hard sector
size than 512 byte.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-25 12:06:05 +10:00
Christian Borntraeger 3ef5360954 virtio_blk: allow read-only disks
Hello Rusty,

sometimes it is useful to share a disk (e.g. usr). To avoid file system
corruption, the disk should be mounted read-only in that case. This patch
adds a new feature flag, that allows the host to specify, if the disk should
be considered read-only.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-30 15:09:44 +10:00
Chris Lalancette ac9d463afb Fix crash in virtio_blk during modprobe ; rmmod ; modprobe
Fix a modprobe virtio_blk ; rmmod virtio_blk ; modprobe virtio_blk crash; this
was basically because we weren't doing "del_gendisk()" in the remove path.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (moved del_gendisk up)
2008-05-30 15:09:41 +10:00
Ryan Harper 48e4043d45 virtio: add virtio disk geometry feature
Rather than faking up some geometry, allow the backend to push the disk
geometry via virtio pci config option.  Keep the old geo code around for
compatibility.

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (modified to single struct)
2008-05-02 21:50:51 +10:00
Rusty Russell c45a6816c1 virtio: explicit advertisement of driver features
A recent proposed feature addition to the virtio block driver revealed
some flaws in the API: in particular, we assume that feature
negotiation is complete once a driver's probe function returns.

There is nothing in the API to require this, however, and even I
didn't notice when it was violated.

So instead, we require the driver to specify what features it supports
in a table, we can then move the feature negotiation into the virtio
core.  The intersection of device and driver features are presented in
a new 'features' bitmap in the struct virtio_device.

Note that this highlights the difference between Linux unsigned-long
bitmaps where each unsigned long is in native endian, and a
straight-forward little-endian array of bytes.

Drivers can still remove feature bits in their probe routine if they
really have to.

API changes:
- dev->config->feature() no longer gets and acks a feature.
- drivers should advertise their features in the 'feature_table' field
- use virtio_has_feature() for extra sanity when checking feature bits

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:50 +10:00
Rusty Russell 72e61eb40b virtio: change config to guest endian.
A recent proposed feature addition to the virtio block driver revealed
some flaws in the API, in particular how easy it is to break big
endian machines.

The virtio config space was originally chosen to be little-endian,
because we thought the config might be part of the PCI config space
for virtio_pci.  It's actually a separate mmio region, so that
argument holds little water; as only x86 is currently using the virtio
mechanism, we can change this (but must do so now, before the
impending s390 merge).

API changes:
- __virtio_config_val() just becomes a striaght vdev->config_get() call.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:50 +10:00
Marcelo Tosatti 2e895e4c23 virtio-blk: fix remove oops
Do not unregister the major at device remove, since there might be
another device instances around.

(qemu) pci_del 0 11
(qemu) ACPI: PCI interrupt for device 0000:00:0b.0 disabled
(qemu) pci_del 0 10
(qemu) ------------[ cut here ]------------
WARNING: at block/genhd.c:126 unregister_blkdev+0x74/0x9e()
ACPI: PCI interrupt for device 0000:00:0a.0 disabled

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:46 +10:00
Rusty Russell cb38fa23c1 virtio: de-structify virtio_block status byte
Ron Minnich points out that a struct containing a char is not always
sizeof(char); simplest to remove the structure to avoid confusion.

Cc: "ron minnich" <rminnich@gmail.com>

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-05-02 21:50:45 +10:00
Jeremy Katz c483934670 virtio: Fix sysfs bits to have proper block symlink
Fix up so that the virtio_blk devices in sysfs link correctly to their
block device.  This then allows them to be detected by hal, etc

Signed-off-by: Jeremy Katz <katzj@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-03-17 22:58:15 +11:00
Christian Borntraeger d50ed907dc virtio_blk: implement naming for vda-vdz,vdaa-vdzz,vdaaa-vdzzz
Am Freitag, 1. Februar 2008 schrieb Christian Borntraeger:
> Right. I will fix that with an additional patch.

This patch goes on top of the minor number patch. Please let me know if
you want a merged patch:

Currently virtio_blk creates the disk name combinging "vd"  with 'a'++.
This will give strange names after vdz. I have implemented names up to
vdzzz - inspired by the sd.c code. That should be sufficient for now.

There is one driver in the kernel (driver/s390/block/dasd_genhd.c) that
implements names from dasda-dasdzzzz allowing even more disks. Maybe
a janitor can come up with a common implementation usable for all kind
of block device drivers.

I have tested this patch with 100 disks - seems to work.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:11 +11:00
Christian Borntraeger 4f3bf19c6e virtio_blk: Dont waste major numbers
Rusty,

currently virtio_blk uses one major number per device. While this works
quite well on most systems it is wasteful and will exhaust major numbers
on larger installations.

This patch allocates a major number on init and will use 16 minor numbers
for each disk. That will allow ~64k virtio_blk disks.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:10 +11:00
Christian Borntraeger 135da0b037 virtio_blk: provide getgeo
Rusty,

I currently try to make my guest boot from an virtio root device
without having an external kernel. Some of the tools that I tried
expect HDIO_GETGEO to work. The most interesting value is likely
the geo.start value to get the offset of a partition. This value
is filled by block/ioctl.c if fops->getgeo is set. This patch also
fills in some standard values for heads, sectors and cylinders.

Makes sense?

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:09 +11:00
Rusty Russell 6e5aa7efb2 virtio: reset function
A reset function solves three problems:

1) It allows us to renegotiate features, eg. if we want to upgrade a
   guest driver without rebooting the guest.

2) It gives us a clean way of shutting down virtqueues: after a reset,
   we know that the buffers won't be used by the host, and

3) It helps the guest recover from messed-up drivers.

So we remove the ->shutdown hook, and the only way we now remove
feature bits is via reset.

We leave it to the driver to do the reset before it deletes queues:
the balloon driver, for example, needs to chat to the host in its
remove function.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:03 +11:00
Rusty Russell 18445c4d50 virtio: explicit enable_cb/disable_cb rather than callback return.
It seems that virtio_net wants to disable callbacks (interrupts) before
calling netif_rx_schedule(), so we can't use the return value to do so.

Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback
now returns void, rather than a boolean.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:49:58 +11:00
Rusty Russell a586d4f601 virtio: simplify config mechanism.
Previously we used a type/len pair within the config space, but this
seems overkill.  We now simply define a structure which represents the
layout in the config space: the config space can now only be extended
at the end.

The main driver-visible changes:
1) We indicate what fields are present with an explicit feature bit.
2) Virtqueues are explicitly numbered, and not in the config space.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:49:57 +11:00
Rusty Russell 74b2553f1d virtio: fix module/device unloading
The virtio code never hooked through the ->remove callback.  Although
noone supports device removal at the moment, this code is already
needed for module unloading.

This of course also revealed bugs in virtio_blk, virtio_net and lguest
unloading paths.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-11-19 11:20:42 +11:00
Jens Axboe 3d1266c704 SG: audit of drivers that use blk_rq_map_sg()
They need to properly init the sg table, or blk_rq_map_sg() will
complain if CONFIG_DEBUG_SG is set.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-24 13:21:21 +02:00
Rusty Russell e467cde238 Block driver using virtio.
The block driver uses scatter-gather lists with sg[0] being the
request information (struct virtio_blk_outhdr) with the type, sector
and inbuf id.  The next N sg entries are the bio itself, then the last
sg is the status byte.  Whether the N entries are in or out depends on
whether it's a read or a write.

We accept the normal (SCSI) ioctls: they get handed through to the other
side which can then handle it or reply that it's unsupported.  It's
not clear that this actually works in general, since I don't know
if blk_pc_request() requests have an accurate rq_data_dir().

Although we try to reply -ENOTTY on unsupported commands, ioctl(fd,
CDROMEJECT) returns success to userspace.  This needs a separate
patch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <jens.axboe@oracle.com>
2007-10-23 15:49:54 +10:00