openkylin/qemu - qemu - 红山开源项目托管

Commit Graph

Author	SHA1	Message	Date
Kevin Wolf	ad02b7af0c	file-posix: Remove unnecessary includes Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-04-27 15:39:49 +02:00
Markus Armbruster	129c7d1c53	block: Document -drive problematic code and bugs -blockdev and blockdev_add convert their arguments via QObject to BlockdevOptions for qmp_blockdev_add(), which converts them back to QObject, then to a flattened QDict. The QDict's members are typed according to the QAPI schema. -drive converts its argument via QemuOpts to a (flat) QDict. This QDict's members are all QString. Thus, the QType of a flat QDict member depends on whether it comes from -drive or -blockdev/blockdev_add, except when the QAPI type maps to QString, which is the case for 'str' and enumeration types. The block layer core extracts generic configuration from the flat QDict, and the block driver extracts driver-specific configuration. Both commonly do so by converting (parts of) the flat QDict to QemuOpts, which turns all values into strings. Not exactly elegant, but correct. However, A few places access the flat QDict directly: * Most of them access members that are always QString. Correct. * bdrv_open_inherit() accesses a boolean, carefully. Correct. * nfs_config() uses a QObject input visitor. Correct only because the visited type contains nothing but QStrings. * nbd_config() and ssh_config() use a QObject input visitor, and the visited types contain non-QStrings: InetSocketAddress members @numeric, @to, @ipv4, @ipv6. -drive works as long as you don't try to use them (they're all optional). @to is ignored anyway. Reproducer: -drive driver=ssh,server.host=h,server.port=22,server.ipv4,path=p -drive driver=nbd,server.type=inet,server.data.host=h,server.data.port=22,server.data.ipv4 both fail with "Invalid parameter type for 'data.ipv4', expected: boolean" Add suitable comments to all these places. Mark the buggy ones FIXME. "Fortunately", -drive's driver-specific options are entirely undocumented. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 1490895797-29094-5-git-send-email-armbru@redhat.com [mreitz: Fixed two typos] Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-04-03 17:11:39 +02:00
Peter Maydell	700f9ce0f9	block/file-posix.c: Fix unused variable warning on OpenBSD On OpenBSD none of the ioctls probe_logical_blocksize() tries exist, so the variable sector_size is unused. Refactor the code to avoid this (and reduce the duplicated code). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1490279788-12995-1-git-send-email-peter.maydell@linaro.org Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-03-27 17:28:34 +02:00
Kevin Wolf	e5bcf967fb	file-posix: Make bdrv_flush() failure permanent without O_DIRECT Success for bdrv_flush() means that all previously written data is safe on disk. For fdatasync(), the best semantics we can hope for on Linux (without O_DIRECT) is that all data that was written since the last call was successfully written back. Therefore, and because we can't redo all writes after a flush failure, we have to give up after a single fdatasync() failure. After this failure, we would never be able to make the promise that a successful bdrv_flush() makes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20170322210005.16533-1-kwolf@redhat.com Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-03-27 16:53:42 +02:00
Fam Zheng	fed414df9d	file-posix: Don't leak fd in hdev_get_max_segments This fixes a leaked fd introduced in commit `9103f1ce`. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-17 12:54:06 +01:00
Stefan Hajnoczi	6958349085	file-posix: clean up max_segments buffer termination The following pattern is unsafe: char buf[32]; ret = read(fd, buf, sizeof(buf)); ... buf[ret] = 0; If read(2) returns 32 then a byte beyond the end of the buffer is zeroed. In practice this buffer overflow does not occur because the sysfs max_segments file only contains an unsigned short + '\n'. The string is always shorter than 32 bytes. Regardless, avoid this pattern because static analysis tools might complain and it could lead to real buffer overflows if copy-pasted elsewhere in the codebase. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-17 12:54:06 +01:00
Fam Zheng	9103f1ceb4	file-posix: Consider max_segments for BlockLimits.max_transfer BlockLimits.max_transfer can be too high without this fix, guest will encounter I/O error or even get paused with werror=stop or rerror=stop. The cause is explained below. Linux has a separate limit, /sys/block/.../queue/max_segments, which in the worst case can be more restrictive than the BLKSECTGET which we already consider (note that they are two different things). So, the failure scenario before this patch is: 1) host device has max_sectors_kb = 4096 and max_segments = 64; 2) guest learns max_sectors_kb limit from QEMU, but doesn't know max_segments; 3) guest issues e.g. a 512KB request thinking it's okay, but actually it's not, because it will be passed through to host device as an SG_IO req that has niov > 64; 4) host kernel doesn't like the segmenting of the request, and returns -EINVAL; This patch checks the max_segments sysfs entry for the host device and calculates a "conservative" bytes limit using the page size, which is then merged into the existing max_transfer limit. Guest will discover this from the usual virtual block device interfaces. (In the case of scsi-generic, it will be done in the INQUIRY reply interception in device model.) The other possibility is to actually propagate it as a separate limit, but it's not better. On the one hand, there is a big complication: the limit is per-LUN in QEMU PoV (because we can attach LUNs from different host HBAs to the same virtio-scsi bus), but the channel to communicate it in a per-LUN manner is missing down the stack; on the other hand, two limits versus one doesn't change much about the valid size of I/O (because guest has no control over host segmenting). Also, the idea to fall back to bounce buffering in QEMU, upon -EINVAL, was explored. Unfortunately there is no neat way to ensure the bounce buffer is less segmented (in terms of DMA addr) than the guest buffer. Practically, this bug is not very common. It is only reported on a Emulex (lpfc), so it's okay to get it fixed in the easier way. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-13 12:49:33 +01:00
Nir Soffer	c6ccc2c5e6	qemu-img: Improve documentation for PREALLOC_MODE_FALLOC Now that we are truncating the file in both PREALLOC_MODE_FULL and PREALLOC_MODE_OFF, not truncating in PREALLOC_MODE_FALLOC looks odd. Add a comment explaining why we do not truncate in this case. Signed-off-by: Nir Soffer <nirsof@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-02-24 16:09:23 +01:00
Nir Soffer	5a1dad9d5a	qemu-img: Truncate before full preallocation In a previous commit (qemu-img: Do not truncate before preallocation) we moved truncate to the PREALLOC_MODE_OFF branch to avoid slowdown in posix_fallocate(). However this change is not optimal when using PREALLOC_MODE_FULL, since knowing the final size from the beginning could allow the file system driver to do less allocations and possibly avoid fragmentation of the file. Now we truncate also before doing full preallocation. Signed-off-by: Nir Soffer <nirsof@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-02-24 16:09:23 +01:00
Nir Soffer	f6a7240442	qemu-img: Do not truncate before preallocation When using file system that does not support fallocate() (e.g. NFS < 4.2), truncating the file only when preallocation=OFF speeds up creating raw file. Here is example run, tested on Fedora 24 machine, creating raw file on NFS version 3 server. $ time ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 1g Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc real 0m21.185s user 0m0.022s sys 0m0.574s $ time ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 1g Formatting 'mnt/test', fmt=raw size=1073741824 preallocation=falloc real 0m11.601s user 0m0.016s sys 0m0.525s $ time dd if=/dev/zero of=mnt/test bs=1M count=1024 oflag=direct 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.6627 s, 68.6 MB/s real 0m16.104s user 0m0.009s sys 0m0.220s Running with strace we can see that without this change we do one pread() and one pwrite() for each block. With this change, we do only one pwrite() per block. $ strace ./qemu-img-master create -f raw -o preallocation=falloc mnt/test 8192 ... pread64(9, "\0", 1, 4095) = 1 pwrite64(9, "\0", 1, 4095) = 1 pread64(9, "\0", 1, 8191) = 1 pwrite64(9, "\0", 1, 8191) = 1 $ strace ./qemu-img-fix create -f raw -o preallocation=falloc mnt/test 8192 ... pwrite64(9, "\0", 1, 4095) = 1 pwrite64(9, "\0", 1, 8191) = 1 This happens because posix_fallocate is checking if each block is allocated before writing a byte to the block, and when truncating the file before preallocation, all blocks are unallocated. Signed-off-by: Nir Soffer <nirsof@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-02-24 16:09:22 +01:00
Eric Farman	c4c41a0a65	block: get max_transfer limit for char (scsi-generic) devices We can get the maximum number of bytes for a single I/O transfer from the BLKSECTGET ioctl, but we only perform this for block devices. scsi-generic devices are represented as character devices, and so do not issue this today. Update this, so that virtio-scsi devices using the scsi-generic interface can return the same data. Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Message-Id: <20170120162527.66075-4-farman@linux.vnet.ibm.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-27 18:07:31 +01:00
Eric Farman	482652502e	block: Fix target variable of BLKSECTGET ioctl Commit `6f6071745b` ("raw-posix: Fetch max sectors for host block device") introduced a routine to call the kernel BLKSECTGET ioctl, which stores the result back to user space. However, the size of the data returned depends on the routine handling the ioctl. The (compat_)blkdev_ioctl returns a short, while sg_ioctl returns an int. Thus, on big-endian systems, we can find ourselves accidentally shifting the result to a much larger value. (On s390x, a short is 16 bits while an int is 32 bits.) Also, the two ioctl handlers return values in different scales (block returns sectors, while sg returns bytes), so some tweaking of the outputs is required such that hdev_get_max_transfer_length returns a value in a consistent set of units. Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Message-Id: <20170120162527.66075-3-farman@linux.vnet.ibm.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-27 18:07:31 +01:00
Eric Blake	c1bb86cd8a	block: Rename raw-{posix,win32} to file-*.c These files deal with the file protocol, not the raw format (the file protocol is often used with other formats, and the raw format is not forced to use the file protocol). Rename things to make it a bit easier to follow. Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-01-09 13:30:53 +01:00

1 2

63 Commits