openkylin/qemu - qemu - 红山开源项目托管

Commit Graph

Author	SHA1	Message	Date
Alberto Garcia	03019d7314	qcow2: Add table size field to Qcow2Cache The table size in the qcow2 cache is currently equal to the cluster size. This doesn't allow us to use the cache memory efficiently, particularly with large cluster sizes, so we need to be able to have smaller cache tables that are independent from the cluster size. This patch adds a new field to Qcow2Cache that we can use instead of the cluster size. The current table size is still being initialized to the cluster size, so there are no semantic changes yet, but this patch will allow us to prepare the rest of the code and simplify a few function calls. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 67a1bf9e55f417005c567bead95a018dc34bc687.1517840876.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2018-02-13 16:59:58 +01:00
Max Reitz	08546bcfb2	qcow2: Fix overly broad madvise() @mem_size and @offset are both size_t, thus subtracting them from one another will just return a big size_t if mem_size < offset -- even more obvious here because the result is stored in another size_t. Checking that result to be positive is therefore not sufficient to exclude the case that offset > mem_size. Thus, we currently sometimes issue an madvise() over a very large address range. This is triggered by iotest 163, but with -m64, this does not result in tangible problems. But with -m32, this test produces three segfaults, all of which are fixed by this patch. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171114184127.24238-1-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-17 18:21:31 +01:00
Max Reitz	4efb1f7c61	qcow2: Refuse to get unaligned offsets from cache Instead of using an assertion, it is better to emit a corruption event here. Checking all offsets for correct alignment can be tedious and it is easily possible to forget to do so. qcow2_cache_do_get() is a function every L2 and refblock access has to go through, so this is a good central point to add such a check. And for good measure, let us also add an assertion that the offset is non-zero. Making this a corruption event is not feasible, because a zero offset usually means something special (such as the cluster is unused), so all callers should be checking this anyway. If they do not, it is their fault, hence the assertion here. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171110203111.7666-6-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-17 18:21:31 +01:00
Pavel Butsykin	f71c08ea8e	qcow2: add qcow2_cache_discard Whenever l2/refcount table clusters are discarded from the file we can automatically drop unnecessary content of the cache tables. This reduces the chance of eviction useful cache data and eliminates inconsistent data in the cache with the data in the file. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20170918124230.8152-3-pbutsykin@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-09-26 15:00:32 +02:00
Alberto Garcia	a8b99dd516	qcow2: Remove stale comment We haven't been using CONFIG_MADVISE since `02d0e09503` Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-25 13:51:30 +01:00
Alberto Garcia	2f2c8d6b37	qcow2: Make qcow2_cache_table_release() work only in Linux We are using QEMU_MADV_DONTNEED to discard the memory of individual L2 cache tables. The problem with this is that those semantics are specific to the Linux madvise() system call. Other implementations of madvise() (including the very Linux implementation of posix_madvise()) don't do that, so we cannot use them for the same purpose. This patch makes the code Linux-specific and uses madvise() directly since there's no point in going through qemu_madvise() for this. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-11-25 13:51:30 +01:00
Kevin Wolf	d9ca2ea2e2	block: Convert bdrv_pwrite(v/_sync) to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Kevin Wolf	cf2ab8fc34	block: Convert bdrv_pread(v) to BdrvChild Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-07-05 16:46:27 +02:00
Paolo Bonzini	02d0e09503	os-posix: include sys/mman.h qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check is bogus without a previous inclusion of sys/mman.h. Include it in sysemu/os-posix.h and remove it from everywhere else. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-16 18:39:03 +02:00
Denis V. Lunev	f3c3b87dae	qcow2: avoid extra flushes in qcow2 The problem with excessive flushing was found by a couple of performance tests: - parallel directory tree creation (from 2 processes) - 32 cached writes + fsync at the end in a loop For the first one results improved from 2.6 loops/sec to 3.5 loops/sec. Each loop creates 10^3 directories with 10 files in each. For the second one results improved from ~600 fsync/sec to ~1100 fsync/sec. Though, it was run on SSD so it probably won't show such performance gain on rotational media. qcow2_cache_flush() calls bdrv_flush() unconditionally after writing cache entries of a particular cache. This can lead to as many as 2 additional fdatasyncs inside bdrv_flush. We can simply skip all fdatasync calls inside qcow2_co_flush_to_os as bdrv_flush for sure will do the job. These flushes are necessary to keep the right order of writes to the different caches. Though this is not necessary in the current code base as this ordering is ensured through the flush in qcow2_cache_flush_dependency(). Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Pavel Borzenkov <pborzenkov@virtuozzo.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:09 +02:00
Peter Maydell	80c71a241a	block: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-01-20 13:36:23 +01:00
Kevin Wolf	9a4f4c3156	block: Convert bs->file to BdrvChild This patch removes the temporary duplication between bs->file and bs->file_child by converting everything to BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-10-16 15:34:29 +02:00
Kevin Wolf	ff99129ab8	qcow2: Rename BDRVQcowState to BDRVQcow2State BDRVQcowState is already used by qcow1, and gdb is always confused which one to use. Rename the qcow2 one so they can be distinguished. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>	2015-09-14 16:51:36 +02:00
Alberto Garcia	909c260c71	qcow2: reorder fields in Qcow2CachedTable to reduce padding Changing the current ordering saves 8 bytes per cache entry in x86_64. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 0bd55291211df3dfb514d0e7d2031dd5c4f9f807.1438690126.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-09-04 21:00:32 +02:00
Alberto Garcia	279621c046	qcow2: add option to clean unused cache entries after some time This adds a new 'cache-clean-interval' option that cleans all qcow2 cache entries that haven't been used in a certain interval, given in seconds. This allows setting a large L2 cache size so it can handle scenarios with lots of I/O and at the same time use little memory during periods of inactivity. This feature currently relies on MADV_DONTNEED to free that memory, so it is not useful in systems that don't follow that behavior. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: a70d12da60433df9360ada648b3f34b8f6f354ce.1438690126.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-09-04 21:00:32 +02:00
Alberto Garcia	355ee2d0e8	qcow2: mark the memory as no longer needed after qcow2_cache_empty() After having emptied the cache, the data in the cache tables is no longer useful, so we can tell the kernel that we are done with it. In Linux this frees the resources associated with it. The effect of this can be seen in the HMP commit operation: it moves data from the top to the base image (and fills both caches), then it empties the top image. At this point the data in that cache is no longer needed so it's just wasting memory. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 08538b098e1faf6c92496477cf9b47a20e5aacea.1438690126.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2015-09-04 21:00:32 +02:00
Alberto Garcia	1bd84ee717	qcow2: remove unnecessary check The value of 'i' is guaranteed to be >= 0 Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 1435824371-2660-1-git-send-email-berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2015-07-07 10:31:04 +01:00
Alberto Garcia	d1b4efe5c4	qcow2: style fixes in qcow2-cache.c Fix pointer declaration to make it consistent with the rest of the code. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Alberto Garcia	a3f1afb43a	qcow2: make qcow2_cache_put() a void function This function never receives an invalid table pointer, so we can make it void and remove all the error checking code. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Alberto Garcia	812e4082ca	qcow2: use a hash to look for entries in the L2 cache The current cache algorithm traverses the array starting always from the beginning, so the average number of comparisons needed to perform a lookup is proportional to the size of the array. By using a hash of the offset as the starting point, lookups are faster and independent from the array size. The hash is computed using the cluster number of the table, multiplied by 4 to make it perform better when there are collisions. In my tests, using a cache with 2048 entries, this reduces the average number of comparisons per lookup from 430 to 2.5. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Alberto Garcia	fdfbca82a0	qcow2: remove qcow2_cache_find_entry_to_replace() A cache miss means that the whole array was traversed and the entry we were looking for was not found, so there's no need to traverse it again in order to select an entry to replace. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Alberto Garcia	2693310ecc	qcow2: use an LRU algorithm to replace entries from the L2 cache The current algorithm to evict entries from the cache gives always preference to those in the lowest positions. As the size of the cache increases, the chances of the later elements of being removed decrease exponentially. In a scenario with random I/O and lots of cache misses, entries in positions 8 and higher are rarely (if ever) evicted. This can be seen even with the default cache size, but with larger caches the problem becomes more obvious. Using an LRU algorithm makes the chances of being removed from the cache independent from the position. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Alberto Garcia	baf07d60f5	qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty() Since all tables are now stored together, it is possible to obtain the position of a particular table directly from its address, so the operation becomes O(1). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Alberto Garcia	72e80b8901	qcow2: use one single memory block for the L2/refcount cache tables The qcow2 L2/refcount cache contains one separate table for each cache entry. Doing one allocation per table adds unnecessary overhead and it also requires us to store the address of each table separately. Since the size of the cache is constant during its lifetime, it's better to have an array that contains all the tables using one single allocation. In my tests measuring freshly created caches with sizes 128MB (L2) and 32MB (refcount) this uses around 10MB of RAM less. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-05-22 17:08:01 +02:00
Alberto Garcia	8e8cb375e0	block: Give always priority to unused entries in the qcow2 L2 cache The current algorithm to replace entries from the L2 cache gives priority to newer hits by dividing the hit count of all existing entries by two everytime there is a cache miss. However, if there are several cache misses the hit count of the existing entries can easily go down to 0. This will result in those entries being replaced even when there are others that have never been used. This problem is more noticeable with larger disk images and cache sizes, since the chances of having several misses before the cache is full are higher. If we make sure that the hit count can never go down to 0 again, unused entries will always have priority. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2015-02-06 17:24:22 +01:00
Max Reitz	02004bd4ba	qcow2: Use g_try_new0() for cache array With a variable cache size, the number given to qcow2_cache_create() may be huge. Therefore, use g_try_new0(). While at it, use g_new0() instead of g_malloc0() for allocating the Qcow2Cache object. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 11:51:28 +02:00
Markus Armbruster	02c4f26b15	block: Use g_new() & friends to avoid multiplying sizes g_new(T, n) is safer than g_malloc(sizeof(v) n) for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void *, which lets the compiler catch more type errors. Perhaps a conversion to g_malloc_n() would be neater in places, but that's merely four years old, and we can't use such newfangled stuff. This commit only touches allocations with size arguments of the form sizeof(T), plus two that use 4 instead of sizeof(uint32_t). We can make the others safe by converting to g_malloc_n() when it becomes available to us in a couple of years. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2014-08-20 11:51:28 +02:00
Kevin Wolf	de82815db1	qcow2: Handle failure for potentially large allocations Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the qcow2 block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2014-08-15 15:07:15 +02:00
Max Reitz	231bb26764	qcow2: Use negated overflow check mask In qcow2_check_metadata_overlap and qcow2_pre_write_overlap_check, change the parameter signifying the checks to perform from its current positive form to a negative one, i.e., it will no longer explicitly specify every check to perform but rather a mask of checks not to perform. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-10-11 16:50:00 +02:00
Max Reitz	e7108feaac	qcow2-cache: Empty cache Add a function for emptying a cache, i.e., flushing it and marking all elements invalid. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-09-12 10:12:46 +02:00
Max Reitz	cf93980e77	qcow2: Employ metadata overlap checks The pre-write overlap check function is now called before most of the qcow2 writes (aborting it on collision or other error). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2013-08-30 15:48:43 +02:00
Paolo Bonzini	737e150e89	block: move include files to include/block/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:31:31 +01:00
Paolo Bonzini	6af4e9ead4	qcow2: always operate caches in writeback mode Writethrough does not need special-casing anymore in the qcow2 caches. The block layer adds flushes after every guest-initiated data write, and these will also flush the qcow2 caches to the OS. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2012-06-15 14:03:43 +02:00
Kevin Wolf	3cce16f44d	qcow2: Add some tracing Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>	2012-03-12 15:14:06 +01:00
Anthony Liguori	7267c0947d	Use glib memory allocation and free functions qemu_malloc/qemu_free no longer exist after this commit. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2011-08-20 23:01:08 -05:00
Kevin Wolf	93913dfd8a	qcow2: Use Qcow2Cache in writeback mode during loadvm/savevm In snapshotting there is no guest involved, so we can safely use a writeback mode and do the flushes in the right place (i.e. at the very end). This improves the time that creating/restoring an internal snapshot takes with an image in writethrough mode. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-07-19 15:39:22 +02:00
Jes Sorensen	bf595021c7	Reorganize struct Qcow2Cache for better struct packing Move size after the two pointers in struct Qcow2Cache to get better packing of struct elements on 64 bit architectures. Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-01-31 10:03:00 +01:00
Kevin Wolf	3de0a2944b	qcow2: Batch flushes for COW qcow2 calls bdrv_flush() after performing COW in order to ensure that the L2 table change is never written before the copy is safe on disk. Now that the L2 table is cached, we can wait with flushing until we write out the next L2 table. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-01-24 16:41:49 +01:00
Kevin Wolf	29c1a7301a	qcow2: Use QcowCache Use the new functions of qcow2-cache.c for everything that works on refcount block and L2 tables. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-01-24 16:41:49 +01:00
Kevin Wolf	493810940b	qcow2: Add QcowCache This adds some new cache functions to qcow2 which can be used for caching refcount blocks and L2 tables. When used with cache=writethrough they work like the old caching code which is spread all over qcow2, so for this case we have merely a cleanup. The interesting case is with writeback caching (this includes cache=none) where data isn't written to disk immediately but only kept in cache initially. This leads to some form of metadata write batching which avoids the current "write to refcount block, flush, write to L2 table" pattern for each single request when a lot of cluster allocations happen. Instead, cache entries are only written out if its required to maintain the right order. In the pure cluster allocation case this means that all metadata updates for requests are done in memory initially and on sync, first the refcount blocks are written to disk, then fsync, then L2 tables. This improves performance of scenarios with lots of cluster allocations noticably (e.g. installation or after taking a snapshot). Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2011-01-24 11:08:51 +01:00

40 Commits