aboutsummaryrefslogtreecommitdiff
path: root/block/qcow2-cache.c
AgeCommit message (Collapse)Author
2015-09-14qcow2: Rename BDRVQcowState to BDRVQcow2StateKevin Wolf
BDRVQcowState is already used by qcow1, and gdb is always confused which one to use. Rename the qcow2 one so they can be distinguished. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>
2015-09-04qcow2: reorder fields in Qcow2CachedTable to reduce paddingAlberto Garcia
Changing the current ordering saves 8 bytes per cache entry in x86_64. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 0bd55291211df3dfb514d0e7d2031dd5c4f9f807.1438690126.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-09-04qcow2: add option to clean unused cache entries after some timeAlberto Garcia
This adds a new 'cache-clean-interval' option that cleans all qcow2 cache entries that haven't been used in a certain interval, given in seconds. This allows setting a large L2 cache size so it can handle scenarios with lots of I/O and at the same time use little memory during periods of inactivity. This feature currently relies on MADV_DONTNEED to free that memory, so it is not useful in systems that don't follow that behavior. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: a70d12da60433df9360ada648b3f34b8f6f354ce.1438690126.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-09-04qcow2: mark the memory as no longer needed after qcow2_cache_empty()Alberto Garcia
After having emptied the cache, the data in the cache tables is no longer useful, so we can tell the kernel that we are done with it. In Linux this frees the resources associated with it. The effect of this can be seen in the HMP commit operation: it moves data from the top to the base image (and fills both caches), then it empties the top image. At this point the data in that cache is no longer needed so it's just wasting memory. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 08538b098e1faf6c92496477cf9b47a20e5aacea.1438690126.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-07-07qcow2: remove unnecessary checkAlberto Garcia
The value of 'i' is guaranteed to be >= 0 Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 1435824371-2660-1-git-send-email-berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-05-22qcow2: style fixes in qcow2-cache.cAlberto Garcia
Fix pointer declaration to make it consistent with the rest of the code. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-05-22qcow2: make qcow2_cache_put() a void functionAlberto Garcia
This function never receives an invalid table pointer, so we can make it void and remove all the error checking code. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-05-22qcow2: use a hash to look for entries in the L2 cacheAlberto Garcia
The current cache algorithm traverses the array starting always from the beginning, so the average number of comparisons needed to perform a lookup is proportional to the size of the array. By using a hash of the offset as the starting point, lookups are faster and independent from the array size. The hash is computed using the cluster number of the table, multiplied by 4 to make it perform better when there are collisions. In my tests, using a cache with 2048 entries, this reduces the average number of comparisons per lookup from 430 to 2.5. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-05-22qcow2: remove qcow2_cache_find_entry_to_replace()Alberto Garcia
A cache miss means that the whole array was traversed and the entry we were looking for was not found, so there's no need to traverse it again in order to select an entry to replace. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-05-22qcow2: use an LRU algorithm to replace entries from the L2 cacheAlberto Garcia
The current algorithm to evict entries from the cache gives always preference to those in the lowest positions. As the size of the cache increases, the chances of the later elements of being removed decrease exponentially. In a scenario with random I/O and lots of cache misses, entries in positions 8 and higher are rarely (if ever) evicted. This can be seen even with the default cache size, but with larger caches the problem becomes more obvious. Using an LRU algorithm makes the chances of being removed from the cache independent from the position. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-05-22qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty()Alberto Garcia
Since all tables are now stored together, it is possible to obtain the position of a particular table directly from its address, so the operation becomes O(1). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-05-22qcow2: use one single memory block for the L2/refcount cache tablesAlberto Garcia
The qcow2 L2/refcount cache contains one separate table for each cache entry. Doing one allocation per table adds unnecessary overhead and it also requires us to store the address of each table separately. Since the size of the cache is constant during its lifetime, it's better to have an array that contains all the tables using one single allocation. In my tests measuring freshly created caches with sizes 128MB (L2) and 32MB (refcount) this uses around 10MB of RAM less. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06block: Give always priority to unused entries in the qcow2 L2 cacheAlberto Garcia
The current algorithm to replace entries from the L2 cache gives priority to newer hits by dividing the hit count of all existing entries by two everytime there is a cache miss. However, if there are several cache misses the hit count of the existing entries can easily go down to 0. This will result in those entries being replaced even when there are others that have never been used. This problem is more noticeable with larger disk images and cache sizes, since the chances of having several misses before the cache is full are higher. If we make sure that the hit count can never go down to 0 again, unused entries will always have priority. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20qcow2: Use g_try_new0() for cache arrayMax Reitz
With a variable cache size, the number given to qcow2_cache_create() may be huge. Therefore, use g_try_new0(). While at it, use g_new0() instead of g_malloc0() for allocating the Qcow2Cache object. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-20block: Use g_new() & friends to avoid multiplying sizesMarkus Armbruster
g_new(T, n) is safer than g_malloc(sizeof(*v) * n) for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void *, which lets the compiler catch more type errors. Perhaps a conversion to g_malloc_n() would be neater in places, but that's merely four years old, and we can't use such newfangled stuff. This commit only touches allocations with size arguments of the form sizeof(T), plus two that use 4 instead of sizeof(uint32_t). We can make the others safe by converting to g_malloc_n() when it becomes available to us in a couple of years. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-08-15qcow2: Handle failure for potentially large allocationsKevin Wolf
Some code in the block layer makes potentially huge allocations. Failure is not completely unexpected there, so avoid aborting qemu and handle out-of-memory situations gracefully. This patch addresses the allocations in the qcow2 block driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-10-11qcow2: Use negated overflow check maskMax Reitz
In qcow2_check_metadata_overlap and qcow2_pre_write_overlap_check, change the parameter signifying the checks to perform from its current positive form to a negative one, i.e., it will no longer explicitly specify every check to perform but rather a mask of checks not to perform. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-12qcow2-cache: Empty cacheMax Reitz
Add a function for emptying a cache, i.e., flushing it and marking all elements invalid. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30qcow2: Employ metadata overlap checksMax Reitz
The pre-write overlap check function is now called before most of the qcow2 writes (aborting it on collision or other error). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-12-19block: move include files to include/block/Paolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-06-15qcow2: always operate caches in writeback modePaolo Bonzini
Writethrough does not need special-casing anymore in the qcow2 caches. The block layer adds flushes after every guest-initiated data write, and these will also flush the qcow2 caches to the OS. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-03-12qcow2: Add some tracingKevin Wolf
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2011-08-20Use glib memory allocation and free functionsAnthony Liguori
qemu_malloc/qemu_free no longer exist after this commit. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-07-19qcow2: Use Qcow2Cache in writeback mode during loadvm/savevmKevin Wolf
In snapshotting there is no guest involved, so we can safely use a writeback mode and do the flushes in the right place (i.e. at the very end). This improves the time that creating/restoring an internal snapshot takes with an image in writethrough mode. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-31Reorganize struct Qcow2Cache for better struct packingJes Sorensen
Move size after the two pointers in struct Qcow2Cache to get better packing of struct elements on 64 bit architectures. Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-24qcow2: Batch flushes for COWKevin Wolf
qcow2 calls bdrv_flush() after performing COW in order to ensure that the L2 table change is never written before the copy is safe on disk. Now that the L2 table is cached, we can wait with flushing until we write out the next L2 table. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-24qcow2: Use QcowCacheKevin Wolf
Use the new functions of qcow2-cache.c for everything that works on refcount block and L2 tables. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-01-24qcow2: Add QcowCacheKevin Wolf
This adds some new cache functions to qcow2 which can be used for caching refcount blocks and L2 tables. When used with cache=writethrough they work like the old caching code which is spread all over qcow2, so for this case we have merely a cleanup. The interesting case is with writeback caching (this includes cache=none) where data isn't written to disk immediately but only kept in cache initially. This leads to some form of metadata write batching which avoids the current "write to refcount block, flush, write to L2 table" pattern for each single request when a lot of cluster allocations happen. Instead, cache entries are only written out if its required to maintain the right order. In the pure cluster allocation case this means that all metadata updates for requests are done in memory initially and on sync, first the refcount blocks are written to disk, then fsync, then L2 tables. This improves performance of scenarios with lots of cluster allocations noticably (e.g. installation or after taking a snapshot). Signed-off-by: Kevin Wolf <kwolf@redhat.com>