KVM: Dynamic sized kvm memslots array

Zhiyi reported an infinite loop issue in VFIO use case. The cause of that was a separate discussion, however during that I found a regression of dirty sync slowness when profiling. Each KVMMemoryListerner maintains an array of kvm memslots. Currently it's statically allocated to be the max supported by the kernel. However after Linux commit 4fc096a99e ("KVM: Raise the maximum number of user memslots"), the max supported memslots reported now grows to some number large enough so that it may not be wise to always statically allocate with the max reported. What's worse, QEMU kvm code still walks all the allocated memslots entries to do any form of lookups. It can drastically slow down all memslot operations because each of such loop can run over 32K times on the new kernels. Fix this issue by making the memslots to be allocated dynamically. Here the initial size was set to 16 because it should cover the basic VM usages, so that the hope is the majority VM use case may not even need to grow at all (e.g. if one starts a VM with ./qemu-system-x86_64 by default it'll consume 9 memslots), however not too large to waste memory. There can also be even better way to address this, but so far this is the simplest and should be already better even than before we grow the max supported memslots. For example, in the case of above issue when VFIO was attached on a 32GB system, there are only ~10 memslots used. So it could be good enough as of now. In the above VFIO context, measurement shows that the precopy dirty sync shrinked from ~86ms to ~3ms after this patch applied. It should also apply to any KVM enabled VM even without VFIO. NOTE: we don't have a FIXES tag for this patch because there's no real commit that regressed this in QEMU. Such behavior existed for a long time, but only start to be a problem when the kernel reports very large nr_slots_max value. However that's pretty common now (the kernel change was merged in 2021) so we attached cc:stable because we'll want this change to be backported to stable branches. Cc: qemu-stable <qemu-stable@nongnu.org> Reported-by: Zhiyi Guo <zhguo@redhat.com> Tested-by: Zhiyi Guo <zhguo@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240917163835.194664-2-peterx@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
author: Peter Xu <peterx@redhat.com> 2024-09-17 12:38:32 -0400
committer: Paolo Bonzini <pbonzini@redhat.com> 2024-10-17 19:41:30 +0200
commit: 5504a8126115d173687b37e657312a8ffe29fc0c (patch)
tree: af99d54cad95398d785fae01578b3ae5da592c24 /accel/kvm/trace-events
parent: ac92afd19e4017b6973f06a760b9c61ff9fc63c4 (diff)
1 files changed, 1 insertions, 0 deletions
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index 82c65fd2ab..e43d18a869 100644
--- a/accel/kvm/trace-events
+++ b/accel/kvm/trace-events
@@ -36,3 +36,4 @@ kvm_io_window_exit(void) ""
 kvm_run_exit_system_event(int cpu_index, uint32_t event_type) "cpu_index %d, system_even_type %"PRIu32
 kvm_convert_memory(uint64_t start, uint64_t size, const char *msg) "start 0x%" PRIx64 " size 0x%" PRIx64 " %s"
 kvm_memory_fault(uint64_t start, uint64_t size, uint64_t flags) "start 0x%" PRIx64 " size 0x%" PRIx64 " flags 0x%" PRIx64
+kvm_slots_grow(unsigned int old, unsigned int new) "%u -> %u"
author	Peter Xu <peterx@redhat.com>	2024-09-17 12:38:32 -0400
committer	Paolo Bonzini <pbonzini@redhat.com>	2024-10-17 19:41:30 +0200
commit	5504a8126115d173687b37e657312a8ffe29fc0c (patch)
tree	af99d54cad95398d785fae01578b3ae5da592c24 /accel/kvm/trace-events
parent	ac92afd19e4017b6973f06a760b9c61ff9fc63c4 (diff)