aboutsummaryrefslogtreecommitdiff
path: root/accel/kvm
AgeCommit message (Collapse)Author
2023-08-24kvm: Introduce kvm_arch_get_default_type hookAkihiko Odaki
kvm_arch_get_default_type() returns the default KVM type. This hook is particularly useful to derive a KVM type that is valid for "none" machine model, which is used by libvirt to probe the availability of KVM. For MIPS, the existing mips_kvm_type() is reused. This function ensures the availability of VZ which is mandatory to use KVM on the current QEMU. Cc: qemu-stable@nongnu.org Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-2-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> [PMM: added doc comment for new function] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> (cherry picked from commit 5e0d65909c6f335d578b90491e165440c99adf81) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-07-31kvm: Fix crash due to access uninitialized kvm_stateGavin Shan
Runs into core dump on arm64 and the backtrace extracted from the core dump is shown as below. It's caused by accessing uninitialized @kvm_state in kvm_flush_coalesced_mmio_buffer() due to commit 176d073029 ("hw/arm/virt: Use machine_memory_devices_init()"), where the machine's memory region is added earlier than before. main qemu_init configure_accelerators qemu_opts_foreach do_configure_accelerator accel_init_machine kvm_init virt_kvm_type virt_set_memmap machine_memory_devices_init memory_region_add_subregion memory_region_add_subregion_common memory_region_update_container_subregions memory_region_transaction_begin qemu_flush_coalesced_mmio_buffer kvm_flush_coalesced_mmio_buffer Fix it by bailing early in kvm_flush_coalesced_mmio_buffer() on the uninitialized @kvm_state. With this applied, no crash is observed on arm64. Fixes: 176d073029 ("hw/arm/virt: Use machine_memory_devices_init()") Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20230731125946.2038742-1-gshan@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-06-28exec/memory: Add symbol for the min value of memory listener priorityIsaku Yamahata
Add MEMORY_LISTNER_PRIORITY_MIN for the symbolic value for the min value of the memory listener instead of the hard-coded magic value 0. Add explicit initialization. No functional change intended. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <29f88477fe82eb774bcfcae7f65ea21995f865f2.1687279702.git.isaku.yamahata@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-06-28exec/memory: Add symbol for memory listener priority for device backendIsaku Yamahata
Add MEMORY_LISTENER_PRIORITY_DEV_BACKEND for the symbolic value for memory listener to replace the hard-coded value 10 for the device backend. No functional change intended. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <8314d91688030d7004e96958f12e2c83fb889245.1687279702.git.isaku.yamahata@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-06-28exec/memory: Add symbolic value for memory listener priority for accelIsaku Yamahata
Add MEMORY_LISTNER_PRIORITY_ACCEL for the symbolic value for the memory listener to replace the hard-coded value 10 for accel. No functional change intended. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <feebe423becc6e2aa375f59f6abce9a85bc15abb.1687279702.git.isaku.yamahata@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-06-26kvm: reuse per-vcpu stats fd to avoid vcpu interruptionMarcelo Tosatti
A regression has been detected in latency testing of KVM guests. More specifically, it was observed that the cyclictest numbers inside of an isolated vcpu (running on isolated pcpu) are: Where a maximum of 50us is acceptable. The implementation of KVM_GET_STATS_FD uses run_on_cpu to query per vcpu statistics, which interrupts the vcpu (and is unnecessary). To fix this, open the per vcpu stats fd on vcpu initialization, and read from that fd from QEMU's main thread. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-18kvm: Enable dirty ring for arm64Gavin Shan
arm64 has different capability from x86 to enable the dirty ring, which is KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Besides, arm64 also needs the backup bitmap extension (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) when 'kvm-arm-gicv3' or 'arm-its-kvm' device is enabled. Here the extension is always enabled and the unnecessary overhead to do the last stage of dirty log synchronization when those two devices aren't used is introduced, but the overhead should be very small and acceptable. The benefit is cover future cases where those two devices are used without modifying the code. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20230509022122.20888-5-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-18kvm: Add helper kvm_dirty_ring_init()Gavin Shan
Due to multiple capabilities associated with the dirty ring for different architectures: KVM_CAP_DIRTY_{LOG_RING, LOG_RING_ACQ_REL} for x86 and arm64 separately. There will be more to be done in order to support the dirty ring for arm64. Lets add helper kvm_dirty_ring_init() to enable the dirty ring. With this, the code looks a bit clean. No functional change intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Message-Id: <20230509022122.20888-4-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-18kvm: Synchronize the backup bitmap in the last stageGavin Shan
In the last stage of live migration or memory slot removal, the backup bitmap needs to be synchronized when it has been enabled. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Message-Id: <20230509022122.20888-3-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-18migration: Add last stage indicator to global dirty logGavin Shan
The global dirty log synchronization is used when KVM and dirty ring are enabled. There is a particularity for ARM64 where the backup bitmap is used to track dirty pages in non-running-vcpu situations. It means the dirty ring works with the combination of ring buffer and backup bitmap. The dirty bits in the backup bitmap needs to collected in the last stage of live migration. In order to identify the last stage of live migration and pass it down, an extra parameter is added to the relevant functions and callbacks. This last stage indicator isn't used until the dirty ring is enabled in the subsequent patches. No functional change intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Message-Id: <20230509022122.20888-2-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-04kvm: dirty-ring: Fix race with vcpu creationPeter Xu
It's possible that we want to reap a dirty ring on a vcpu that is during creation, because the vcpu is put onto list (CPU_FOREACH visible) before initialization of the structures. In this case: qemu_init_vcpu x86_cpu_realizefn cpu_exec_realizefn cpu_list_add <---- can be probed by CPU_FOREACH qemu_init_vcpu cpus_accel->create_vcpu_thread(cpu); kvm_init_vcpu map kvm_dirty_gfns <--- kvm_dirty_gfns valid Don't try to reap dirty ring on vcpus during creation or it'll crash. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2124756 Reported-by: Xiaohui Li <xiaohli@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1d14deb6684bcb7de1c9633c5bd21113988cc698.1676563222.git.huangy81@chinatelecom.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-07gdbstub: move update guest debug to accel opsMads Ynddal
Continuing the refactor of a48e7d9e52 (gdbstub: move guest debug support check to ops) by removing hardcoded kvm_enabled() from generic cpu.c code, and replace it with a property of AccelOpsClass. Signed-off-by: Mads Ynddal <m.ynddal@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230207131721.49233-1-mads@ynddal.dk> [AJB: add ifdef around update_guest_debug_ops, fix brace] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20230302190846.2593720-27-alex.bennee@linaro.org> Message-Id: <20230303025805.625589-30-richard.henderson@linaro.org>
2023-03-01kvm/i386: Add xen-evtchn-max-pirq propertyDavid Woodhouse
The default number of PIRQs is set to 256 to avoid issues with 32-bit MSI devices. Allow it to be increased if the user desires. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01kvm/i386: Add xen-gnttab-max-frames propertyDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01i386/kvm: Add xen-version KVM accelerator property and init KVM Xen supportDavid Woodhouse
This just initializes the basic Xen support in KVM for now. Only permitted on TYPE_PC_MACHINE because that's where the sysbus devices for Xen heap overlay, event channel, grant tables and other stuff will exist. There's no point having the basic hypercall support if nothing else works. Provide sysemu/kvm_xen.h and a kvm_xen_get_caps() which will be used later by support devices. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-02-27accel/kvm: Silent -Wmissing-field-initializers warningPhilippe Mathieu-Daudé
Silent when compiling with -Wextra: ../accel/kvm/kvm-all.c:2291:17: warning: missing field 'num' initializer [-Wmissing-field-initializers] { NULL, } ^ Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20221220143532.24958-3-philmd@linaro.org>
2023-02-27gdbstub: Use vaddr type for generic insert/remove_breakpoint() APIPhilippe Mathieu-Daudé
Both insert/remove_breakpoint() handlers are used in system and user emulation. We can not use the 'hwaddr' type on user emulation, we have to use 'vaddr' which is defined as "wide enough to contain any #target_ulong virtual address". gdbstub.c doesn't require to include "exec/hwaddr.h" anymore. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20221216215519.5522-4-philmd@linaro.org>
2023-02-04stats: Move QMP commands from monitor/ to stats/Markus Armbruster
This moves these commands from MAINTAINERS section "QMP" to new section "Stats". Status is Orphan. Volunteers welcome! Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20230124121946.1139465-23-armbru@redhat.com>
2023-01-11kvm: Atomic memslot updatesDavid Hildenbrand
If we update an existing memslot (e.g., resize, split), we temporarily remove the memslot to re-add it immediately afterwards. These updates are not atomic, especially not for KVM VCPU threads, such that we can get spurious faults. Let's inhibit most KVM ioctls while performing relevant updates, such that we can perform the update just as if it would happen atomically without additional kernel support. We capture the add/del changes and apply them in the notifier commit stage instead. There, we can check for overlaps and perform the ioctl inhibiting only if really required (-> overlap). To keep things simple we don't perform additional checks that wouldn't actually result in an overlap -- such as !RAM memory regions in some cases (see kvm_set_phys_mem()). To minimize cache-line bouncing, use a separate indicator (in_ioctl_lock) per CPU. Also, make sure to hold the kvm_slots_lock while performing both actions (removing+re-adding). We have to wait until all IOCTLs were exited and block new ones from getting executed. This approach cannot result in a deadlock as long as the inhibitor does not hold any locks that might hinder an IOCTL from getting finished and exited - something fairly unusual. The inhibitor will always hold the BQL. AFAIKs, one possible candidate would be userfaultfd. If a page cannot be placed (e.g., during postcopy), because we're waiting for a lock, or if the userfaultfd thread cannot process a fault, because it is waiting for a lock, there could be a deadlock. However, the BQL is not applicable here, because any other guest memory access while holding the BQL would already result in a deadlock. Nothing else in the kernel should block forever and wait for userspace intervention. Note: pause_all_vcpus()/resume_all_vcpus() or start_exclusive()/end_exclusive() cannot be used, as they either drop the BQL or require to be called without the BQL - something inhibitors cannot handle. We need a low-level locking mechanism that is deadlock-free even when not releasing the BQL. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Tested-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20221111154758.1372674-4-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11KVM: keep track of running ioctlsEmanuele Giuseppe Esposito
Using the new accel-blocker API, mark where ioctls are being called in KVM. Next, we will implement the critical section that will take care of performing memslots modifications atomically, therefore preventing any new ioctl from running and allowing the running ones to finish. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20221111154758.1372674-3-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-14qapi: Use returned bool to check for failure (again)Markus Armbruster
Commit 012d4c96e2 changed the visitor functions taking Error ** to return bool instead of void, and the commits following it used the new return value to simplify error checking. Since then a few more uses in need of the same treatment crept in. Do that. All pretty mechanical except for * balloon_stats_get_all() This is basically the same transformation commit 012d4c96e2 applied to the virtual walk example in include/qapi/visitor.h. * set_max_queue_size() Additionally replace "goto end of function" by return. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221121085054.683122-10-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2022-10-11i386: add notify VM exit supportChenyi Qiang
There are cases that malicious virtual machine can cause CPU stuck (due to event windows don't open up), e.g., infinite loop in microcode when nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and IRQ) can be delivered. It leads the CPU to be unavailable to host or other VMs. Notify VM exit is introduced to mitigate such kind of attacks, which will generate a VM exit if no event window occurs in VM non-root mode for a specified amount of time (notify window). A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space so that the user can query the capability and set the expected notify window when creating VMs. The format of the argument when enabling this capability is as follows: Bit 63:32 - notify window specified in qemu command Bit 31:0 - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to enable the feature.) Users can configure the feature by a new (x86 only) accel property: qemu -accel kvm,notify-vmexit=run|internal-error|disable,notify-window=n The default option of notify-vmexit is run, which will enable the capability and do nothing if the exit happens. The internal-error option raises a KVM internal error if it happens. The disable option does not enable the capability. The default value of notify-window is 0. It is valid only when notify-vmexit is not disabled. The valid range of notify-window is non-negative. It is even safe to set it to zero since there's an internal hardware threshold to be added to ensure no false positive. Because a notify VM exit may happen with VM_CONTEXT_INVALID set in exit qualification (no cases are anticipated that would set this bit), which means VM context is corrupted. It would be reflected in the flags of KVM_EXIT_NOTIFY exit. If KVM_NOTIFY_CONTEXT_INVALID bit is set, raise a KVM internal error unconditionally. Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20220929072014.20705-5-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-11kvm: expose struct KVMStateChenyi Qiang
Expose struct KVMState out of kvm-all.c so that the field of struct KVMState can be accessed when defining target-specific accelerator properties. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20220929072014.20705-4-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-10kvm: allow target-specific accelerator propertiesPaolo Bonzini
Several hypervisor capabilities in KVM are target-specific. When exposed to QEMU users as accelerator properties (i.e. -accel kvm,prop=value), they should not be available for all targets. Add a hook for targets to add their own properties to -accel kvm, for now no such property is defined. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220929072014.20705-3-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-06accel/kvm: move kvm_update_guest_debug to inline stubAlex Bennée
Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20220929114231.583801-47-alex.bennee@linaro.org>
2022-10-06gdbstub: move guest debug support check to opsAlex Bennée
This removes the final hard coding of kvm_enabled() in gdbstub and moves the check to an AccelOps. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Mads Ynddal <mads@ynddal.dk> Message-Id: <20220929114231.583801-46-alex.bennee@linaro.org>
2022-10-06gdbstub: move breakpoint logic to accel opsAlex Bennée
As HW virtualization requires specific support to handle breakpoints lets push out special casing out of the core gdbstub code and into AccelOpsClass. This will make it easier to add other accelerator support and reduces some of the stub shenanigans. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Mads Ynddal <mads@ynddal.dk> Message-Id: <20220929114231.583801-45-alex.bennee@linaro.org>
2022-10-06gdbstub: move sstep flags probing into AccelClassAlex Bennée
The support of single-stepping is very much dependent on support from the accelerator we are using. To avoid special casing in gdbstub move the probing out to an AccelClass function so future accelerators can put their code there. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Mads Ynddal <mads@ynddal.dk> Message-Id: <20220929114231.583801-44-alex.bennee@linaro.org>
2022-09-18kvm: fix memory leak on failure to read stats descriptorsPaolo Bonzini
Reported by Coverity as CID 1490142. Since the size is constant and the lifetime is the same as the StatsDescriptors struct, embed the struct directly instead of using a separate allocation. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-18KVM: use store-release to mark dirty pages as harvestedPaolo Bonzini
The following scenario can happen if QEMU sets more RESET flags while the KVM_RESET_DIRTY_RINGS ioctl is ongoing on another host CPU: CPU0 CPU1 CPU2 ------------------------ ------------------ ------------------------ fill gfn0 store-rel flags for gfn0 fill gfn1 store-rel flags for gfn1 load-acq flags for gfn0 set RESET for gfn0 load-acq flags for gfn1 set RESET for gfn1 do ioctl! -----------> ioctl(RESET_RINGS) fill gfn2 store-rel flags for gfn2 load-acq flags for gfn2 set RESET for gfn2 process gfn0 process gfn1 process gfn2 do ioctl! etc. The three load-acquire in CPU0 synchronize with the three store-release in CPU2, but CPU0 and CPU1 are only synchronized up to gfn1 and CPU1 may miss gfn2's fields other than flags. The kernel must be able to cope with invalid values of the fields, and userspace *will* invoke the ioctl once more. However, once the RESET flag is cleared on gfn2, it is lost forever, therefore in the above scenario CPU1 must read the correct value of gfn2's fields. Therefore RESET must be set with a store-release, that will synchronize with KVM's load-acquire in CPU1. Cc: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-01KVM: dirty ring: add missing memory barrierPaolo Bonzini
The KVM_DIRTY_GFN_F_DIRTY flag ensures that the entry is valid. If the read of the fields are not ordered after the read of the flag, QEMU might see stale values. Cc: Gavin Shan <gshan@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-08-18kvm: fix segfault with query-stats-schemas and -M nonePaolo Bonzini
-M none creates a guest without a vCPU, causing the following error: $ ./qemu-system-x86_64 -qmp stdio -M none -accel kvm {execute:qmp_capabilities} {"return": {}} {execute: query-stats-schemas} Segmentation fault (core dumped) Fix it by not querying the vCPU stats if first_cpu is NULL. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-29kvm: don't use perror() without useful errnoCornelia Huck
perror() is designed to append the decoded errno value to a string. This, however, only makes sense if we called something that actually sets errno prior to that. For the callers that check for split irqchip support that is not the case, and we end up with confusing error messages that end in "success". Use error_report() instead. Signed-off-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20220728142446.438177-1-cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-22accel/kvm: Avoid Coverity warning in query_stats()Peter Maydell
Coverity complains that there is a codepath in the query_stats() function where it can leak the memory pointed to by stats_list. This can only happen if the caller passes something other than STATS_TARGET_VM or STATS_TARGET_VCPU as the 'target', which no callsite does. Enforce this assumption using g_assert_not_reached(), so that if we have a future bug we hit the assert rather than silently leaking memory. Resolves: Coverity CID 1490140 Fixes: cc01a3f4cadd91e6 ("kvm: Support for querying fd-based stats") Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220719134853.327059-1-peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-21Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into stagingPeter Maydell
* Boolean statistics for KVM * Fix build on Haiku # gpg: Signature made Tue 19 Jul 2022 10:32:34 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: util: Fix broken build on Haiku kvm: add support for boolean statistics monitor: add support for boolean statistics Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-07-20softmmu/dirtylimit: Implement virtual CPU throttleHyman Huang(黄勇)
Setup a negative feedback system when vCPU thread handling KVM_EXIT_DIRTY_RING_FULL exit by introducing throttle_us_per_full field in struct CPUState. Sleep throttle_us_per_full microseconds to throttle vCPU if dirtylimit is in service. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <977e808e03a1cef5151cae75984658b6821be618.1656177590.git.huangy81@chinatelecom.cn> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20accel/kvm/kvm-all: Introduce kvm_dirty_ring_size functionHyman Huang(黄勇)
Introduce kvm_dirty_ring_size util function to help calculate dirty ring ful time. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Acked-by: Peter Xu <peterx@redhat.com> Message-Id: <f9ce1f550bfc0e3a1f711e17b1dbc8f701700e56.1656177590.git.huangy81@chinatelecom.cn> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20accel/kvm/kvm-all: Refactor per-vcpu dirty ring reapingHyman Huang(黄勇)
Add a non-required argument 'CPUState' to kvm_dirty_ring_reap so that it can cover single vcpu dirty-ring-reaping scenario. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <c32001242875e83b0d9f78f396fe2dcd380ba9e8.1656177590.git.huangy81@chinatelecom.cn> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-18kvm: add support for boolean statisticsPaolo Bonzini
The next version of Linux will introduce boolean statistics, which can only have 0 or 1 values. Convert them to the new QAPI fields added in the previous commit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-08accel: kvm: Fix memory leak in find_stats_descriptorsMiaoqian Lin
This function doesn't release descriptors in one error path, result in memory leak. Call g_free() to release it. Fixes: cc01a3f4cadd ("kvm: Support for querying fd-based stats") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Message-Id: <20220624063159.57411-1-linmq006@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-14qmp: add filtering of statistics by namePaolo Bonzini
Allow retrieving only a subset of statistics. This can be useful for example in order to plot a subset of the statistics many times a second: KVM publishes ~40 statistics for each vCPU on x86; retrieving and serializing all of them would be useless. Another use will be in HMP in the following patch; implementing the filter in the backend is easy enough that it was deemed okay to make this a public interface. Example: { "execute": "query-stats", "arguments": { "target": "vcpu", "vcpus": [ "/machine/unattached/device[2]", "/machine/unattached/device[4]" ], "providers": [ { "provider": "kvm", "names": [ "l1d_flush", "exits" ] } } } { "return": { "vcpus": [ { "path": "/machine/unattached/device[2]" "providers": [ { "provider": "kvm", "stats": [ { "name": "l1d_flush", "value": 41213 }, { "name": "exits", "value": 74291 } ] } ] }, { "path": "/machine/unattached/device[4]" "providers": [ { "provider": "kvm", "stats": [ { "name": "l1d_flush", "value": 16132 }, { "name": "exits", "value": 57922 } ] } ] } ] } } Extracted from a patch by Mark Kanda. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-14qmp: add filtering of statistics by providerPaolo Bonzini
Allow retrieving the statistics from a specific provider only. This can be used in the future by HMP commands such as "info sync-profile" or "info profile". The next patch also adds filter-by-provider capabilities to the HMP equivalent of query-stats, "info stats". Example: { "execute": "query-stats", "arguments": { "target": "vm", "providers": [ { "provider": "kvm" } ] } } The QAPI is a bit more verbose than just a list of StatsProvider, so that it can be subsequently extended with filtering of statistics by name. If a provider is specified more than once in the filter, each request will be included separately in the output. Extracted from a patch by Mark Kanda. Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-14qmp: add filtering of statistics by target vCPUPaolo Bonzini
Introduce a simple filtering of statistics, that allows to retrieve statistics for a subset of the guest vCPUs. This will be used for example by the HMP monitor, in order to retrieve the statistics for the currently selected CPU. Example: { "execute": "query-stats", "arguments": { "target": "vcpu", "vcpus": [ "/machine/unattached/device[2]", "/machine/unattached/device[4]" ] } } Extracted from a patch by Mark Kanda. Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-14kvm: Support for querying fd-based statsMark Kanda
Add support for querying fd-based KVM stats - as introduced by Linux kernel commit: cb082bfab59a ("KVM: stats: Add fd-based API to read binary stats data") This allows the user to analyze the behavior of the VM without access to debugfs. Signed-off-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08Fix 'writeable' typosPeter Maydell
We have about 30 instances of the typo/variant spelling 'writeable', and over 500 of the more common 'writable'. Standardize on the latter. Change produced with: sed -i -e 's/\([Ww][Rr][Ii][Tt]\)[Ee]\([Aa][Bb][Ll][Ee]\)/\1\2/g' $(git grep -il writeable) and then hand-undoing the instance in linux-headers/linux/kvm.h. Most of these changes are in comments or documentation; the exceptions are: * a local variable in accel/hvf/hvf-accel-ops.c * a local variable in accel/kvm/kvm-all.c * the PMCR_WRITABLE_MASK macro in target/arm/internals.h * the EPT_VIOLATION_GPA_WRITABLE macro in target/i386/hvf/vmcs.h (which is never used anywhere) * the AR_TYPE_WRITABLE_MASK macro in target/i386/hvf/vmx.h (which is never used anywhere) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Weil <sw@weilnetz.de> Message-id: 20220505095015.2714666-1-peter.maydell@linaro.org
2022-04-06Replace qemu_real_host_page variables with inlined functionsMarc-André Lureau
Replace the global variables with inlined helper functions. getpagesize() is very likely annotated with a "const" function attribute (at least with glibc), and thus optimization should apply even better. This avoids the need for a constructor initialization too. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06Replace TARGET_WORDS_BIGENDIANMarc-André Lureau
Convert the TARGET_WORDS_BIGENDIAN macro, similarly to what was done with HOST_BIG_ENDIAN. The new TARGET_BIG_ENDIAN macro is either 0 or 1, and thus should always be defined to prevent misuse. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Suggested-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220323155743.1585078-8-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06Replace config-time define HOST_WORDS_BIGENDIANMarc-André Lureau
Replace a config-time define with a compile time condition define (compatible with clang and gcc) that must be declared prior to its usage. This avoids having a global configure time define, but also prevents from bad usage, if the config header wasn't included before. This can help to make some code independent from qemu too. gcc supports __BYTE_ORDER__ from about 4.6 and clang from 3.2. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> [ For the s390x parts I'm involved in ] Acked-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220323155743.1585078-7-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21Use g_new() & friends where that makes obvious senseMarkus Armbruster
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void *, which lets the compiler catch more type errors. This commit only touches allocations with size arguments of the form sizeof(T). Patch created mechanically with: $ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \ --macro-file scripts/cocci-macro-file.h FILES... Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20220315144156.1595462-4-armbru@redhat.com> Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
2022-03-15kvm/msi: do explicit commit when adding msi routesLongpeng(Mike)
We invoke the kvm_irqchip_commit_routes() for each addition to MSI route table, which is not efficient if we are adding lots of routes in some cases. This patch lets callers invoke the kvm_irqchip_commit_routes(), so the callers can decide how to optimize. [1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg00967.html Signed-off-by: Longpeng <longpeng2@huawei.com> Message-Id: <20220222141116.2091-3-longpeng2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>