aboutsummaryrefslogtreecommitdiff
path: root/accel/kvm
AgeCommit message (Collapse)Author
2024-04-08kvm: error out of kvm_irqchip_add_msi_route() in case of full route tableIgor Mammedov
subj is calling kvm_add_routing_entry() which simply extends KVMState::irq_routes::entries[] but doesn't check if number of routes goes beyond limit the kernel is willing to accept. Which later leads toi the assert qemu-kvm: ../accel/kvm/kvm-all.c:1833: kvm_irqchip_commit_routes: Assertion `ret == 0' failed typically it happens during guest boot for large enough guest Reproduced with: ./qemu --enable-kvm -m 8G -smp 64 -machine pc \ `for b in {1..2}; do echo -n "-device pci-bridge,id=pci$b,chassis_nr=$b "; for i in {0..31}; do touch /tmp/vblk$b$i; echo -n "-drive file=/tmp/vblk$b$i,if=none,id=drive$b$i,format=raw -device virtio-blk-pci,drive=drive$b$i,bus=pci$b "; done; done` While crash at boot time is bad, the same might happen at hotplug time which is unacceptable. So instead calling kvm_add_routing_entry() unconditionally, check first that number of routes won't exceed KVM_CAP_IRQ_ROUTING. This way virtio device insteads killin qemu, will gracefully fail to initialize device as expected with following warnings on console: virtio-blk failed to set guest notifier (-28), ensure -accel kvm is set. virtio_bus_start_ioeventfd: failed. Fallback to userspace (slower). Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-ID: <20240408110956.451558-1-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-05migration: prevent migration when VM has poisoned memoryWilliam Roche
A memory page poisoned from the hypervisor level is no longer readable. The migration of a VM will crash Qemu when it tries to read the memory address space and stumbles on the poisoned page with a similar stack trace: Program terminated with signal SIGBUS, Bus error. #0 _mm256_loadu_si256 #1 buffer_zero_avx2 #2 select_accel_fn #3 buffer_is_zero #4 save_zero_page #5 ram_save_target_page_legacy #6 ram_save_host_page #7 ram_find_and_save_block #8 ram_save_iterate #9 qemu_savevm_state_iterate #10 migration_iteration_run #11 migration_thread #12 qemu_thread_start To avoid this VM crash during the migration, prevent the migration when a known hardware poison exists on the VM. Signed-off-by: William Roche <william.roche@oracle.com> Link: https://lore.kernel.org/r/20240130190640.139364-2-william.roche@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-19Merge tag 'hw-cpus-20240119' of https://github.com/philmd/qemu into stagingPeter Maydell
HW core patch queue . Deprecate unmaintained SH-4 models (Samuel) . HPET: Convert DPRINTF calls to trace events (Daniel) . Implement buffered block writes in Intel PFlash (Gerd) . Ignore ELF loadable segments with zero size (Bin) . ESP/NCR53C9x: PCI DMA fixes (Mark) . PIIX: Simplify Xen PCI IRQ routing (Bernhard) . Restrict CPU 'start-powered-off' property to sysemu (Phil) . target/alpha: Only build sys_helper.c on system emulation (Phil) . target/xtensa: Use generic instruction breakpoint API & add test (Max) . Restrict icount to system emulation (Phil) . Do not set CPUState TCG-specific flags in non-TCG accels (Phil) . Cleanup TCG tb_invalidate API (Phil) . Correct LoongArch/KVM include path (Bibo) . Do not ignore throttle errors in crypto backends (Phil) . MAINTAINERS updates (Raphael, Zhao) # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmWqXbkACgkQ4+MsLN6t # wN6VVBAAkP/Bs2JfQYobPZVV868wceM97KeUJMXP2YWf6dSLpHRCQN5KtuJcACM9 # y3k3R7nMeVJSGmzl/1gF1G9JhjoCLoVLX/ejeBppv4Wq//9sEdggaQfdCwkhWw2o # IK/gPjTZpimE7Er4hPlxmuhSRuM1MX4duKFRRfuZpE7XY14Y7/Hk12VIG7LooO0x # 2Sl8CaU0DN7CWmRVDoUkwVx7JBy28UVarRDsgpBim7oKmjjBFnCJkH6B6NJXEiYr # z1BmIcHa87S09kG1ek+y8aZpG9iPC7nUWjPIQyJGhnfrnBuO7hQHwCLIjHHp5QBR # BoMr8YQNTI34/M/D8pBfg96LrGDjkQOfwRyRddkMP/jJcNPMAPMNGbfVaIrfij1e # T+jFF4gQenOvy1XKCY3Uk/a11P3tIRFBEeOlzzQg4Aje9W2MhUNwK2HTlRfBbrRr # V30R764FDmHlsyOu6/E3jqp4GVCgryF1bglPOBjVEU5uytbQTP8jshIpGVnxBbF+ # OpFwtsoDbsousNKVcO5+B0mlHcB9Ru9h11M5/YD/jfLMk95Ga90JGdgYpqQ5tO5Y # aqQhKfCKbfgKuKhysxpsdWAwHZzVrlSf+UrObF0rl2lMXXfcppjCqNaw4QJ0oedc # DNBxTPcCE2vWhUzP3A60VH7jLh4nLaqSTrxxQKkbx+Je1ERGrxs= # =KmQh # -----END PGP SIGNATURE----- # gpg: Signature made Fri 19 Jan 2024 11:32:09 GMT # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] # Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE * tag 'hw-cpus-20240119' of https://github.com/philmd/qemu: (36 commits) configure: Add linux header compile support for LoongArch MAINTAINERS: Update hw/core/cpu.c entry MAINTAINERS: Update Raphael Norwitz email hw/elf_ops: Ignore loadable segments with zero size hw/scsi/esp-pci: set DMA_STAT_BCMBLT when BLAST command issued hw/scsi/esp-pci: synchronise setting of DMA_STAT_DONE with ESP completion interrupt hw/scsi/esp-pci: generate PCI interrupt from separate ESP and PCI sources hw/scsi/esp-pci: use correct address register for PCI DMA transfers target/riscv: Rename tcg_cpu_FOO() to include 'riscv' target/i386: Rename tcg_cpu_FOO() to include 'x86' hw/s390x: Rename cpu_class_init() to include 'sclp' hw/core/cpu: Rename cpu_class_init() to include 'common' accel: Rename accel_init_ops_interfaces() to include 'system' cpus: Restrict 'start-powered-off' property to system emulation system/watchpoint: Move TCG specific code to accel/tcg/ system/replay: Restrict icount to system emulation hw/pflash: implement update buffer for block writes hw/pflash: use ldn_{be,le}_p and stn_{be,le}_p hw/pflash: refactor pflash_data_write() hw/i386/pc_piix: Make piix_intx_routing_notifier_xen() more device independent ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-01-19accel: Do not set CPUState::can_do_io in non-TCG accelsPhilippe Mathieu-Daudé
'can_do_io' is specific to TCG. It was added to other accelerators in 626cf8f4c6 ("icount: set can_do_io outside TB execution"), then likely copy/pasted in commit c97d6d2cdf ("i386: hvf: add code base from Google's QEMU repository"). Having it set in non-TCG code is confusing, so remove it from QTest / HVF / KVM. Fixes: 626cf8f4c6 ("icount: set can_do_io outside TB execution") Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20231129205037.16849-1-philmd@linaro.org>
2024-01-18Add class property to configure KVM device node to useDaan De Meyer
This allows passing the KVM device node to use as a file descriptor via /dev/fdset/XX. Passing the device node to use as a file descriptor allows running qemu unprivileged even when the user running qemu is not in the kvm group on distributions where access to /dev/kvm is gated behind membership of the kvm group (as long as the process invoking qemu is able to open /dev/kvm and passes the file descriptor to qemu). Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Message-ID: <20231021134015.1119597-1-daan.j.demeyer@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-01-08system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()Stefan Hajnoczi
The Big QEMU Lock (BQL) has many names and they are confusing. The actual QemuMutex variable is called qemu_global_mutex but it's commonly referred to as the BQL in discussions and some code comments. The locking APIs, however, are called qemu_mutex_lock_iothread() and qemu_mutex_unlock_iothread(). The "iothread" name is historic and comes from when the main thread was split into into KVM vcpu threads and the "iothread" (now called the main loop thread). I have contributed to the confusion myself by introducing a separate --object iothread, a separate concept unrelated to the BQL. The "iothread" name is no longer appropriate for the BQL. Rename the locking APIs to: - void bql_lock(void) - void bql_unlock(void) - bool bql_locked(void) There are more APIs with "iothread" in their names. Subsequent patches will rename them. There are also comments and documentation that will be updated in later patches. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Fabiano Rosas <farosas@suse.de> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Cédric Le Goater <clg@kaod.org> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Acked-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20240102153529.486531-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-12-23accel/kvm: Turn DPRINTF macro use into tracepointsJai Arora
Patch removes DPRINTF macro and adds multiple tracepoints to capture different kvm events. We also drop the DPRINTFs that don't add any additional information than trace_kvm_run_exit already does. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1827 Signed-off-by: Jai Arora <arorajai2798@gmail.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-12-19accel/kvm: Make kvm_has_guest_debug staticRichard Henderson
This variable is not used or declared outside kvm-all.c. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-10-25kvm: i386: require KVM_CAP_SET_VCPU_EVENTS and KVM_CAP_X86_ROBUST_SINGLESTEPPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: i386: require KVM_CAP_DEBUGREGSPaolo Bonzini
This was introduced in KVM in Linux 2.6.35, we can require it unconditionally. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: unify listeners for PIO address spacePaolo Bonzini
Since we now assume that ioeventfds are present, kvm_io_listener is always registered. Merge it with kvm_coalesced_pio_listener in a single listener. Since PIO space does not have KVM memslots attached to it, the priority is irrelevant. Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: require KVM_CAP_IOEVENTFD and KVM_CAP_IOEVENTFD_ANY_LENGTHPaolo Bonzini
KVM_CAP_IOEVENTFD_ANY_LENGTH was added in Linux 4.4, released in 2016. Assume that it is present. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: assume that many ioeventfds can be createdPaolo Bonzini
NR_IOBUS_DEVS was increased to 200 in Linux 2.6.34. By Linux 3.5 it had increased to 1000 and later ioeventfds were changed to not count against the limit. But the earlier limit of 200 would already be enough for kvm_check_many_ioeventfds() to be true, so remove the check. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: drop reference to KVM_CAP_PCI_2_3Paolo Bonzini
This is a remnant of pre-VFIO device assignment; it is not defined anymore by Linux and not used by QEMU. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: require KVM_IRQFD for kernel irqchipPaolo Bonzini
KVM_IRQFD was introduced in Linux 2.6.32, and since then it has always been available on architectures that support an in-kernel interrupt controller. We can require it unconditionally. Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: require KVM_CAP_SIGNAL_MSIPaolo Bonzini
This was introduced in KVM in Linux 3.5, we can require it unconditionally in kvm_irqchip_send_msi(). However, not all architectures have to implement it so check it only in x86, the only architecture that ever had MSI injection but not KVM_CAP_SIGNAL_MSI. ARM uses it to detect the presence of the ITS emulation in the kernel, introduced in Linux 4.8. Assume that it's there and possibly fail when realizing the arm-its-kvm device. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: require KVM_CAP_INTERNAL_ERROR_DATAPaolo Bonzini
This was introduced in KVM in Linux 2.6.33, we can require it unconditionally. Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-12kvm: Add stub for kvm_get_max_memslots()David Hildenbrand
We'll need the stub soon from memory device context. While at it, use "unsigned int" as return value and place the declaration next to kvm_get_free_memslots(). Message-ID: <20230926185738.277351-11-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2023-10-12kvm: Return number of free memslotsDavid Hildenbrand
Let's return the number of free slots instead of only checking if there is a free slot. While at it, check all address spaces, which will also consider SMM under x86 correctly. This is a preparation for memory devices that consume multiple memslots. Message-ID: <20230926185738.277351-5-david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2023-10-03accel/tcg: Move can_do_io to CPUNegativeOffsetStateRichard Henderson
Minimize the displacement to can_do_io, since it may be touched at the start of each TranslationBlock. It fits into other padding within the substructure. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-09-29accel/kvm/kvm-all: Handle register access errorsAkihiko Odaki
A register access error typically means something seriously wrong happened so that anything bad can happen after that and recovery is impossible. Even failing one register access is catastorophic as architecture-specific code are not written so that it torelates such failures. Make sure the VM stop and nothing worse happens if such an error occurs. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-ID: <20221201102728.69751-1-akihiko.odaki@daynix.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-08arm/kvm: Enable support for KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZEShameer Kolothum
Now that we have Eager Page Split support added for ARM in the kernel, enable it in Qemu. This adds, -eager-split-size to -accel sub-options to set the eager page split chunk size. -enable KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE. The chunk size specifies how many pages to break at a time, using a single allocation. Bigger the chunk size, more pages need to be allocated ahead of time. Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Message-id: 20230905091246.1931-1-shameerali.kolothum.thodi@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-08-24accel/kvm: Widen pc/saved_insn for kvm_sw_breakpointAnton Johansson
Widens the pc and saved_insn fields of kvm_sw_breakpoint from target_ulong to vaddr. The pc argument of kvm_find_sw_breakpoint is also widened to match. Signed-off-by: Anton Johansson <anjo@rev.ng> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230807155706.9580-2-anjo@rev.ng> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-08-22accel/kvm: Make kvm_dirty_ring_reaper_init() voidAkihiko Odaki
The returned value was always zero and had no meaning. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-7-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-08-22accel/kvm: Free as when an error occurredAkihiko Odaki
An error may occur after s->as is allocated, for example if the KVM_CREATE_VM ioctl call fails. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-6-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> [PMM: tweaked commit message] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-08-22accel/kvm: Use negative KVM type for error propagationAkihiko Odaki
On MIPS, kvm_arch_get_default_type() returns a negative value when an error occurred so handle the case. Also, let other machines return negative values when errors occur and declare returning a negative value as the correct way to propagate an error that happened when determining KVM type. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-5-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-08-22kvm: Introduce kvm_arch_get_default_type hookAkihiko Odaki
kvm_arch_get_default_type() returns the default KVM type. This hook is particularly useful to derive a KVM type that is valid for "none" machine model, which is used by libvirt to probe the availability of KVM. For MIPS, the existing mips_kvm_type() is reused. This function ensures the availability of VZ which is mandatory to use KVM on the current QEMU. Cc: qemu-stable@nongnu.org Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-2-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> [PMM: added doc comment for new function] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-07-31kvm: Fix crash due to access uninitialized kvm_stateGavin Shan
Runs into core dump on arm64 and the backtrace extracted from the core dump is shown as below. It's caused by accessing uninitialized @kvm_state in kvm_flush_coalesced_mmio_buffer() due to commit 176d073029 ("hw/arm/virt: Use machine_memory_devices_init()"), where the machine's memory region is added earlier than before. main qemu_init configure_accelerators qemu_opts_foreach do_configure_accelerator accel_init_machine kvm_init virt_kvm_type virt_set_memmap machine_memory_devices_init memory_region_add_subregion memory_region_add_subregion_common memory_region_update_container_subregions memory_region_transaction_begin qemu_flush_coalesced_mmio_buffer kvm_flush_coalesced_mmio_buffer Fix it by bailing early in kvm_flush_coalesced_mmio_buffer() on the uninitialized @kvm_state. With this applied, no crash is observed on arm64. Fixes: 176d073029 ("hw/arm/virt: Use machine_memory_devices_init()") Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20230731125946.2038742-1-gshan@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-06-28exec/memory: Add symbol for the min value of memory listener priorityIsaku Yamahata
Add MEMORY_LISTNER_PRIORITY_MIN for the symbolic value for the min value of the memory listener instead of the hard-coded magic value 0. Add explicit initialization. No functional change intended. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <29f88477fe82eb774bcfcae7f65ea21995f865f2.1687279702.git.isaku.yamahata@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-06-28exec/memory: Add symbol for memory listener priority for device backendIsaku Yamahata
Add MEMORY_LISTENER_PRIORITY_DEV_BACKEND for the symbolic value for memory listener to replace the hard-coded value 10 for the device backend. No functional change intended. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <8314d91688030d7004e96958f12e2c83fb889245.1687279702.git.isaku.yamahata@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-06-28exec/memory: Add symbolic value for memory listener priority for accelIsaku Yamahata
Add MEMORY_LISTNER_PRIORITY_ACCEL for the symbolic value for the memory listener to replace the hard-coded value 10 for accel. No functional change intended. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <feebe423becc6e2aa375f59f6abce9a85bc15abb.1687279702.git.isaku.yamahata@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-06-26kvm: reuse per-vcpu stats fd to avoid vcpu interruptionMarcelo Tosatti
A regression has been detected in latency testing of KVM guests. More specifically, it was observed that the cyclictest numbers inside of an isolated vcpu (running on isolated pcpu) are: Where a maximum of 50us is acceptable. The implementation of KVM_GET_STATS_FD uses run_on_cpu to query per vcpu statistics, which interrupts the vcpu (and is unnecessary). To fix this, open the per vcpu stats fd on vcpu initialization, and read from that fd from QEMU's main thread. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-18kvm: Enable dirty ring for arm64Gavin Shan
arm64 has different capability from x86 to enable the dirty ring, which is KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Besides, arm64 also needs the backup bitmap extension (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) when 'kvm-arm-gicv3' or 'arm-its-kvm' device is enabled. Here the extension is always enabled and the unnecessary overhead to do the last stage of dirty log synchronization when those two devices aren't used is introduced, but the overhead should be very small and acceptable. The benefit is cover future cases where those two devices are used without modifying the code. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20230509022122.20888-5-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-18kvm: Add helper kvm_dirty_ring_init()Gavin Shan
Due to multiple capabilities associated with the dirty ring for different architectures: KVM_CAP_DIRTY_{LOG_RING, LOG_RING_ACQ_REL} for x86 and arm64 separately. There will be more to be done in order to support the dirty ring for arm64. Lets add helper kvm_dirty_ring_init() to enable the dirty ring. With this, the code looks a bit clean. No functional change intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Message-Id: <20230509022122.20888-4-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-18kvm: Synchronize the backup bitmap in the last stageGavin Shan
In the last stage of live migration or memory slot removal, the backup bitmap needs to be synchronized when it has been enabled. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Message-Id: <20230509022122.20888-3-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-18migration: Add last stage indicator to global dirty logGavin Shan
The global dirty log synchronization is used when KVM and dirty ring are enabled. There is a particularity for ARM64 where the backup bitmap is used to track dirty pages in non-running-vcpu situations. It means the dirty ring works with the combination of ring buffer and backup bitmap. The dirty bits in the backup bitmap needs to collected in the last stage of live migration. In order to identify the last stage of live migration and pass it down, an extra parameter is added to the relevant functions and callbacks. This last stage indicator isn't used until the dirty ring is enabled in the subsequent patches. No functional change intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Message-Id: <20230509022122.20888-2-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-04kvm: dirty-ring: Fix race with vcpu creationPeter Xu
It's possible that we want to reap a dirty ring on a vcpu that is during creation, because the vcpu is put onto list (CPU_FOREACH visible) before initialization of the structures. In this case: qemu_init_vcpu x86_cpu_realizefn cpu_exec_realizefn cpu_list_add <---- can be probed by CPU_FOREACH qemu_init_vcpu cpus_accel->create_vcpu_thread(cpu); kvm_init_vcpu map kvm_dirty_gfns <--- kvm_dirty_gfns valid Don't try to reap dirty ring on vcpus during creation or it'll crash. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2124756 Reported-by: Xiaohui Li <xiaohli@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1d14deb6684bcb7de1c9633c5bd21113988cc698.1676563222.git.huangy81@chinatelecom.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-07gdbstub: move update guest debug to accel opsMads Ynddal
Continuing the refactor of a48e7d9e52 (gdbstub: move guest debug support check to ops) by removing hardcoded kvm_enabled() from generic cpu.c code, and replace it with a property of AccelOpsClass. Signed-off-by: Mads Ynddal <m.ynddal@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230207131721.49233-1-mads@ynddal.dk> [AJB: add ifdef around update_guest_debug_ops, fix brace] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20230302190846.2593720-27-alex.bennee@linaro.org> Message-Id: <20230303025805.625589-30-richard.henderson@linaro.org>
2023-03-01kvm/i386: Add xen-evtchn-max-pirq propertyDavid Woodhouse
The default number of PIRQs is set to 256 to avoid issues with 32-bit MSI devices. Allow it to be increased if the user desires. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01kvm/i386: Add xen-gnttab-max-frames propertyDavid Woodhouse
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-03-01i386/kvm: Add xen-version KVM accelerator property and init KVM Xen supportDavid Woodhouse
This just initializes the basic Xen support in KVM for now. Only permitted on TYPE_PC_MACHINE because that's where the sysbus devices for Xen heap overlay, event channel, grant tables and other stuff will exist. There's no point having the basic hypercall support if nothing else works. Provide sysemu/kvm_xen.h and a kvm_xen_get_caps() which will be used later by support devices. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-02-27accel/kvm: Silent -Wmissing-field-initializers warningPhilippe Mathieu-Daudé
Silent when compiling with -Wextra: ../accel/kvm/kvm-all.c:2291:17: warning: missing field 'num' initializer [-Wmissing-field-initializers] { NULL, } ^ Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20221220143532.24958-3-philmd@linaro.org>
2023-02-27gdbstub: Use vaddr type for generic insert/remove_breakpoint() APIPhilippe Mathieu-Daudé
Both insert/remove_breakpoint() handlers are used in system and user emulation. We can not use the 'hwaddr' type on user emulation, we have to use 'vaddr' which is defined as "wide enough to contain any #target_ulong virtual address". gdbstub.c doesn't require to include "exec/hwaddr.h" anymore. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20221216215519.5522-4-philmd@linaro.org>
2023-02-04stats: Move QMP commands from monitor/ to stats/Markus Armbruster
This moves these commands from MAINTAINERS section "QMP" to new section "Stats". Status is Orphan. Volunteers welcome! Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20230124121946.1139465-23-armbru@redhat.com>
2023-01-11kvm: Atomic memslot updatesDavid Hildenbrand
If we update an existing memslot (e.g., resize, split), we temporarily remove the memslot to re-add it immediately afterwards. These updates are not atomic, especially not for KVM VCPU threads, such that we can get spurious faults. Let's inhibit most KVM ioctls while performing relevant updates, such that we can perform the update just as if it would happen atomically without additional kernel support. We capture the add/del changes and apply them in the notifier commit stage instead. There, we can check for overlaps and perform the ioctl inhibiting only if really required (-> overlap). To keep things simple we don't perform additional checks that wouldn't actually result in an overlap -- such as !RAM memory regions in some cases (see kvm_set_phys_mem()). To minimize cache-line bouncing, use a separate indicator (in_ioctl_lock) per CPU. Also, make sure to hold the kvm_slots_lock while performing both actions (removing+re-adding). We have to wait until all IOCTLs were exited and block new ones from getting executed. This approach cannot result in a deadlock as long as the inhibitor does not hold any locks that might hinder an IOCTL from getting finished and exited - something fairly unusual. The inhibitor will always hold the BQL. AFAIKs, one possible candidate would be userfaultfd. If a page cannot be placed (e.g., during postcopy), because we're waiting for a lock, or if the userfaultfd thread cannot process a fault, because it is waiting for a lock, there could be a deadlock. However, the BQL is not applicable here, because any other guest memory access while holding the BQL would already result in a deadlock. Nothing else in the kernel should block forever and wait for userspace intervention. Note: pause_all_vcpus()/resume_all_vcpus() or start_exclusive()/end_exclusive() cannot be used, as they either drop the BQL or require to be called without the BQL - something inhibitors cannot handle. We need a low-level locking mechanism that is deadlock-free even when not releasing the BQL. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Tested-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20221111154758.1372674-4-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11KVM: keep track of running ioctlsEmanuele Giuseppe Esposito
Using the new accel-blocker API, mark where ioctls are being called in KVM. Next, we will implement the critical section that will take care of performing memslots modifications atomically, therefore preventing any new ioctl from running and allowing the running ones to finish. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20221111154758.1372674-3-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-14qapi: Use returned bool to check for failure (again)Markus Armbruster
Commit 012d4c96e2 changed the visitor functions taking Error ** to return bool instead of void, and the commits following it used the new return value to simplify error checking. Since then a few more uses in need of the same treatment crept in. Do that. All pretty mechanical except for * balloon_stats_get_all() This is basically the same transformation commit 012d4c96e2 applied to the virtual walk example in include/qapi/visitor.h. * set_max_queue_size() Additionally replace "goto end of function" by return. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221121085054.683122-10-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2022-10-11i386: add notify VM exit supportChenyi Qiang
There are cases that malicious virtual machine can cause CPU stuck (due to event windows don't open up), e.g., infinite loop in microcode when nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and IRQ) can be delivered. It leads the CPU to be unavailable to host or other VMs. Notify VM exit is introduced to mitigate such kind of attacks, which will generate a VM exit if no event window occurs in VM non-root mode for a specified amount of time (notify window). A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space so that the user can query the capability and set the expected notify window when creating VMs. The format of the argument when enabling this capability is as follows: Bit 63:32 - notify window specified in qemu command Bit 31:0 - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to enable the feature.) Users can configure the feature by a new (x86 only) accel property: qemu -accel kvm,notify-vmexit=run|internal-error|disable,notify-window=n The default option of notify-vmexit is run, which will enable the capability and do nothing if the exit happens. The internal-error option raises a KVM internal error if it happens. The disable option does not enable the capability. The default value of notify-window is 0. It is valid only when notify-vmexit is not disabled. The valid range of notify-window is non-negative. It is even safe to set it to zero since there's an internal hardware threshold to be added to ensure no false positive. Because a notify VM exit may happen with VM_CONTEXT_INVALID set in exit qualification (no cases are anticipated that would set this bit), which means VM context is corrupted. It would be reflected in the flags of KVM_EXIT_NOTIFY exit. If KVM_NOTIFY_CONTEXT_INVALID bit is set, raise a KVM internal error unconditionally. Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20220929072014.20705-5-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-11kvm: expose struct KVMStateChenyi Qiang
Expose struct KVMState out of kvm-all.c so that the field of struct KVMState can be accessed when defining target-specific accelerator properties. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20220929072014.20705-4-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-10kvm: allow target-specific accelerator propertiesPaolo Bonzini
Several hypervisor capabilities in KVM are target-specific. When exposed to QEMU users as accelerator properties (i.e. -accel kvm,prop=value), they should not be available for all targets. Add a hook for targets to add their own properties to -accel kvm, for now no such property is defined. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220929072014.20705-3-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>