aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-11-02target/microblaze: Make mb_cpu_tlb_fill sysemu onlyRichard Henderson
The fallback code in cpu_loop_exit_sigsegv is sufficient for microblaze linux-user. Remove the code from cpu_loop that handled the unnamed 0xaa exception. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02target/m68k: Make m68k_cpu_tlb_fill sysemu onlyRichard Henderson
The fallback code in cpu_loop_exit_sigsegv is sufficient for m68k linux-user. Remove the code from cpu_loop that handled EXCP_ACCESS. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02target/i386: Implement x86_cpu_record_sigsegvRichard Henderson
Record cr2, error_code, and exception_index. That last means that we must exit to cpu_loop ourselves, instead of letting exception_index being overwritten. Use the maperr parameter to properly set PG_ERROR_P_MASK. Reviewed by: Warner Losh <imp@bsdimp.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02target/hppa: Make hppa_cpu_tlb_fill sysemu onlyRichard Henderson
The fallback code in cpu_loop_exit_sigsegv is sufficient for hppa linux-user. Remove the code from cpu_loop that raised SIGSEGV. This makes all of the code in mem_helper.c sysemu only, so remove the ifdefs and move the file to hppa_softmmu_ss. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02target/hexagon: Remove hexagon_cpu_tlb_fillRichard Henderson
The fallback code in cpu_loop_exit_sigsegv is sufficient for hexagon linux-user. Remove the code from cpu_loop that raises SIGSEGV. Reviewed-by: Taylor Simpson <tsimpson@quicinc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02target/cris: Make cris_cpu_tlb_fill sysemu onlyRichard Henderson
The fallback code in cpu_loop_exit_sigsegv is sufficient for cris linux-user. Remove the code from cpu_loop that handled the unnamed 0xaa exception. This makes all of the code in helper.c sysemu only, so remove the ifdefs and move the file to cris_softmmu_ss. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02target/arm: Implement arm_cpu_record_sigsegvRichard Henderson
Because of the complexity of setting ESR, continue to use arm_deliver_fault. This means we cannot remove the code within cpu_loop that decodes EXCP_DATA_ABORT and EXCP_PREFETCH_ABORT. But using the new hook means that we don't have to do the page_get_flags check manually, and we'll be able to restrict the tlb_fill hook to sysemu later. Reviewed-by: Warner Losh <imp@bsdimp.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02target/arm: Use cpu_loop_exit_sigsegv for mte tag lookupRichard Henderson
Use the new os interface for raising the exception, rather than calling arm_cpu_tlb_fill directly. Reviewed-by: Warner Losh <imp@bsdimp.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02target/alpha: Implement alpha_cpu_record_sigsegvRichard Henderson
Record trap_arg{0,1,2} for the linux-user signal frame. Fill in the stores to trap_arg{1,2} that were missing from the previous user-only alpha_cpu_tlb_fill function. Use maperr to simplify computation of trap_arg1. Remove the code for EXCP_MMFAULT from cpu_loop, as that part is now handled by cpu_loop_exit_sigsegv. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02linux-user: Add cpu_loop_exit_sigsegvRichard Henderson
This is a new interface to be provided by the os emulator for raising SIGSEGV on fault. Use the new record_sigsegv target hook. Reviewed by: Warner Losh <imp@bsdimp.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02hw/core: Add TCGCPUOps.record_sigsegvRichard Henderson
Add a new user-only interface for updating cpu state before raising a signal. This will replace tlb_fill for user-only and should result in less boilerplate for each guest. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02linux-user/signal: Drop HOST_SIGNAL_PLACEHOLDERRichard Henderson
Now that all of the linux-user hosts have been converted to host-signal.h, drop the compatibility code. Reviewed by: Warner Losh <imp@bsdimp.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02linux-user/host/riscv: Improve host_signal_writeRichard Henderson
Do not read 4 bytes before we determine the size of the insn. Simplify triple switches in favor of checking major opcodes. Include the missing cases of compact fsd and fsdsp. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02target/arm: Fixup comment re handle_cpu_signalRichard Henderson
The named function no longer exists. Refer to host_signal_handler instead. Reviewed-by: Warner Losh <imp@bsdimp.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02linux-user/host/riscv: Populate host_signal.hRichard Henderson
Split host_signal_pc and host_signal_write out of user-exec.c. Reviewed-by: Warner Losh <imp@bsdimp.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02linux-user/host/mips: Populate host_signal.hRichard Henderson
Split host_signal_pc and host_signal_write out of user-exec.c. Reviewed-by: Warner Losh <imp@bsdimp.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02linux-user/host/s390: Populate host_signal.hRichard Henderson
Split host_signal_pc and host_signal_write out of user-exec.c. Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02linux-user/host/aarch64: Populate host_signal.hRichard Henderson
Split host_signal_pc and host_signal_write out of user-exec.c. Drop the *BSD code, to be re-created under bsd-user/ later. Reviewed-by: Warner Losh <imp@bsdimp.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02linux-user/host/arm: Populate host_signal.hRichard Henderson
Split host_signal_pc and host_signal_write out of user-exec.c. Drop the *BSD code, to be re-created under bsd-user/ later. Reviewed-by: Warner Losh <imp@bsdimp.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02linux-user/host/sparc: Populate host_signal.hRichard Henderson
Split host_signal_pc and host_signal_write out of user-exec.c. Drop the *BSD code, to be re-created under bsd-user/ later. Drop the Solaris code as completely unused. Reviewed-by: Warner Losh <imp@bsdimp.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02linux-user/host/alpha: Populate host_signal.hRichard Henderson
Split host_signal_pc and host_signal_write out of user-exec.c. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02linux-user/host/ppc: Populate host_signal.hRichard Henderson
Split host_signal_pc and host_signal_write out of user-exec.c. Drop the *BSD code, to be re-created under bsd-user/ later. Reviewed-by: Warner Losh <imp@bsdimp.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02linux-user/host/x86: Populate host_signal.hRichard Henderson
Split host_signal_pc and host_signal_write out of user-exec.c. Drop the *BSD code, to be re-created under bsd-user/ later. Reviewed-by: Warner Losh <imp@bsdimp.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02Merge remote-tracking branch 'remotes/philmd/tags/machine-20211101' into stagingRichard Henderson
Machine core patches - Move GPIO code out of qdev.c - Move hotplug code out of qdev.c - Restrict various files to sysemu - Move SMP code out of machine.c - Add SMP parsing unit tests - Move dynamic sysbus device check earlier # gpg: Signature made Mon 01 Nov 2021 02:44:32 PM EDT # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] * remotes/philmd/tags/machine-20211101: machine: remove the done notifier for dynamic sysbus device type check qdev-monitor: Check sysbus device type before creating it machine: add device_type_is_dynamic_sysbus function tests/unit: Add an unit test for smp parsing hw/core/machine: Split out the smp parsing code hw/core: Restrict hotplug to system emulation hw/core: Extract hotplug-related functions to qdev-hotplug.c hw/core: Declare meson source set hw/core: Restrict sysemu specific files machine: Move gpio code to hw/core/gpio.c Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-01hw/i386: fix vmmouse registrationPavel Dovgalyuk
According to the logic of vmmouse_update_handler function, vmmouse should be registered as an event handler when it's status is zero. vmmouse_read_id resets the status but does not register the handler. This patch adds vmmouse registration and activation when status is reset. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Message-Id: <163524204515.1914131.16465061981774791228.stgit@pasha-ThinkPad-X280> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01pci: Export pci_for_each_device_under_bus*()Peter Xu
They're actually more commonly used than the helper without _under_bus, because most callers do have the pci bus on hand. After exporting we can switch a lot of the call sites to use these two helpers. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20211028043129.38871-3-peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au>
2021-11-01pci: Define pci_bus_dev_fn/pci_bus_fn/pci_bus_ret_fnPeter Xu
They're used in quite a few places of pci.[ch] and also in the rest of the code base. Define them so that it doesn't need to be defined all over the places. The pci_bus_fn is similar to pci_bus_dev_fn that only takes a PCIBus* and an opaque. The pci_bus_ret_fn is similar to pci_bus_fn but it allows to return a void* pointer. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20211028043129.38871-2-peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01hw/i386/pc: Allow instantiating a virtio-iommu deviceJean-Philippe Brucker
Allow instantiating a virtio-iommu device by adding an ACPI Virtual I/O Translation table (VIOT), which describes the relation between the virtio-iommu and the endpoints it manages. Add a hotplug handler for virtio-iommu on x86 and set the necessary reserved region property. On x86, the [0xfee00000, 0xfeefffff] DMA region is reserved for MSIs. DMA transactions to this range either trigger IRQ remapping in the IOMMU or bypasses IOMMU translation. Although virtio-iommu does not support IRQ remapping it must be informed of the reserved region so that it can forward DMA transactions targeting this region. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Message-Id: <20211026182024.2642038-5-jean-philippe@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01hw/i386/pc: Move IOMMU singleton into PCMachineStateJean-Philippe Brucker
We're about to support a third vIOMMU for x86, virtio-iommu which doesn't inherit X86IOMMUState. Move the IOMMU singleton into PCMachineState, so it can be shared between all three vIOMMUs. The x86_iommu_get_default() helper is still needed by KVM and IOAPIC to fetch the default IRQ-remapping IOMMU. Since virtio-iommu doesn't support IRQ remapping, this interface doesn't need to change for the moment. We could later replace X86IOMMUState with an "IRQ remapping IOMMU" interface if necessary. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Message-Id: <20211026182024.2642038-4-jean-philippe@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01hw/i386/pc: Remove x86_iommu_get_type()Jean-Philippe Brucker
To generate the IOMMU ACPI table, acpi-build.c can use base QEMU types instead of a special IommuType value. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Message-Id: <20211026182024.2642038-3-jean-philippe@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01hw/acpi: Add VIOT tableJean-Philippe Brucker
Add a function that generates a Virtual I/O Translation table (VIOT), describing the topology of paravirtual IOMMUs. The table is created if a virtio-iommu device is present. It contains a virtio-iommu node and PCI Range nodes for endpoints managed by the IOMMU. By default, a single node describes all PCI devices. When passing the "default_bus_bypass_iommu" machine option and "bypass_iommu" PXB option, only buses that do not bypass the IOMMU are described by PCI Range nodes. Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Message-Id: <20211026182024.2642038-2-jean-philippe@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01vhost-vdpa: Set discarding of RAM broken when initializing the backendDavid Hildenbrand
Similar to VFIO, vDPA will go ahead an map+pin all guest memory. Memory that used to be discarded will get re-populated and if we discard+re-access memory after mapping+pinning, the pages mapped into the vDPA IOMMU will go out of sync with the actual pages mapped into the user space page tables. Set discarding of RAM broken such that: - virtio-mem and vhost-vdpa run mutually exclusive - virtio-balloon is inhibited and no memory discards will get issued In the future, we might be able to support coordinated discarding of RAM as used by virtio-mem and already supported by vfio via the RamDiscardManager. Acked-by: Jason Wang <jasowang@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Cindy Lu <lulu@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211027130324.59791-1-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2021-11-01qtest: fix 'expression is always false' build failure in qtest_has_accel()Igor Mammedov
If KVM is disabled or not present, qtest library build may fail with: libqtest.c: In function 'qtest_has_accel': comparison of unsigned expression < 0 is always false [-Werror=type-limits] for (i = 0; i < ARRAY_SIZE(targets); i++) { due to empty 'targets' array. Fix it by making sure that CONFIG_KVM_TARGETS isn't empty. Fixes: e741aff0f43343 ("tests: qtest: add qtest_has_accel() to check if tested binary supports accelerator") Reported-by: Jason Andryuk <jandryuk@gmail.com> Suggested-by: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20211027151012.2639284-1-imammedo@redhat.com> Tested-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01migration/dirtyrate: implement dirty-bitmap dirtyrate calculationHyman Huang(黄勇)
introduce dirty-bitmap mode as the third method of calc-dirty-rate. implement dirty-bitmap dirtyrate calculation, which can be used to measuring dirtyrate in the absence of dirty-ring. introduce "dirty_bitmap:-b" option in hmp calc_dirty_rate to indicate dirty bitmap method should be used for calculation. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01memory: introduce total_dirty_pages to stat dirty pagesHyman Huang(黄勇)
introduce global var total_dirty_pages to stat dirty pages along with memory_global_dirty_log_sync. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01migration/ram: Handle RAMBlocks with a RamDiscardManager on background snapshotsDavid Hildenbrand
We already don't ever migrate memory that corresponds to discarded ranges as managed by a RamDiscardManager responsible for the mapped memory region of the RAMBlock. virtio-mem uses this mechanism to logically unplug parts of a RAMBlock. Right now, we still populate zeropages for the whole usable part of the RAMBlock, which is undesired because: 1. Even populating the shared zeropage will result in memory getting consumed for page tables. 2. Memory backends without a shared zeropage (like hugetlbfs and shmem) will populate an actual, fresh page, resulting in an unintended memory consumption. Discarded ("logically unplugged") parts have to remain discarded. As these pages are never part of the migration stream, there is no need to track modifications via userfaultfd WP reliably for these parts. Further, any writes to these ranges by the VM are invalid and the behavior is undefined. Note that Linux only supports userfaultfd WP on private anonymous memory for now, which usually results in the shared zeropage getting populated. The issue will become more relevant once userfaultfd WP supports shmem and hugetlb. Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01migration/ram: Factor out populating pages readable in ↵David Hildenbrand
ram_block_populate_pages() Let's factor out prefaulting/populating to make further changes easier to review and add a comment what we are actually expecting to happen. While at it, use the actual page size of the ramblock, which defaults to qemu_real_host_page_size for anonymous memory. Further, rename ram_block_populate_pages() to ram_block_populate_read() as well, to make it clearer what we are doing. In the future, we might want to use MADV_POPULATE_READ to speed up population. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01migration: Simplify alignment and alignment checksDavid Hildenbrand
Let's use QEMU_ALIGN_DOWN() and friends to make the code a bit easier to read. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destinationDavid Hildenbrand
Currently, when someone (i.e., the VM) accesses discarded parts inside a RAMBlock with a RamDiscardManager managing the corresponding mapped memory region, postcopy will request migration of the corresponding page from the source. The source, however, will never answer, because it refuses to migrate such pages with undefined content ("logically unplugged"): the pages are never dirty, and get_queued_page() will consequently skip processing these postcopy requests. Especially reading discarded ("logically unplugged") ranges is supposed to work in some setups (for example with current virtio-mem), although it barely ever happens: still, not placing a page would currently stall the VM, as it cannot make forward progress. Let's check the state via the RamDiscardManager (the state e.g., of virtio-mem is migrated during precopy) and avoid sending a request that will never get answered. Place a fresh zero page instead to keep the VM working. This is the same behavior that would happen automatically without userfaultfd being active, when accessing virtual memory regions without populated pages -- "populate on demand". For now, there are valid cases (as documented in the virtio-mem spec) where a VM might read discarded memory; in the future, we will disallow that. Then, we might want to handle that case differently, e.g., warning the user that the VM seems to be mis-behaving. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01virtio-mem: Drop precopy notifierDavid Hildenbrand
Migration code now properly handles RAMBlocks which are indirectly managed by a RamDiscardManager. No need for manual handling via the free page optimization interface, let's get rid of it. Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01migration/ram: Handle RAMBlocks with a RamDiscardManager on the migration sourceDavid Hildenbrand
We don't want to migrate memory that corresponds to discarded ranges as managed by a RamDiscardManager responsible for the mapped memory region of the RAMBlock. The content of these pages is essentially stale and without any guarantees for the VM ("logically unplugged"). Depending on the underlying memory type, even reading memory might populate memory on the source, resulting in an undesired memory consumption. Of course, on the destination, even writing a zeropage consumes memory, which we also want to avoid (similar to free page hinting). Currently, virtio-mem tries achieving that goal (not migrating "unplugged" memory that was discarded) by going via qemu_guest_free_page_hint() - but it's hackish and incomplete. For example, background snapshots still end up reading all memory, as they don't do bitmap syncs. Postcopy recovery code will re-add previously cleared bits to the dirty bitmap and migrate them. Let's consult the RamDiscardManager after setting up our dirty bitmap initially and when postcopy recovery code reinitializes it: clear corresponding bits in the dirty bitmaps (e.g., of the RAMBlock and inside KVM). It's important to fixup the dirty bitmap *after* our initial bitmap sync, such that the corresponding dirty bits in KVM are actually cleared. As colo is incompatible with discarding of RAM and inhibits it, we don't have to bother. Note: if a misbehaving guest would use discarded ranges after migration started we would still migrate that memory: however, then we already populated that memory on the migration source. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01virtio-mem: Implement replay_discarded RamDiscardManager callbackDavid Hildenbrand
Implement it similar to the replay_populated callback. Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01memory: Introduce replay_discarded callback for RamDiscardManagerDavid Hildenbrand
Introduce replay_discarded callback similar to our existing replay_populated callback, to be used my migration code to never migrate discarded memory. Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01dump-guest-memory: Block live migrationPeter Xu
Both dump-guest-memory and live migration caches vm state at the beginning. Either of them entering the other one will cause race on the vm state, and even more severe on that (please refer to the crash report in the bug link). Let's block live migration in dump-guest-memory, and that'll also block dump-guest-memory if it detected that we're during a live migration. Side note: migrate_del_blocker() can be called even if the blocker is not inserted yet, so it's safe to unconditionally delete that blocker in dump_cleanup (g_slist_remove allows no-entry-found case). Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1996609 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01migration: Add migrate_add_blocker_internal()Peter Xu
An internal version that removes -only-migratable implications. It can be used for temporary migration blockers like dump-guest-memory. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01migration: Make migration blocker work for snapshots tooPeter Xu
save_snapshot() checks migration blocker, which looks sane. At the meantime we should also teach the blocker add helper to fail if during a snapshot, just like for migrations. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01migration/dirtyrate: implement dirty-ring dirtyrate calculationHyman Huang(黄勇)
use dirty ring feature to implement dirtyrate calculation. introduce mode option in qmp calc_dirty_rate to specify what method should be used when calculating dirtyrate, either page-sampling or dirty-ring should be passed. introduce "dirty_ring:-r" option in hmp calc_dirty_rate to indicate dirty ring method should be used for calculation. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Message-Id: <7db445109bd18125ce8ec86816d14f6ab5de6a7d.1624040308.git.huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01migration/dirtyrate: move init step of calculation to main threadHyman Huang(黄勇)
since main thread may "query dirty rate" at any time, it's better to move init step into main thead so that synchronization overhead between "main" and "get_dirtyrate" can be reduced. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Message-Id: <109f8077518ed2f13068e3bfb10e625e964780f1.1624040308.git.huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01migration/dirtyrate: adjust order of registering threadHyman Huang(黄勇)
registering get_dirtyrate thread in advance so that both page-sampling and dirty-ring mode can be covered. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Message-Id: <d7727581a8e86d4a42fc3eacf7f310419b9ebf7e.1624040308.git.huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01migration/dirtyrate: introduce struct and adjust DirtyRateStatHyman Huang(黄勇)
introduce "DirtyRateMeasureMode" to specify what method should be used to calculate dirty rate, introduce "DirtyRateVcpu" to store dirty rate for each vcpu. use union to store stat data of specific mode Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Message-Id: <661c98c40f40e163aa58334337af8f3ddf41316a.1624040308.git.huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>