aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2019-08-13spapr/xive: Fix migration of hot-plugged CPUsCédric Le Goater
The migration sequence of a guest using the XIVE exploitation mode relies on the fact that the states of all devices are restored before the machine is. This is not true for hot-plug devices such as CPUs which state come after the machine. This breaks migration because the thread interrupt context registers are not correctly set. Fix migration of hotplugged CPUs by restoring their context in the 'post_load' handler of the XiveTCTX model. Fixes: 277dd3d7712a ("spapr/xive: add migration support for KVM") Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190813064853.29310-1-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-30pcie_root_port: Allow ACS to be disabledDr. David Alan Gilbert
ACS was added in 4.0 unconditionally, this breaks migration compatibility. Allow ACS to be disabled by adding a property that's checked by pcie_root_port. Unfortunately pcie-root-port doesn't have any instance data, so there's no where for that flag to live, so stuff it into PCIESlot. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190730093719.12958-2-dgilbert@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-07-29Revert "Revert "globals: Allow global properties to be optional""Dr. David Alan Gilbert
This reverts commit 8fa70dbd8bb478d9483c1da3e9976a2d86b3f9a0. Because we're about to revert it's neighbour and thus uses an optional again. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190729162903.4489-2-dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
2019-07-26vmstate.h: Type check VMSTATE_STRUCT_VARRAY macrosPeter Maydell
The VMSTATE_STRUCT_VARRAY_UINT32 macro is intended to handle migrating a field which is an array of structs, but where instead of migrating the entire array we only migrate a variable number of elements of it. The VMSTATE_STRUCT_VARRAY_POINTER_UINT32 macro is intended to handle migrating a field which is of pointer type, and points to a dynamically allocated array of structs of variable size. We weren't actually checking that the field passed to VMSTATE_STRUCT_VARRAY_UINT32 really is an array, with the result that accidentally using it where the _POINTER_ macro was intended would compile but silently corrupt memory on migration. Add type-checking that enforces that the field passed in is really of the right array type. This applies to all the VMSTATE macros which use flags including VMS_VARRAY_* but not VMS_POINTER. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Damien Hedde <damien.hedde@greensocs.com> Tested-by: Damien Hedde <damien.hedde@greensocs.com> Message-id: 20190725163710.11703-3-peter.maydell@linaro.org
2019-07-25virtio-balloon: Use temporary PBP onlyDavid Hildenbrand
We still have multiple issues in the current code - The PBP is not freed during unrealize() - The PBP is not reset on device resets: After a reset, the PBP is stale. - We are not indicating VIRTIO_BALLOON_F_MUST_TELL_HOST, therefore guests (esp. legacy guests) will reuse pages without deflating, turning the PBP stale. Adding that would require compat handling. Instead, let's use the PBP only temporarily, when processing one bulk of inflation requests. This will keep guest_page_size > 4k working (with Linux guests). There is nothing to do for deflation requests anymore. The pbp is only used for a limited amount of time. Fixes: ed48c59875b6 ("virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size") Cc: qemu-stable@nongnu.org #v4.0.0 Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190722134108.22151-7-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-22block: Only the main loop can change AioContextsMax Reitz
bdrv_set_aio_context_ignore() can only work in the main loop: bdrv_drained_begin() only works in the main loop and the node's (old) AioContext; and bdrv_drained_end() really only works in the main loop and the node's (new) AioContext (contrary to its current comment, which is just wrong). Consequentially, bdrv_set_aio_context_ignore() must be called from the main loop. Luckily, assuming that we can make block graph changes only from the main loop as well, all its callers do that already. Note that changing a node's context in a sense is an operation that changes the block graph, so it actually makes sense to require this function to be called from the main loop. Also, fix bdrv_drained_end()'s description. You can only use it from the main loop or the node's AioContext, and in the latter case, the whole subtree must be in the same context. Fixes: e037c09c78520cbdb6da7cfc6ad0256d5870b814 Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190722133054.21781-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-22Merge remote-tracking branch ↵Peter Maydell
'remotes/pmaydell/tags/pull-target-arm-20190722' into staging target-arm queue: * target/arm: Add missing break statement for Hypervisor Trap Exception (fixes handling of SMC insn taken to AArch32 Hyp mode via HCR.TSC) * hw/arm/fsl-imx6ul.c: Remove dead SMP-related code * target/arm: Limit ID register assertions to TCG * configure: Clarify URL to source downloads * contrib/elf2dmp: Build download.o with CURL_CFLAGS # gpg: Signature made Mon 22 Jul 2019 14:13:31 BST # gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE # gpg: issuer "peter.maydell@linaro.org" # gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate] # gpg: aka "Peter Maydell <pmaydell@gmail.com>" [ultimate] # gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate] # Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE * remotes/pmaydell/tags/pull-target-arm-20190722: contrib/elf2dmp: Build download.o with CURL_CFLAGS configure: Clarify URL to source downloads target/arm: Limit ID register assertions to TCG hw/arm/fsl-imx6ul.c: Remove dead SMP-related code target/arm: Add missing break statement for Hypervisor Trap Exception Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-07-22hw/arm/fsl-imx6ul.c: Remove dead SMP-related codePeter Maydell
The i.MX6UL always has a single Cortex-A7 CPU (we set FSL_IMX6UL_NUM_CPUS to 1 in line with this). This means that all the code in fsl-imx6ul.c to handle multiple CPUs is dead code, and Coverity is now complaining that it is unreachable (CID 1403008, 1403011). Remove the unreachable code and the only-executes-once loops, and replace the single-entry cpu[] array in the FSLIMX6ULState with a simple cpu member. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20190712115030.26895-1-peter.maydell@linaro.org
2019-07-22Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into stagingPeter Maydell
Mostly bugfixes, plus a patch to mark accelerator MemoryRegions in "info mtree" that has been lingering for too long. # gpg: Signature made Fri 19 Jul 2019 22:45:46 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: target/i386: sev: fix failed message typos i386: indicate that 'pconfig' feature was removed intentionally build-sys: do no support modules on Windows qmp: don't emit the RESET event on wakeup hmp: Print if memory section is registered with an accelerator test-bitmap: add test for bitmap_set scsi-generic: Check sense key before request snooping and patching vhost-user-scsi: Call virtio_scsi_common_unrealize() when device realize failed vhost-scsi: Call virtio_scsi_common_unrealize() when device realize failed virtio-scsi: remove unused argument to virtio_scsi_common_realize target/i386: skip KVM_GET/SET_NESTED_STATE if VMX disabled, or for SVM target/i386: kvm: Demand nested migration kernel capabilities only when vCPU may have enabled VMX Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-07-19hmp: Print if memory section is registered with an acceleratorAlexey Kardashevskiy
This adds an accelerator name to the "into mtree -f" to tell the user if a particular memory section is registered with the accelerator; the primary user for this is KVM and such information is useful for debugging purposes. This adds a has_memory() callback to the accelerator class allowing any accelerator to have a label in that memory tree dump. Since memory sections are passed to memory listeners and get registered in accelerators (rather than memory regions), this only prints new labels for flatviews attached to the system address space. An example: Root memory region: system 0000000000000000-0000002fffffffff (prio 0, ram): /objects/mem0 kvm 0000003000000000-0000005fffffffff (prio 0, ram): /objects/mem1 kvm 0000200000000020-000020000000003f (prio 1, i/o): virtio-pci 0000200080000000-000020008000003f (prio 0, i/o): capabilities Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20190614015237.82463-1-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-19virtio-scsi: remove unused argument to virtio_scsi_common_realizePaolo Bonzini
The argument is not used and passing it clutters error propagation in the callers. So, get rid of it. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-19Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches: - block: Fix forbidden use of polling in drained_end - block: Don't wait for I/O throttling while exiting QEMU - iotests: Use read-zeroes for the null driver to be Valgrind-friendly # gpg: Signature made Fri 19 Jul 2019 14:30:14 BST # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: iotests: Test quitting with job on throttled node vl: Drain before (block) job cancel when quitting iotests: Test commit with a filter on the chain iotests: Add @has_quit to vm.shutdown() block: Loop unsafely in bdrv*drained_end() tests: Extend commit by drained_end test block: Do not poll in bdrv_do_drained_end() tests: Lock AioContexts in test-block-iothread block: Make bdrv_parent_drained_[^_]*() static block: Add @drained_end_counter tests: Add job commit by drained_end test block: Introduce BdrvChild.parent_quiesce_counter iotests: Set read-zeroes on in null block driver for Valgrind Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-07-19Merge remote-tracking branch 'remotes/berrange/tags/misc-next-pull-request' ↵Peter Maydell
into staging Merge misc fixes A collection of patches I have fixing crypto code and other pieces without an assigned maintainer * Fixes crypto function signatures to be compatible with both old and new versions of nettle * Fixes deprecation warnings on new nettle * Fixes GPL license header typos * Documents security implications of monitor usage * Optimize linking of capstone to avoid it in tools # gpg: Signature made Fri 19 Jul 2019 14:24:37 BST # gpg: using RSA key DAF3A6FDB26B62912D0E8E3FBE86EBB415104FDF # gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>" [full] # gpg: aka "Daniel P. Berrange <berrange@redhat.com>" [full] # Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF * remotes/berrange/tags/misc-next-pull-request: crypto: Fix LGPL information in the file headers doc: document that the monitor console is a privileged control interface configure: only link capstone to emulation targets crypto: fix function signatures for nettle 2.7 vs 3 crypto: switch to modern nettle AES APIs Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-07-19crypto: Fix LGPL information in the file headersThomas Huth
It's either "GNU *Library* General Public License version 2" or "GNU Lesser General Public License version *2.1*", but there was no "version 2.0" of the "Lesser" license. So assume that version 2.1 is meant here. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2019-07-19block: Do not poll in bdrv_do_drained_end()Max Reitz
We should never poll anywhere in bdrv_do_drained_end() (including its recursive callees like bdrv_drain_invoke()), because it does not cope well with graph changes. In fact, it has been written based on the postulation that no graph changes will happen in it. Instead, the callers that want to poll must poll, i.e. all currently globally available wrappers: bdrv_drained_end(), bdrv_subtree_drained_end(), bdrv_unapply_subtree_drain(), and bdrv_drain_all_end(). Graph changes there do not matter. They can poll simply by passing a pointer to a drained_end_counter and wait until it reaches 0. This patch also adds a non-polling global wrapper for bdrv_do_drained_end() that takes a drained_end_counter pointer. We need such a variant because now no function called anywhere from bdrv_do_drained_end() must poll. This includes BdrvChildRole.drained_end(), which already must not poll according to its interface documentation, but bdrv_child_cb_drained_end() just violates that by invoking bdrv_drained_end() (which does poll). Therefore, BdrvChildRole.drained_end() must take a *drained_end_counter parameter, which bdrv_child_cb_drained_end() can pass on to the new bdrv_drained_end_no_poll() function. Note that we now have a pattern of all drained_end-related functions either polling or receiving a *drained_end_counter to let the caller poll based on that. A problem with a single poll loop is that when the drained section in bdrv_set_aio_context_ignore() ends, some nodes in the subgraph may be in the old contexts, while others are in the new context already. To let the collective poll in bdrv_drained_end() work correctly, we must not hold a lock to the old context, so that the old context can make progress in case it is different from the current context. (In the process, remove the comment saying that the current context is always the old context, because it is wrong.) In all other places, all nodes in a subtree must be in the same context, so we can just poll that. The exception of course is bdrv_drain_all_end(), but that always runs in the main context, so we can just poll NULL (like bdrv_drain_all_begin() does). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-19block: Make bdrv_parent_drained_[^_]*() staticMax Reitz
These functions are not used outside of block/io.c, there is no reason why they should be globally available. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-19block: Introduce BdrvChild.parent_quiesce_counterMax Reitz
Commit 5cb2737e925042e6c7cd3fb0b01313950b03cddf laid out why bdrv_do_drained_end() must decrement the quiesce_counter after bdrv_drain_invoke(). It did not give a very good reason why it has to happen after bdrv_parent_drained_end(), instead only claiming symmetry to bdrv_do_drained_begin(). It turns out that delaying it for so long is wrong. Situation: We have an active commit job (i.e. a mirror job) from top to base for the following graph: filter | [file] | v top --[backing]--> base Now the VM is closed, which results in the job being cancelled and a bdrv_drain_all() happening pretty much simultaneously. Beginning the drain means the job is paused once whenever one of its nodes is quiesced. This is reversed when the drain ends. With how the code currently is, after base's drain ends (which means that it will have unpaused the job once), its quiesce_counter remains at 1 while it goes to undrain its parents (bdrv_parent_drained_end()). For some reason or another, undraining filter causes the job to be kicked and enter mirror_exit_common(), where it proceeds to invoke block_job_remove_all_bdrv(). Now base will be detached from the job. Because its quiesce_counter is still 1, it will unpause the job once more. So in total, undraining base will unpause the job twice. Eventually, this will lead to the job's pause_count going negative -- well, it would, were there not an assertion against this, which crashes qemu. The general problem is that if in bdrv_parent_drained_end() we undrain parent A, and then undrain parent B, which then leads to A detaching the child, bdrv_replace_child_noperm() will undrain A as if we had not done so yet; that is, one time too many. It follows that we cannot decrement the quiesce_counter after invoking bdrv_parent_drained_end(). Unfortunately, decrementing it before bdrv_parent_drained_end() would be wrong, too. Imagine the above situation in reverse: Undraining A leads to B detaching the child. If we had already decremented the quiesce_counter by that point, bdrv_replace_child_noperm() would undrain B one time too little; because it expects bdrv_parent_drained_end() to issue this undrain. But bdrv_parent_drained_end() won't do that, because B is no longer a parent. Therefore, we have to do something else. This patch opts for introducing a second quiesce_counter that counts how many times a child's parent has been quiesced (though c->role->drained_*). With that, bdrv_replace_child_noperm() just has to undrain the parent exactly that many times when removing a child, and it will always be right. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-19Merge remote-tracking branch 'remotes/palmer/tags/riscv-for-master-4.1-rc2' ↵Peter Maydell
into staging RISC-V Patches for 4.2-rc2 This contains a pair of patches that add OpenSBI support to QEMU on RISC-V targets. The patches have been floating around for a bit, but everything seems solid now. These pass my standard test of booting OpenEmbedded, and also works when I swap around the various command-line arguments to use the new boot method. # gpg: Signature made Fri 19 Jul 2019 00:54:27 BST # gpg: using RSA key 00CE76D1834960DFCE886DF8EF4CA1502CCBAB41 # gpg: issuer "palmer@dabbelt.com" # gpg: Good signature from "Palmer Dabbelt <palmer@dabbelt.com>" [unknown] # gpg: aka "Palmer Dabbelt <palmer@sifive.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 00CE 76D1 8349 60DF CE88 6DF8 EF4C A150 2CCB AB41 * remotes/palmer/tags/riscv-for-master-4.1-rc2: hw/riscv: Load OpenSBI as the default firmware roms: Add OpenSBI version 0.4 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-07-18hw/riscv: Load OpenSBI as the default firmwareAlistair Francis
If the user hasn't specified a firmware to load (with -bios) or specified no bios (with -bios none) then load OpenSBI by default. This allows users to boot a RISC-V kernel with just -kernel. Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Bin Meng <bmeng.cn@gmail.com> Tested-by: Bin Meng <bmeng.cn@gmail.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2019-07-18linux-user: check valid address in access_ok()Rémi Denis-Courmont
Fix a crash with LTP testsuite and aarch64: tst_test.c:1015: INFO: Timeout per run is 0h 05m 00s qemu-aarch64: .../qemu/accel/tcg/translate-all.c:2522: page_check_range: Assertion `start < ((target_ulong)1 << L1_MAP_ADDR_SPACE_BITS)' failed. qemu:handle_cpu_signal received signal outside vCPU context @ pc=0x60001554 page_check_range() should never be called with address outside the guest address space. This patch adds a guest_addr_valid() check in access_ok() to only call page_check_range() with a valid address. Fixes: f6768aa1b4c6 ("target/arm: fix AArch64 virtual address space size") Signed-off-by: Rémi Denis-Courmont <remi@remlab.net> Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20190704084115.24713-1-lvivier@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-07-16Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into stagingPeter Maydell
* VFIO bugfix for AMD SEV (Alex) * Kconfig improvements (Julio, Philippe) * MemoryRegion reference counting bugfix (King Wang) * Build system cleanups (Marc-André, myself) * rdmacm-mux off-by-one (Marc-André) * ZBC passthrough fixes (Shinichiro, myself) * WHPX build fix (Stefan) * char-pty fix (Wei Yang) # gpg: Signature made Tue 16 Jul 2019 08:31:27 BST # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: vl: make sure char-pty message displayed by moving setbuf to the beginning create_config: remove $(CONFIG_SOFTMMU) hack Makefile: do not repeat $(CONFIG_SOFTMMU) in hw/Makefile.objs hw/usb/Kconfig: USB_XHCI_NEC requires USB_XHCI hw/usb/Kconfig: Add CONFIG_USB_EHCI_PCI target/i386: sev: Do not unpin ram device memory region checkpatch: detect doubly-encoded UTF-8 hw/lm32/Kconfig: Milkymist One provides a USB 1.1 Controller util: merge main-loop.c and iohandler.c Fix broken build with WHPX enabled memory: unref the memory region in simplify flatview hw/i386: turn off vmport if CONFIG_VMPORT is disabled rdmacm-mux: fix strcpy string warning build-sys: remove slirp cflags from main-loop.o iscsi: base all handling of check condition on scsi_sense_to_errno iscsi: fix busy/timeout/task set full scsi: add guest-recoverable ZBC errors scsi: explicitly list guest-recoverable sense codes scsi-disk: pass sense correctly for guest-recoverable errors Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-07-15Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2019-07-15' ↵Peter Maydell
into staging Block patches for 4.1-rc1: - Fixes for the NVMe block driver, the gluster block driver, and for running multiple block jobs concurrently on a single chain # gpg: Signature made Mon 15 Jul 2019 14:51:43 BST # gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40 # gpg: issuer "mreitz@redhat.com" # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full] # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * remotes/maxreitz/tags/pull-block-2019-07-15: gluster: fix .bdrv_reopen_prepare when backing file is a JSON object iotests: Add read-only test case to 030 iotests: Add new case to 030 iotests: Add @use_log to VM.run_job() iotests: Compare error messages in 030 iotests: Fix throttling in 030 block: Deep-clear inherits_from block/stream: Swap backing file change order block/stream: Fix error path block: Add BDS.never_freeze nvme: Set number of queues later in nvme_init() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-07-15Merge remote-tracking branch ↵Peter Maydell
'remotes/juanquintela/tags/migration-pull-request' into staging Pull request # gpg: Signature made Mon 15 Jul 2019 14:49:41 BST # gpg: using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [full] # gpg: aka "Juan Quintela <quintela@trasno.org>" [full] # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration-pull-request: (21 commits) migration: always initial RAMBlock.bmap to 1 for new migration migration/postcopy: remove redundant cpu_synchronize_all_post_init migration/postcopy: fix document of postcopy_send_discard_bm_ram() migration: allow private destination ram with x-ignore-shared migration: Split log_clear() into smaller chunks kvm: Support KVM_CLEAR_DIRTY_LOG kvm: Introduce slots lock for memory listener kvm: Persistent per kvmslot dirty bitmap kvm: Update comments for sync_dirty_bitmap memory: Introduce memory listener hook log_clear() memory: Pass mr into snapshot_and_clear_dirty bitmap: Add bitmap_copy_with_{src|dst}_offset() memory: Don't set migration bitmap when without migration migration: No need to take rcu during sync_dirty_bitmap migration/ram.c: reset complete_round when we gets a queued page migration/multifd: sync packet_num after all thread are done cutils: remove one unnecessary pointer operation migration/xbzrle: update cache and current_data in one place migration/multifd: call multifd_send_sync_main when sending RAM_SAVE_FLAG_EOS migration-test: rename parameter to parameter_int ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-07-15block: Add BDS.never_freezeMax Reitz
The commit and the mirror block job must be able to drop their filter node at any point. However, this will not be possible if any of the BdrvChild links to them is frozen. Therefore, we need to prevent them from ever becoming frozen. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190703172813.6868-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-15migration: Split log_clear() into smaller chunksPeter Xu
Currently we are doing log_clear() right after log_sync() which mostly keeps the old behavior when log_clear() was still part of log_sync(). This patch tries to further optimize the migration log_clear() code path to split huge log_clear()s into smaller chunks. We do this by spliting the whole guest memory region into memory chunks, whose size is decided by MigrationState.clear_bitmap_shift (an example will be given below). With that, we don't do the dirty bitmap clear operation on the remote node (e.g., KVM) when we fetch the dirty bitmap, instead we explicitly clear the dirty bitmap for the memory chunk for each of the first time we send a page in that chunk. Here comes an example. Assuming the guest has 64G memory, then before this patch the KVM ioctl KVM_CLEAR_DIRTY_LOG will be a single one covering 64G memory. If after the patch, let's assume when the clear bitmap shift is 18, then the memory chunk size on x86_64 will be 1UL<<18 * 4K = 1GB. Then instead of sending a big 64G ioctl, we'll send 64 small ioctls, each of the ioctl will cover 1G of the guest memory. For each of the 64 small ioctls, we'll only send if any of the page in that small chunk was going to be sent right away. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190603065056.25211-12-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-07-15kvm: Introduce slots lock for memory listenerPeter Xu
Introduce KVMMemoryListener.slots_lock to protect the slots inside the kvm memory listener. Currently it is close to useless because all the KVM code path now is always protected by the BQL. But it'll start to make sense in follow up patches where we might do remote dirty bitmap clear and also we'll update the per-slot cached dirty bitmap even without the BQL. So let's prepare for it. We can also use per-slot lock for above reason but it seems to be an overkill. Let's just use this bigger one (which covers all the slots of a single address space) but anyway this lock is still much smaller than the BQL. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190603065056.25211-10-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-07-15kvm: Persistent per kvmslot dirty bitmapPeter Xu
When synchronizing dirty bitmap from kernel KVM we do it in a per-kvmslot fashion and we allocate the userspace bitmap for each of the ioctl. This patch instead make the bitmap cache be persistent then we don't need to g_malloc0() every time. More importantly, the cached per-kvmslot dirty bitmap will be further used when we want to add support for the KVM_CLEAR_DIRTY_LOG and this cached bitmap will be used to guarantee we won't clear any unknown dirty bits otherwise that can be a severe data loss issue for migration code. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190603065056.25211-9-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-07-15memory: Introduce memory listener hook log_clear()Peter Xu
Introduce a new memory region listener hook log_clear() to allow the listeners to hook onto the points where the dirty bitmap is cleared by the bitmap users. Previously log_sync() contains two operations: - dirty bitmap collection, and, - dirty bitmap clear on remote site. Let's take KVM as example - log_sync() for KVM will first copy the kernel dirty bitmap to userspace, and at the same time we'll clear the dirty bitmap there along with re-protecting all the guest pages again. We add this new log_clear() interface only to split the old log_sync() into two separated procedures: - use log_sync() to collect the collection only, and, - use log_clear() to clear the remote dirty bitmap. With the new interface, the memory listener users will still be able to decide how to implement the log synchronization procedure, e.g., they can still only provide log_sync() method only and put all the two procedures within log_sync() (that's how the old KVM works before KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is introduced). However with this new interface the memory listener users will start to have a chance to postpone the log clear operation explicitly if the module supports. That can really benefit users like KVM at least for host kernels that support KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2. There are three places that can clear dirty bits in any one of the dirty bitmap in the ram_list.dirty_memory[3] array: cpu_physical_memory_snapshot_and_clear_dirty cpu_physical_memory_test_and_clear_dirty cpu_physical_memory_sync_dirty_bitmap Currently we hook directly into each of the functions to notify about the log_clear(). Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190603065056.25211-7-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-07-15memory: Pass mr into snapshot_and_clear_dirtyPeter Xu
Also we change the 2nd parameter of it to be the relative offset within the memory region. This is to be used in follow up patches. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20190603065056.25211-6-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-07-15bitmap: Add bitmap_copy_with_{src|dst}_offset()Peter Xu
These helpers copy the source bitmap to destination bitmap with a shift either on the src or dst bitmap. Meanwhile, we never have bitmap tests but we should. This patch also introduces the initial test cases for utils/bitmap.c but it only tests the newly introduced functions. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20190603065056.25211-5-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> --- Bitmap test used sizeof(unsigned long) instead of BITS_PER_LONG.
2019-07-15memory: Don't set migration bitmap when without migrationPeter Xu
Similar to 9460dee4b2 ("memory: do not touch code dirty bitmap unless TCG is enabled", 2015-06-05) but for the migration bitmap - we can skip the MIGRATION bitmap update if migration not enabled. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190603065056.25211-4-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-07-15migration: No need to take rcu during sync_dirty_bitmapPeter Xu
cpu_physical_memory_sync_dirty_bitmap() has one RAMBlock* as parameter, which means that it must be with RCU read lock held already. Taking it again inside seems redundant. Removing it. Instead comment on the functions about the RCU read lock. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190603065056.25211-2-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-07-15pl031: Correctly migrate state when using -rtc clock=hostPeter Maydell
The PL031 RTC tracks the difference between the guest RTC and the host RTC using a tick_offset field. For migration, however, we currently always migrate the offset between the guest and the vm_clock, even if the RTC clock is not the same as the vm_clock; this was an attempt to retain migration backwards compatibility. Unfortunately this results in the RTC behaving oddly across a VM state save and restore -- since the VM clock stands still across save-then-restore, regardless of how much real world time has elapsed, the guest RTC ends up out of sync with the host RTC in the restored VM. Fix this by migrating the raw tick_offset. To retain migration compatibility as far as possible, we have a new property migrate-tick-offset; by default this is 'true' and we will migrate the true tick offset in a new subsection; if the incoming data has no subsection we fall back to the old vm_clock-based offset information, so old->new migration compatibility is preserved. For complete new->old migration compatibility, the property is set to 'false' for 4.0 and earlier machine types (this will only affect 'virt-4.0' and below, as none of the other pl031-using machines are versioned). Reported-by: Russell King <rmk@armlinux.org.uk> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-id: 20190709143912.28905-1-peter.maydell@linaro.org
2019-07-15scsi: explicitly list guest-recoverable sense codesPaolo Bonzini
It's not really possible to fit all sense codes into errno codes, especially in such a way that sense codes can be properly categorized as either guest-recoverable or host-handled. Create a new function that checks for guest recoverable sense, then scsi_sense_buf_to_errno only needs to be called for host handled sense codes. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-15Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20190714' into stagingPeter Maydell
Fixes for 3 tcg bugs # gpg: Signature made Sun 14 Jul 2019 12:11:01 BST # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "richard.henderson@linaro.org" # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth/tags/pull-tcg-20190714: tcg: Release mmap_lock on translation fault tcg: Remove duplicate #if !defined(CODE_ACCESS) tcg: Remove cpu_ld*_code_ra tcg: Introduce set/clear_helper_retaddr include/qemu/atomic.h: Add signal_barrier tcg/aarch64: Fix output of extract2 opcodes tcg: Fix constant folding of INDEX_op_extract2_i32 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-07-14tcg: Release mmap_lock on translation faultRichard Henderson
Turn helper_retaddr into a multi-state flag that may now also indicate when we're performing a read on behalf of the translator. In this case, release the mmap_lock before the longjmp back to the main cpu loop, and thereby avoid a failing assert therein. Fixes: https://bugs.launchpad.net/qemu/+bug/1832353 Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-07-14tcg: Remove duplicate #if !defined(CODE_ACCESS)Richard Henderson
This code block is already surrounded by #ifndef CODE_ACCESS. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-07-14tcg: Remove cpu_ld*_code_raRichard Henderson
These functions are not used, and are not usable in the context of code generation, because we never have a helper return address to pass in to them. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-07-14tcg: Introduce set/clear_helper_retaddrRichard Henderson
At present we have a potential error in that helper_retaddr contains data for handle_cpu_signal, but we have not ensured that those stores will be scheduled properly before the operation that may fault. It might be that these races are not in practice observable, due to our use of -fno-strict-aliasing, but better safe than sorry. Adjust all of the setters of helper_retaddr. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-07-14include/qemu/atomic.h: Add signal_barrierRichard Henderson
We have some potential race conditions vs our user-exec signal handler that will be solved with this barrier. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-07-12virtio-balloon: fix QEMU 4.0 config size migration incompatibilityStefan Hajnoczi
The virtio-balloon config size changed in QEMU 4.0 even for existing machine types. Migration from QEMU 3.1 to 4.0 can fail in some circumstances with the following error: qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0 This happens because the virtio-balloon config size affects the VIRTIO Legacy I/O Memory PCI BAR size. Introduce a qdev property called "qemu-4-0-config-size" and enable it only for the QEMU 4.0 machine types. This way <4.0 machine types use the old size, 4.0 uses the larger size, and >4.0 machine types use the appropriate size depending on enabled virtio-balloon features. Live migration to and from old QEMUs to QEMU 4.1 works again as long as a versioned machine type is specified (do not use just "pc"!). Originally-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20190710141440.27635-1-stefanha@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-07-12pcie: consistent names for function argsMichael S. Tsirkin
The function declarations for pci_cap_slot_get and pci_cap_slot_write_config call the argument "slot_ctl", but the function definitions and all the call sites drop the 'o' and call it "slt_ctl". Let's be consistent. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
2019-07-08qdev: add qdev_add_vm_change_state_handler()Stefan Hajnoczi
Children sometimes depend on their parent's vm change state handler having completed. Add a vm change state handler API for devices that guarantees tree depth ordering. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-08vl: add qemu_add_vm_change_state_handler_prio()Stefan Hajnoczi
Add an API for registering vm change state handlers with a well-defined ordering. This is necessary when handlers depend on each other. Small coding style fixes are included to make checkpatch.pl happy. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-07-05i386: Make unversioned CPU models be aliasesEduardo Habkost
This will make unversioned CPU models behavior depend on the machine type: * "pc-*-4.0" and older will not report them as aliases. This is done to keep compatibility with older QEMU versions after management software starts translating aliases. * "pc-*-4.1" will translate unversioned CPU models to -v1. This is done to keep compatibility with existing management software, that still relies on CPU model runnability promises. * "none" will translate unversioned CPU models to their latest version. This is planned become the default in future machine types (probably in pc-*-4.3). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190628002844.24894-8-ehabkost@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-07-05machine: Refactor smp_parse() in vl.c as MachineClass::smp_parse()Like Xu
To make smp_parse() more flexible and expansive, a smp_parse function pointer is added to MachineClass that machine types could override. The generic smp_parse() code in vl.c is moved to hw/core/machine.c, and become the default implementation of MachineClass::smp_parse. A PC-specific function called pc_smp_parse() has been added to hw/i386/pc.c, which in this patch changes nothing against the default one . Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190620054525.37188-3-like.xu@linux.intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-07-05machine: show if CLI option '-numa node,mem' is supported in QAPI schemaIgor Mammedov
Legacy '-numa node,mem' option has a number of issues and mgmt often defaults to it. Unfortunately it's no possible to replace it with an alternative '-numa memdev' without breaking migration compatibility. What's possible though is to deprecate it, keeping option working with old machine types only. In order to help users to find out if being deprecated CLI option '-numa node,mem' is still supported by particular machine type, add new "numa-mem-supported" property to output of query-machines. "numa-mem-supported" is set to 'true' for machines that currently support NUMA, but it will be flipped to 'false' later on, once deprecation period expires and kept 'true' only for old machine types that used to support the legacy option so it won't break existing configuration that are using it. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1560172207-378962-1-git-send-email-imammedo@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-07-05i386: Update new x86_apicid parsing rules with die_offset supportLike Xu
In new sockets/dies/cores/threads model, the apicid of logical cpu could imply die level info of guest cpu topology thus x86_apicid_from_cpu_idx() need to be refactored with #dies value, so does apicid_*_offset(). To keep semantic compatibility, the legacy pkg_offset which helps to generate CPUIDs such as 0x3 for L3 cache should be mapping to die_offset. Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20190612084104.34984-5-like.xu@linux.intel.com> [ehabkost: squash unit test patch] Message-Id: <20190612084104.34984-6-like.xu@linux.intel.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-07-05i386/cpu: Consolidate die-id validity in smp contextLike Xu
The field die_id (default as 0) and has_die_id are introduced to X86CPU. Following the legacy smp check rules, the die_id validity is added to the same contexts as leagcy smp variables such as hmp_hotpluggable_cpus(), machine_set_cpu_numa_node(), cpu_slot_to_string() and pc_cpu_pre_plug(). Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20190612084104.34984-4-like.xu@linux.intel.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-07-05i386: Add die-level cpu topology to x86CPU on PCMachineLike Xu
The die-level as the first PC-specific cpu topology is added to the leagcy cpu topology model, which has one die per package implicitly and only the numbers of sockets/cores/threads are configurable. In the new model with die-level support, the total number of logical processors (including offline) on board will be calculated as: #cpus = #sockets * #dies * #cores * #threads and considering compatibility, the default value for #dies would be initialized to one in x86_cpu_initfn() and pc_machine_initfn(). Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20190612084104.34984-2-like.xu@linux.intel.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>