aboutsummaryrefslogtreecommitdiff
path: root/migration
AgeCommit message (Collapse)Author
2023-10-21migration/qmp: Fix crash on setting tls-authz with nullPeter Xu
QEMU will crash if anyone tries to set tls-authz (which is a type StrOrNull) with 'null' value. Fix it in the easy way by converting it to qstring just like the other two tls parameters. Cc: qemu-stable@nongnu.org # v4.0+ Fixes: d2f1d29b95 ("migration: add support for a "tls-authz" migration parameter") Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230905162335.235619-2-peterx@redhat.com> (cherry picked from commit 86dec715a7339fc61c3bdb9715993b277b2089db) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: the function is in m/migration.c, not m/option.c; minor context tweak)
2023-09-11block-migration: Ensure we don't crash during migration cleanupFabiano Rosas
We can fail the blk_insert_bs() at init_blk_migration(), leaving the BlkMigDevState without a dirty_bitmap and BlockDriverState. Account for the possibly missing elements when doing cleanup. Fix the following crashes: Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x0000555555ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at ../block/dirty-bitmap.c:359 359 BlockDriverState *bs = bitmap->bs; #0 0x0000555555ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at ../block/dirty-bitmap.c:359 #1 0x0000555555bba331 in unset_dirty_tracking () at ../migration/block.c:371 #2 0x0000555555bbad98 in block_migration_cleanup_bmds () at ../migration/block.c:681 Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x0000555555e971ff in bdrv_op_unblock (bs=0x0, op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073 7073 QLIST_FOREACH_SAFE(blocker, &bs->op_blockers[op], list, next) { #0 0x0000555555e971ff in bdrv_op_unblock (bs=0x0, op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073 #1 0x0000555555e9734a in bdrv_op_unblock_all (bs=0x0, reason=0x0) at ../block.c:7095 #2 0x0000555555bbae13 in block_migration_cleanup_bmds () at ../migration/block.c:690 Signed-off-by: Fabiano Rosas <farosas@suse.de> Message-id: 20230731203338.27581-1-farosas@suse.de Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> (cherry picked from commit f187609f27b261702a17f79d20bf252ee0d4f9cd) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-18migration: Attempt disk reactivation in more failure scenariosEric Blake
Commit fe904ea824 added a fail_inactivate label, which tries to reactivate disks on the source after a failure while s->state == MIGRATION_STATUS_ACTIVE, but didn't actually use the label if qemu_savevm_state_complete_precopy() failed. This failure to reactivate is also present in commit 6039dd5b1c (also covering the new s->state == MIGRATION_STATUS_DEVICE state) and 403d18ae (ensuring s->block_inactive is set more reliably). Consolidate the two labels back into one - no matter HOW migration is failed, if there is any chance we can reach vm_start() after having attempted inactivation, it is essential that we have tried to restart disks before then. This also makes the cleanup more like migrate_fd_cancel(). Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20230502205212.134680-1-eblake@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> (cherry picked from commit 6dab4c93ecfae48e2e67b984d1032c1e988d3005) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: minor context tweak near added comment in migration/migration.c)
2023-05-18migration: Minor control flow simplificationEric Blake
No need to declare a temporary variable. Suggested-by: Juan Quintela <quintela@redhat.com> Fixes: 1df36e8c6289 ("migration: Handle block device inactivation failures better") Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> (cherry picked from commit 5d39f44d7ac5c63f53d4d0900ceba9521bc27e49) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-05-18migration: Handle block device inactivation failures betterEric Blake
Consider what happens when performing a migration between two host machines connected to an NFS server serving multiple block devices to the guest, when the NFS server becomes unavailable. The migration attempts to inactivate all block devices on the source (a necessary step before the destination can take over); but if the NFS server is non-responsive, the attempt to inactivate can itself fail. When that happens, the destination fails to get the migrated guest (good, because the source wasn't able to flush everything properly): (qemu) qemu-kvm: load of migration failed: Input/output error at which point, our only hope for the guest is for the source to take back control. With the current code base, the host outputs a message, but then appears to resume: (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1) (src qemu)info status VM status: running but a second migration attempt now asserts: (src qemu) qemu-kvm: ../block.c:6738: int bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed. Whether the guest is recoverable on the source after the first failure is debatable, but what we do not want is to have qemu itself fail due to an assertion. It looks like the problem is as follows: In migration.c:migration_completion(), the source sets 'inactivate' to true (since COLO is not enabled), then tries savevm.c:qemu_savevm_state_complete_precopy() with a request to inactivate block devices. In turn, this calls block.c:bdrv_inactivate_all(), which fails when flushing runs up against the non-responsive NFS server. With savevm failing, we are now left in a state where some, but not all, of the block devices have been inactivated; but migration_completion() then jumps to 'fail' rather than 'fail_invalidate' and skips an attempt to reclaim those those disks by calling bdrv_activate_all(). Even if we do attempt to reclaim disks, we aren't taking note of failure there, either. Thus, we have reached a state where the migration engine has forgotten all state about whether a block device is inactive, because we did not set s->block_inactive in enough places; so migration allows the source to reach vm_start() and resume execution, violating the block layer invariant that the guest CPUs should not be restarted while a device is inactive. Note that the code in migration.c:migrate_fd_cancel() will also try to reactivate all block devices if s->block_inactive was set, but because we failed to set that flag after the first failure, the source assumes it has reclaimed all devices, even though it still has remaining inactivated devices and does not try again. Normally, qmp_cont() will also try to reactivate all disks (or correctly fail if the disks are not reclaimable because NFS is not yet back up), but the auto-resumption of the source after a migration failure does not go through qmp_cont(). And because we have left the block layer in an inconsistent state with devices still inactivated, the later migration attempt is hitting the assertion failure. Since it is important to not resume the source with inactive disks, this patch marks s->block_inactive before attempting inactivation, rather than after succeeding, in order to prevent any vm_start() until it has successfully reactivated all devices. See also https://bugzilla.redhat.com/show_bug.cgi?id=2058982 Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Acked-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Juan Quintela <quintela@redhat.com> (cherry picked from commit 403d18ae384239876764bbfa111d6cc5dcb673d1) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-03-29migration/ram: Fix populate_read_range()David Hildenbrand
Unfortunately, commit f7b9dcfbcf44 broke populate_read_range(): the loop end condition is very wrong, resulting in that function not populating the full range. Lets' fix that. Fixes: f7b9dcfbcf44 ("migration/ram: Factor out populating pages readable in ram_block_populate_pages()") Cc: qemu-stable@nongnu.org Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> (cherry picked from commit 5f19a4491941fdc5c5b50ce4ade6ffffe0f591b4) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-03-29migration/ram: Fix error handling in ram_write_tracking_start()David Hildenbrand
If something goes wrong during uffd_change_protection(), we would miss to unregister uffd-wp and not release our reference. Fix it by performing the uffd_change_protection(true) last. Note that a uffd_change_protection(false) on the recovery path without a prior uffd_change_protection(false) is fine. Fixes: 278e2f551a09 ("migration: support UFFD write fault processing in ram_save_iterate()") Cc: qemu-stable@nongnu.org Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> (cherry picked from commit 72ef3a370836aa07261ad7aaeea27ed5cbcee342) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2022-11-21migration: Block migration comment or code is wrongJuan Quintela
And it appears that what is wrong is the code. During bulk stage we need to make sure that some block is dirty, but no games with max_size at all. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-11-21migration: Disable multifd explicitly with compressionPeter Xu
Multifd thread model does not work for compression, explicitly disable it. Note that previuosly even we can enable both of them, nothing will go wrong, because the compression code has higher priority so multifd feature will just be ignored. Now we'll fail even earlier at config time so the user should be aware of the consequence better. Note that there can be a slight chance of breaking existing users, but let's assume they're not majority and not serious users, or they should have found that multifd is not working already. With that, we can safely drop the check in ram_save_target_page() for using multifd, because when multifd=on then compression=off, then the removed check on save_page_use_compression() will also always return false too. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-11-21migration: Disallow postcopy preempt to be used with compressPeter Xu
The preempt mode requires the capability to assign channel for each of the page, while the compression logic will currently assign pages to different compress thread/local-channel so potentially they're incompatible. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-11-21migration: Fix race on qemu_file_shutdown()Peter Xu
In qemu_file_shutdown(), there's a possible race if with current order of operation. There're two major things to do: (1) Do real shutdown() (e.g. shutdown() syscall on socket) (2) Update qemufile's last_error We must do (2) before (1) otherwise there can be a race condition like: page receiver other thread ------------- ------------ qemu_get_buffer() do shutdown() returns 0 (buffer all zero) (meanwhile we didn't check this retcode) try to detect IO error last_error==NULL, IO okay install ALL-ZERO page set last_error --> guest crash! To fix this, we can also check retval of qemu_get_buffer(), but not all APIs can be properly checked and ultimately we still need to go back to qemu_file_get_error(). E.g. qemu_get_byte() doesn't return error. Maybe some day a rework of qemufile API is really needed, but for now keep using qemu_file_get_error() and fix it by not allowing that race condition to happen. Here shutdown() is indeed special because the last_error was emulated. For real -EIO errors it'll always be set when e.g. sendmsg() error triggers so we won't miss those ones, only shutdown() is a bit tricky here. Cc: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-11-21migration: Fix possible infinite loop of ram save processPeter Xu
When starting ram saving procedure (especially at the completion phase), always set last_seen_block to non-NULL to make sure we can always correctly detect the case where "we've migrated all the dirty pages". Then we'll guarantee both last_seen_block and pss.block will be valid always before the loop starts. See the comment in the code for some details. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-11-21migration/multifd/zero-copy: Create helper function for flushingLeonardo Bras
Move flushing code from multifd_send_sync_main() to a new helper, and call it in multifd_send_sync_main(). Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-11-21migration/channel-block: fix return value for qio_channel_block_{readv,writev}Fiona Ebner
in the error case. The documentation in include/io/channel.h states that -1 or QIO_CHANNEL_ERR_BLOCK should be returned upon error. Simply passing along the return value from the bdrv-functions has the potential to confuse the call sides. Non-blocking mode is not implemented currently, so -1 it is. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-10-27reset: allow registering handlers that aren't called by snapshot loadingJason A. Donenfeld
Snapshot loading only expects to call deterministic handlers, not non-deterministic ones. So introduce a way of registering handlers that won't be called when reseting for snapshots. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Message-id: 20221025004327.568476-2-Jason@zx2c4.com [PMM: updated json doc comment with Markus' text; fixed checkpatch style nit] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-07migration: add missing coroutine_fn annotationsMarc-André Lureau
Callers of coroutine_fn must be coroutine_fn themselves, or the call must be within "if (qemu_in_coroutine())". Apply coroutine_fn to functions where this holds. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Alberto Faria <afaria@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220922084924.201610-26-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-10-04Use g_new() & friends where that makes obvious senseMarkus Armbruster
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void *, which lets the compiler catch more type errors. This commit only touches allocations with size arguments of the form sizeof(T). Patch created mechanically with: $ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \ --macro-file scripts/cocci-macro-file.h FILES... The previous iteration was commit a95942b50c. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20220923084254.4173111-1-armbru@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-08-02migration: Define BLK_MIG_BLOCK_SIZE as unsigned long longPeter Maydell
When we use BLK_MIG_BLOCK_SIZE in expressions like block_mig_state.submitted * BLK_MIG_BLOCK_SIZE, this multiplication is done as 32 bits, because both operands are 32 bits. Coverity complains about possible overflows because we then accumulate that into a 64 bit variable. Define BLK_MIG_BLOCK_SIZE as unsigned long long using the ULL suffix. The only two current uses of it with this problem are both in block_save_pending(), so we could just cast to uint64_t there, but using the ULL suffix is simpler and ensures that we don't accidentally introduce new variants of the same issue in future. Resolves: Coverity CID 1487136, 1487175 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220721115207.729615-3-peter.maydell@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-08-02migration: Assert that migrate_multifd_compression() returns an in-range valuePeter Maydell
Coverity complains that when we use the return value from migrate_multifd_compression() as an array index: multifd_recv_state->ops = multifd_ops[migrate_multifd_compression()]; that this might overrun the array (which is declared to have size MULTIFD_COMPRESSION__MAX). This is because the function return type is MultiFDCompression, which is an autogenerated enum. The code generator includes the "one greater than the maximum possible value" MULTIFD_COMPRESSION__MAX in the enum, even though this is not actually a valid value for the enum, and this makes Coverity think that migrate_multifd_compression() could return that __MAX value and index off the end of the array. Suppress the Coverity error by asserting that the value we're going to return is within range. Resolves: Coverity CID 1487239, 1487254 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220721115207.729615-2-peter.maydell@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-08-02Revert "migration: Simplify unqueue_page()"Thomas Huth
This reverts commit cfd66f30fb0f735df06ff4220e5000290a43dad3. The simplification of unqueue_page() introduced a bug that sometimes breaks migration on s390x hosts. The problem is not fully understood yet, but since we are already in the freeze for QEMU 7.1 and we need something working there, let's revert this patch for the upcoming release. The optimization can be redone later again in a proper way if necessary. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2099934 Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20220802061949.331576-1-thuth@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-08-02migration: add remaining params->has_* = true in migration_instance_init()Leonardo Bras
Some of params->has_* = true are missing in migration_instance_init, this causes migrate_params_check() to skip some tests, allowing some unsupported scenarios. Fix this by adding all missing params->has_* = true in migration_instance_init(). Fixes: 69ef1f36b0 ("migration: define 'tls-creds' and 'tls-hostname' migration parameters") Fixes: 1d58872a91 ("migration: do not wait for free thread") Fixes: d2f1d29b95 ("migration: add support for a "tls-authz" migration parameter") Signed-off-by: Leonardo Bras <leobras@redhat.com> Message-Id: <20220726010235.342927-1-leobras@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Avoid false-positive on non-supported scenarios for zero-copy-sendLeonardo Bras
Migration with zero-copy-send currently has it's limitations, as it can't be used with TLS nor any kind of compression. In such scenarios, it should output errors during parameter / capability setting. But currently there are some ways of setting this not-supported scenarios without printing the error message: !) For 'compression' capability, it works by enabling it together with zero-copy-send. This happens because the validity test for zero-copy uses the helper unction migrate_use_compression(), which check for compression presence in s->enabled_capabilities[MIGRATION_CAPABILITY_COMPRESS]. The point here is: the validity test happens before the capability gets enabled. If all of them get enabled together, this test will not return error. In order to fix that, replace migrate_use_compression() by directly testing the cap_list parameter migrate_caps_check(). 2) For features enabled by parameters such as TLS & 'multifd_compression', there was also a possibility of setting non-supported scenarios: setting zero-copy-send first, then setting the unsupported parameter. In order to fix that, also add a check for parameters conflicting with zero-copy-send on migrate_params_check(). 3) XBZRLE is also a compression capability, so it makes sense to also add it to the list of capabilities which are not supported with zero-copy-send. Fixes: 1abaec9a1b2c ("migration: Change zero_copy_send from migration parameter to migration capability") Signed-off-by: Leonardo Bras <leobras@redhat.com> Message-Id: <20220719122345.253713-1-leobras@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20multifd: Document the locking of MultiFD{Send/Recv}ParamsJuan Quintela
Reorder the structures so we can know if the fields are: - Read only - Their own locking (i.e. sems) - Protected by 'mutex' - Only for the multifd channel Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <20220531104318.7494-2-quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Typo fixes from Chen Zhang
2022-07-20migration/multifd: Report to user when zerocopy not workingLeonardo Bras
Some errors, like the lack of Scatter-Gather support by the network interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using zero-copy, which causes it to fall back to the default copying mechanism. After each full dirty-bitmap scan there should be a zero-copy flush happening, which checks for errors each of the previous calls to sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then increment dirty_sync_missed_zero_copy migration stat to let the user know about it. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Message-Id: <20220711211112.18951-4-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20Add dirty-sync-missed-zero-copy migration statLeonardo Bras
Signed-off-by: Leonardo Bras <leobras@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220711211112.18951-3-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: remove unreachable code after reading dataDaniel P. Berrangé
The code calls qio_channel_read() in a loop when it reports QIO_CHANNEL_ERR_BLOCK. This code is reported when errno==EAGAIN. As such the later block of code will always hit the 'errno != EAGAIN' condition, making the final 'else' unreachable. Fixes: Coverity CID 1490203 Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220627135318.156121-1-berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Respect postcopy request order in preemption modePeter Xu
With preemption mode on, when we see a postcopy request that was requesting for exactly the page that we have preempted before (so we've partially sent the page already via PRECOPY channel and it got preempted by another postcopy request), currently we drop the request so that after all the other postcopy requests are serviced then we'll go back to precopy stream and start to handle that. We dropped the request because we can't send it via postcopy channel since the precopy channel already contains partial of the data, and we can only send a huge page via one channel as a whole. We can't split a huge page into two channels. That's a very corner case and that works, but there's a change on the order of postcopy requests that we handle since we're postponing this (unlucky) postcopy request to be later than the other queued postcopy requests. The problem is there's a possibility that when the guest was very busy, the postcopy queue can be always non-empty, it means this dropped request will never be handled until the end of postcopy migration. So, there's a chance that there's one dest QEMU vcpu thread waiting for a page fault for an extremely long time just because it's unluckily accessing the specific page that was preempted before. The worst case time it needs can be as long as the whole postcopy migration procedure. It's extremely unlikely to happen, but when it happens it's not good. The root cause of this problem is because we treat pss->postcopy_requested variable as with two meanings bound together, as the variable shows: 1. Whether this page request is urgent, and, 2. Which channel we should use for this page request. With the old code, when we set postcopy_requested it means either both (1) and (2) are true, or both (1) and (2) are false. We can never have (1) and (2) to have different values. However it doesn't necessarily need to be like that. It's very legal that there's one request that has (1) very high urgency, but (2) we'd like to use the precopy channel. Just like the corner case we were discussing above. To differenciate the two meanings better, introduce a new field called postcopy_target_channel, showing which channel we should use for this page request, so as to cover the old meaning (2) only. Then we leave the postcopy_requested variable to stand only for meaning (1), which is the urgency of this page request. With this change, we can easily boost priority of a preempted precopy page as long as we know that page is also requested as a postcopy page. So with the new approach in get_queued_page() instead of dropping that request, we send it right away with the precopy channel so we get back the ordering of the page faults just like how they're requested on dest. Reported-by: Manish Mishra <manish.mishra@nutanix.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Manish Mishra <manish.mishra@nutanix.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185520.27583-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Enable TLS for preempt channelPeter Xu
This patch is based on the async preempt channel creation. It continues wiring up the new channel with TLS handshake to destionation when enabled. Note that only the src QEMU needs such operation; the dest QEMU does not need any change for TLS support due to the fact that all channels are established synchronously there, so all the TLS magic is already properly handled by migration_tls_channel_process_incoming(). Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185518.27529-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Export tls-[creds|hostname|authz] params to cmdline tooPeter Xu
It's useful for specifying tls credentials all in the cmdline (along with the -object tls-creds-*), especially for debugging purpose. The trick here is we must remember to not free these fields again in the finalize() function of migration object, otherwise it'll cause double-free. The thing is when destroying an object, we'll first destroy the properties that bound to the object, then the object itself. To be explicit, when destroy the object in object_finalize() we have such sequence of operations: object_property_del_all(obj); object_deinit(obj, ti); So after this change the two fields are properly released already even before reaching the finalize() function but in object_property_del_all(), hence we don't need to free them anymore in finalize() or it's double-free. This also fixes a trivial memory leak for tls-authz as we forgot to free it before this patch. Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185515.27475-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Add helpers to detect TLS capabilityPeter Xu
Add migrate_channel_requires_tls() to detect whether the specific channel requires TLS, leveraging the recently introduced migrate_use_tls(). No functional change intended. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185513.27421-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Add property x-postcopy-preempt-break-hugePeter Xu
Add a property field that can conditionally disable the "break sending huge page" behavior in postcopy preemption. By default it's enabled. It should only be used for debugging purposes, and we should never remove the "x-" prefix. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Manish Mishra <manish.mishra@nutanix.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185511.27366-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Create the postcopy preempt channel asynchronouslyPeter Xu
This patch allows the postcopy preempt channel to be created asynchronously. The benefit is that when the connection is slow, we won't take the BQL (and potentially block all things like QMP) for a long time without releasing. A function postcopy_preempt_wait_channel() is introduced, allowing the migration thread to be able to wait on the channel creation. The channel is always created by the main thread, in which we'll kick a new semaphore to tell the migration thread that the channel has created. We'll need to wait for the new channel in two places: (1) when there's a new postcopy migration that is starting, or (2) when there's a postcopy migration to resume. For the start of migration, we don't need to wait for this channel until when we want to start postcopy, aka, postcopy_start(). We'll fail the migration if we found that the channel creation failed (which should probably not happen at all in 99% of the cases, because the main channel is using the same network topology). For a postcopy recovery, we'll need to wait in postcopy_pause(). In that case if the channel creation failed, we can't fail the migration or we'll crash the VM, instead we keep in PAUSED state, waiting for yet another recovery. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Manish Mishra <manish.mishra@nutanix.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185509.27311-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Postcopy recover with preempt enabledPeter Xu
To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread needs similar handling on fault tolerance. When ram_load_postcopy() fails, instead of stopping the thread it halts with a semaphore, preparing to be kicked again when recovery is detected. A mutex is introduced to make sure there's no concurrent operation upon the socket. To make it simple, the fast ram load thread will take the mutex during its whole procedure, and only release it if it's paused. The fast-path socket will be properly released by the main loading thread safely when there's network failures during postcopy with that mutex held. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185506.27257-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Postcopy preemption enablementPeter Xu
This patch enables postcopy-preempt feature. It contains two major changes to the migration logic: (1) Postcopy requests are now sent via a different socket from precopy background migration stream, so as to be isolated from very high page request delays. (2) For huge page enabled hosts: when there's postcopy requests, they can now intercept a partial sending of huge host pages on src QEMU. After this patch, we'll live migrate a VM with two channels for postcopy: (1) PRECOPY channel, which is the default channel that transfers background pages; and (2) POSTCOPY channel, which only transfers requested pages. There's no strict rule of which channel to use, e.g., if a requested page is already being transferred on precopy channel, then we will keep using the same precopy channel to transfer the page even if it's explicitly requested. In 99% of the cases we'll prioritize the channels so we send requested page via the postcopy channel as long as possible. On the source QEMU, when we found a postcopy request, we'll interrupt the PRECOPY channel sending process and quickly switch to the POSTCOPY channel. After we serviced all the high priority postcopy pages, we'll switch back to PRECOPY channel so that we'll continue to send the interrupted huge page again. There's no new thread introduced on src QEMU. On the destination QEMU, one new thread is introduced to receive page data from the postcopy specific socket (done in the preparation patch). This patch has a side effect: after sending postcopy pages, previously we'll assume the guest will access follow up pages so we'll keep sending from there. Now it's changed. Instead of going on with a postcopy requested page, we'll go back and continue sending the precopy huge page (which can be intercepted by a postcopy request so the huge page can be sent partially before). Whether that's a problem is debatable, because "assuming the guest will continue to access the next page" may not really suite when huge pages are used, especially if the huge page is large (e.g. 1GB pages). So that locality hint is much meaningless if huge pages are used. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185504.27203-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Postcopy preemption preparation on channel creationPeter Xu
Create a new socket for postcopy to be prepared to send postcopy requested pages via this specific channel, so as to not get blocked by precopy pages. A new thread is also created on dest qemu to receive data from this new channel based on the ram_load_postcopy() routine. The ram_load_postcopy(POSTCOPY) branch and the thread has not started to function, and that'll be done in follow up patches. Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new thread too to make sure it'll be recycled properly. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185502.27149-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: With Peter's fix to quieten compiler warning on start_migration
2022-07-20migration: Add postcopy-preempt capabilityPeter Xu
Firstly, postcopy already preempts precopy due to the fact that we do unqueue_page() first before looking into dirty bits. However that's not enough, e.g., when there're host huge page enabled, when sending a precopy huge page, a postcopy request needs to wait until the whole huge page that is sending to finish. That could introduce quite some delay, the bigger the huge page is the larger delay it'll bring. This patch adds a new capability to allow postcopy requests to preempt existing precopy page during sending a huge page, so that postcopy requests can be serviced even faster. Meanwhile to send it even faster, bypass the precopy stream by providing a standalone postcopy socket for sending requested pages. Since the new behavior will not be compatible with the old behavior, this will not be the default, it's enabled only when the new capability is set on both src/dst QEMUs. This patch only adds the capability itself, the logic will be added in follow up patches. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185342.26794-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20multifd: Copy pages before compressing them with zlibIlya Leoshkevich
zlib_send_prepare() compresses pages of a running VM. zlib does not make any thread-safety guarantees with respect to changing deflate() input concurrently with deflate() [1]. One can observe problems due to this with the IBM zEnterprise Data Compression accelerator capable zlib [2]. When the hardware acceleration is enabled, migration/multifd/tcp/plain/zlib test fails intermittently [3] due to sliding window corruption. The accelerator's architecture explicitly discourages concurrent accesses [4]: Page 26-57, "Other Conditions": As observed by this CPU, other CPUs, and channel programs, references to the parameter block, first, second, and third operands may be multiple-access references, accesses to these storage locations are not necessarily block-concurrent, and the sequence of these accesses or references is undefined. Mark Adler pointed out that vanilla zlib performs double fetches under certain circumstances as well [5], therefore we need to copy data before passing it to deflate(). [1] https://zlib.net/manual.html [2] https://github.com/madler/zlib/pull/410 [3] https://lists.nongnu.org/archive/html/qemu-devel/2022-03/msg03988.html [4] http://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf [5] https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg00889.html Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-Id: <20220705203559.2960949-1-iii@linux.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration/dirtyrate: Refactor dirty page rate calculationHyman Huang(黄勇)
abstract out dirty log change logic into function global_dirty_log_change. abstract out dirty page rate calculation logic via dirty-ring into function vcpu_calculate_dirtyrate. abstract out mathematical dirty page rate calculation into do_calculate_dirtyrate, decouple it from DirtyStat. rename set_sample_page_period to dirty_stat_wait, which is well-understood and will be reused in dirtylimit. handle cpu hotplug/unplug scenario during measurement of dirty page rate. export util functions outside migration. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <7b6f6f4748d5b3d017b31a0429e630229ae97538.1656177590.git.huangy81@chinatelecom.cn> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-12block: Change blk_{pread,pwrite}() param orderAlberto Faria
Swap 'buf' and 'bytes' around for consistency with blk_co_{pread,pwrite}(), and in preparation to implement these functions using generated_co_wrapper. Callers were updated using this Coccinelle script: @@ expression blk, offset, buf, bytes, flags; @@ - blk_pread(blk, offset, buf, bytes, flags) + blk_pread(blk, offset, bytes, buf, flags) @@ expression blk, offset, buf, bytes, flags; @@ - blk_pwrite(blk, offset, buf, bytes, flags) + blk_pwrite(blk, offset, bytes, buf, flags) It had no effect on hw/block/nand.c, presumably due to the #if, so that file was updated manually. Overly-long lines were then fixed by hand. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220705161527.1054072-4-afaria@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-07-12block: Add a 'flags' param to blk_pread()Alberto Faria
For consistency with other I/O functions, and in preparation to implement it using generated_co_wrapper. Callers were updated using this Coccinelle script: @@ expression blk, offset, buf, bytes; @@ - blk_pread(blk, offset, buf, bytes) + blk_pread(blk, offset, buf, bytes, 0) It had no effect on hw/block/nand.c, presumably due to the #if, so that file was updated manually. Overly-long lines were then fixed by hand. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220705161527.1054072-3-afaria@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-06-23migration: remove the QEMUFileOps abstractionDaniel P. Berrangé
Now that all QEMUFile callbacks are removed, the entire concept can be deleted. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-06-23migration: remove the QEMUFileOps 'get_return_path' callbackDaniel P. Berrangé
This directly implements the get_return_path logic using QIOChannel APIs. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-06-23migration: remove the QEMUFileOps 'writev_buffer' callbackDaniel P. Berrangé
This directly implements the writev_buffer logic using QIOChannel APIs. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-06-23migration: remove the QEMUFileOps 'get_buffer' callbackDaniel P. Berrangé
This directly implements the get_buffer logic using QIOChannel APIs. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Fixup len = *-*EIO as spotted by Peter Xu
2022-06-22migration: remove the QEMUFileOps 'close' callbackDaniel P. Berrangé
This directly implements the close logic using QIOChannel APIs. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-06-22migration: remove the QEMUFileOps 'set_blocking' callbackDaniel P. Berrangé
This directly implements the set_blocking logic using QIOChannel APIs. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-06-22migration: remove the QEMUFileOps 'shut_down' callbackDaniel P. Berrangé
This directly implements the shutdown logic using QIOChannel APIs. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-06-22migration: remove unused QEMUFileGetFD typedef / qemu_get_fd methodDaniel P. Berrangé
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-06-22migration: introduce new constructors for QEMUFileDaniel P. Berrangé
Prepare for the elimination of QEMUFileOps by introducing a pair of new constructors. This lets us distinguish between an input and output file object explicitly rather than via the existance of specific callbacks. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-06-22migration: hardcode assumption that QEMUFile is backed with QIOChannelDaniel P. Berrangé
The only callers of qemu_fopen_ops pass 'true' for the 'has_ioc' parameter, so hardcode this assumption in QEMUFile, by passing in the QIOChannel object as a non-opaque parameter. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Fixed long line