aboutsummaryrefslogtreecommitdiff
path: root/migration/postcopy-ram.c
AgeCommit message (Collapse)Author
2024-10-31migration: Put thread names together with macrosPeter Xu
Keep migration thread names together, so it's easier to see a list of all possible migration threads. Still two functional changes below besides the macro defintions: - There's one dirty rate thread that we overlooked before, now we add that too and name it as "mig/dirtyrate" following the old rules. - The old name "mig/src/rp-thr" has "-thr" but it may not be useful if it's a thread name anyway, while "rp" can be slightly hard to read. Taking this chance to rename it to "mig/src/return", hopefully a better name. Reviewed-by: Fabiano Rosas <farosas@suse.de> Acked-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Link: https://lore.kernel.org/r/20241011153652.517440-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-08migration/postcopy: Use uffd helpersDr. David Alan Gilbert
Use the uffd_copy_page, uffd_zero_page and uffd_wakeup helpers rather than calling ioctl ourselves. They return -errno on error, and print an error_report themselves. I think this actually makes postcopy_place_page actually more consistent in it's callers. Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240919134626.166183-7-dave@treblig.org [peterx: fix i386 build] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-09-24migration: remove return after g_assert_not_reached()Pierrick Bouvier
This patch is part of a series that moves towards a consistent use of g_assert_not_reached() rather than an ad hoc mix of different assertion mechanisms. Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240919044641.386068-31-pierrick.bouvier@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-09-24migration: replace assert(0) with g_assert_not_reached()Pierrick Bouvier
This patch is part of a series that moves towards a consistent use of g_assert_not_reached() rather than an ad hoc mix of different assertion mechanisms. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240919044641.386068-5-pierrick.bouvier@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-06-21migration/postcopy: Add postcopy-recover-setup phasePeter Xu
This patch adds a migration state on src called "postcopy-recover-setup". The new state will describe the intermediate step starting from when the src QEMU received a postcopy recovery request, until the migration channels are properly established, but before the recovery process take place. The request came from Libvirt where Libvirt currently rely on the migration state events to detect migration state changes. That works for most of the migration process but except postcopy recovery failures at the beginning. Currently postcopy recovery only has two major states: - postcopy-paused: this is the state that both sides of QEMU will be in for a long time as long as the migration channel was interrupted. - postcopy-recover: this is the state where both sides of QEMU handshake with each other, preparing for a continuation of postcopy which used to be interrupted. The issue here is when the recovery port is invalid, the src QEMU will take the URI/channels, noticing the ports are not valid, and it'll silently keep in the postcopy-paused state, with no event sent to Libvirt. In this case, the only thing Libvirt can do is to poll the migration status with a proper interval, however that's less optimal. Considering that this is the only case where Libvirt won't get a notification from QEMU on such events, let's add postcopy-recover-setup state to mimic what we have with the "setup" state of a newly initialized migration, describing the phase of connection establishment. With that, postcopy recovery will have two paths to go now, and either path will guarantee an event generated. Now the events will look like this during a recovery process on src QEMU: - Initially when the recovery is initiated on src, QEMU will go from "postcopy-paused" -> "postcopy-recover-setup". Old QEMUs don't have this event. - Depending on whether the channel re-establishment is succeeded: - In succeeded case, src QEMU will move from "postcopy-recover-setup" to "postcopy-recover". Old QEMUs also have this event. - In failure case, src QEMU will move from "postcopy-recover-setup" to "postcopy-paused" again. Old QEMUs don't have this event. This guarantees that Libvirt will always receive a notification for recovery process properly. One thing to mention is, such new status is only needed on src QEMU not both. On dest QEMU, the state machine doesn't change. Hence the events don't change either. It's done like so because dest QEMU may not have an explicit point of setup start. E.g., it can happen that when dest QEMUs doesn't use migrate-recover command to use a new URI/channel, but the old URI/channels can be reused in recovery, in which case the old ports simply can work again after the network routes are fixed up. Add a new helper postcopy_is_paused() detecting whether postcopy is still paused, taking RECOVER_SETUP into account too. When using it on both src/dst, a slight change is done altogether to always wait for the semaphore before checking the status, because for both sides a sem_post() will be required for a recovery. Cc: Jiri Denemark <jdenemar@redhat.com> Cc: Prasad Pandit <ppandit@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Buglink: https://issues.redhat.com/browse/RHEL-38485 Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-06-21migration: Rename thread debug namesPeter Xu
The postcopy thread names on dest QEMU are slightly confusing, partly I'll need to blame myself on 36f62f11e4 ("migration: Postcopy preemption preparation on channel creation"). E.g., "fault-fast" reads like a fast version of "fault-default", but it's actually the fast version of "postcopy/listen". Taking this chance, rename all the migration threads with proper rules. Considering we only have 15 chars usable, prefix all threads with "mig/", meanwhile identify src/dst threads properly this time. So now most thread names will look like "mig/DIR/xxx", where DIR will be "src"/"dst", except the bg-snapshot thread which doesn't have a direction. For multifd threads, making them "mig/{src|dst}/{send|recv}_%d". We used to have "live_migration" thread for a very long time, now it's called "mig/src/main". We may hope to have "mig/dst/main" soon but not yet. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-05-03migration: remove PostcopyDiscardState from typedefs.hPaolo Bonzini
It is defined and referred to exclusively from a .c file. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-12error: Move ERRP_GUARD() to the beginning of the functionZhao Liu
Since the commit 05e385d2a9 ("error: Move ERRP_GUARD() to the beginning of the function"), there are new codes that don't put ERRP_GUARD() at the beginning of the functions. As stated in the commit 05e385d2a9: "include/qapi/error.h advises to put ERRP_GUARD() right at the beginning of the function, because only then can it guard the whole function.", so clean up the few spots disregarding the advice. Inspired-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240312060337.3240965-1-zhao1.liu@linux.intel.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-28migration: remove error from notifier dataSteve Sistare
Remove the error object from opaque data passed to notifiers. Use the new error parameter passed to the notifier instead. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-3-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28notify: pass error to notifier with returnSteve Sistare
Pass an error object as the third parameter to "notifier with return" notifiers, so clients no longer need to bundle an error object in the opaque data. The new parameter is used in a later patch. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-2-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29userfaultfd: use 1ULL to build ioctl masksPaolo Bonzini
There is no need to use the Linux-internal __u64 type, 1ULL is guaranteed to be wide enough. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20240117160313.175609-1-pbonzini@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2023-09-27migration: Fix race that dest preempt thread close too earlyPeter Xu
We hit intermit CI issue on failing at migration-test over the unit test preempt/plain: qemu-system-x86_64: Unable to read from socket: Connection reset by peer Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc current = 4f hit_edge = 1 ** ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0) (test program exited with status code -6) Fabiano debugged into it and found that the preempt thread can quit even without receiving all the pages, which can cause guest not receiving all the pages and corrupt the guest memory. To make sure preempt thread finished receiving all the pages, we can rely on the page_requested_count being zero because preempt channel will only receive requested page faults. Note, not all the faulted pages are required to be sent via the preempt channel/thread; imagine the case when a requested page is just queued into the background main channel for migration, the src qemu will just still send it via the background channel. Here instead of spinning over reading the count, we add a condvar so the main thread can wait on it if that unusual case happened, without burning the cpu for no good reason, even if the duration is short; so even if we spin in this rare case is probably fine. It's just better to not do so. The condvar is only used when that special case is triggered. Some memory ordering trick is needed to guarantee it from happening (against the preempt thread status field), so the main thread will always get a kick when that triggers correctly. Closes: https://gitlab.com/qemu-project/qemu/-/issues/1886 Debugged-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-2-farosas@suse.de>
2023-07-12migration/ram: Expose ramblock_is_ignored() as migrate_ram_is_ignored()David Hildenbrand
virtio-mem wants to know whether it should not mess with the RAMBlock content (e.g., discard RAM, preallocate memory) on incoming migration. So let's expose that function as migrate_ram_is_ignored() in migration/misc.h Message-ID: <20230706075612.67404-4-david@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2023-05-03migration: Drop unused parameter for migration_tls_client_create()Juan Quintela
It is not needed since we moved the accessor for tls properties to options.c. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-27migration: Allow postcopy_ram_supported_by_host() to report errPeter Xu
Instead of print it to STDERR, bring the error upwards so that it can be reported via QMP responses. E.g.: { "execute": "migrate-set-capabilities" , "arguments": { "capabilities": [ { "capability": "postcopy-ram", "state": true } ] } } { "error": { "class": "GenericError", "desc": "Postcopy is not supported: Host backend files need to be TMPFS or HUGETLBFS only" } } Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-24migration: Create options.cJuan Quintela
We move there all capabilities helpers from migration.c. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- Following David advise: - looked through the history, capabilities are newer than 2012, so we can remove that bit of the header. - This part is posterior to Anthony. Original Author is Orit. Once there, I put myself. Peter Xu also did quite a bit of work here. Anyone else wants/needs to be there? I didn't search too hard because nobody asked before to be added. What do you think?
2023-04-24migration/postcopy: Detect file system on dest hostPeter Xu
Postcopy requires the memory support userfaultfd to work. Right now we check it but it's a bit too late (when switching to postcopy migration). Do that early right at enabling of postcopy. Note that this is still only a best effort because ramblocks can be dynamically created. We can add check in hostmem creations and fail if postcopy enabled, but maybe that's too aggressive. Still, we have chance to fail the most obvious where we know there's an existing unsupported ramblock. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-20postcopy-ram: do not use qatomic_mb_readPaolo Bonzini
It does not even pair with a qatomic_mb_set(), so it is clearer to use load-acquire in this case; they are synonyms. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-12migration: Recover behavior of preempt channel creation for pre-7.2Peter Xu
In 8.0 devel window we reworked preempt channel creation, so that there'll be no race condition when the migration channel and preempt channel got established in the wrong order in commit 5655aab079. However no one noticed that the change will also be not compatible with older qemus, majorly 7.1/7.2 versions where preempt mode started to be supported. Leverage the same pre-7.2 flag introduced in the previous patch to recover the behavior hopefully before 8.0 releases, so we don't break migration when we migrate from 8.0 to older qemu binaries. Fixes: 5655aab079 ("migration: Postpone postcopy preempt channel to be after main") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-12migration: Fix potential race on postcopy_qemufile_srcPeter Xu
postcopy_qemufile_src object should be owned by one thread, either the main thread (e.g. when at the beginning, or at the end of migration), or by the return path thread (when during a preempt enabled postcopy migration). If that's not the case the access to the object might be racy. postcopy_preempt_shutdown_file() can be potentially racy, because it's called at the end phase of migration on the main thread, however during which the return path thread hasn't yet been recycled; the recycle happens in await_return_path_close_on_source() which is after this point. It means, logically it's posslbe the main thread and the return path thread are both operating on the same qemufile. While I don't think qemufile is thread safe at all. postcopy_preempt_shutdown_file() used to be needed because that's where we send EOS to dest so that dest can safely shutdown the preempt thread. To avoid the possible race, remove this only place that a race can happen. Instead we figure out another way to safely close the preempt thread on dest. The core idea during postcopy on deciding "when to stop" is that dest will send a postcopy SHUT message to src, telling src that all data is there. Hence to shut the dest preempt thread maybe better to do it directly on dest node. This patch proposed such a way that we change postcopy_prio_thread_created into PreemptThreadStatus, so that we kick the preempt thread on dest qemu by a sequence of: mis->preempt_thread_status = PREEMPT_THREAD_QUIT; qemu_file_shutdown(mis->postcopy_qemufile_dst); While here shutdown() is probably so far the easiest way to kick preempt thread from a blocked qemu_get_be64(). Then it reads preempt_thread_status to make sure it's not a network failure but a willingness to quit the thread. We could have avoided that extra status but just rely on migration status. The problem is postcopy_ram_incoming_cleanup() is just called early enough so we're still during POSTCOPY_ACTIVE no matter what.. So just make it simple to have the status introduced. One flag x-preempt-pre-7-2 is added to keep old pre-7.2 behaviors of postcopy preempt. Fixes: 9358982744 ("migration: Send requested page directly in rp-return thread") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-03-16migration: Wait on preempt channel in preempt threadPeter Xu
QEMU main thread will wait until dest preempt channel established during processing the LISTEN command (within the whole postcopy PACKAGED data), by waiting on the semaphore postcopy_qemufile_dst_done. That's racy, because it's possible that the dest QEMU main thread hasn't yet accept()ed the new connection when processing the LISTEN event. The sem_wait() will yield the main thread without being able to run anything else including the accept() of the new socket, which can cause deadlock within the main thread. To avoid the race, move the "wait channel" from main thread to the preempt thread right at the start. Reported-by: Peter Maydell <peter.maydell@linaro.org> Fixes: 5655aab079 ("migration: Postpone postcopy preempt channel to be after main") Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-11migration: Postpone postcopy preempt channel to be after mainPeter Xu
Postcopy with preempt-mode enabled needs two channels to communicate. The order of channel establishment is not guaranteed. It can happen that the dest QEMU got the preempt channel connection request before the main channel is established, then the migration may make no progress even during precopy due to the wrong order. To fix it, create the preempt channel only if we know the main channel is established. For a general postcopy migration, we delay it until postcopy_start(), that's where we already went through some part of precopy on the main channel. To make sure dest QEMU has already established the channel, we wait until we got the first PONG received. That's something we do at the start of precopy when postcopy enabled so it's guaranteed to happen sooner or later. For a postcopy recovery, we delay it to qemu_savevm_state_resume_prepare() where we'll have round trips of data on bitmap synchronizations, which means the main channel must have been established. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-11migration: Cleanup postcopy_preempt_setup()Peter Xu
Since we just dropped the only case where postcopy_preempt_setup() can return an error, it doesn't need a retval anymore because it never fails. Move the preempt check to the caller, preparing it to be used elsewhere to do nothing but as simple as kicking the async connection. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-11migration: Rework multi-channel checks on URIPeter Xu
The whole idea of multi-channel checks was not properly done, IMHO. Currently we check multi-channel in a lot of places, but actually that's not needed because we only need to check it right after we get the URI and that should be it. If the URI check succeeded, we should never need to check it again because we must have it. If it check fails, we should fail immediately on either the qmp_migrate or qmp_migrate_incoming, instead of failingg it later after the connection established. Neither should we fail any set capabiliities like what we used to do here: 5ad15e8614 ("migration: allow enabling mutilfd for specific protocol only", 2021-10-19) Because logically the URI will only be set later after the capability is set, so it doesn't make a lot of sense to check the URI type when setting the capability, because we're checking the cap with an old URI passed in, and that may not even be the URI we're going to use later. This patch mostly reverted all such checks for before, dropping the variable migrate_allow_multi_channels and helpers. Instead, add a common helper to check URI for multi-channels for either qmp_migrate and qmp_migrate_incoming and that should do all the proper checks. The failure will only trigger with the "migrate" or "migrate_incoming" command, or when user specified "-incoming xxx" where "xxx" is not "defer". Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-08Drop duplicate #includeMarkus Armbruster
Tracked down with the help of scripts/clean-includes. Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230202133830.2152150-21-armbru@redhat.com>
2023-02-06migration: check magic value for deciding the mapping of channelsmanish.mishra
Current logic assumes that channel connections on the destination side are always established in the same order as the source and the first one will always be the main channel followed by the multifid or post-copy preemption channel. This may not be always true, as even if a channel has a connection established on the source side it can be in the pending state on the destination side and a newer connection can be established first. Basically causing out of order mapping of channels on the destination side. Currently, all channels except post-copy preempt send a magic number, this patch uses that magic number to decide the type of channel. This logic is applicable only for precopy(multifd) live migration, as mentioned, the post-copy preempt channel does not send any magic number. Also, tls live migrations already does tls handshake before creating other channels, so this issue is not possible with tls, hence this logic is avoided for tls live migrations. This patch uses read peek to check the magic number of channels so that current data/control stream management remains un-effected. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: manish.mishra <manish.mishra@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06util/userfaultfd: Add uffd_open()Peter Xu
Add a helper to create the uffd handle. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-07-20migration: Enable TLS for preempt channelPeter Xu
This patch is based on the async preempt channel creation. It continues wiring up the new channel with TLS handshake to destionation when enabled. Note that only the src QEMU needs such operation; the dest QEMU does not need any change for TLS support due to the fact that all channels are established synchronously there, so all the TLS magic is already properly handled by migration_tls_channel_process_incoming(). Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185518.27529-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Create the postcopy preempt channel asynchronouslyPeter Xu
This patch allows the postcopy preempt channel to be created asynchronously. The benefit is that when the connection is slow, we won't take the BQL (and potentially block all things like QMP) for a long time without releasing. A function postcopy_preempt_wait_channel() is introduced, allowing the migration thread to be able to wait on the channel creation. The channel is always created by the main thread, in which we'll kick a new semaphore to tell the migration thread that the channel has created. We'll need to wait for the new channel in two places: (1) when there's a new postcopy migration that is starting, or (2) when there's a postcopy migration to resume. For the start of migration, we don't need to wait for this channel until when we want to start postcopy, aka, postcopy_start(). We'll fail the migration if we found that the channel creation failed (which should probably not happen at all in 99% of the cases, because the main channel is using the same network topology). For a postcopy recovery, we'll need to wait in postcopy_pause(). In that case if the channel creation failed, we can't fail the migration or we'll crash the VM, instead we keep in PAUSED state, waiting for yet another recovery. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Manish Mishra <manish.mishra@nutanix.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185509.27311-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Postcopy recover with preempt enabledPeter Xu
To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread needs similar handling on fault tolerance. When ram_load_postcopy() fails, instead of stopping the thread it halts with a semaphore, preparing to be kicked again when recovery is detected. A mutex is introduced to make sure there's no concurrent operation upon the socket. To make it simple, the fast ram load thread will take the mutex during its whole procedure, and only release it if it's paused. The fast-path socket will be properly released by the main loading thread safely when there's network failures during postcopy with that mutex held. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185506.27257-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Postcopy preemption preparation on channel creationPeter Xu
Create a new socket for postcopy to be prepared to send postcopy requested pages via this specific channel, so as to not get blocked by precopy pages. A new thread is also created on dest qemu to receive data from this new channel based on the ram_load_postcopy() routine. The ram_load_postcopy(POSTCOPY) branch and the thread has not started to function, and that'll be done in follow up patches. Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new thread too to make sure it'll be recycled properly. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185502.27149-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: With Peter's fix to quieten compiler warning on start_migration
2022-04-06Replace qemu_real_host_page variables with inlined functionsMarc-André Lureau
Replace the global variables with inlined helper functions. getpagesize() is very likely annotated with a "const" function attribute (at least with glibc), and thus optimization should apply even better. This avoids the need for a constructor initialization too. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-02migration: postcopy_pause_fault_thread() never failsPeter Xu
Per the title, remove the return code and simplify the callers as the errors will never be triggered. No functional change intended. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220301083925.33483-12-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-03-02migration: Enlarge postcopy recovery to capture !-EIO tooPeter Xu
We used to have quite a few places making sure -EIO happened and that's the only way to trigger postcopy recovery. That's based on the assumption that we'll only return -EIO for channel issues. It'll work in 99.99% cases but logically that won't cover some corner cases. One example is e.g. ram_block_from_stream() could fail with an interrupted network, then -EINVAL will be returned instead of -EIO. I remembered Dave Gilbert pointed that out before, but somehow this is overlooked. Neither did I encounter anything outside the -EIO error. However we'd better touch that up before it triggers a rare VM data loss during live migrating. To cover as much those cases as possible, remove the -EIO restriction on triggering the postcopy recovery, because even if it's not a channel failure, we can't do anything better than halting QEMU anyway - the corpse of the process may even be used by a good hand to dig out useful memory regions, or the admin could simply kill the process later on. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220301083925.33483-11-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-03-02migration: Add postcopy_thread_create()Peter Xu
Postcopy create threads. A common manner is we init a sem and use it to sync with the thread. Namely, we have fault_thread_sem and listen_thread_sem and they're only used for this. Make it a shared infrastructure so it's easier to create yet another thread. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220301083925.33483-7-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-03-02migration: Introduce postcopy channels on dest nodePeter Xu
Postcopy handles huge pages in a special way that currently we can only have one "channel" to transfer the page. It's because when we install pages using UFFDIO_COPY, we need to have the whole huge page ready, it also means we need to have a temp huge page when trying to receive the whole content of the page. Currently all maintainance around this tmp page is global: firstly we'll allocate a temp huge page, then we maintain its status mostly within ram_load_postcopy(). To enable multiple channels for postcopy, the first thing we need to do is to prepare N temp huge pages as caching, one for each channel. Meanwhile we need to maintain the tmp huge page status per-channel too. To give some example, some local variables maintained in ram_load_postcopy() are listed; they are responsible for maintaining temp huge page status: - all_zero: this keeps whether this huge page contains all zeros - target_pages: this counts how many target pages have been copied - host_page: this keeps the host ptr for the page to install Move all these fields to be together with the temp huge pages to form a new structure called PostcopyTmpPage. Then for each (future) postcopy channel, we need one structure to keep the state around. For vanilla postcopy, obviously there's only one channel. It contains both precopy and postcopy pages. This patch teaches the dest migration node to start realize the possible number of postcopy channels by introducing the "postcopy_channels" variable. Its value is calculated when setup postcopy on dest node (during POSTCOPY_LISTEN phase). Vanilla postcopy will have channels=1, but when postcopy-preempt capability is enabled (in the future), we will boost it to 2 because even during partial sending of a precopy huge page we still want to preempt it and start sending the postcopy requested page right away (so we start to keep two temp huge pages; more if we want to enable multifd). In this patch there's a TODO marked for that; so far the channels is always set to 1. We need to send one "host huge page" on one channel only and we cannot split them, because otherwise the data upon the same huge page can locate on more than one channel so we need more complicated logic to manage. One temp host huge page for each channel will be enough for us for now. Postcopy will still always use the index=0 huge page even after this patch. However it prepares for the latter patches where it can start to use multiple channels (which needs src intervention, because only src knows which channel we should use). Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220301083925.33483-5-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Fixed up long line
2022-02-21include: Move qemu_madvise() and related #defines to new qemu/madvise.hPeter Maydell
The function qemu_madvise() and the QEMU_MADV_* constants associated with it are used in only 10 files. Move them out of osdep.h to a new qemu/madvise.h header that is included where it is needed. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220208200856.3558249-2-peter.maydell@linaro.org
2022-01-28migration: Move temp page setup and cleanup into separate functionsPeter Xu
Temp pages will need to grow if we want to have multiple channels for postcopy, because each channel will need its own temp page to cache huge page data. Before doing that, cleanup the related code. No functional change intended. Since at it, touch up the errno handling a little bit on the setup side. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-01-28migration: Enable UFFD_FEATURE_THREAD_ID even without blocktime featPeter Xu
This patch allows us to read the tid even without blocktime feature enabled. It's useful when tracing postcopy fault thread on faulted pages to show thread id too with the address. Remove the comments - they're merely not helpful at all. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-06migration: Check that postcopy fd's are not NULLJuan Quintela
If postcopy has finished, it frees the array. But vhost-user unregister it at cleanup time. fixes: c4f7538 Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-11-01migration: Simplify alignment and alignment checksDavid Hildenbrand
Let's use QEMU_ALIGN_DOWN() and friends to make the code a bit easier to read. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-01migration/postcopy: Handle RAMBlocks with a RamDiscardManager on the destinationDavid Hildenbrand
Currently, when someone (i.e., the VM) accesses discarded parts inside a RAMBlock with a RamDiscardManager managing the corresponding mapped memory region, postcopy will request migration of the corresponding page from the source. The source, however, will never answer, because it refuses to migrate such pages with undefined content ("logically unplugged"): the pages are never dirty, and get_queued_page() will consequently skip processing these postcopy requests. Especially reading discarded ("logically unplugged") ranges is supposed to work in some setups (for example with current virtio-mem), although it barely ever happens: still, not placing a page would currently stall the VM, as it cannot make forward progress. Let's check the state via the RamDiscardManager (the state e.g., of virtio-mem is migrated during precopy) and avoid sending a request that will never get answered. Place a fresh zero page instead to keep the VM working. This is the same behavior that would happen automatically without userfaultfd being active, when accessing virtual memory regions without populated pages -- "populate on demand". For now, there are valid cases (as documented in the virtio-mem spec) where a VM might read discarded memory; in the future, we will disallow that. Then, we might want to handle that case differently, e.g., warning the user that the VM seems to be mis-behaving. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-05-13migration/ram: Handle RAM block resizes during postcopyDavid Hildenbrand
Resizing while migrating is dangerous and does not work as expected. The whole migration code works with the usable_length of a ram block and does not expect this value to change at random points in time. In the case of postcopy, relying on used_length is racy as soon as the guest is running. Also, when used_length changes we might leave the uffd handler registered for some memory regions, reject valid pages when migrating and fail when sending the recv bitmap to the source. Resizing can be trigger *after* (but not during) a reset in ACPI code by the guest - hw/arm/virt-acpi-build.c:acpi_ram_update() - hw/i386/acpi-build.c:acpi_ram_update() Let's remember the original used_length in a separate variable and use it in relevant postcopy code. Make sure to update it when we resize during precopy, when synchronizing the RAM block sizes with the source. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-9-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-12-19qapi: Use QAPI_LIST_PREPEND() where possibleEric Blake
Anywhere we create a list of just one item or by prepending items (typically because order doesn't matter), we can use QAPI_LIST_PREPEND(). But places where we must keep the list in order by appending remain open-coded until later patches. Note that as a side effect, this also performs a cleanup of two minor issues in qga/commands-posix.c: the old code was performing new = g_malloc0(sizeof(*ret)); which 1) is confusing because you have to verify whether 'new' and 'ret' are variables with the same type, and 2) would conflict with C++ compilation (not an actual problem for this file, but makes copy-and-paste harder). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201113011340.463563-5-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> [Straightforward conflicts due to commit a8aa94b5f8 "qga: update schema for guest-get-disks 'dependents' field" and commit a10b453a52 "target/mips: Move mips_cpu_add_definition() from helper.c to cpu.c" resolved. Commit message tweaked.] Signed-off-by: Markus Armbruster <armbru@redhat.com>
2020-11-02migration: Unify reset of last_rb on destination node when recoverPeter Xu
When postcopy recover happens, we need to reset last_rb after each return of postcopy_pause_fault_thread() because that means we just got the postcopy migration continued. Unify this reset to the place right before we want to kick the fault thread again, when we get the command MIG_CMD_POSTCOPY_RESUME from source. This is actually more than that - because the main thread on destination will now be able to call migrate_send_rp_req_pages_pending() too, so the fault thread is not the only user of last_rb now. Move the reset earlier will allow the first call to migrate_send_rp_req_pages_pending() to use the reset value even if called from the main thread. (NOTE: this is not a real fix to 0c26781c09 mentioned below, however it is just a mark that when picking up 0c26781c09 we'd better have this one too; the real fix will come later) Fixes: 0c26781c09 ("migration: Sync requested pages after postcopy recovery") Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20201102153010.11979-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26migration: Maintain postcopy faulted addressesPeter Xu
Maintain a list of faulted addresses on the destination host for which we're waiting on. This is implemented using a GTree rather than a real list to make sure even there're plenty of vCPUs/threads that are faulting, the lookup will still be fast with O(log(N)) (because we'll do that after placing each page). It should bring a slight overhead, but ideally that shouldn't be a big problem simply because in most cases the requested page list will be short. Actually we did similar things for postcopy blocktime measurements. This patch didn't use that simply because: (1) blocktime measurement is towards vcpu threads only, but here we need to record all faulted addresses, including main thread and external thread (like, DPDK via vhost-user). (2) blocktime measurement will require UFFD_FEATURE_THREAD_ID, but here we don't want to add that extra dependency on the kernel version since not necessary. E.g., we don't need to know which thread faulted on which page, we also don't care about multiple threads faulting on the same page. But we only care about what addresses are faulted so waiting for a page copying from src. (3) blocktime measurement is not enabled by default. However we need this by default especially for postcopy recover. Another thing to mention is that this patch introduced a new mutex to serialize the receivedmap and the page_requested tree, however that serialization does not cover other procedures like UFFDIO_COPY. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-4-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26migration: Pass incoming state into qemu_ufd_copy_ioctl()Peter Xu
It'll be used in follow up patches to access more fields out of it. Meanwhile fetch the userfaultfd inside the function. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26migration: Add spaces around operatorBihong Yu
Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-4-git-send-email-yubihong@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-09-25migration: Rework migrate_send_rp_req_pages() functionPeter Xu
We duplicated the logic of maintaining the last_rb variable at both callers of this function. Pass *rb pointer into the function so that we can avoid duplicating the logic. Also, when we have the rb pointer, it's also easier to remove the original 2nd & 4th parameters, because both of them (name of the ramblock when needed, or the page size) can be fetched from the ramblock pointer too. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200908203022.341615-3-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-09-23qemu/atomic.h: rename atomic_ to qatomic_Stefan Hajnoczi
clang's C11 atomic_fetch_*() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>