aboutsummaryrefslogtreecommitdiff
path: root/migration/ram.c
AgeCommit message (Collapse)Author
2023-02-15migration: Rename res_{postcopy,precopy}_onlyJuan Quintela
Once that res_compatible is removed, they don't make sense anymore. We remove the _only preffix. And to make things clearer we rename them to must_precopy and can_postcopy. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-15migration: Remove unused res_compatibleJuan Quintela
Nothing assigns to it after previous commit. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-15migration: In case of postcopy, the memory ends in res_postcopy_onlyJuan Quintela
So remove last assignation of res_compatible. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-13ram: Document migration ram flagsJuan Quintela
0x80 is RAM_SAVE_FLAG_HOOK, it is in qemu-file now. Bigger usable flag is 0x200, noticing that. We can reuse RAM_SAVe_FLAG_FULL. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-11AVX512 support for xbzrle_encode_bufferling xu
This commit is the same with [PATCH v6 1/2], and provides avx512 support for xbzrle_encode_buffer function to accelerate xbzrle encoding speed. Runtime check of avx512 support and benchmark for this feature are added. Compared with C version of xbzrle_encode_buffer function, avx512 version can achieve 50%-70% performance improvement on benchmarking. In addition, if dirty data is randomly located in 4K page, the avx512 version can achieve almost 140% performance gain. Signed-off-by: ling xu <ling1.xu@intel.com> Co-authored-by: Zhou Zhao <zhou.zhao@intel.com> Co-authored-by: Jun Jin <jun.i.jin@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-11migration: Make ram_save_target_page() a pointerJuan Quintela
We are going to create a new function for multifd latest in the series. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2023-02-11migration: Calculate ram size onceJuan Quintela
We are recalculating ram size continously, when we know that it don't change during migration. Create a field in RAMState to track it. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-02-11migration: Split ram_bytes_total_common() in two functionsJuan Quintela
It is just a big if in the middle of the function, and we need two functions anways. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Juan Quintela <quintela@redhat.com> --- Reindent to make Phillipe happy (and CODING_STYLE)
2023-02-11migration: Make find_dirty_block() return a single parameterJuan Quintela
We used to return two bools, just return a single int with the following meaning: old return / again / new return false false PAGE_ALL_CLEAN false true PAGE_TRY_AGAIN true true PAGE_DIRTY_FOUND /* We don't care about again at all */ Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-11migration: Simplify ram_find_and_save_block()Juan Quintela
We will need later that find_dirty_block() return errors, so simplify the loop. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/ram: Factor out check for advised postcopyDavid Hildenbrand
Let's factor out this check, to be used in virtio-mem context next. While at it, fix a spelling error in a related comment. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>S Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/ram: Optimize ram_write_tracking_start() for RamDiscardManagerDavid Hildenbrand
ram_block_populate_read() already optimizes for RamDiscardManager. However, ram_write_tracking_start() will still try protecting discarded memory ranges. Let's optimize, because discarded ranges don't map any pages and (1) For anonymous memory, trying to protect using uffd-wp without a mapped page is ignored by the kernel and consequently a NOP. (2) For shared/file-backed memory, we will fill present page tables in the range with PTE markers. However, we will even allocate page tables just to fill them with unnecessary PTE markers and effectively waste memory. So let's exclude these ranges, just like ram_block_populate_read() already does. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/ram: Rely on used_length for uffd_change_protection()David Hildenbrand
ram_mig_ram_block_resized() will abort migration (including background snapshots) when resizing a RAMBlock. ram_block_populate_read() will only populate RAM up to used_length, so at least for anonymous memory protecting everything between used_length and max_length won't actually be protected and is just a NOP. So let's only protect everything up to used_length. Note: it still makes sense to register uffd-wp for max_length, such that RAM_UF_WRITEPROTECT is independent of a changing used_length. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/ram: Don't explicitly unprotect when unregistering uffd-wpDavid Hildenbrand
When unregistering uffd-wp, older kernels before commit f369b07c86143 ("mm/uffd:reset write protection when unregister with wp-mode") won't clear the uffd-wp PTE bit. When re-registering uffd-wp, the previous uffd-wp PTE bits would trigger again. With above commit, the kernel will clear the uffd-wp PTE bits when unregistering itself. Consequently, we'll clear the uffd-wp PTE bits now twice -- whereby we don't care about clearing them at all: a new background snapshot will re-register uffd-wp and re-protect all memory either way. So let's skip the manual clearing of uffd-wp. If ever relevant, we could clear conditionally in uffd_unregister_memory() -- we just need a way to figure out more recent kernels. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/ram: Fix error handling in ram_write_tracking_start()David Hildenbrand
If something goes wrong during uffd_change_protection(), we would miss to unregister uffd-wp and not release our reference. Fix it by performing the uffd_change_protection(true) last. Note that a uffd_change_protection(false) on the recovery path without a prior uffd_change_protection(false) is fine. Fixes: 278e2f551a09 ("migration: support UFFD write fault processing in ram_save_iterate()") Cc: qemu-stable@nongnu.org Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration/ram: Fix populate_read_range()David Hildenbrand
Unfortunately, commit f7b9dcfbcf44 broke populate_read_range(): the loop end condition is very wrong, resulting in that function not populating the full range. Lets' fix that. Fixes: f7b9dcfbcf44 ("migration/ram: Factor out populating pages readable in ram_block_populate_pages()") Cc: qemu-stable@nongnu.org Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration: Remove unused threshold_size parameterJuan Quintela
Until previous commit, save_live_pending() was used for ram. Now with the split into state_pending_estimate() and state_pending_exact() it is not needed anymore, so remove them. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2023-02-06migration: Split save_live_pending() into state_pending_*Juan Quintela
We split the function into to: - state_pending_estimate: We estimate the remaining state size without stopping the machine. - state pending_exact: We calculate the exact amount of remaining state. The only "device" that implements different functions for _estimate() and _exact() is ram. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2023-02-06migration: No save_live_pending() method uses the QEMUFile parameterJuan Quintela
So remove it everywhere. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2023-02-06migration: Fix migration crash when target psize larger than hostPeter Xu
Commit d9e474ea56 overlooked the case where the target psize is even larger than the host psize. One example is Alpha has 8K page size and migration will start to crash the source QEMU when running Alpha migration on x86. Fix it by detecting that case and set host start/end just to cover the single page to be migrated. This will slightly optimize the common case where host psize equals to guest psize so we don't even need to do the roundups, but that's trivial. Cc: qemu-stable@nongnu.org Reported-by: Thomas Huth <thuth@redhat.com> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1456 Fixes: d9e474ea56 ("migration: Teach PSS about host page") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Drop rs->fPeter Xu
Now with rs->pss we can already cache channels in pss->pss_channels. That pss_channel contains more infromation than rs->f because it's per-channel. So rs->f could be replaced by rss->pss[RAM_CHANNEL_PRECOPY].pss_channel, while rs->f itself is a bit vague now. Note that vanilla postcopy still send pages via pss[RAM_CHANNEL_PRECOPY], that's slightly confusing but it reflects the reality. Then, after the replacement we can safely drop rs->f. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Remove old preempt code around state maintainancePeter Xu
With the new code to send pages in rp-return thread, there's little help to keep lots of the old code on maintaining the preempt state in migration thread, because the new way should always be faster.. Then if we'll always send pages in the rp-return thread anyway, we don't need those logic to maintain preempt state anymore because now we serialize things using the mutex directly instead of using those fields. It's very unfortunate to have those code for a short period, but that's still one intermediate step that we noticed the next bottleneck on the migration thread. Now what we can do best is to drop unnecessary code as long as the new code is stable to reduce the burden. It's actually a good thing because the new "sending page in rp-return thread" model is (IMHO) even cleaner and with better performance. Remove the old code that was responsible for maintaining preempt states, at the meantime also remove x-postcopy-preempt-break-huge parameter because with concurrent sender threads we don't really need to break-huge anymore. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Send requested page directly in rp-return threadPeter Xu
With all the facilities ready, send the requested page directly in the rp-return thread rather than queuing it in the request queue, if and only if postcopy preempt is enabled. It can achieve so because it uses separate channel for sending urgent pages. The only shared data is bitmap and it's protected by the bitmap_mutex. Note that since we're moving the ownership of the urgent channel from the migration thread to rp thread it also means the rp thread is responsible for managing the qemufile, e.g. properly close it when pausing migration happens. For this, let migration_release_from_dst_file to cover shutdown of the urgent channel too, renaming it as migration_release_dst_files() to better show what it does. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Move last_sent_block into PageSearchStatusPeter Xu
Since we use PageSearchStatus to represent a channel, it makes perfect sense to keep last_sent_block (aka, leverage RAM_SAVE_FLAG_CONTINUE) to be per-channel rather than global because each channel can be sending different pages on ramblocks. Hence move it from RAMState into PageSearchStatus. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Make PageSearchStatus part of RAMStatePeter Xu
We used to allocate PSS structure on the stack for precopy when sending pages. Make it static, so as to describe per-channel ram migration status. Here we declared RAM_CHANNEL_MAX instances, preparing for postcopy to use it, even though this patch has not yet to start using the 2nd instance. This should not have any functional change per se, but it already starts to export PSS information via the RAMState, so that e.g. one PSS channel can start to reference the other PSS channel. Always protect PSS access using the same RAMState.bitmap_mutex. We already do so, so no code change needed, just some comment update. Maybe we should consider renaming bitmap_mutex some day as it's going to be a more commonly and big mutex we use for ram states, but just leave it for later. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Add pss_init()Peter Xu
Helper to init PSS structures. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Introduce pss_channelPeter Xu
Introduce pss_channel for PageSearchStatus, define it as "the migration channel to be used to transfer this host page". We used to have rs->f, which is a mirror to MigrationState.to_dst_file. After postcopy preempt initial version, rs->f can be dynamically changed depending on which channel we want to use. But that later work still doesn't grant full concurrency of sending pages in e.g. different threads, because rs->f can either be the PRECOPY channel or POSTCOPY channel. This needs to be per-thread too. PageSearchStatus is actually a good piece of struct which we can leverage if we want to have multiple threads sending pages. Sending a single guest page may not make sense, so we make the granule to be "host page", and in the PSS structure we allow specify a QEMUFile* to migrate a specific host page. Then we open the possibility to specify different channels in different threads with different PSS structures. The PSS prefix can be slightly misleading here because e.g. for the upcoming usage of postcopy channel/thread it's not "searching" (or, scanning) at all but sending the explicit page that was requested. However since PSS existed for some years keep it as-is until someone complains. This patch mostly (simply) replace rs->f with pss->pss_channel only. No functional change intended for this patch yet. But it does prepare to finally drop rs->f, and make ram_save_guest_page() thread safe. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Teach PSS about host pagePeter Xu
Migration code has a lot to do with host pages. Teaching PSS core about the idea of host page helps a lot and makes the code clean. Meanwhile, this prepares for the future changes that can leverage the new PSS helpers that this patch introduces to send host page in another thread. Three more fields are introduced for this: (1) host_page_sending: this is set to true when QEMU is sending a host page, false otherwise. (2) host_page_{start|end}: these point to the start/end of host page we're sending, and it's only valid when host_page_sending==true. For example, when we look up the next dirty page on the ramblock, with host_page_sending==true, we'll not try to look for anything beyond the current host page boundary. This can be slightly efficient than current code because currently we'll set pss->page to next dirty bit (which can be over current host page boundary) and reset it to host page boundary if we found it goes beyond that. With above, we can easily make migration_bitmap_find_dirty() self contained by updating pss->page properly. rs* parameter is removed because it's not even used in old code. When sending a host page, we should use the pss helpers like this: - pss_host_page_prepare(pss): called before sending host page - pss_within_range(pss): whether we're still working on the cur host page? - pss_host_page_finish(pss): called after sending a host page Then we can use ram_save_target_page() to save one small page. Currently ram_save_host_page() is still the only user. If there'll be another function to send host page (e.g. in return path thread) in the future, it should follow the same style. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Use atomic ops properly for page accountingsPeter Xu
To prepare for thread-safety on page accountings, at least below counters need to be accessed only atomically, they are: ram_counters.transferred ram_counters.duplicate ram_counters.normal ram_counters.postcopy_bytes There are a lot of other counters but they won't be accessed outside migration thread, then they're still safe to be accessed without atomic ops. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Yield bitmap_mutex properly when sending/sleepingPeter Xu
Don't take the bitmap mutex when sending pages, or when being throttled by migration_rate_limit() (which is a bit tricky to call it here in ram code, but seems still helpful). It prepares for the possibility of concurrently sending pages in >1 threads using the function ram_save_host_page() because all threads may need the bitmap_mutex to operate on bitmaps, so that either sendmsg() or any kind of qemu_sem_wait() blocking for one thread will not block the other from progressing. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Remove RAMState.f references in compression codePeter Xu
Removing referencing to RAMState.f in compress_page_with_multi_thread() and flush_compressed_data(). Compression code by default isn't compatible with having >1 channels (or it won't currently know which channel to flush the compressed data), so to make it simple we always flush on the default to_dst_file port until someone wants to add >1 ports support, as rs->f right now can really change (after postcopy preempt is introduced). There should be no functional change at all after patch applied, since as long as rs->f referenced in compression code, it must be to_dst_file. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Trivial cleanup save_page_header() on same block checkPeter Xu
The 2nd check on RAM_SAVE_FLAG_CONTINUE is a bit redundant. Use a boolean to be clearer. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Cleanup xbzrle zero page cache update logicPeter Xu
The major change is to replace "!save_page_use_compression()" with "xbzrle_enabled" to make it clear. Reasonings: (1) When compression enabled, "!save_page_use_compression()" is exactly the same as checking "xbzrle_enabled". (2) When compression disabled, "!save_page_use_compression()" always return true. We used to try calling the xbzrle code, but after this change we won't, and we shouldn't need to. Since at it, drop the xbzrle_enabled check in xbzrle_cache_zero_page() because with this change it's not needed anymore. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Add postcopy_preempt_active()Peter Xu
Add the helper to show that postcopy preempt enabled, meanwhile active. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Take bitmap mutex when completing ram migrationPeter Xu
Any call to ram_find_and_save_block() needs to take the bitmap mutex. We used to not take it for most of ram_save_complete() because we thought we're the only one left using the bitmap, but it's not true after the preempt full patchset applied, since the return path can be taking it too. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Export ram_release_page()Juan Quintela
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Leonardo Bras <leobras@redhat.com>
2022-12-15migration: Export ram_transferred_ram()Juan Quintela
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Leonardo Bras <leobras@redhat.com>
2022-11-21migration: Disable multifd explicitly with compressionPeter Xu
Multifd thread model does not work for compression, explicitly disable it. Note that previuosly even we can enable both of them, nothing will go wrong, because the compression code has higher priority so multifd feature will just be ignored. Now we'll fail even earlier at config time so the user should be aware of the consequence better. Note that there can be a slight chance of breaking existing users, but let's assume they're not majority and not serious users, or they should have found that multifd is not working already. With that, we can safely drop the check in ram_save_target_page() for using multifd, because when multifd=on then compression=off, then the removed check on save_page_use_compression() will also always return false too. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-11-21migration: Fix possible infinite loop of ram save processPeter Xu
When starting ram saving procedure (especially at the completion phase), always set last_seen_block to non-NULL to make sure we can always correctly detect the case where "we've migrated all the dirty pages". Then we'll guarantee both last_seen_block and pss.block will be valid always before the loop starts. See the comment in the code for some details. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-08-02Revert "migration: Simplify unqueue_page()"Thomas Huth
This reverts commit cfd66f30fb0f735df06ff4220e5000290a43dad3. The simplification of unqueue_page() introduced a bug that sometimes breaks migration on s390x hosts. The problem is not fully understood yet, but since we are already in the freeze for QEMU 7.1 and we need something working there, let's revert this patch for the upcoming release. The optimization can be redone later again in a proper way if necessary. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2099934 Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20220802061949.331576-1-thuth@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration/multifd: Report to user when zerocopy not workingLeonardo Bras
Some errors, like the lack of Scatter-Gather support by the network interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using zero-copy, which causes it to fall back to the default copying mechanism. After each full dirty-bitmap scan there should be a zero-copy flush happening, which checks for errors each of the previous calls to sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then increment dirty_sync_missed_zero_copy migration stat to let the user know about it. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Message-Id: <20220711211112.18951-4-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Respect postcopy request order in preemption modePeter Xu
With preemption mode on, when we see a postcopy request that was requesting for exactly the page that we have preempted before (so we've partially sent the page already via PRECOPY channel and it got preempted by another postcopy request), currently we drop the request so that after all the other postcopy requests are serviced then we'll go back to precopy stream and start to handle that. We dropped the request because we can't send it via postcopy channel since the precopy channel already contains partial of the data, and we can only send a huge page via one channel as a whole. We can't split a huge page into two channels. That's a very corner case and that works, but there's a change on the order of postcopy requests that we handle since we're postponing this (unlucky) postcopy request to be later than the other queued postcopy requests. The problem is there's a possibility that when the guest was very busy, the postcopy queue can be always non-empty, it means this dropped request will never be handled until the end of postcopy migration. So, there's a chance that there's one dest QEMU vcpu thread waiting for a page fault for an extremely long time just because it's unluckily accessing the specific page that was preempted before. The worst case time it needs can be as long as the whole postcopy migration procedure. It's extremely unlikely to happen, but when it happens it's not good. The root cause of this problem is because we treat pss->postcopy_requested variable as with two meanings bound together, as the variable shows: 1. Whether this page request is urgent, and, 2. Which channel we should use for this page request. With the old code, when we set postcopy_requested it means either both (1) and (2) are true, or both (1) and (2) are false. We can never have (1) and (2) to have different values. However it doesn't necessarily need to be like that. It's very legal that there's one request that has (1) very high urgency, but (2) we'd like to use the precopy channel. Just like the corner case we were discussing above. To differenciate the two meanings better, introduce a new field called postcopy_target_channel, showing which channel we should use for this page request, so as to cover the old meaning (2) only. Then we leave the postcopy_requested variable to stand only for meaning (1), which is the urgency of this page request. With this change, we can easily boost priority of a preempted precopy page as long as we know that page is also requested as a postcopy page. So with the new approach in get_queued_page() instead of dropping that request, we send it right away with the precopy channel so we get back the ordering of the page faults just like how they're requested on dest. Reported-by: Manish Mishra <manish.mishra@nutanix.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Manish Mishra <manish.mishra@nutanix.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185520.27583-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Add property x-postcopy-preempt-break-hugePeter Xu
Add a property field that can conditionally disable the "break sending huge page" behavior in postcopy preemption. By default it's enabled. It should only be used for debugging purposes, and we should never remove the "x-" prefix. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Manish Mishra <manish.mishra@nutanix.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185511.27366-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Postcopy preemption enablementPeter Xu
This patch enables postcopy-preempt feature. It contains two major changes to the migration logic: (1) Postcopy requests are now sent via a different socket from precopy background migration stream, so as to be isolated from very high page request delays. (2) For huge page enabled hosts: when there's postcopy requests, they can now intercept a partial sending of huge host pages on src QEMU. After this patch, we'll live migrate a VM with two channels for postcopy: (1) PRECOPY channel, which is the default channel that transfers background pages; and (2) POSTCOPY channel, which only transfers requested pages. There's no strict rule of which channel to use, e.g., if a requested page is already being transferred on precopy channel, then we will keep using the same precopy channel to transfer the page even if it's explicitly requested. In 99% of the cases we'll prioritize the channels so we send requested page via the postcopy channel as long as possible. On the source QEMU, when we found a postcopy request, we'll interrupt the PRECOPY channel sending process and quickly switch to the POSTCOPY channel. After we serviced all the high priority postcopy pages, we'll switch back to PRECOPY channel so that we'll continue to send the interrupted huge page again. There's no new thread introduced on src QEMU. On the destination QEMU, one new thread is introduced to receive page data from the postcopy specific socket (done in the preparation patch). This patch has a side effect: after sending postcopy pages, previously we'll assume the guest will access follow up pages so we'll keep sending from there. Now it's changed. Instead of going on with a postcopy requested page, we'll go back and continue sending the precopy huge page (which can be intercepted by a postcopy request so the huge page can be sent partially before). Whether that's a problem is debatable, because "assuming the guest will continue to access the next page" may not really suite when huge pages are used, especially if the huge page is large (e.g. 1GB pages). So that locality hint is much meaningless if huge pages are used. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185504.27203-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Postcopy preemption preparation on channel creationPeter Xu
Create a new socket for postcopy to be prepared to send postcopy requested pages via this specific channel, so as to not get blocked by precopy pages. A new thread is also created on dest qemu to receive data from this new channel based on the ram_load_postcopy() routine. The ram_load_postcopy(POSTCOPY) branch and the thread has not started to function, and that'll be done in follow up patches. Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new thread too to make sure it'll be recycled properly. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185502.27149-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: With Peter's fix to quieten compiler warning on start_migration
2022-06-23migration: remove the QEMUFileOps abstractionDaniel P. Berrangé
Now that all QEMUFile callbacks are removed, the entire concept can be deleted. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-06-22migration: rename qemu_update_position to qemu_file_credit_transferDaniel P. Berrangé
The qemu_update_position method name gives the misleading impression that it is changing the current file offset. Most of the files are just streams, however, so there's no concept of a file offset in the general case. What this method is actually used for is to report on the number of bytes that have been transferred out of band from the main I/O methods. This new name better reflects this purpose. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-06-22migration: switch to use QIOChannelNull for dummy channelDaniel P. Berrangé
This removes one further custom impl of QEMUFile, in favour of a QIOChannel based impl. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-05-16multifd: multifd_send_sync_main now returns negative on errorLeonardo Bras
Even though multifd_send_sync_main() currently emits error_reports, it's callers don't really check it before continuing. Change multifd_send_sync_main() to return -1 on error and 0 on success. Also change all it's callers to make use of this change and possibly fail earlier. (This change is important to next patch on multifd zero copy implementation, to make it sure an error in zero-copy flush does not go unnoticed. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20220513062836.965425-7-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-04-21migration: Fix operator typeDr. David Alan Gilbert
Clang spotted an & that should have been an &&; fix it. Reported by: David Binderman / https://gitlab.com/dcb Fixes: 65dacaa04fa ("migration: introduce save_normal_page()") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/963 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20220406102515.96320-1-dgilbert@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>