aboutsummaryrefslogtreecommitdiff
path: root/migration/multifd.c
AgeCommit message (Collapse)Author
2023-03-16migration/multifd: correct multifd_send_thread to trace the flagsWei Wang
The p->flags could be updated via the send_prepare callback, e.g. OR-ed with MULTIFD_FLAG_ZLIB via zlib_send_prepare. Assign p->flags to the local "flags" before the send_prepare callback could only get partial of p->flags. Fix it by moving the assignment of p->flags to the local flags after the callback, so that the correct flags can be traced. Fixes: ab7cbb0b9a3b ("multifd: Make no compression operations into its own structure") Signed-off-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-13migration/multifd: Move load_cleanup inside incoming_state_destroyLeonardo Bras
Currently running migration_incoming_state_destroy() without first running multifd_load_cleanup() will cause a yank error: qemu-system-x86_64: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. (core dumped) The above error happens in the target host, when multifd is being used for precopy, and then postcopy is triggered and the migration finishes. This will crash the VM in the target host. To avoid that, move multifd_load_cleanup() inside migration_incoming_state_destroy(), so that the load cleanup becomes part of the incoming state destroying process. Running multifd_load_cleanup() twice can become an issue, though, but the only scenario it could be ran twice is on process_incoming_migration_bh(). So removing this extra call is necessary. On the other hand, this multifd_load_cleanup() call happens way before the migration_incoming_state_destroy() and having this happening before dirty_bitmap_mig_before_vm_start() and vm_start() may be a need. So introduce a new function multifd_load_shutdown() that will mainly stop all multifd threads and close their QIOChannels. Then use this function instead of multifd_load_cleanup() to make sure nothing else is received before dirty_bitmap_mig_before_vm_start(). Fixes: b5eea99ec2 ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-13migration/multifd: Join all multifd threads in order to avoid leaksLeonardo Bras
Current approach will only join threads that are still running. For the threads not joined, resources or private memory are always kept in the process space and never reclaimed before process end, and this risks serious memory leaks. This should usually not represent a big problem, since multifd migration is usually just ran at most a few times, and after it succeeds there is not much to be done before exiting the process. Yet still, it should not hurt performance to join all of them. Fixes: b5eea99ec2 ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-13migration/multifd: Remove unnecessary assignment on multifd_load_cleanup()Leonardo Bras
Before assigning "p->quit = true" for every multifd channel, multifd_load_cleanup() will call multifd_recv_terminate_threads() which already does the same assignment, while protected by a mutex. So there is no point doing the same assignment again. Fixes: b5eea99ec2 ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-13migration/multifd: Change multifd_load_cleanup() signature and usageLeonardo Bras
Since it's introduction in commit f986c3d256 ("migration: Create multifd migration threads"), multifd_load_cleanup() never returned any value different than 0, neither set up any error on errp. Even though, on process_incoming_migration_bh() an if clause uses it's return value to decide on setting autostart = false, which will never happen. In order to simplify the codebase, change multifd_load_cleanup() signature to 'void multifd_load_cleanup(void)', and for every usage remove error handling or decision made based on return value != 0. Fixes: b5eea99ec2 ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-11migration: Rework multi-channel checks on URIPeter Xu
The whole idea of multi-channel checks was not properly done, IMHO. Currently we check multi-channel in a lot of places, but actually that's not needed because we only need to check it right after we get the URI and that should be it. If the URI check succeeded, we should never need to check it again because we must have it. If it check fails, we should fail immediately on either the qmp_migrate or qmp_migrate_incoming, instead of failingg it later after the connection established. Neither should we fail any set capabiliities like what we used to do here: 5ad15e8614 ("migration: allow enabling mutilfd for specific protocol only", 2021-10-19) Because logically the URI will only be set later after the capability is set, so it doesn't make a lot of sense to check the URI type when setting the capability, because we're checking the cap with an old URI passed in, and that may not even be the URI we're going to use later. This patch mostly reverted all such checks for before, dropping the variable migrate_allow_multi_channels and helpers. Instead, add a common helper to check URI for multi-channels for either qmp_migrate and qmp_migrate_incoming and that should do all the proper checks. The failure will only trigger with the "migrate" or "migrate_incoming" command, or when user specified "-incoming xxx" where "xxx" is not "defer". Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-11multifd: Remove some redundant codeLi Zhang
Clean up some unnecessary code Signed-off-by: Li Zhang <lizhang@suse.de> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-11multifd: cleanup the function multifd_channel_connectLi Zhang
Cleanup multifd_channel_connect Signed-off-by: Li Zhang <lizhang@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration: save/delete migration thread infoJiang Jiacheng
To support query migration thread infomation, save and delete thread(live_migration and multifdsend) information at thread creation and finish. Signed-off-by: Jiang Jiacheng <jiangjiacheng@huawei.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06multifd: Fix flush of zero copy page send requestZhenzhong Duan
Make IO channel flush call after the inflight request has been drained in multifd thread, or else we may missed to flush the inflight request. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06multifd: Fix a race on reading MultiFDPages_t.blockZhenzhong Duan
In multifd_queue_page() MultiFDPages_t.block is checked twice. Between the two checks, MultiFDPages_t.block may be reset to NULL by multifd thread. This lead to the 2nd check always true then a redundant page submitted to multifd thread again. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06migration: check magic value for deciding the mapping of channelsmanish.mishra
Current logic assumes that channel connections on the destination side are always established in the same order as the source and the first one will always be the main channel followed by the multifid or post-copy preemption channel. This may not be always true, as even if a channel has a connection established on the source side it can be in the pending state on the destination side and a newer connection can be established first. Basically causing out of order mapping of channels on the destination side. Currently, all channels except post-copy preempt send a magic number, this patch uses that magic number to decide the type of channel. This logic is applicable only for precopy(multifd) live migration, as mentioned, the post-copy preempt channel does not send any magic number. Also, tls live migrations already does tls handshake before creating other channels, so this issue is not possible with tls, hence this logic is avoided for tls live migrations. This patch uses read peek to check the magic number of channels so that current data/control stream management remains un-effected. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: manish.mishra <manish.mishra@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15migration: Use atomic ops properly for page accountingsPeter Xu
To prepare for thread-safety on page accountings, at least below counters need to be accessed only atomically, they are: ram_counters.transferred ram_counters.duplicate ram_counters.normal ram_counters.postcopy_bytes There are a lot of other counters but they won't be accessed outside migration thread, then they're still safe to be accessed without atomic ops. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15multifd: Create page_count fields into both MultiFD{Recv,Send}ParamsJuan Quintela
We were recalculating it left and right. We plan to change that values on next patches. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Leonardo Bras <leobras@redhat.com>
2022-12-15multifd: Create page_size fields into both MultiFD{Recv,Send}ParamsJuan Quintela
We were calling qemu_target_page_size() left and right. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Leonardo Bras <leobras@redhat.com>
2022-11-21migration/multifd/zero-copy: Create helper function for flushingLeonardo Bras
Move flushing code from multifd_send_sync_main() to a new helper, and call it in multifd_send_sync_main(). Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-07-20migration/multifd: Report to user when zerocopy not workingLeonardo Bras
Some errors, like the lack of Scatter-Gather support by the network interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using zero-copy, which causes it to fall back to the default copying mechanism. After each full dirty-bitmap scan there should be a zero-copy flush happening, which checks for errors each of the previous calls to sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then increment dirty_sync_missed_zero_copy migration stat to let the user know about it. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Message-Id: <20220711211112.18951-4-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20migration: Add helpers to detect TLS capabilityPeter Xu
Add migrate_channel_requires_tls() to detect whether the specific channel requires TLS, leveraging the recently introduced migrate_use_tls(). No functional change intended. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185513.27421-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-06-22migration: rename qemu_file_update_transfer to qemu_file_acct_rate_limitDaniel P. Berrangé
The qemu_file_update_transfer name doesn't give a clear guide on what its purpose is, and how it differs from the qemu_file_credit_transfer method. The latter is specifically for accumulating for total migration traffic, while the former is specifically for accounting in thue rate limit calculations. The new name give better guidance on its usage. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-05-16multifd: Implement zero copy write in multifd migration (multifd-zero-copy)Leonardo Bras
Implement zero copy send on nocomp_send_write(), by making use of QIOChannel writev + flags & flush interface. Change multifd_send_sync_main() so flush_zero_copy() can be called after each iteration in order to make sure all dirty pages are sent before a new iteration is started. It will also flush at the beginning and at the end of migration. Also make it return -1 if flush_zero_copy() fails, in order to cancel the migration process, and avoid resuming the guest in the target host without receiving all current RAM. This will work fine on RAM migration because the RAM pages are not usually freed, and there is no problem on changing the pages content between writev_zero_copy() and the actual sending of the buffer, because this change will dirty the page and cause it to be re-sent on a next iteration anyway. A lot of locked memory may be needed in order to use multifd migration with zero-copy enabled, so disabling the feature should be necessary for low-privileged users trying to perform multifd migrations. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220513062836.965425-9-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-05-16multifd: Send header packet without flags if zero-copy-send is enabledLeonardo Bras
Since d48c3a0445 ("multifd: Use a single writev on the send side"), sending the header packet and the memory pages happens in the same writev, which can potentially make the migration faster. Using channel-socket as example, this works well with the default copying mechanism of sendmsg(), but with zero-copy-send=true, it will cause the migration to often break. This happens because the header packet buffer gets reused quite often, and there is a high chance that by the time the MSG_ZEROCOPY mechanism get to send the buffer, it has already changed, sending the wrong data and causing the migration to abort. It means that, as it is, the buffer for the header packet is not suitable for sending with MSG_ZEROCOPY. In order to enable zero copy for multifd, send the header packet on an individual write(), without any flags, and the remanining pages with a writev(), as it was happening before. This only changes how a migration with zero-copy-send=true works, not changing any current behavior for migrations with zero-copy-send=false. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220513062836.965425-8-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-05-16multifd: multifd_send_sync_main now returns negative on errorLeonardo Bras
Even though multifd_send_sync_main() currently emits error_reports, it's callers don't really check it before continuing. Change multifd_send_sync_main() to return -1 on error and 0 on success. Also change all it's callers to make use of this change and possibly fail earlier. (This change is important to next patch on multifd zero copy implementation, to make it sure an error in zero-copy flush does not go unnoticed. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20220513062836.965425-7-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-05-16migration: Add migrate_use_tls() helperLeonardo Bras
A lot of places check parameters.tls_creds in order to evaluate if TLS is in use, and sometimes call migrate_get_current() just for that test. Add new helper function migrate_use_tls() in order to simplify testing for TLS usage. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220513062836.965425-6-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-04-21migration: Move migrate_allow_multifd and helpers into migration.cPeter Xu
This variable, along with its helpers, is used to detect whether multiple channel will be supported for migration. In follow up patches, there'll be other capability that requires multi-channels. Hence move it outside multifd specific code and make it public. Meanwhile rename it from "multifd" to "multi_channels" to show its real meaning. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220331150857.74406-5-peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-04-21migration: Drop multifd tls_hostname cachePeter Xu
The hostname is cached N times, N equals to the multifd channels. Drop that cache because after previous patch we've got s->hostname being alive for the whole lifecycle of migration procedure. Cc: Juan Quintela <quintela@redhat.com> Cc: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220331150857.74406-3-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-01-28multifd: Rename pages_used to normal_pagesJuan Quintela
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-01-28multifd: recv side only needs the RAMBlock host addressJuan Quintela
So we can remove the MultiFDPages. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-01-28multifd: Use normal pages array on the recv sideJuan Quintela
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- Rename num_normal_pages to total_normal_pages (peter)
2022-01-28multifd: Use normal pages array on the send sideJuan Quintela
We are only sending normal pages through multifd channels. Later on this series, we are going to also send zero pages. We are going to detect if a page is zero or non zero in the multifd channel thread, not on the main thread. So we receive an array of pages page->offset[N] And we will end with: p->normal[N - zero_pages] p->zero[zero_pages]. In this patch, we just copy all the pages in offset to normal. for (i = 0; i < pages->num; i++) { p->narmal[p->normal_num] = pages->offset[i]; p->normal_num++: } Later in the series this becomes: for (i = 0; i < pages->num; i++) { if (buffer_is_zero(page->offset[i])) { p->zerol[p->zero_num] = pages->offset[i]; p->zero_num++: } else { p->narmal[p->normal_num] = pages->offset[i]; p->normal_num++: } } Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- Improving comment (dave) Renaming num_normal_pages to total_normal_pages (peter)
2022-01-28multifd: Unfold "used" variable by its valueJuan Quintela
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-01-28multifd: Use a single writev on the send sideJuan Quintela
Until now, we wrote the packet header with write(), and the rest of the pages with writev(). Just increase the size of the iovec and do a single writev(). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-01-28multifd: Remove send_write() methodJuan Quintela
Everything use now iov's. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-01-28multifd: Move iov from pages to paramsJuan Quintela
This will allow us to reduce the number of system calls on the next patch. Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-01-28migration: All this fields are unsignedJuan Quintela
So printing it as %d is wrong. Notice that for the channel id, that is an uint8_t, but I changed it anyways for consistency. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
2021-12-15multifd: Shut down the QIO channels to avoid blocking the send threads when ↵Li Zhang
they are terminated. When doing live migration with multifd channels 8, 16 or larger number, the guest hangs in the presence of the network errors such as missing TCP ACKs. At sender's side: The main thread is blocked on qemu_thread_join, migration_fd_cleanup is called because one thread fails on qio_channel_write_all when the network problem happens and other send threads are blocked on sendmsg. They could not be terminated. So the main thread is blocked on qemu_thread_join to wait for the threads terminated. (gdb) bt 0 0x00007f30c8dcffc0 in __pthread_clockjoin_ex () at /lib64/libpthread.so.0 1 0x000055cbb716084b in qemu_thread_join (thread=0x55cbb881f418) at ../util/qemu-thread-posix.c:627 2 0x000055cbb6b54e40 in multifd_save_cleanup () at ../migration/multifd.c:542 3 0x000055cbb6b4de06 in migrate_fd_cleanup (s=0x55cbb8024000) at ../migration/migration.c:1808 4 0x000055cbb6b4dfb4 in migrate_fd_cleanup_bh (opaque=0x55cbb8024000) at ../migration/migration.c:1850 5 0x000055cbb7173ac1 in aio_bh_call (bh=0x55cbb7eb98e0) at ../util/async.c:141 6 0x000055cbb7173bcb in aio_bh_poll (ctx=0x55cbb7ebba80) at ../util/async.c:169 7 0x000055cbb715ba4b in aio_dispatch (ctx=0x55cbb7ebba80) at ../util/aio-posix.c:381 8 0x000055cbb7173ffe in aio_ctx_dispatch (source=0x55cbb7ebba80, callback=0x0, user_data=0x0) at ../util/async.c:311 9 0x00007f30c9c8cdf4 in g_main_context_dispatch () at /usr/lib64/libglib-2.0.so.0 10 0x000055cbb71851a2 in glib_pollfds_poll () at ../util/main-loop.c:232 11 0x000055cbb718521c in os_host_main_loop_wait (timeout=42251070366) at ../util/main-loop.c:255 12 0x000055cbb7185321 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:531 13 0x000055cbb6e6ba27 in qemu_main_loop () at ../softmmu/runstate.c:726 14 0x000055cbb6ad6fd7 in main (argc=68, argv=0x7ffc0c578888, envp=0x7ffc0c578ab0) at ../softmmu/main.c:50 To make sure that the send threads could be terminated, IO channels should be shut down to avoid waiting IO. Signed-off-by: Li Zhang <lizhang@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-12-15multifd: Fill offset and block for receptionJuan Quintela
We were using the iov directly, but we will need this info on the following patch. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-12-15multifd: remove used parameter from send_recv_pages() methodJuan Quintela
It is already there as p->pages->num. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-12-15multifd: remove used parameter from send_prepare() methodJuan Quintela
It is already there as p->pages->num. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-12-15multifd: The variable is only used inside the loopJuan Quintela
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-12-15multifd: Add missing documentionJuan Quintela
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-12-15multifd: Rename used field to numJuan Quintela
We will need to split it later in zero_num (number of zero pages) and normal_num (number of normal pages). This name is better. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-12-15migration: Never call twice qemu_target_page_size()Juan Quintela
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-10-19migration: allow enabling mutilfd for specific protocol onlyLi Zhijian
To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org> CC: Li Zhijian <lizhijian@cn.fujitsu.com> Date: Sat, 31 Jul 2021 22:05:52 +0800 (5 weeks, 4 days, 17 hours ago) And change the default to true so that in '-incoming defer' case, user is able to change multifd capability. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-10-19migration: allow multifd for socket protocol onlyLi Zhijian
To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org> CC: Li Zhijian <lizhijian@cn.fujitsu.com> Date: Sat, 31 Jul 2021 22:05:51 +0800 (5 weeks, 4 days, 17 hours ago) multifd with unsupported protocol will cause a segment fault. (gdb) bt #0 0x0000563b4a93faf8 in socket_connect (addr=0x0, errp=0x7f7f02675410) at ../util/qemu-sockets.c:1190 #1 0x0000563b4a797a03 in qio_channel_socket_connect_sync (ioc=0x563b4d16e8c0, addr=0x0, errp=0x7f7f02675410) at ../io/channel-socket.c:145 #2 0x0000563b4a797abf in qio_channel_socket_connect_worker (task=0x563b4cd86c30, opaque=0x0) at ../io/channel-socket.c:168 #3 0x0000563b4a792631 in qio_task_thread_worker (opaque=0x563b4cd86c30) at ../io/task.c:124 #4 0x0000563b4a91da69 in qemu_thread_start (args=0x563b4c44bb80) at ../util/qemu-thread-posix.c:541 #5 0x00007f7fe9b5b3f9 in ?? () #6 0x0000000000000000 in ?? () It's enough to check migrate_multifd_is_allowed() in multifd cleanup() and multifd setup() though there are so many other places using migrate_use_multifd(). Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-10-19multifd: Unconditionally unregister yank functionLukas Straub
To: qemu-devel <qemu-devel@nongnu.org> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras Soares Passos <lsoaresp@redhat.com> Date: Wed, 4 Aug 2021 21:26:32 +0200 (5 weeks, 11 hours, 52 minutes ago) [[PGP Signed Part:No public key for 35AB0B289C5DB258 created at 2021-08-04T21:26:32+0200 using RSA]] Unconditionally unregister yank function in multifd_load_cleanup(). If it is not unregistered here, it will leak and cause a crash in yank_unregister_instance(). Now if the ioc is still in use afterwards, it will only lead to qemu not being able to recover from a hang related to that ioc. After checking the code, i am pretty sure that ref is always 1 when arriving here. So all this currently does is remove the unneeded check. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-10-19multifd: Implement yank for multifd send sideLukas Straub
To: qemu-devel <qemu-devel@nongnu.org> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras Soares Passos <lsoaresp@redhat.com> Date: Wed, 1 Sep 2021 17:58:57 +0200 (1 week, 15 hours, 17 minutes ago) [[PGP Signed Part:No public key for 35AB0B289C5DB258 created at 2021-09-01T17:58:57+0200 using RSA]] When introducing yank functionality in the migration code I forgot to cover the multifd send side. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-07-26migration: Introduce migration_ioc_[un]register_yank()Peter Xu
There're plenty of places in migration/* that checks against either socket or tls typed ioc for yank operations. Provide two helpers to hide all these information. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210722175841.938739-4-peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-06-08migration/socket: Close the listener at the endDr. David Alan Gilbert
Delay closing the listener until the cleanup hook at the end; mptcp needs the listener to stay open while the other paths come in. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210421112834.107651-5-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-06-08yank: Unregister function when using TLS migrationLeonardo Bras
After yank feature was introduced in migration, whenever migration is started using TLS, the following error happens in both source and destination hosts: (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. This happens because of a missing yank_unregister_function() when using qio-channel-tls. Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform yank_unregister_function() in channel_close() and multifd_load_cleanup(). Also, inside migration_channel_connect() and migration_channel_process_incoming() move yank_register_function() so it only runs once on a TLS migration. Fixes: b5eea99ec2f ("migration: Add yank feature", 2021-01-13) Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326 Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Peter Xu <peterx@redhat.com> -- Changes since v2: - Dropped all references to ioc->master - yank_register_function() and yank_unregister_function() now only run once in a TLS migration. Changes since v1: - Cast p->c to QIOChannelTLS into multifd_load_cleanup() Message-Id: <20210601054030.1153249-1-leobras.c@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-13migration/multifd: Print used_length of memory blockDavid Hildenbrand
We actually want to print the used_length, against which we check. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-10-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>