slackcoder/qemu - QEMU is a generic and open source machine & userspace emulator and virtualizer

Age	Commit message (Collapse)	Author
2023-10-11	migration/rdma: Replace int error_state by bool errored	Markus Armbruster
	All we do with the value of RDMAContext member @error_state is test whether it's zero. Change to bool and rename to @errored. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-28-armbru@redhat.com>
2023-10-11	migration/rdma: Dumb down remaining int error values to -1	Markus Armbruster
	This is just to make the error value more obvious. Callers don't mind. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-27-armbru@redhat.com>
2023-10-11	migration/rdma: Return -1 instead of negative errno code	Markus Armbruster
	Several functions return negative errno codes on failure. Callers check for specific codes exactly never. For some of the functions, callers couldn't check even if they wanted to, because the functions also return negative values that aren't errno codes, leaving readers confused on what the function actually returns. Clean up and simplify: return -1 instead of negative errno code. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-26-armbru@redhat.com>
2023-10-11	migration/rdma: Fix rdma_getaddrinfo() error checking	Markus Armbruster
	rdma_getaddrinfo() returns 0 on success. On error, it returns one of the EAI_ error codes like getaddrinfo() does, or -1 with errno set. This is broken by design: POSIX implicitly specifies the EAI_ error codes to be non-zero, no more. They could clash with -1. Nothing we can do about this design flaw. Both callers of rdma_getaddrinfo() only recognize negative values as error. Works only because systems elect to make the EAI_ error codes negative. Best not to rely on that: change the callers to treat any non-zero value as failure. Also change them to return -1 instead of the value received from getaddrinfo() on failure, to avoid positive error values. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-25-armbru@redhat.com>
2023-10-11	migration/rdma: Fix QEMUFileHooks method return values	Markus Armbruster
	The QEMUFileHooks methods don't come with a written contract. Digging through the code calling them, we find: * save_page(): Negative values RAM_SAVE_CONTROL_DELAYED and RAM_SAVE_CONTROL_NOT_SUPP are special. Any other negative value is an unspecified error. qemu_rdma_save_page() returns -EIO or rdma->error_state on error. I believe the latter is always negative. Nothing stops either of them to clash with the special values, though. Feels unlikely, but fix it anyway to return only the special values and -1. * before_ram_iterate(), after_ram_iterate(): Negative value means error. qemu_rdma_registration_start() and qemu_rdma_registration_stop() comply as far as I can tell. Make them comply obviously, by returning -1 on error. * hook_ram_load: Negative value means error. rdma_load_hook() already returns -1 on error. Leave it alone. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-24-armbru@redhat.com>
2023-10-11	migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port	Markus Armbruster
	qemu_rdma_data_init() neglects to set an Error when it fails because @host_port is null. Fortunately, no caller passes null, so this is merely a latent bug. Drop the flawed code handling null argument. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-23-armbru@redhat.com>
2023-10-11	migration/rdma: Fix qemu_get_cm_event_timeout() to always set error	Markus Armbruster
	qemu_get_cm_event_timeout() neglects to set an error when it fails because rdma_get_cm_event() fails. Harmless, as its caller qemu_rdma_connect() substitutes a generic error then. Fix it anyway. qemu_rdma_connect() also sets the generic error when its own call of rdma_get_cm_event() fails. Make the error handling more obvious: set a specific error right after rdma_get_cm_event() fails. Delete the generic error. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-22-armbru@redhat.com>
2023-10-11	migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error	Markus Armbruster
	qemu_rdma_resolve_host() and qemu_rdma_dest_init() try addresses until they find on that works. If none works, they return the first Error set by qemu_rdma_broken_ipv6_kernel(), or else return a generic one. qemu_rdma_broken_ipv6_kernel() neglects to set an Error when ibv_open_device() fails. If a later address fails differently, we use that Error instead, or else the generic one. Harmless enough, but needs fixing all the same. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-21-armbru@redhat.com>
2023-10-11	migration/rdma: Replace dangerous macro CHECK_ERROR_STATE()	Markus Armbruster
	Hiding return statements in macros is a bad idea. Use a function instead, and open code the return part. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-20-armbru@redhat.com>
2023-10-11	migration/rdma: Fix io_writev(), io_readv() methods to obey contract	Markus Armbruster
	QIOChannelClass methods qio_channel_rdma_readv() and qio_channel_rdma_writev() violate their method contract when rdma->error_state is non-zero: 1. They return whatever is in rdma->error_state then. Only -1 will be fine. -2 will be misinterpreted as "would block". Anything less than -2 isn't defined in the contract. A positive value would be misinterpreted as success, but I believe that's not actually possible. 2. They neglect to set an error then. If something up the call stack dereferences the error when failure is returned, it will crash. If it ignores the return value and checks the error instead, it will miss the error. Crap like this happens when return statements hide in macros, especially when their uses are far away from the definition. I elected not to investigate how callers are impacted. Expand the two bad macro uses, so we can set an error and return -1. The next commit will then get rid of the macro altogether. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-19-armbru@redhat.com>
2023-10-11	migration/rdma: Ditch useless numeric error codes in error messages	Markus Armbruster
	Several error messages include numeric error codes returned by failed functions: * ibv_poll_cq() returns an unspecified negative value. Useless. * rdma_accept and rdma_get_cm_event() return -1. Useless. * qemu_rdma_poll() returns either -1 or an unspecified negative value. Useless. * qemu_rdma_block_for_wrid(), qemu_rdma_write_flush(), qemu_rdma_exchange_send(), qemu_rdma_exchange_recv(), qemu_rdma_write() return a negative value that may or may not be an errno value. While reporting human-readable errno information (which a number is not) can be useful, reporting an error code that may or may not be an errno value is useless. Drop these error codes from the error messages. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-18-armbru@redhat.com>
2023-10-11	migration/rdma: Fix or document problematic uses of errno	Markus Armbruster
	We use errno after calling Libibverbs functions that are not documented to set errno (manual page does not mention errno), or where the documentation is unclear ("returns [...] the value of errno on failure"). While this could be read as "sets errno and returns it", a glance at the source code[] kills that hope: static inline int ibv_post_send(struct ibv_qp qp, struct ibv_send_wr wr, struct ibv_send_wr bad_wr) { return qp->context->ops.post_send(qp, wr, bad_wr); } The callback can be static int mana_post_send(struct ibv_qp ibqp, struct ibv_send_wr wr, struct ibv_send_wr bad) { / This version of driver supports RAW QP only. * Posting WR is done directly in the application. / return EOPNOTSUPP; } Neither of them touches errno. One of these errno uses is easy to fix, so do that now. Several more will go away later in the series; add temporary FIXME commments. Three will remain; add TODO comments. TODO, not FIXME, because the bug might be in Libibverbs documentation. [] https://github.com/linux-rdma/rdma-core.git commit 55fa316b4b18f258d8ac1ceb4aa5a7a35b094dcf Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-17-armbru@redhat.com>
2023-10-11	migration/rdma: Use bool for two RDMAContext flags	Markus Armbruster
	@error_reported and @received_error are flags. The latter is even assigned bool true. Change them from int to bool. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-16-armbru@redhat.com>
2023-10-11	migration/rdma: Make qemu_rdma_buffer_mergeable() return bool	Markus Armbruster
	qemu_rdma_buffer_mergeable() is semantically a predicate. It returns int 0 or 1. Return bool instead, and fix the function name's spelling. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-15-armbru@redhat.com>
2023-10-11	migration/rdma: Drop qemu_rdma_search_ram_block() error handling	Markus Armbruster
	qemu_rdma_search_ram_block() can't fail. Return void, and drop the unreachable error handling. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-14-armbru@redhat.com>
2023-10-11	migration/rdma: Drop rdma_add_block() error handling	Markus Armbruster
	rdma_add_block() can't fail. Return void, and drop the unreachable error handling. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-13-armbru@redhat.com>
2023-10-11	migration/rdma: Eliminate error_propagate()	Markus Armbruster
	When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-12-armbru@redhat.com>
2023-10-11	migration/rdma: Put @errp parameter last	Markus Armbruster
	include/qapi/error.h demands: * - Functions that use Error to report errors have an Error *errp parameter. It should be the last parameter, except for functions * taking variable arguments. qemu_rdma_connect() does not conform. Clean it up. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-11-armbru@redhat.com>
2023-10-11	migration/rdma: Fix qemu_rdma_accept() to return failure on errors	Markus Armbruster
	qemu_rdma_accept() returns 0 in some cases even when it didn't complete its job due to errors. Impact is not obvious. I figure the caller will soon fail again with a misleading error message. Fix it to return -1 on any failure. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-10-armbru@redhat.com>
2023-10-11	migration/rdma: Give qio_channel_rdma_source_funcs internal linkage	Markus Armbruster
	Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-9-armbru@redhat.com>
2023-10-11	migration/rdma: Clean up two more harmless signed vs. unsigned issues	Markus Armbruster
	qemu_rdma_exchange_get_response() compares int parameter @expecting with uint32_t head->type. Actual arguments are non-negative enumeration constants, RDMAControlHeader uint32_t member type, or qemu_rdma_exchange_recv() int parameter expecting. Actual arguments for the latter are non-negative enumeration constants. Change both parameters to uint32_t. In qio_channel_rdma_readv(), loop control variable @i is ssize_t, and counts from 0 up to @niov, which is size_t. Change @i to size_t. While there, make qio_channel_rdma_readv() and qio_channel_rdma_writev() more consistent: change the former's @done to ssize_t, and delete the latter's useless initialization of @len. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-8-armbru@redhat.com>
2023-10-11	migration/rdma: Fix unwanted integer truncation	Markus Armbruster
	qio_channel_rdma_readv() assigns the size_t value of qemu_rdma_fill() to an int variable before it adds it to @done / subtracts it from @want, both size_t. Truncation when qemu_rdma_fill() copies more than INT_MAX bytes. Seems vanishingly unlikely, but needs fixing all the same. Fixes: 6ddd2d76ca6f (migration: convert RDMA to use QIOChannel interface) Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-7-armbru@redhat.com>
2023-10-11	migration/rdma: Consistently use uint64_t for work request IDs	Markus Armbruster
	We use int instead of uint64_t in a few places. Change them to uint64_t. This cleans up a comparison of signed qemu_rdma_block_for_wrid() parameter @wrid_requested with unsigned @wr_id. Harmless, because the actual arguments are non-negative enumeration constants. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-6-armbru@redhat.com>
2023-10-11	migration/rdma: Drop fragile wr_id formatting	Markus Armbruster
	wrid_desc[] uses 4001 pointers to map four integer values to strings. print_wrid() accesses wrid_desc[] out of bounds when passed a negative argument. It returns null for values 2..1999 and 2001..3999. qemu_rdma_poll() and qemu_rdma_block_for_wrid() print wrid_desc[wr_id] and passes print_wrid(wr_id) to tracepoints. Could conceivably crash trying to format a null string. I believe access out of bounds is not possible. Not worth cleaning up. Dumb down to show just numeric wr_id. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-5-armbru@redhat.com>
2023-10-11	migration/rdma: Clean up rdma_delete_block()'s return type	Markus Armbruster
	rdma_delete_block() always returns 0, which its only caller ignores. Return void instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-4-armbru@redhat.com>
2023-10-11	migration/rdma: Clean up qemu_rdma_data_init()'s return type	Markus Armbruster
	qemu_rdma_data_init() return type is void . It actually returns RDMAContext , and all its callers assign the value to an RDMAContext . Unclean. Return RDMAContext instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-3-armbru@redhat.com>
2023-10-11	migration/rdma: Clean up qemu_rdma_poll()'s return type	Markus Armbruster
	qemu_rdma_poll()'s return type is uint64_t, even though it returns 0, -1, or @ret, which is int. Its callers assign the return value to int variables, then check whether it's negative. Unclean. Return int instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230928132019.2544702-2-armbru@redhat.com>
2023-10-11	migration: Allow RECOVER->PAUSED convertion for dest qemu	Peter Xu
	There's a bug on dest that if a double fault triggered on dest qemu (a network issue during postcopy-recover), we won't set PAUSED correctly because we assumed we always came from ACTIVE. Fix that by always overwriting the state to PAUSE. We could also check for these two states, but maybe it's an overkill. We did the same on the src QEMU to unconditionally switch to PAUSE anyway. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231004220240.167175-10-peterx@redhat.com>
2023-10-11	migration: Set migration status early in incoming side	Fabiano Rosas
	We are sending a migration event of MIGRATION_STATUS_SETUP at qemu_start_incoming_migration but never actually setting the state. This creates a window between qmp_migrate_incoming and process_incoming_migration_co where the migration status is still MIGRATION_STATUS_NONE. Calling query-migrate during this time will return an empty response even though the incoming migration command has already been issued. Commit 7cf1fe6d68 ("migration: Add migration events on target side") has added support to the 'events' capability to the incoming part of migration, but chose to send the SETUP event without setting the state. I'm assuming this was a mistake. This introduces a change in behavior, any QMP client waiting for the SETUP event will hang, unless it has previously enabled the 'events' capability. Having the capability enabled is sufficient to continue to receive the event. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230712190742.22294-5-farosas@suse.de>
2023-10-11	migration/qmp: Fix crash on setting tls-authz with null	Peter Xu
	QEMU will crash if anyone tries to set tls-authz (which is a type StrOrNull) with 'null' value. Fix it in the easy way by converting it to qstring just like the other two tls parameters. Cc: qemu-stable@nongnu.org # v4.0+ Fixes: d2f1d29b95 ("migration: add support for a "tls-authz" migration parameter") Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230905162335.235619-2-peterx@redhat.com>
2023-10-10	migration/dirtyrate: use QEMU_CLOCK_HOST to report start-time	Andrei Gudkov
	Currently query-dirty-rate uses QEMU_CLOCK_REALTIME as the source for start-time field. This translates to clock_gettime(CLOCK_MONOTONIC), i.e. number of seconds since host boot. This is not very useful. The only reasonable use case of start-time I can imagine is to check whether previously completed measurements are too old or not. But this makes sense only if start-time is reported as host wall-clock time. This patch replaces source of start-time from QEMU_CLOCK_REALTIME to QEMU_CLOCK_HOST. Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com> Reviewed-by: Hyman Huang <yong.huang@smartx.com> Message-Id: <399861531e3b24a1ecea2ba453fb2c3d129fb03a.1693905328.git.gudkov.andrei@huawei.com> Signed-off-by: Hyman Huang <yong.huang@smartx.com>
2023-10-10	migration/calc-dirty-rate: millisecond-granularity period	Andrei Gudkov
	This patch allows to measure dirty page rate for sub-second intervals of time. An optional argument is introduced -- calc-time-unit. For example: {"execute": "calc-dirty-rate", "arguments": {"calc-time": 500, "calc-time-unit": "millisecond"} } Millisecond granularity allows to make predictions whether migration will succeed or not. To do this, calculate dirty rate with calc-time set to max allowed downtime (e.g. 300ms), convert measured rate into volume of dirtied memory, and divide by network throughput. If the value is lower than max allowed downtime, then migration will converge. Measurement results for single thread randomly writing to a 1/4/24GiB memory region: +----------------+-----------------------------------------------+ \| calc-time \| dirty rate MiB/s \| \| (milliseconds) +----------------+---------------+--------------+ \| \| theoretical \| page-sampling \| dirty-bitmap \| \| \| (at 3M wr/sec) \| \| \| +----------------+----------------+---------------+--------------+ \| 1GiB \| +----------------+----------------+---------------+--------------+ \| 100 \| 6996 \| 7100 \| 3192 \| \| 200 \| 4606 \| 4660 \| 2655 \| \| 300 \| 3305 \| 3280 \| 2371 \| \| 400 \| 2534 \| 2525 \| 2154 \| \| 500 \| 2041 \| 2044 \| 1871 \| \| 750 \| 1365 \| 1341 \| 1358 \| \| 1000 \| 1024 \| 1052 \| 1025 \| \| 1500 \| 683 \| 678 \| 684 \| \| 2000 \| 512 \| 507 \| 513 \| +----------------+----------------+---------------+--------------+ \| 4GiB \| +----------------+----------------+---------------+--------------+ \| 100 \| 10232 \| 8880 \| 4070 \| \| 200 \| 8954 \| 8049 \| 3195 \| \| 300 \| 7889 \| 7193 \| 2881 \| \| 400 \| 6996 \| 6530 \| 2700 \| \| 500 \| 6245 \| 5772 \| 2312 \| \| 750 \| 4829 \| 4586 \| 2465 \| \| 1000 \| 3865 \| 3780 \| 2178 \| \| 1500 \| 2694 \| 2633 \| 2004 \| \| 2000 \| 2041 \| 2031 \| 1789 \| +----------------+----------------+---------------+--------------+ \| 24GiB \| +----------------+----------------+---------------+--------------+ \| 100 \| 11495 \| 8640 \| 5597 \| \| 200 \| 11226 \| 8616 \| 3527 \| \| 300 \| 10965 \| 8386 \| 2355 \| \| 400 \| 10713 \| 8370 \| 2179 \| \| 500 \| 10469 \| 8196 \| 2098 \| \| 750 \| 9890 \| 7885 \| 2556 \| \| 1000 \| 9354 \| 7506 \| 2084 \| \| 1500 \| 8397 \| 6944 \| 2075 \| \| 2000 \| 7574 \| 6402 \| 2062 \| +----------------+----------------+---------------+--------------+ Theoretical values are computed according to the following formula: size * (1 - (1-(4096/size))^(timewps)) / (time 2^20), where size is in bytes, time is in seconds, and wps is number of writes per second. Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com> Reviewed-by: Hyman Huang <yong.huang@smartx.com> Message-Id: <d802e6b8053eb60fbec1a784cf86f67d9528e0a8.1693895970.git.gudkov.andrei@huawei.com> Signed-off-by: Hyman Huang <yong.huang@smartx.com>
2023-10-04	migration: Unify and trace vmstate field_exists() checks	Peter Xu
	For both save/load we actually share the logic on deciding whether a field should exist. Merge the checks into a helper and use it for both save and load. When doing so, add documentations and reformat the code to make it much easier to read. The real benefit here (besides code cleanups) is we add a trace-point for this; this is a known spot where we can easily break migration compatibilities between binaries, and this trace point will be critical for us to identify such issues. For example, this will be handy when debugging things like: https://gitlab.com/qemu-project/qemu/-/issues/932 Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230906204722.514474-1-peterx@redhat.com>
2023-10-04	migration: file URI offset	Steve Sistare
	Allow an offset option to be specified as part of the file URI, in the form "file:filename,offset=offset", where offset accepts the common size suffixes, or the 0x prefix, but not both. Migration data is written to and read from the file starting at offset. If unspecified, it defaults to 0. This is needed by libvirt to store its own data at the head of the file. Suggested-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <1694182931-61390-3-git-send-email-steven.sistare@oracle.com>
2023-10-04	migration: file URI	Steve Sistare
	Extend the migration URI to support file:<filename>. This can be used for any migration scenario that does not require a reverse path. It can be used as an alternative to 'exec:cat > file' in minimized containers that do not contain /bin/sh, and it is easier to use than the fd:<fdname> URI. It can be used in HMP commands, and as a qemu command-line parameter. For best performance, guest ram should be shared and x-ignore-shared should be true, so guest pages are not written to the file, in which case the guest may remain running. If ram is not so configured, then the user is advised to stop the guest first. Otherwise, a busy guest may re-dirty the same page, causing it to be appended to the file multiple times, and the file may grow unboundedly. That issue is being addressed in the "fixed-ram" patch series. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Tested-by: Michael Galaxy <mgalaxy@akamai.com> Reviewed-by: Michael Galaxy <mgalaxy@akamai.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <1694182931-61390-2-git-send-email-steven.sistare@oracle.com>
2023-10-04	migration/rdma: zore out head.repeat to make the error more clear	Li Zhijian
	Previously, we got a confusion error that complains the RDMAControlHeader.repeat: qemu-system-x86_64: rdma: Too many requests in this message (3638950032).Bailing. Actually, it's caused by an unexpected RDMAControlHeader.type. After this patch, error will become: qemu-system-x86_64: Unknown control message QEMU FILE Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230926100103.201564-2-lizhijian@fujitsu.com>
2023-10-04	migration: Update error description outside migration.c	Tejus GK
	A few code paths exist in the source code,where a migration is marked as failed via MIGRATION_STATUS_FAILED, but the failure happens outside of migration.c In such cases, an error_report() call is made, however the current MigrationState is never updated with the error description, and hence clients like libvirt never know the actual reason for the failure. This patch covers such cases outside of migration.c and updates the error description at the appropriate places. Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Tejus GK <tejus.gk@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231003065538.244752-3-tejus.gk@nutanix.com>
2023-10-04	migration/vmstate: Introduce vmstate_save_state_with_err	Tejus GK
	Currently, a few code paths exist in the function vmstate_save_state_v, which ultimately leads to a migration failure. However, an update in the current MigrationState for the error description is never done. vmstate.c somehow doesn't seem to allow the use of migrate_set_error due to some dependencies for unit tests. Hence, this patch introduces a new function vmstate_save_state_with_err, which will eventually propagate the error message to savevm.c where a migrate_set_error call can be eventually done. Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Tejus GK <tejus.gk@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231003065538.244752-2-tejus.gk@nutanix.com>
2023-10-02	Merge tag 'migration-20231002-pull-request' of ↵	Stefan Hajnoczi
	https://gitlab.com/juan.quintela/qemu into staging Migration Pull request (20231002) In this migration pull request: - Refactor repeated call of yank_unregister_instance (tejus) - More migraton-test changes Please, apply. # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmUatX4ACgkQ9IfvGFhy # 1yMlbQ/+Kp7m1Mr5LUM/8mvh9LZTVvWauBHch1pdvpCsJO+Grdtv6MtZL5UKT2ue # xYksZvf/rT4bdt2H1lSsG1o2GOcIf4qyWICgYNDo8peaxm1IrvgAbimaWHWLeORX # sBxKcBBuTac55vmEKzbPSbwGCGGTU/11UGXQ4ruGN3Hwbd2JZHAK6GxGIzANToZc # JtwBr/31SxJ2YndNLaPMEnD3cHbRbD2UyODeTt1KI5LdTGgXHoB6PgCk2AMQP1Ko # LlaPLsrEKC06h2CJ27BB36CNVEGMN2iFa3aKz1FC85Oj2ckatspAFw78t9guj6eM # MYxn0ipSsjjWjMsc3zEDxi7JrA///5bp1e6e7WdLpOaMBPpV4xuvVvA6Aku2es7D # fMPOMdftBp6rrXp8edBMTs1sOHdE1k8ZsyJ90m96ckjfLX39TPAiJRm4pWD2UuP5 # Wjr+/IU+LEp/KCqimMj0kYMRz4rM3PP8hOakPZLiRR5ZG6sgbHZK44iPXB/Udz/g # TCZ87siIpI8YHb3WCaO5CvbdjPrszg1j9v7RimtDeGLDR/hNokkQ1EEeszDTGpgt # xst4S4wVmex2jYyi53woH4V1p8anP7iqa8elPehAaYPobp47pmBV53ZaSwibqzPN # TmO7P9rfyQGCiXXZRvrAQJa+gmAkQlSEI7mSssV77pU+1gdEj9c= # =hD/8 # -----END PGP SIGNATURE----- # gpg: Signature made Mon 02 Oct 2023 08:20:14 EDT # gpg: using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [full] # gpg: aka "Juan Quintela <quintela@trasno.org>" [full] # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * tag 'migration-20231002-pull-request' of https://gitlab.com/juan.quintela/qemu: migration/rdma: Simplify the function that saves a page migration: Remove unused qemu_file_credit_transfer() migration/rdma: Don't use imaginary transfers migration/rdma: Remove QEMUFile parameter when not used migration/RDMA: It is accounting for zero/normal pages in two places migration: Don't abuse qemu_file transferred for RDMA migration: Use qemu_file_transferred_noflush() for block migration. migration: Refactor repeated call of yank_unregister_instance migration-test: simplify shmem_opts handling migration-test: dirtylimit checks for x86_64 arch before migration-test: Add bootfile_create/delete() functions migration-test: bootpath is the same for all tests and for all archs migration-test: Create kvm_opts Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-09-29	migration/rdma: Simplify the function that saves a page	Juan Quintela
	When we sent a page through QEMUFile hooks (RDMA) there are three posiblities: - We are not using RDMA. return RAM_SAVE_CONTROL_DELAYED and control_save_page() returns false to let anything else to proceed. - There is one error but we are using RDMA. Then we return a negative value, control_save_page() needs to return true. - Everything goes well and RDMA start the sent of the page asynchronously. It returns RAM_SAVE_CONTROL_DELAYED and we need to return 1 for ram_save_page_legacy. Clear? I know, I know, the interface is as bad as it gets. I think that now it is a bit clearer, but this needs to be done some other way. Reviewed-by: Leonardo Bras <leobras@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230515195709.63843-16-quintela@redhat.com>
2023-09-29	migration: Remove unused qemu_file_credit_transfer()	Juan Quintela
	After this change, nothing abuses QEMUFile to account for data transferrefd during migration. Reviewed-by: Leonardo Bras <leobras@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230515195709.63843-15-quintela@redhat.com>
2023-09-29	migration/rdma: Don't use imaginary transfers	Juan Quintela
	RDMA protocol is completely asynchronous, so in qemu_rdma_save_page() they "invent" that a byte has been transferred. And then they call qemu_file_credit_transfer() and ram_transferred_add() with that byte. Just remove that calls as nothing has been sent. Reviewed-by: Leonardo Bras <leobras@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230515195709.63843-14-quintela@redhat.com>
2023-09-29	migration/rdma: Remove QEMUFile parameter when not used	Juan Quintela
	Reviewed-by: Leonardo Bras <leobras@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230515195709.63843-13-quintela@redhat.com>
2023-09-29	migration/RDMA: It is accounting for zero/normal pages in two places	Juan Quintela
	Remove the one in control_save_page(). Reviewed-by: Leonardo Bras <leobras@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230515195709.63843-12-quintela@redhat.com>
2023-09-29	migration: Don't abuse qemu_file transferred for RDMA	Juan Quintela
	Just create a variable for it, the same way that multifd does. This way it is safe to use for other thread, etc, etc. Reviewed-by: Leonardo Bras <leobras@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230515195709.63843-11-quintela@redhat.com>
2023-09-29	migration: Use qemu_file_transferred_noflush() for block migration.	Juan Quintela
	We only care about the amount of bytes transferred. Flushing is done by the system somewhere else. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230530183941.7223-4-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-09-29	migration: Refactor repeated call of yank_unregister_instance	Tejus GK
	In the function qmp_migrate(), yank_unregister_instance() gets called twice which isn't required. Hence, refactoring it so that it gets called during the local_error cleanup. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Tejus GK <tejus.gk@nutanix.com> Message-ID: <20230621130940.178659-3-tejus.gk@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-09-29	migration: Clean up local variable shadowing	Markus Armbruster
	Local variables shadowing other local variables or parameters make the code needlessly hard to understand. Tracked down with -Wshadow=local. Clean up: delete inner declarations when they are actually redundant, else rename variables. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Message-ID: <20230921121312.1301864-3-armbru@redhat.com>
2023-09-29	migration/rdma: Fix save_page method to fail on polling error	Markus Armbruster
	qemu_rdma_save_page() reports polling error with error_report(), then succeeds anyway. This is because the variable holding the polling status shadows the variable the function returns. The latter remains zero. Broken since day one, and duplicated more recently. Fixes: 2da776db4846 (rdma: core logic) Fixes: b390afd8c50b (migration/rdma: Fix out of order wrid) Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Message-ID: <20230921121312.1301864-2-armbru@redhat.com>
2023-09-27	migration: Move return path cleanup to main migration thread	Fabiano Rosas
	Now that the return path thread is allowed to finish during a paused migration, we can move the cleanup of the QEMUFiles to the main migration thread. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-9-farosas@suse.de>