aboutsummaryrefslogtreecommitdiff
path: root/migration/ram.c
AgeCommit message (Collapse)Author
2018-06-15migration: calculate expected_downtime with ram_bytes_remaining()Balamuruhan S
expected_downtime value is not accurate with dirty_pages_rate * page_size, using ram_bytes_remaining() would yeild it resonable. consider to read the remaining ram just after having updated the dirty pages count later migration_bitmap_sync_range() in migration_bitmap_sync() and reuse the `remaining` field in ram_counters to hold ram_bytes_remaining() for calculating expected_downtime. Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20180612085009.17594-2-bala24@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15migration/postcopy: Wake rate limit sleep on postcopy requestDr. David Alan Gilbert
Use the 'urgent request' mechanism added in the previous patch for entries added to the postcopy request queue for RAM. Ignore the rate limiting while we have requests. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180613102642.23995-4-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15migration: introduce migration_update_ratesXiao Guangrong
It is used to slightly clean the code up, no logic is changed Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180604095520.8563-5-xiaoguangrong@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15migration: fix counting xbzrle cache_miss_rateXiao Guangrong
Sync up xbzrle_cache_miss_prev only after migration iteration goes forward Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180604095520.8563-4-xiaoguangrong@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15migration: Poison ramblock loops in migrationDr. David Alan Gilbert
The migration code should be using the RAMBLOCK_FOREACH_MIGRATABLE and qemu_ram_foreach_block_migratable not the all-block versions; poison them so that we can't accidentally use them. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180605162545.80778-3-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15migration: Fixes for non-migratable RAMBlocksDr. David Alan Gilbert
There are still a few cases where migration code is using the macros and functions that do all RAMBlocks rather than just the migratable blocks; fix those up. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180605162545.80778-2-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-04Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20180604' ↵Peter Maydell
into staging migration/next for 20180604 # gpg: Signature made Mon 04 Jun 2018 05:14:24 BST # gpg: using RSA key F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20180604: migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect migration: remove unnecessary variables len in QIOChannelRDMA migration: Don't activate block devices if using -S migration: discard non-migratable RAMBlocks migration: introduce decompress-error-check Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-04Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell
acpi, vhost, misc: fixes, features vDPA support, fix to vhost blk RO bit handling, some include path cleanups, NFIT ACPI table. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Fri 01 Jun 2018 17:25:19 BST # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (31 commits) vhost-blk: turn on pre-defined RO feature bit ACPI testing: test NFIT platform capabilities nvdimm, acpi: support NFIT platform capabilities tests/.gitignore: add entry for generated file arch_init: sort architectures ui: use local path for local headers qga: use local path for local headers colo: use local path for local headers migration: use local path for local headers usb: use local path for local headers sd: fix up include vhost-scsi: drop an unused include ppc: use local path for local headers rocker: drop an unused include e1000e: use local path for local headers ioapic: fix up includes ide: use local path for local headers display: use local path for local headers trace: use local path for local headers migration: drop an unused include ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-04migration: discard non-migratable RAMBlocksCédric Le Goater
On the POWER9 processor, the XIVE interrupt controller can control interrupt sources using MMIO to trigger events, to EOI or to turn off the sources. Priority management and interrupt acknowledgment is also controlled by MMIO in the presenter sub-engine. These MMIO regions are exposed to guests in QEMU with a set of 'ram device' memory mappings, similarly to VFIO, and the VMAs are populated dynamically with the appropriate pages using a fault handler. But, these regions are an issue for migration. We need to discard the associated RAMBlocks from the RAM state on the source VM and let the destination VM rebuild the memory mappings on the new host in the post_load() operation just before resuming the system. To achieve this goal, the following introduces a new RAMBlock flag RAM_MIGRATABLE which is updated in the vmstate_register_ram() and vmstate_unregister_ram() routines. This flag is then used by the migration to identify RAMBlocks to discard on the source. Some checks are also performed on the destination to make sure nothing invalid was sent. This change impacts the boston, malta and jazz mips boards for which migration compatibility is broken. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-06-04migration: introduce decompress-error-checkXiao Guangrong
QEMU 3.0 enables strict check for compression & decompression to make the migration more robust, that depends on the source to fix the internal design which triggers the unexpected error conditions To make it work for migrating old version QEMU to 2.13 QEMU, we introduce this parameter to disable the error check on the destination which is the default behavior of the machine type which is older than 2.13, alternately, the strict check can be enabled explicitly as followings: -M pc-q35-2.11 -global migration.decompress-error-check=true Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-06-01migration: use local path for local headersMichael S. Tsirkin
When pulling in headers that are in the same directory as the C file (as opposed to one in include/), we should use its relative path, without a directory. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>
2018-05-25migration: use g_free for ram load bitmapPeter Xu
Buffers allocated with bitmap_new() should be freed with g_free(). Both reported by Coverity: *** CID 1391300: API usage errors (ALLOC_FREE_MISMATCH) /migration/ram.c: 3517 in ram_dirty_bitmap_reload() 3511 * the last one to sync, we need to notify the main send thread. 3512 */ 3513 ram_dirty_bitmap_reload_notify(s); 3514 3515 ret = 0; 3516 out: >>> CID 1391300: API usage errors (ALLOC_FREE_MISMATCH) >>> Calling "free" frees "le_bitmap" using "free" but it should have been freed using "g_free". 3517 free(le_bitmap); 3518 return ret; 3519 } 3520 3521 static int ram_resume_prepare(MigrationState *s, void *opaque) 3522 { *** CID 1391292: API usage errors (ALLOC_FREE_MISMATCH) /migration/ram.c: 249 in ramblock_recv_bitmap_send() 243 * Mark as an end, in case the middle part is screwed up due to 244 * some "misterious" reason. 245 */ 246 qemu_put_be64(file, RAMBLOCK_RECV_BITMAP_ENDING); 247 qemu_fflush(file); 248 >>> CID 1391292: API usage errors (ALLOC_FREE_MISMATCH) >>> Calling "free" frees "le_bitmap" using "free" but it should have been freed using "g_free". 249 free(le_bitmap); 250 251 if (qemu_file_get_error(file)) { 252 return qemu_file_get_error(file); 253 } 254 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20180525015042.31778-1-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15migration: setup ramstate for resumePeter Xu
After we updated the dirty bitmaps of ramblocks, we also need to update the critical fields in RAMState to make sure it is ready for a resume. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-18-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15migration: synchronize dirty bitmap for resumePeter Xu
This patch implements the first part of core RAM resume logic for postcopy. ram_resume_prepare() is provided for the work. When the migration is interrupted by network failure, the dirty bitmap on the source side will be meaningless, because even the dirty bit is cleared, it is still possible that the sent page was lost along the way to destination. Here instead of continue the migration with the old dirty bitmap on source, we ask the destination side to send back its received bitmap, then invert it to be our initial dirty bitmap. The source side send thread will issue the MIG_CMD_RECV_BITMAP requests, once per ramblock, to ask for the received bitmap. On destination side, MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap. Data will be received on the return-path thread of source, and the main migration thread will be notified when all the ramblock bitmaps are synchronized. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-17-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15migration: new message MIG_RP_MSG_RECV_BITMAPPeter Xu
Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send received bitmap of ramblock back to source. This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only the header (including the ramblock name), and it was appended with the whole ramblock received bitmap on the destination side. When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP), it parses it, convert it to the dirty bitmap by inverting the bits. One thing to mention is that, when we send the recv bitmap, we are doing these things in extra: - converting the bitmap to little endian, to support when hosts are using different endianess on src/dst. - do proper alignment for 8 bytes, to support when hosts are using different word size (32/64 bits) on src/dst. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-13-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15migration: Define MultifdRecvParams soonerJuan Quintela
Once there, we don't need the struct names anywhere, just the typedefs. And now also document all fields. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-05-15migration: Transmit initial package through the multifd channelsJuan Quintela
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> -- Be network agnostic. Add error checking for all values.
2018-05-15migration: Delay start of migration main routinesJuan Quintela
We need to make sure that we have started all the multifd threads. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-05-15migration: Create multifd channelsJuan Quintela
In both sides. We still don't transmit anything through them. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-05-15migration: Be sure all recv channels are createdJuan Quintela
We need them before we start migration. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-05-15migration: terminate_* can be called for other threadsJuan Quintela
Once there, make count field to always be accessed with atomic operations. To make blocking operations, we need to know that the thread is running, so create a bool to indicate that. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> -- Once here, s/terminate_multifd_*-threads/multifd_*_terminate_threads/ This is consistente with every other function
2018-05-15migration: Introduce multifd_recv_new_channel()Juan Quintela
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-05-15migration: Set error state in case of errorJuan Quintela
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15migration: fix saving normal page even if it's been compressedXiao Guangrong
Fix the bug introduced by da3f56cb2e767016 (migration: remove ram_save_compressed_page()), It should be 'return' rather than 'res' Sorry for this stupid mistake :( Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180428081045.8878-1-xiaoguangrong@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-04-25migration: remove ram_save_compressed_page()Xiao Guangrong
Now, we can reuse the path in ram_save_page() to post the page out as normal, then the only thing remained in ram_save_compressed_page() is compression that we can move it out to the caller Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-11-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: introduce save_normal_page()Xiao Guangrong
It directly sends the page to the stream neither checking zero nor using xbzrle or compression Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-10-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: move calling save_zero_page to the common placeXiao Guangrong
save_zero_page() is always our first approach to try, move it to the common place before calling ram_save_compressed_page and ram_save_page Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-9-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: move calling control_save_page to the common placeXiao Guangrong
The function is called by both ram_save_page and ram_save_target_page, so move it to the common caller to cleanup the code Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-8-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: move some code to ram_save_host_pageXiao Guangrong
Move some code from ram_save_target_page() to ram_save_host_page() to make it be more readable for latter patches that dramatically clean ram_save_target_page() up Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-7-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: introduce control_save_page()Xiao Guangrong
Abstract the common function control_save_page() to cleanup the code, no logic is changed Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-6-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: detect compression and decompression errorsXiao Guangrong
Currently the page being compressed is allowed to be updated by the VM on the source QEMU, correspondingly the destination QEMU just ignores the decompression error. However, we completely miss the chance to catch real errors, then the VM is corrupted silently To make the migration more robuster, we copy the page to a buffer first to avoid it being written by VM, then detect and handle the errors of both compression and decompression errors properly Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-5-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: stop decompression to allocate and free memory frequentlyXiao Guangrong
Current code uses uncompress() to decompress memory which manages memory internally, that causes huge memory is allocated and freed very frequently, more worse, frequently returning memory to kernel will flush TLBs So, we maintain the memory by ourselves and reuse it for each decompression Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-4-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: stop compression to allocate and free memory frequentlyXiao Guangrong
Current code uses compress2() to compress memory which manages memory internally, that causes huge memory is allocated and freed very frequently More worse, frequently returning memory to kernel will flush TLBs and trigger invalidation callbacks on mmu-notification which interacts with KVM MMU, that dramatically reduce the performance of VM So, we maintain the memory by ourselves and reuse it for each compression Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-3-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: stop compressing page in migration threadXiao Guangrong
As compression is a heavy work, do not do it in migration thread, instead, we post it out as a normal page Reviewed-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-2-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-03-20Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell
virtio,vhost,pci,pc: features, cleanups SRAT tables for DIMM devices new virtio net flags for speed/duplex post-copy migration support in vhost cleanups in pci Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Tue 20 Mar 2018 14:40:43 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (51 commits) postcopy shared docs libvhost-user: Claim support for postcopy postcopy: Allow shared memory vhost: Huge page align and merge vhost+postcopy: Wire up POSTCOPY_END notify vhost-user: Add VHOST_USER_POSTCOPY_END message libvhost-user: mprotect & madvises for postcopy vhost+postcopy: Call wakeups vhost+postcopy: Add vhost waker postcopy: postcopy_notify_shared_wake postcopy: helper for waking shared vhost+postcopy: Resolve client address postcopy-ram: add a stub for postcopy_request_shared_page vhost+postcopy: Helper to send requests to source for shared pages vhost+postcopy: Stash RAMBlock and offset vhost+postcopy: Send address back to qemu libvhost-user+postcopy: Register new regions with the ufd migration/ram: ramblock_recv_bitmap_test_byte_offset postcopy+vhost-user: Split set_mem_table for postcopy vhost+postcopy: Transmit 'listen' to slave ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # scripts/update-linux-headers.sh
2018-03-20migration/ram: ramblock_recv_bitmap_test_byte_offsetDr. David Alan Gilbert
Utility for testing the map when you already know the offset in the RAMBlock. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-03-13migration: introduce postcopy-only pendingVladimir Sementsov-Ogievskiy
There would be savevm states (dirty-bitmap) which can migrate only in postcopy stage. The corresponding pending is introduced here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-id: 20180313180320.339796-6-vsementsov@virtuozzo.com
2018-03-09migration: do not transfer ram during bulk storage migrationPeter Lieven
this patch makes the bulk phase of a block migration to take place before we start transferring ram. As the bulk block migration can take a long time its pointless to transfer ram during that phase. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <1520507908-16743-2-git-send-email-pl@kamp.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-03-02Include less of the generated modular QAPI headersMarkus Armbruster
In my "build everything" tree, a change to the types in qapi-schema.json triggers a recompile of about 4800 out of 5100 objects. The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h, qapi-types.h. Each of these headers still includes all its shards. Reduce compile time by including just the shards we actually need. To illustrate the benefits: adding a type to qapi/migration.json now recompiles some 2300 instead of 4800 objects. The next commit will improve it further. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-24-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-02-14migration: better error handling with QEMUFilePeter Xu
If the postcopy down due to some reason, we can always see this on dst: qemu-system-x86_64: RP: Received invalid message 0x0000 length 0x0000 However in most cases that's not the real issue. The problem is that qemu_get_be16() has no way to show whether the returned data is valid or not, and we are _always_ assuming it is valid. That's possibly not wise. The best approach to solve this would be: refactoring QEMUFile interface to allow the APIs to return error if there is. However it needs quite a bit of work and testing. For now, let's explicitly check the validity first before using the data in all places for qemu_get_*(). This patch tries to fix most of the cases I can see. Only if we are with this, can we make sure we are processing the valid data, and also can we make sure we can capture the channel down events correctly. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180208103132.28452-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-02-14migration: Fix early failure cleanupDr. David Alan Gilbert
Avoid crash in cleanup after a very early migration failure (possibly due to my 688a3dcba980bf01344a 'Route errors down ...') Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180212160340.15333-2-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
2018-02-09Include qapi/error.h exactly where neededMarkus Armbruster
This cleanup makes the number of objects depending on qapi/error.h drop from 1910 (out of 4743) to 1612 in my "build everything" tree. While there, separate #include from file comment with a blank line, and drop a useless comment on why qemu/osdep.h is included first. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-5-armbru@redhat.com> [Semantic conflict with commit 34e304e975 resolved, OSX breakage fixed]
2018-02-06migration: Drop current address parameter from save_zero_page()Juan Quintela
It already has RAMBlock and offset, it can calculate it itself. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-01-15migration: Guard ram_bytes_remaining against early callDr. David Alan Gilbert
Calling ram_bytes_remaining during the early part of setup is unsafe because the ram_state isn't yet initialised. This can happen in the sequence: migrate migrate_cancel info migrate if the migrate sticks trying to connect (e.g. to an unresponsive destination due to the connect timeout). Here 'info migrate' sees a state of CANCELLING and so assumes the migrate has partially happened. partial fix for: RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1525899 Reported-by: Xianxian Wang <xianwang@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-11-22migration/ram.c: do not set 'postcopy_running' in POSTCOPY_INCOMING_ENDDaniel Henrique Barboza
When migrating a VM with 'migrate_set_capability postcopy-ram on' a postcopy_state is set during the process, ending up with the state POSTCOPY_INCOMING_END when the migration is over. This postcopy_state is taken into account inside ram_load to check how it will load the memory pages. This same ram_load is called when in a loadvm command. Inside ram_load, the logic to see if we're at postcopy_running state is: postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING postcopy_state_get() returns this enum type: typedef enum { POSTCOPY_INCOMING_NONE = 0, POSTCOPY_INCOMING_ADVISE, POSTCOPY_INCOMING_DISCARD, POSTCOPY_INCOMING_LISTENING, POSTCOPY_INCOMING_RUNNING, POSTCOPY_INCOMING_END } PostcopyState; In the case where ram_load is executed and postcopy_state is POSTCOPY_INCOMING_END, postcopy_running will be set to 'true' and ram_load will behave like a postcopy is in progress. This scenario isn't achievable in a migration but it is reproducible when executing savevm/loadvm after migrating with 'postcopy-ram on', causing loadvm to fail with Error -22: Source: (qemu) migrate_set_capability postcopy-ram on (qemu) migrate tcp:127.0.0.1:4444 Dest: (qemu) migrate_set_capability postcopy-ram on (qemu) ubuntu1704-intel login: Ubuntu 17.04 ubuntu1704-intel ttyS0 ubuntu1704-intel login: (qemu) (qemu) savevm test1 (qemu) loadvm test1 Unknown combination of migration flags: 0x4 (postcopy mode) error while loading state for instance 0x0 of device 'ram' Error -22 while loading VM state (qemu) This patch fixes this problem by changing the existing logic for postcopy_advised and postcopy_running in ram_load, making them 'false' if we're at POSTCOPY_INCOMING_END state. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> CC: Juan Quintela <quintela@redhat.com> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-10-29migration: Make xbzrle_cache_size a migration parameterJuan Quintela
Right now it is a variable in MigrationState instead of a MigrationParameter. The change allows to set it as the rest of the Migration parameters, from the command line, with query_migration_paramters, set_migrate_parameters, etc. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-10-29migration: No need to return the size of the cacheJuan Quintela
After the previous commits, we make sure that the value passed is right, or we just drop an error. So now we return if there is one error or we have setup correctly the value passed. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Improve error messasge Return 0 always for success
2017-10-29migration: Don't play games with the requested cache sizeJuan Quintela
Now that we check that the value passed is a power of 2, we don't need to play games when comparing what is the size that is going to take the cache. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-10-23migration: add bitmap for received pageAlexey Perevalov
This patch adds ability to track down already received pages, it's necessary for calculation vCPU block time in postcopy migration feature, and for recovery after postcopy migration failure. Also it's necessary to solve shared memory issue in postcopy livemigration. Information about received pages will be transferred to the software virtual bridge (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for already received pages. fallocate syscall is required for remmaped shared memory, due to remmaping itself blocks ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT error (struct page is exists after remmap). Bitmap is placed into RAMBlock as another postcopy/precopy related bitmaps. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-10-23migration: postcopy_place_page factoring outAlexey Perevalov
Need to mark copied pages as closer as possible to the place where it tracks down. That will be necessary in futher patch. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Juan Quintela <quintela@redhat.com>