diff options
author | Richard Henderson <richard.henderson@linaro.org> | 2023-06-30 08:11:08 +0200 |
---|---|---|
committer | Richard Henderson <richard.henderson@linaro.org> | 2023-06-30 08:11:08 +0200 |
commit | 408015a97dbe48a9dde8c0d2526c9312691952e7 (patch) | |
tree | b41b1a9349392da698def212ff60e19f7043f851 /docs/devel | |
parent | f7884164cbe3743c3bd2acc9daf877497fdb5fa3 (diff) | |
parent | 0cc889c8826cefa5b80110d31a62273b56aa1832 (diff) |
Merge tag 'pull-vfio-20230630' of https://github.com/legoater/qemu into staging
vfio queue:
* migration: New switchover ack to reduce downtime
* VFIO migration pre-copy support
* Removal of the VFIO migration experimental flag
* Alternate offset for GPUDirect Cliques
* Misc fixes
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmSeVHYACgkQUaNDx8/7
# 7KHeZw/+LRe9QQpx8hU//vKBvLet2QvI3WUaXGHiHbblbRT6HhiHjWHB2/8j6jji
# QhAGJ6w9yoKODyY0kGpVFEnkmXOKyqwWssBheV219ntZs09pFGxZr/ldUhT22aBN
# kH8mHU9BZ3J+zF/kKphpcIC1sPxVu/DlrtnJu5vDGuRAOu8+3kFV217JC1yGs1Vh
# n+KOho8a8oP9qxtzfvQ9iZ4dpBOOKpE9vscS12wJAlen93AGB6esR7VaLxDjExRP
# yL1pguQ8ZZ1gEXXbXO62djKo3IViobtD08KmCXTzQ6TVquLleJzqgjp+A0THnYAe
# J9Rlja7LpsO9MYSxmRE9WcQccC+sAGn/t/ufB0tL8zR43FvfhbF5H0PzBBY0H7YA
# JlzN+fgrKEEHJwMhXANNvSddhWCwvrkjNxo/80u3ySYMQR1Hav/tsXYBlk16e5nS
# fmtrFGTwhsVdy1Q6ZqEOyTni1eiYt5stEQMZFODdUNj6b9FugSZ0BK+2WN/M0CzU
# 6mKmJQgZAG/nBoRJm/XCO5OKQ6wm/4tm6F4HSH5EJ6mDT+DqETAk4GRUWTbYa2/G
# yAAOlhTMu8Xc/NhMeJ7Z99dyq0SM8pi/XpVEIv7p9yBak8ix60iCWZtDE8vlDv3M
# UfMVMTAvTS30kbS6FDN2Yyl6l8/ETdcwVIN4l02ipGzpMCtn9EQ=
# =dKUj
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 30 Jun 2023 06:05:10 AM CEST
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* tag 'pull-vfio-20230630' of https://github.com/legoater/qemu:
vfio/pci: Free leaked timer in vfio_realize error path
vfio/pci: Fix a segfault in vfio_realize
MAINTAINERS: Promote Cédric to VFIO co-maintainer
vfio/migration: Make VFIO migration non-experimental
vfio/migration: Reset bytes_transferred properly
vfio/pci: Call vfio_prepare_kvm_msi_virq_batch() in MSI retry path
hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques
vfio: Implement a common device info helper
vfio/migration: Add support for switchover ack capability
vfio/migration: Add VFIO migration pre-copy support
vfio/migration: Store VFIO migration flags in VFIOMigration
vfio/migration: Refactor vfio_save_block() to return saved data size
tests: Add migration switchover ack capability test
migration: Enable switchover ack capability
migration: Implement switchover ack logic
migration: Add switchover ack capability
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Diffstat (limited to 'docs/devel')
-rw-r--r-- | docs/devel/vfio-migration.rst | 45 |
1 files changed, 35 insertions, 10 deletions
diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst index 1b68ccf115..b433cb5bb2 100644 --- a/docs/devel/vfio-migration.rst +++ b/docs/devel/vfio-migration.rst @@ -7,12 +7,21 @@ the guest is running on source host and restoring this saved state on the destination host. This document details how saving and restoring of VFIO devices is done in QEMU. -Migration of VFIO devices currently consists of a single stop-and-copy phase. -During the stop-and-copy phase the guest is stopped and the entire VFIO device -data is transferred to the destination. - -The pre-copy phase of migration is currently not supported for VFIO devices. -Support for VFIO pre-copy will be added later on. +Migration of VFIO devices consists of two phases: the optional pre-copy phase, +and the stop-and-copy phase. The pre-copy phase is iterative and allows to +accommodate VFIO devices that have a large amount of data that needs to be +transferred. The iterative pre-copy phase of migration allows for the guest to +continue whilst the VFIO device state is transferred to the destination, this +helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy +support by reporting the VFIO_MIGRATION_PRE_COPY flag in the +VFIO_DEVICE_FEATURE_MIGRATION ioctl. + +When pre-copy is supported, it's possible to further reduce downtime by +enabling "switchover-ack" migration capability. +VFIO migration uAPI defines "initial bytes" as part of its pre-copy data stream +and recommends that the initial bytes are sent and loaded in the destination +before stopping the source VM. Enabling this migration capability will +guarantee that and thus, can potentially reduce downtime even further. Note that currently VFIO migration is supported only for a single device. This is due to VFIO migration's lack of P2P support. However, P2P support is planned @@ -29,10 +38,23 @@ VFIO implements the device hooks for the iterative approach as follows: * A ``load_setup`` function that sets the VFIO device on the destination in _RESUMING state. +* A ``state_pending_estimate`` function that reports an estimate of the + remaining pre-copy data that the vendor driver has yet to save for the VFIO + device. + * A ``state_pending_exact`` function that reads pending_bytes from the vendor driver, which indicates the amount of data that the vendor driver has yet to save for the VFIO device. +* An ``is_active_iterate`` function that indicates ``save_live_iterate`` is + active only when the VFIO device is in pre-copy states. + +* A ``save_live_iterate`` function that reads the VFIO device's data from the + vendor driver during iterative pre-copy phase. + +* A ``switchover_ack_needed`` function that checks if the VFIO device uses + "switchover-ack" migration capability when this capability is enabled. + * A ``save_state`` function to save the device config space if it is present. * A ``save_live_complete_precopy`` function that sets the VFIO device in @@ -111,8 +133,10 @@ Flow of state changes during Live migration =========================================== Below is the flow of state change during live migration. -The values in the brackets represent the VM state, the migration state, and +The values in the parentheses represent the VM state, the migration state, and the VFIO device state, respectively. +The text in the square brackets represents the flow if the VFIO device supports +pre-copy. Live migration save path ------------------------ @@ -124,11 +148,12 @@ Live migration save path | migrate_init spawns migration_thread Migration thread then calls each device's .save_setup() - (RUNNING, _SETUP, _RUNNING) + (RUNNING, _SETUP, _RUNNING [_PRE_COPY]) | - (RUNNING, _ACTIVE, _RUNNING) - If device is active, get pending_bytes by .state_pending_exact() + (RUNNING, _ACTIVE, _RUNNING [_PRE_COPY]) + If device is active, get pending_bytes by .state_pending_{estimate,exact}() If total pending_bytes >= threshold_size, call .save_live_iterate() + [Data of VFIO device for pre-copy phase is copied] Iterate till total pending bytes converge and are less than threshold | On migration completion, vCPU stops and calls .save_live_complete_precopy for |