diff options
author | Stefan Hajnoczi <stefanha@redhat.com> | 2023-12-20 09:39:18 -0500 |
---|---|---|
committer | Stefan Hajnoczi <stefanha@redhat.com> | 2023-12-20 09:39:18 -0500 |
commit | dd7d3e35401f80ffef4e209fa9e27db9087501b0 (patch) | |
tree | e5af65ff04d761a2f390540471373f47e8ffe54b /docs/devel | |
parent | bd00730ec0f621706d0179768436f82c39048499 (diff) | |
parent | 4278df9d1d2383b738338c857406357660f11e42 (diff) |
Merge tag 'pull-vfio-20231219' of https://github.com/legoater/qemu into staging
vfio queue:
* Introduce an IOMMU interface backend for VFIO devices
* Convert IOMMU type1 and sPAPR IOMMU to respective backends
* Introduce a new IOMMUFD backend for ARM, x86_64 and s390x platforms
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmWB34AACgkQUaNDx8/7
# 7KGOMxAAqXegvAneHqIlu4c8TzTuUR2rkYgev9RdfIHRDuY2XtaX14xlWn/rpTXZ
# qSgeta+iT8Cv4YV1POJeHWFDNs9E29p1w+R7nLcH1qTIIaZHtxwbVVQ3s7kAo1Vb
# 1S1G0/zIznzGVI50a0lj1gO2yQJnu/79nXpnICgA5REW0CscMssnvboQODlwq17V
# ZLNVM8CSAvKl6ppkmzRdfNXCfq6x7bf4MsvnuXsqda4TBbvyyTjAqdo/8sjKiGly
# gSDQqhgy6cvEXIF0UUHPJzFApf0YdXUDlL8hzH90hvRVu4W/t24dPmT7UkVIX9Ek
# TA7RVxv7iJlHtFDqfSTAJFr7nKO9Tm2V9N7xbD1OJUKrMoPZRT6+0R1hMKqsZ5z+
# nG6khqHGzuo/aI9n70YxYIPXt+vs/EHI4WUtslGLUTL0xv8lUzk6cxyIJupFRmDS
# ix6GM9TXOV8RyOveL2knHVymlFnAR6dekkMB+6ljUTuzDwG0oco4vno8z9bi7Vct
# j36bM56U3lhY+w+Ljoy0gPwgrw/FROnGG3mp1mwp1KRHqtEDnUQu8CaLbJOBsBGE
# JJDP6AKAYMczdmYVkd4CvE0WaeSxtOUxW5H5NCPjtaFQt0qEcght2lA2K15g521q
# jeojoJ/QK5949jnNCqm1Z66/YQVL79lPyL0E+mxEohwu+yTORk4=
# =U0x5
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 19 Dec 2023 13:22:56 EST
# gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [unknown]
# gpg: aka "Cédric Le Goater <clg@kaod.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1
* tag 'pull-vfio-20231219' of https://github.com/legoater/qemu: (47 commits)
hw/ppc/Kconfig: Imply VFIO_PCI
docs/devel: Add VFIO iommufd backend documentation
vfio: Introduce a helper function to initialize VFIODevice
vfio/ccw: Move VFIODevice initializations in vfio_ccw_instance_init
vfio/ap: Move VFIODevice initializations in vfio_ap_instance_init
vfio/platform: Move VFIODevice initializations in vfio_platform_instance_init
vfio/pci: Move VFIODevice initializations in vfio_instance_init
hw/i386: Activate IOMMUFD for q35 machines
kconfig: Activate IOMMUFD for s390x machines
hw/arm: Activate IOMMUFD for virt machines
vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps callbacks
vfio/ccw: Make vfio cdev pre-openable by passing a file handle
vfio/ccw: Allow the selection of a given iommu backend
vfio/ap: Make vfio cdev pre-openable by passing a file handle
vfio/ap: Allow the selection of a given iommu backend
vfio/platform: Make vfio cdev pre-openable by passing a file handle
vfio/platform: Allow the selection of a given iommu backend
vfio/pci: Make vfio cdev pre-openable by passing a file handle
vfio/pci: Allow the selection of a given iommu backend
vfio/iommufd: Enable pci hot reset through iommufd cdev interface
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Diffstat (limited to 'docs/devel')
-rw-r--r-- | docs/devel/index-internals.rst | 1 | ||||
-rw-r--r-- | docs/devel/vfio-iommufd.rst | 166 |
2 files changed, 167 insertions, 0 deletions
diff --git a/docs/devel/index-internals.rst b/docs/devel/index-internals.rst index 6f81df92bc..3def4a138b 100644 --- a/docs/devel/index-internals.rst +++ b/docs/devel/index-internals.rst @@ -18,5 +18,6 @@ Details about QEMU's various subsystems including how to add features to them. s390-dasd-ipl tracing vfio-migration + vfio-iommufd writing-monitor-commands virtio-backends diff --git a/docs/devel/vfio-iommufd.rst b/docs/devel/vfio-iommufd.rst new file mode 100644 index 0000000000..3d1c11f175 --- /dev/null +++ b/docs/devel/vfio-iommufd.rst @@ -0,0 +1,166 @@ +=============================== +IOMMUFD BACKEND usage with VFIO +=============================== + +(Same meaning for backend/container/BE) + +With the introduction of iommufd, the Linux kernel provides a generic +interface for user space drivers to propagate their DMA mappings to kernel +for assigned devices. While the legacy kernel interface is group-centric, +the new iommufd interface is device-centric, relying on device fd and iommufd. + +To support both interfaces in the QEMU VFIO device, introduce a base container +to abstract the common part of VFIO legacy and iommufd container. So that the +generic VFIO code can use either container. + +The base container implements generic functions such as memory_listener and +address space management whereas the derived container implements callbacks +specific to either legacy or iommufd. Each container has its own way to setup +secure context and dma management interface. The below diagram shows how it +looks like with both containers. + +:: + + VFIO AddressSpace/Memory + +-------+ +----------+ +-----+ +-----+ + | pci | | platform | | ap | | ccw | + +---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+ + | | | | | AddressSpace | + | | | | +------------+---------+ + +---V-----------V-----------V--------V----+ / + | VFIOAddressSpace | <------------+ + | | | MemoryListener + | VFIOContainerBase list | + +-------+----------------------------+----+ + | | + | | + +-------V------+ +--------V----------+ + | iommufd | | vfio legacy | + | container | | container | + +-------+------+ +--------+----------+ + | | + | /dev/iommu | /dev/vfio/vfio + | /dev/vfio/devices/vfioX | /dev/vfio/$group_id + Userspace | | + ============+============================+=========================== + Kernel | device fd | + +---------------+ | group/container fd + | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU) + | ATTACH_IOAS) | | device fd + | | | + | +-------V------------V-----------------+ + iommufd | | vfio | + (map/unmap | +---------+--------------------+-------+ + ioas_copy) | | | map/unmap + | | | + +------V------+ +-----V------+ +------V--------+ + | iommfd core | | device | | vfio iommu | + +-------------+ +------------+ +---------------+ + +* Secure Context setup + + - iommufd BE: uses device fd and iommufd to setup secure context + (bind_iommufd, attach_ioas) + - vfio legacy BE: uses group fd and container fd to setup secure context + (set_container, set_iommu) + +* Device access + + - iommufd BE: device fd is opened through ``/dev/vfio/devices/vfioX`` + - vfio legacy BE: device fd is retrieved from group fd ioctl + +* DMA Mapping flow + + 1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener + 2. VFIO populates DMA map/unmap via the container BEs + * iommufd BE: uses iommufd + * vfio legacy BE: uses container fd + +Example configuration +===================== + +Step 1: configure the host device +--------------------------------- + +It's exactly same as the VFIO device with legacy VFIO container. + +Step 2: configure QEMU +---------------------- + +Interactions with the ``/dev/iommu`` are abstracted by a new iommufd +object (compiled in with the ``CONFIG_IOMMUFD`` option). + +Any QEMU device (e.g. VFIO device) wishing to use ``/dev/iommu`` must +be linked with an iommufd object. It gets a new optional property +named iommufd which allows to pass an iommufd object. Take ``vfio-pci`` +device for example: + +.. code-block:: bash + + -object iommufd,id=iommufd0 + -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 + +Note the ``/dev/iommu`` and VFIO cdev can be externally opened by a +management layer. In such a case the fd is passed, the fd supports a +string naming the fd or a number, for example: + +.. code-block:: bash + + -object iommufd,id=iommufd0,fd=22 + -device vfio-pci,iommufd=iommufd0,fd=23 + +If the ``fd`` property is not passed, the fd is opened by QEMU. + +If no ``iommufd`` object is passed to the ``vfio-pci`` device, iommufd +is not used and the user gets the behavior based on the legacy VFIO +container: + +.. code-block:: bash + + -device vfio-pci,host=0000:02:00.0 + +Supported platform +================== + +Supports x86, ARM and s390x currently. + +Caveats +======= + +Dirty page sync +--------------- + +Dirty page sync with iommufd backend is unsupported yet, live migration is +disabled by default. But it can be force enabled like below, low efficient +though. + +.. code-block:: bash + + -object iommufd,id=iommufd0 + -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0,enable-migration=on + +P2P DMA +------- + +PCI p2p DMA is unsupported as IOMMUFD doesn't support mapping hardware PCI +BAR region yet. Below warning shows for assigned PCI device, it's not a bug. + +.. code-block:: none + + qemu-system-x86_64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR? + qemu-system-x86_64: vfio_container_dma_map(0x560cb6cb1620, 0xe000000021000, 0x3000, 0x7f32ed55c000) = -14 (Bad address) + +FD passing with mdev +-------------------- + +``vfio-pci`` device checks sysfsdev property to decide if backend is a mdev. +If FD passing is used, there is no way to know that and the mdev is treated +like a real PCI device. There is an error as below if user wants to enable +RAM discarding for mdev. + +.. code-block:: none + + qemu-system-x86_64: -device vfio-pci,iommufd=iommufd0,x-balloon-allowed=on,fd=9: vfio VFIO_FD9: x-balloon-allowed only potentially compatible with mdev devices + +``vfio-ap`` and ``vfio-ccw`` devices don't have same issue as their backend +devices are always mdev and RAM discarding is force enabled. |