aboutsummaryrefslogtreecommitdiff
path: root/hw
AgeCommit message (Collapse)Author
2024-07-23hw/intc/loongson_ipi: Fix resource leakPhilippe Mathieu-Daudé
Once initialised, QOM objects can be realized and unrealized multiple times before being finalized. Resources allocated in REALIZE must be deallocated in an equivalent UNREALIZE handler. Free the CPU array in loongson_ipi_unrealize() instead of loongson_ipi_finalize(). Cc: qemu-stable@nongnu.org Fixes: 5e90b8db382 ("hw/loongarch: Set iocsr address space per-board rather than percpu") Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Song Gao <gaosong@loongson.cn> Message-Id: <20240723111405.14208-3-philmd@linaro.org>
2024-07-23hw/intc/loongson_ipi: Access memory in little endianBibo Mao
Loongson IPI is only available in little-endian, so use that to access the guest memory (in case we run on a big-endian host). Cc: qemu-stable@nongnu.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Fixes: f6783e3438 ("hw/loongarch: Add LoongArch ipi interrupt support") [PMD: Extracted from bigger commit, added commit description] Co-Developed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Bibo Mao <maobibo@loongson.cn> Tested-by: Bibo Mao <maobibo@loongson.cn> Acked-by: Song Gao <gaosong@loongson.cn> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Message-Id: <20240718133312.10324-3-philmd@linaro.org>
2024-07-23hw/i386/intel_iommu: Extract device IOTLB invalidation logicClément Mathieu--Drif
This piece of code can be shared by both IOTLB invalidation and PASID-based IOTLB invalidation No functional changes intended. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-ID: <20240718081636.879544-12-zhenzhong.duan@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-07-23vfio/common: Allow disabling device dirty page trackingJoao Martins
The property 'x-pre-copy-dirty-page-tracking' allows disabling the whole tracking of VF pre-copy phase of dirty page tracking, though it means that it will only be used at the start of the switchover phase. Add an option that disables the VF dirty page tracking, and fall back into container-based dirty page tracking. This also allows to use IOMMU dirty tracking even on VFs with their own dirty tracker scheme. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23vfio/migration: Don't block migration device dirty tracking is unsupportedJoao Martins
By default VFIO migration is set to auto, which will support live migration if the migration capability is set *and* also dirty page tracking is supported. For testing purposes one can force enable without dirty page tracking via enable-migration=on, but that option is generally left for testing purposes. So starting with IOMMU dirty tracking it can use to accommodate the lack of VF dirty page tracking allowing us to minimize the VF requirements for migration and thus enabling migration by default for those too. While at it change the error messages to mention IOMMU dirty tracking as well. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> [ clg: - spelling in commit log ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-07-23vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap supportJoao Martins
ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI that fetches the bitmap that tells what was dirty in an IOVA range. A single bitmap is allocated and used across all the hwpts sharing an IOAS which is then used in log_sync() to set Qemu global bitmaps. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking supportJoao Martins
ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that enables or disables dirty page tracking. The ioctl is used if the hwpt has been created with dirty tracking supported domain (stored in hwpt::flags) and it is called on the whole list of iommu domains. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/iommufd: Probe and request hwpt dirty tracking capabilityJoao Martins
In preparation to using the dirty tracking UAPI, probe whether the IOMMU supports dirty tracking. This is done via the data stored in hiod::caps::hw_caps initialized from GET_HW_INFO. Qemu doesn't know if VF dirty tracking is supported when allocating hardware pagetable in iommufd_cdev_autodomains_get(). This is because VFIODevice migration state hasn't been initialized *yet* hence it can't pick between VF dirty tracking vs IOMMU dirty tracking. So, if IOMMU supports dirty tracking it always creates HWPTs with IOMMU_HWPT_ALLOC_DIRTY_TRACKING even if later on VFIOMigration decides to use VF dirty tracking instead. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> [ clg: - Fixed vbasedev->iommu_dirty_tracking assignment in iommufd_cdev_autodomains_get() - Added warning for heterogeneous dirty page tracking support in iommufd_cdev_autodomains_get() ] Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during ↵Joao Martins
attach_device() Move the HostIOMMUDevice::realize() to be invoked during the attach of the device before we allocate IOMMUFD hardware pagetable objects (HWPT). This allows the use of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell if the IOMMU behind the device supports dirty tracking. Note: The HostIOMMUDevice data from legacy backend is static and doesn't need any information from the (type1-iommu) backend to be initialized. In contrast however, the IOMMUFD HostIOMMUDevice data requires the iommufd FD to be connected and having a devid to be able to successfully GET_HW_INFO. This means vfio_device_hiod_realize() is called in different places within the backend .attach_device() implementation. Suggested-by: Cédric Le Goater <clg@redhat.cm> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> [ clg: Fixed error handling in iommufd_cdev_attach() ] Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCapsJoao Martins
Store the value of @caps returned by iommufd_backend_get_device_info() in a new field HostIOMMUDeviceCaps::hw_caps. Right now the only value is whether device IOMMU supports dirty tracking (IOMMU_HW_CAP_DIRTY_TRACKING). This is in preparation for HostIOMMUDevice::realize() being called early during attach_device(). Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/{iommufd,container}: Remove caps::aw_bitsJoao Martins
Remove caps::aw_bits which requires the bcontainer::iova_ranges being initialized after device is actually attached. Instead defer that to .get_cap() and call vfio_device_get_aw_bits() directly. This is in preparation for HostIOMMUDevice::realize() being called early during attach_device(). Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/iommufd: Introduce auto domain creationJoao Martins
There's generally two modes of operation for IOMMUFD: 1) The simple user API which intends to perform relatively simple things with IOMMUs e.g. DPDK. The process generally creates an IOAS and attaches to VFIO and mainly performs IOAS_MAP and UNMAP. 2) The native IOMMUFD API where you have fine grained control of the IOMMU domain and model it accordingly. This is where most new feature are being steered to. For dirty tracking 2) is required, as it needs to ensure that the stage-2/parent IOMMU domain will only attach devices that support dirty tracking (so far it is all homogeneous in x86, likely not the case for smmuv3). Such invariant on dirty tracking provides a useful guarantee to VMMs that will refuse incompatible device attachments for IOMMU domains. Dirty tracking insurance is enforced via HWPT_ALLOC, which is responsible for creating an IOMMU domain. This is contrast to the 'simple API' where the IOMMU domain is created by IOMMUFD automatically when it attaches to VFIO (usually referred as autodomains) but it has the needed handling for mdevs. To support dirty tracking with the advanced IOMMUFD API, it needs similar logic, where IOMMU domains are created and devices attached to compatible domains. Essentially mimicking kernel iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain it falls back to IOAS attach. The auto domain logic allows different IOMMU domains to be created when DMA dirty tracking is not desired (and VF can provide it), and others where it is. Here it is not used in this way given how VFIODevice migration state is initialized after the device attachment. But such mixed mode of IOMMU dirty tracking + device dirty tracking is an improvement that can be added on. Keep the 'all of nothing' of type1 approach that we have been using so far between container vs device dirty tracking. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> [ clg: Added ERRP_GUARD() in iommufd_cdev_autodomains_get() ] Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/ccw: Don't initialize HOST_IOMMU_DEVICE with mdevZhenzhong Duan
mdevs aren't "physical" devices and when asking for backing IOMMU info, it fails the entire provisioning of the guest. Fix that by setting vbasedev->mdev true so skipping HostIOMMUDevice initialization in the presence of mdevs. Fixes: 930589520128 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/ap: Don't initialize HOST_IOMMU_DEVICE with mdevZhenzhong Duan
mdevs aren't "physical" devices and when asking for backing IOMMU info, it fails the entire provisioning of the guest. Fix that by setting vbasedev->mdev true so skipping HostIOMMUDevice initialization in the presence of mdevs. Fixes: 930589520128 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()Joao Martins
In preparation to implement auto domains have the attach function return the errno it got during domain attach instead of a bool. -EINVAL is tracked to track domain incompatibilities, and decide whether to create a new IOMMU domain. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW ↵Joao Martins
capabilities The helper will be able to fetch vendor agnostic IOMMU capabilities supported both by hardware and software. Right now it is only iommu dirty tracking. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdevJoao Martins
mdevs aren't "physical" devices and when asking for backing IOMMU info, it fails the entire provisioning of the guest. Fix that by skipping HostIOMMUDevice initialization in the presence of mdevs, and skip setting an iommu device when it is known to be an mdev. Cc: Zhenzhong Duan <zhenzhong.duan@intel.com> Fixes: 930589520128 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler") Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23vfio/pci: Extract mdev check into an helperJoao Martins
In preparation to skip initialization of the HostIOMMUDevice for mdev, extract the checks that validate if a device is an mdev into helpers. A vfio_device_is_mdev() is created, and subsystems consult VFIODevice::mdev to check if it's mdev or not. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23hw/vfio/container: Fix SIGSEV on vfio_container_instance_finalize()Eric Auger
In vfio_connect_container's error path, the base container is removed twice form the VFIOAddressSpace QLIST: first on the listener_release_exit label and second, on free_container_exit label, through object_unref(container), which calls vfio_container_instance_finalize(). Let's remove the first instance. Fixes: 938026053f4 ("vfio/container: Switch to QOM") Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23Merge tag 'ui-pull-request' of https://gitlab.com/marcandre.lureau/qemu into ↵Richard Henderson
staging UI-related for 9.1 # -----BEGIN PGP SIGNATURE----- # # iQJQBAABCAA6FiEEh6m9kz+HxgbSdvYt2ujhCXWWnOUFAmaeu44cHG1hcmNhbmRy # ZS5sdXJlYXVAcmVkaGF0LmNvbQAKCRDa6OEJdZac5ZvNEACES6y1D4rzBtZBV/FY # OvWHzM/2Uycma3CO2pTl8DzwucgUuVxVjrAppi+iIXza+qEHlN0e9tbmR8u3ypdV # tu0ijRm1MWeV9EHw8fQxSIci9cgoPzJzfvrmGD9rPEJTPh44yifL3CiE97y/5SJx # FkrmYoDeuLQ4WAgZqIhkFOZ3eX+bQ+sI49ZVm0vSIeZ2wYuWlw7JwMKq2Xb4fCsZ # 7wJZcL7gNGHk3rsH2Sfukv5LRw64+eDwpQMkXS2scYp64xwhdd5bAqKchicBA0zh # jBw+KszCpAW7XunQtXjiiQZco9x6auu2c+4erDyNcTfqBtSRNjArMauL2/609EVv # 7xsLmwZvXgrbO7fRCGCnC4M5NCuisDbMeON+7tKdS8kfEMgFX0FNfM1Jp9z4Rh7T # I/vy8mLlBIy4BNZA7jV1jyIJZeVYBYGc+ieBEeE1sK7L5RIxeoOwP1S20Xu9A9bO # VFBohKcMt5x0HlUg0oSH8OJLbpQ8vDQDkIcDMIOQCqj+PX0erc2u9oHQ7xB1k3BB # os83zWDTLJTJ+ZdoI2tp9FHQj56wdGJxDQNrRjFOP5KL1AoHGz+Y5fF7BvGB3jnK # JsPV2OSkEs6Q/be6pLTiVEoUUEpqy40Kh/7NlzdbM+oHX5h0TlcIqJ16I2QsfM/N # sRXAmzqCe00STyhxopR1BMZnjg== # =aCj6 # -----END PGP SIGNATURE----- # gpg: Signature made Tue 23 Jul 2024 06:05:34 AM AEST # gpg: using RSA key 87A9BD933F87C606D276F62DDAE8E10975969CE5 # gpg: issuer "marcandre.lureau@redhat.com" # gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>" [full] # gpg: aka "Marc-André Lureau <marcandre.lureau@gmail.com>" [full] * tag 'ui-pull-request' of https://gitlab.com/marcandre.lureau/qemu: chardev/char-win-stdio.c: restore old console mode ui/vdagent: send caps on fe_open ui/vdagent: notify clipboard peers of serial reset ui/vdagent: improve vdagent_fe_open() trace ui: add more tracing for dbus Cursor: 8 -> 1 bit alpha downsampling improvement virtio-gpu-gl: declare dependency on ui-opengl vnc: increase max display size Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-23Merge tag 'nvme-next-pull-request' of https://gitlab.com/birkelund/qemu into ↵Richard Henderson
staging hw/nvme patches # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCgAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmaeiz4ACgkQTeGvMW1P # Dem5DggAkudAwZYUlKLz/FuxmOJsZ/CKL7iIu6wE3P93WTTbi4m2AL5lMFz1bOUH # 33LtjHz51bDvOsnhAwLs2TwjfhICiMJCOXEmxF9zJnO4Yo8ih9UbeE7sEukpxsVr # FJlAg5OXhdIHuo48ow7hu7BqMs58jnXhVA6zSvLU5rbKTSdG/369jyQKy5aoFPN0 # Rk+S6hqDmVMiN7u6E+QqPyB2tSbmNKkhPICu3O9fbHmaOoMFmrcvyxkd1wJ9JxwF # 8MWbuEZlIpLIIL/mCN4wzDw8VKlJ26sBJJC1b+NHmWIWmPkqMeXwcmQtWhUqsrcs # xAGUcjgJuJ3Fu6Xzt+09Y+FXO8v0oQ== # =vCDb # -----END PGP SIGNATURE----- # gpg: Signature made Tue 23 Jul 2024 02:39:26 AM AEST # gpg: using RSA key 522833AA75E2DCE6A24766C04DE1AF316D4F0DE9 # gpg: Good signature from "Klaus Jensen <its@irrelevant.dk>" [unknown] # gpg: aka "Klaus Jensen <k.jensen@samsung.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DDCA 4D9C 9EF9 31CC 3468 4272 63D5 6FC5 E55D A838 # Subkey fingerprint: 5228 33AA 75E2 DCE6 A247 66C0 4DE1 AF31 6D4F 0DE9 * tag 'nvme-next-pull-request' of https://gitlab.com/birkelund/qemu: hw/nvme: remove useless type cast hw/nvme: actually implement abort hw/nvme: add cross namespace copy support hw/nvme: fix memory leak in nvme_dsm Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-22hw/nvme: Add SPDM over DOE supportWilfred Mallawa
Setup Data Object Exchange (DOE) as an extended capability for the NVME controller and connect SPDM to it (CMA) to it. Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Klaus Jensen <k.jensen@samsung.com> Message-Id: <20240703092027.644758-4-alistair.francis@wdc.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22acpi/gpex: Create PCI link devices outside PCI root bridgeSunil V L
Currently, PCI link devices (PNP0C0F) are always created within the scope of the PCI root bridge. However, RISC-V needs these link devices to be created outside to ensure the probing order in the OS. This matches the example given in the ACPI specification [1] as well. Hence, create these link devices directly under _SB instead of under the PCI root bridge. To keep these link device names unique for multiple PCI bridges, change the device name from GSIx to LXXY format where XX is the PCI bus number and Y is the INTx. GPEX is currently used by riscv, aarch64/virt and x86/microvm machines. So, this change will alter the DSDT for those systems. [1] - ACPI 5.1: 6.2.13.1 Example: Using _PRT to Describe PCI IRQ Routing Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Acked-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716144306.2432257-5-sunilvl@ventanamicro.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/riscv/virt-acpi-build.c: Update the HID of RISC-V UARTSunil V L
The requirement ACPI_060 in the RISC-V BRS specification [1], requires NS16550 compatible UART to have the HID RSCV0003. So, update the HID for the UART. [1] - https://github.com/riscv-non-isa/riscv-brs/releases/download/v0.0.2/riscv-brs-spec.pdf (Chapter 6) Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716144306.2432257-3-sunilvl@ventanamicro.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/riscv/virt-acpi-build.c: Add namespace devices for PLIC and APLICSunil V L
As per the requirement ACPI_080 in the RISC-V Boot and Runtime Services (BRS) specification [1], PLIC and APLIC should be in namespace as well. So, add them using the defined HID. [1] - https://github.com/riscv-non-isa/riscv-brs/releases/download/v0.0.2/riscv-brs-spec.pdf (Chapter 6) Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> Acked-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716144306.2432257-2-sunilvl@ventanamicro.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22virtio-iommu: Add trace point on virtio_iommu_detach_endpoint_from_domainEric Auger
Add a trace point on virtio_iommu_detach_endpoint_from_domain(). Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240716094619.1713905-7-eric.auger@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/vfio/common: Add vfio_listener_region_del_iommu trace eventEric Auger
Trace when VFIO gets notified about the deletion of an IOMMU MR. Also trace the name of the region in the add_iommu trace message. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240716094619.1713905-6-eric.auger@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22virtio-iommu: Remove the end point on detachEric Auger
We currently miss the removal of the endpoint in case of detach. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240716094619.1713905-5-eric.auger@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22virtio-iommu: Free [host_]resv_ranges on unset_iommu_devicesEric Auger
We are currently missing the deallocation of the [host_]resv_regions in case of hot unplug. Also to make things more simple let's rule out the case where multiple HostIOMMUDevices would be aliased and attached to the same IOMMUDevice. This allows to remove the handling of conflicting Host reserved regions. Anyway this is not properly supported at guest kernel level. On hotunplug the reserved regions are reset to the ones set by virtio-iommu property. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240716094619.1713905-4-eric.auger@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22virtio-iommu: Remove probe_doneEric Auger
Now we have switched to PCIIOMMUOps to convey host IOMMU information, the host reserved regions are transmitted when the PCIe topology is built. This happens way before the virtio-iommu driver calls the probe request. So let's remove the probe_done flag that allowed to check the probe was not done before the IOMMU MR got enabled. Besides this probe_done flag had a flaw wrt migration since it was not saved/restored. The only case at risk is if 2 devices were plugged to a PCIe to PCI bridge and thus aliased. First of all we discovered in the past this case was not properly supported for neither SMMU nor virtio-iommu on guest kernel side: see [RFC] virtio-iommu: Take into account possible aliasing in virtio_iommu_mr() https://lore.kernel.org/all/20230116124709.793084-1-eric.auger@redhat.com/ If this were supported by the guest kernel, it is unclear what the call sequence would be from a virtio-iommu driver point of view. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240716094619.1713905-3-eric.auger@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22Revert "virtio-iommu: Clear IOMMUDevice when VFIO device is unplugged"Eric Auger
This reverts commit 1b889d6e39c32d709f1114699a014b381bcf1cb1. There are different problems with that tentative fix: - Some resources are left dangling (resv_regions, host_resv_ranges) and memory subregions are left attached to the root MR although freed as embedded in the sdev IOMMUDevice. Finally the sdev->as is not destroyed and associated listeners are left. - Even when fixing the above we observe a memory corruption associated with the deallocation of the IOMMUDevice. This can be observed when a VFIO device is hotplugged, hot-unplugged and a system reset is issued. At this stage we have not been able to identify the root cause (IOMMU MR or as structs beeing overwritten and used later on?). - Another issue is HostIOMMUDevice are indexed by non aliased BDF whereas the IOMMUDevice is indexed by aliased BDF - yes the current naming is really misleading -. Given the state of the code I don't think the virtio-iommu device works in non singleton group case though. So let's revert the patch for now. This means the IOMMU MR/as survive the hotunplug. This is what is done in the intel_iommu for instance. It does not sound very logical to keep those but currently there is no symetric function to pci_device_iommu_address_space(). probe_done issue will be handled in a subsequent patch. Also resv_regions and host_resv_regions will be deallocated separately. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240716094619.1713905-2-eric.auger@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22gdbstub: Add helper function to unregister GDB register spaceSalil Mehta
Add common function to help unregister the GDB register space. This shall be done in context to the CPU unrealization. Note: These are common functions exported to arch specific code. For example, for ARM this code is being referred in associated arch specific patch-set: Link: https://lore.kernel.org/qemu-devel/20230926103654.34424-1-salil.mehta@huawei.com/ Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Acked-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-8-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/acpi: Update CPUs AML with cpu-(ctrl)dev changeSalil Mehta
CPUs Control device(\\_SB.PCI0) register interface for the x86 arch is IO port based and existing CPUs AML code assumes _CRS objects would evaluate to a system resource which describes IO Port address. But on ARM arch CPUs control device(\\_SB.PRES) register interface is memory-mapped hence _CRS object should evaluate to system resource which describes memory-mapped base address. Update build CPUs AML function to accept both IO/MEMORY region spaces and accordingly update the _CRS object. Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-6-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/acpi: Update GED _EVT method AML with CPU scanSalil Mehta
OSPM evaluates _EVT method to map the event. The CPU hotplug event eventually results in start of the CPU scan. Scan figures out the CPU and the kind of event(plug/unplug) and notifies it back to the guest. Update the GED AML _EVT method with the call to method \\_SB.CPUS.CSCN (via \\_SB.GED.CSCN) Architecture specific code [1] might initialize its CPUs AML code by calling common function build_cpus_aml() like below for ARM: build_cpus_aml(scope, ms, opts, xx_madt_cpu_entry, memmap[VIRT_CPUHP_ACPI].base, "\\_SB", "\\_SB.GED.CSCN", AML_SYSTEM_MEMORY); [1] https://lore.kernel.org/qemu-devel/20240613233639.202896-13-salil.mehta@huawei.com/ Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-5-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/acpi: Update ACPI GED framework to support vCPU HotplugSalil Mehta
ACPI GED (as described in the ACPI 6.4 spec) uses an interrupt listed in the _CRS object of GED to intimate OSPM about an event. Later then demultiplexes the notified event by evaluating ACPI _EVT method to know the type of event. Use ACPI GED to also notify the guest kernel about any CPU hot(un)plug events. Note, GED interface is used by many hotplug events like memory hotplug, NVDIMM hotplug and non-hotplug events like system power down event. Each of these can be selected using a bit in the 32 bit GED IO interface. A bit has been reserved for the CPU hotplug event. ACPI CPU hotplug related initialization should only happen if ACPI_CPU_HOTPLUG support has been enabled for particular architecture. Add cpu_hotplug_hw_init() stub to avoid compilation break. Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20240716111502.202344-4-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Igor Mammedov <imammedo@redhat.com>
2024-07-22hw/acpi: Move CPU ctrl-dev MMIO region len macro to common header fileSalil Mehta
CPU ctrl-dev MMIO region length could be used in ACPI GED and various other architecture specific places. Move ACPI_CPU_HOTPLUG_REG_LEN macro to more appropriate common header file. Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-3-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22smbios: make memory device size configurable per MachineIgor Mammedov
Currently QEMU describes initial[1] RAM* in SMBIOS as a series of virtual DIMMs (capped at 16Gb max) using type 17 structure entries. Which is fine for the most cases. However when starting guest with terabytes of RAM this leads to too many memory device structures, which eventually upsets linux kernel as it reserves only 64K for these entries and when that border is crossed out it runs out of reserved memory. Instead of partitioning initial RAM on 16Gb DIMMs, use maximum possible chunk size that SMBIOS spec allows[2]. Which lets encode RAM in lower 31 bits of 32bit field (which amounts upto 2047Tb per DIMM). As result initial RAM will generate only one type 17 structure until host/guest reach ability to use more RAM in the future. Compat changes: We can't unconditionally change chunk size as it will break QEMU<->guest ABI (and migration). Thus introduce a new machine class field that would let older versioned machines to use legacy 16Gb chunks, while new(er) machine type[s] use maximum possible chunk size. PS: While it might seem to be risky to rise max entry size this large (much beyond of what current physical RAM modules support), I'd not expect it causing much issues, modulo uncovering bugs in software running within guest. And those should be fixed on guest side to handle SMBIOS spec properly, especially if guest is expected to support so huge RAM configs. In worst case, QEMU can reduce chunk size later if we would care enough about introducing a workaround for some 'unfixable' guest OS, either by fixing up the next machine type or giving users a CLI option to customize it. 1) Initial RAM - is RAM configured with help '-m SIZE' CLI option/ implicitly defined by machine. It doesn't include memory configured with help of '-device' option[s] (pcdimm,nvdimm,...) 2) SMBIOS 3.1.0 7.18.5 Memory Device — Extended Size PS: * tested on 8Tb host with RHEL6 guest, which seems to parse type 17 SMBIOS table entries correctly (according to 'dmidecode'). Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240715122417.4059293-1-imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22virtio-net: Implement SR-IOV VFAkihiko Odaki
A virtio-net device can be added as a SR-IOV VF to another virtio-pci device that will be the PF. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240715-sriov-v5-7-3f5539093ffc@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22virtio-pci: Implement SR-IOV PFAkihiko Odaki
Allow user to attach SR-IOV VF to a virtio-pci PF. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240715-sriov-v5-6-3f5539093ffc@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22pcie_sriov: Allow user to create SR-IOV deviceAkihiko Odaki
A user can create a SR-IOV device by specifying the PF with the sriov-pf property of the VFs. The VFs must be added before the PF. A user-creatable VF must have PCIDeviceClass::sriov_vf_user_creatable set. Such a VF cannot refer to the PF because it is created before the PF. A PF that user-creatable VFs can be attached calls pcie_sriov_pf_init_from_user_created_vfs() during realization and pcie_sriov_pf_exit() when exiting. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240715-sriov-v5-5-3f5539093ffc@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22pcie_sriov: Check PCI Express for SR-IOV PFAkihiko Odaki
SR-IOV requires PCI Express. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240715-sriov-v5-4-3f5539093ffc@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22pcie_sriov: Ensure PF and VF are mutually exclusiveAkihiko Odaki
A device cannot be a SR-IOV PF and a VF at the same time. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240715-sriov-v5-3-3f5539093ffc@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hw/pci: Fix SR-IOV VF number calculationAkihiko Odaki
pci_config_get_bar_addr() had a division by vf_stride. vf_stride needs to be non-zero when there are multiple VFs, but the specification does not prohibit to make it zero when there is only one VF. Do not perform the division for the first VF to avoid division by zero. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240715-sriov-v5-2-3f5539093ffc@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22hpet: avoid timer storms on periodic timersPaolo Bonzini
If the period is set to a value that is too low, there could be no time left to run the rest of QEMU. Do not trigger interrupts faster than 1 MHz. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22hpet: store full 64-bit target value of the counterPaolo Bonzini
Store the full 64-bit value at which the timer should fire. This makes it possible to skip the imprecise hpet_calculate_diff() step, and to remove the clamping of the period to 31 or 63 bits. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22hpet: accept 64-bit reads and writesPaolo Bonzini
Declare the MemoryRegionOps so that 64-bit reads and writes to the HPET are received directly. This makes it possible to unify the code to process low and high parts: for 32-bit reads, extract the desired word; for 32-bit writes, just merge the desired part into the old value and proceed as with a 64-bit write. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22hpet: place read-only bits directly in "new_val"Paolo Bonzini
The variable "val" is used for two different purposes. As an intermediate value when writing configuration registers, and to store the cleared bits when writing ISR. Use "new_val" for the former, and rename the variable so that it is clearer for the latter case. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22hpet: remove unnecessary variable "index"Paolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22hpet: ignore high bits of comparator in 32-bit modePaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-22hpet: fix and cleanup persistence of interrupt statusPaolo Bonzini
There are several bugs in the handling of the ISR register: - switching level->edge was not lowering the interrupt and clearing ISR - switching on the enable bit was not raising a level-triggered interrupt if the timer had fired - the timer must be kept running even if not enabled, in order to set the ISR flag, so writes to HPET_TN_CFG must not call hpet_del_timer() Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>