aboutsummaryrefslogtreecommitdiff
path: root/hw/virtio
AgeCommit message (Collapse)Author
2022-03-22Replace GCC_FMT_ATTR with G_GNUC_PRINTFMarc-André Lureau
One less qemu-specific macro. It also helps to make some headers/units only depend on glib, and thus moved in standalone projects eventually. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
2022-03-21Use g_new() & friends where that makes obvious senseMarkus Armbruster
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void *, which lets the compiler catch more type errors. This commit only touches allocations with size arguments of the form sizeof(T). Patch created mechanically with: $ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \ --macro-file scripts/cocci-macro-file.h FILES... Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20220315144156.1595462-4-armbru@redhat.com> Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
2022-03-18virtio/virtio-balloon: Prefer Object* over void* parameterBernhard Beschow
*opaque is an alias to *obj. Using the ladder makes the code consistent with with other devices, e.g. accel/kvm/kvm-all and accel/tcg/tcg-all. It also makes the cast more typesafe. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20220301222301.103821-2-shentey@gmail.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-03-15Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into stagingPeter Maydell
* whpx fixes in preparation for GDB support (Ivan) * VSS header fixes (Marc-André) * 5-level EPT support (Vitaly) * AMX support (Jing Liu & Yang Zhong) * Bundle changes to MSI routes (Longpeng) * More precise emulation of #SS (Gareth) * Disable ASAN testing # gpg: Signature made Tue 15 Mar 2022 10:51:00 GMT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (22 commits) gitlab-ci: do not run tests with address sanitizer KVM: SVM: always set MSR_AMD64_TSC_RATIO to default value i386: Add Icelake-Server-v6 CPU model with 5-level EPT support x86: Support XFD and AMX xsave data migration x86: add support for KVM_CAP_XSAVE2 and AMX state migration x86: Add AMX CPUIDs enumeration x86: Add XFD faulting bit for state components x86: Grant AMX permission for guest x86: Add AMX XTILECFG and XTILEDATA components x86: Fix the 64-byte boundary enumeration for extended state linux-headers: include missing changes from 5.17 target/i386: Throw a #SS when loading a non-canonical IST target/i386: only include bits in pg_mode if they are not ignored kvm/msi: do explicit commit when adding msi routes kvm-irqchip: introduce new API to support route change update meson-buildoptions.sh qga/vss: update informative message about MinGW qga/vss-win32: check old VSS SDK headers meson: fix generic location of vss headers vmxcap: Add 5-level EPT bit ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-03-15kvm/msi: do explicit commit when adding msi routesLongpeng(Mike)
We invoke the kvm_irqchip_commit_routes() for each addition to MSI route table, which is not efficient if we are adding lots of routes in some cases. This patch lets callers invoke the kvm_irqchip_commit_routes(), so the callers can decide how to optimize. [1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg00967.html Signed-off-by: Longpeng <longpeng2@huawei.com> Message-Id: <20220222141116.2091-3-longpeng2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-15vdpa: Expose VHOST_F_LOG_ALL on SVQEugenio Pérez
SVQ is able to log the dirty bits by itself, so let's use it to not block migration. Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is enabled. Even if the device supports it, the reports would be nonsense because SVQ memory is in the qemu region. The log region is still allocated. Future changes might skip that, but this series is already long enough. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2022-03-15vdpa: Never set log_base addr if SVQ is enabledEugenio Pérez
Setting the log address would make the device start reporting invalid dirty memory because the SVQ vrings are located in qemu's memory. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2022-03-15vdpa: Adapt vhost_vdpa_get_vring_base to SVQEugenio Pérez
This is needed to achieve migration, so the destination can restore its index. Setting base as last used idx, so destination will see as available all the entries that the device did not use, including the in-flight processing ones. This is ok for networking, but other kinds of devices might have problems with these retransmissions. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2022-03-15vdpa: Add custom IOTLB translations to SVQEugenio Pérez
Use translations added in VhostIOVATree in SVQ. Only introduce usage here, not allocation and deallocation. As with previous patches, we use the dead code paths of shadow_vqs_enabled to avoid commiting too many changes at once. These are impossible to take at the moment. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2022-03-15vhost: Add VhostIOVATreeEugenio Pérez
This tree is able to look for a translated address from an IOVA address. At first glance it is similar to util/iova-tree. However, SVQ working on devices with limited IOVA space need more capabilities, like allocating IOVA chunks or performing reverse translations (qemu addresses to iova). The allocation capability, as "assign a free IOVA address to this chunk of memory in qemu's address space" allows shadow virtqueue to create a new address space that is not restricted by guest's addressable one, so we can allocate shadow vqs vrings outside of it. It duplicates the tree so it can search efficiently in both directions, and it will signal overlap if iova or the translated address is present in any tree. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2022-03-15vhost: Shadow virtqueue buffers forwardingEugenio Pérez
Initial version of shadow virtqueue that actually forward buffers. There is no iommu support at the moment, and that will be addressed in future patches of this series. Since all vhost-vdpa devices use forced IOMMU, this means that SVQ is not usable at this point of the series on any device. For simplicity it only supports modern devices, that expects vring in little endian, with split ring and no event idx or indirect descriptors. Support for them will not be added in this series. It reuses the VirtQueue code for the device part. The driver part is based on Linux's virtio_ring driver, but with stripped functionality and optimizations so it's easier to review. However, forwarding buffers have some particular pieces: One of the most unexpected ones is that a guest's buffer can expand through more than one descriptor in SVQ. While this is handled gracefully by qemu's emulated virtio devices, it may cause unexpected SVQ queue full. This patch also solves it by checking for this condition at both guest's kicks and device's calls. The code may be more elegant in the future if SVQ code runs in its own iocontext. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2022-03-15vdpa: adapt vhost_ops callbacks to svqEugenio Pérez
First half of the buffers forwarding part, preparing vhost-vdpa callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so this is effectively dead code at the moment, but it helps to reduce patch size. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2022-03-15virtio: Add vhost_svq_get_vring_addrEugenio Pérez
It reports the shadow virtqueue address from qemu virtual address space. Since this will be different from the guest's vaddr, but the device can access it, SVQ takes special care about its alignment & lack of garbage data. It assumes that IOMMU will work in host_page_size ranges for that. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2022-03-15vhost: Add vhost_svq_valid_features to shadow vqEugenio Pérez
This allows SVQ to negotiate features with the guest and the device. For the device, SVQ is a driver. While this function bypasses all non-transport features, it needs to disable the features that SVQ does not support when forwarding buffers. This includes packed vq layout, indirect descriptors or event idx. Future changes can add support to offer more features to the guest, since the use of VirtQueue gives this for free. This is left out at the moment for simplicity. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2022-03-15vhost: Add Shadow VirtQueue call forwarding capabilitiesEugenio Pérez
This will make qemu aware of the device used buffers, allowing it to write the guest memory with its contents if needed. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2022-03-15vhost: Add Shadow VirtQueue kick forwarding capabilitiesEugenio Pérez
At this mode no buffer forwarding will be performed in SVQ mode: Qemu will just forward the guest's kicks to the device. Host memory notifiers regions are left out for simplicity, and they will not be addressed in this series. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2022-03-15vhost: Add VhostShadowVirtqueueEugenio Pérez
Vhost shadow virtqueue (SVQ) is an intermediate jump for virtqueue notifications and buffers, allowing qemu to track them. While qemu is forwarding the buffers and virtqueue changes, it is able to commit the memory it's being dirtied, the same way regular qemu's VirtIO devices do. This commit only exposes basic SVQ allocation and free. Next patches of the series add functionality like notifications and buffers forwarding. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2022-03-06vhost: use wfd on functions setting vring call fdSergio Lopez
When ioeventfd is emulated using qemu_pipe(), only EventNotifier's wfd can be used for writing. Use the recently introduced event_notifier_get_wfd() function to obtain the fd that our peer must use to signal the vring. Signed-off-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220304100854.14829-3-slp@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-06vhost-vsock: detach the virqueue element in case of errorStefano Garzarella
In vhost_vsock_common_send_transport_reset(), if an element popped from the virtqueue is invalid, we should call virtqueue_detach_element() to detach it from the virtqueue before freeing its memory. Fixes: fc0b9b0e1c ("vhost-vsock: add virtio sockets device") Fixes: CVE-2022-26354 Cc: qemu-stable@nongnu.org Reported-by: VictorV <vv474172261@gmail.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220228095058.27899-1-sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-06virtio-iommu: Support bypass domainJean-Philippe Brucker
The driver can create a bypass domain by passing the VIRTIO_IOMMU_ATTACH_F_BYPASS flag on the ATTACH request. Bypass domains perform slightly better than domains with identity mappings since they skip translation. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Message-Id: <20220214124356.872985-4-jean-philippe@linaro.org> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-06virtio-iommu: Default to bypass during bootJean-Philippe Brucker
Currently the virtio-iommu device must be programmed before it allows DMA from any PCI device. This can make the VM entirely unusable when a virtio-iommu driver isn't present, for example in a bootloader that loads the OS from storage. Similarly to the other vIOMMU implementations, default to DMA bypassing the IOMMU during boot. Add a "boot-bypass" property, defaulting to true, that lets users change this behavior. Replace the VIRTIO_IOMMU_F_BYPASS feature, which didn't support bypass before feature negotiation, with VIRTIO_IOMMU_F_BYPASS_CONFIG. We add the bypass field to the migration stream without introducing subsections, based on the assumption that this virtio-iommu device isn't being used in production enough to require cross-version migration at the moment (all previous version required workarounds since they didn't support ACPI and boot-bypass). Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Message-Id: <20220214124356.872985-3-jean-philippe@linaro.org> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-06vhost-vdpa: make notifiers _init()/_uninit() symmetricLaurent Vivier
vhost_vdpa_host_notifiers_init() initializes queue notifiers for queues "dev->vq_index" to queue "dev->vq_index + dev->nvqs", whereas vhost_vdpa_host_notifiers_uninit() uninitializes the same notifiers for queue "0" to queue "dev->nvqs". This asymmetry seems buggy, fix that by using dev->vq_index as the base for both. Fixes: d0416d487bd5 ("vhost-vdpa: map virtqueue notification area if possible") Cc: jasowang@redhat.com Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20220211161309.1385839-1-lvivier@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-06hw/virtio: vdpa: Fix leak of host-notifier memory-regionLaurent Vivier
If call virtio_queue_set_host_notifier_mr fails, should free host-notifier memory-region. This problem can trigger a coredump with some vDPA drivers (mlx5, but not with the vdpasim), if we unplug the virtio-net card from the guest after a stop/start. The same fix has been done for vhost-user: 1f89d3b91e3e ("hw/virtio: Fix leak of host-notifier memory-region") Fixes: d0416d487bd5 ("vhost-vdpa: map virtqueue notification area if possible") Cc: jasowang@redhat.com Resolves: https://bugzilla.redhat.com/2027208 Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20220211170259.1388734-1-lvivier@redhat.com> Cc: qemu-stable@nongnu.org Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04hw/vhost-user-i2c: Add support for VIRTIO_I2C_F_ZERO_LENGTH_REQUESTViresh Kumar
VIRTIO_I2C_F_ZERO_LENGTH_REQUEST is a mandatory feature, that must be implemented by everyone. Add its support. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Message-Id: <fc47ab63b1cd414319c9201e8d6c7705b5ec3bd9.1644490993.git.viresh.kumar@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04virtio: fix the condition for iommu_platform not supportedHalil Pasic
The commit 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but unsupported") claims to fail the device hotplug when iommu_platform is requested, but not supported by the (vhost) device. On the first glance the condition for detecting that situation looks perfect, but because a certain peculiarity of virtio_platform it ain't. In fact the aforementioned commit introduces a regression. It breaks virtio-fs support for Secure Execution, and most likely also for AMD SEV or any other confidential guest scenario that relies encrypted guest memory. The same also applies to any other vhost device that does not support _F_ACCESS_PLATFORM. The peculiarity is that iommu_platform and _F_ACCESS_PLATFORM collates "device can not access all of the guest RAM" and "iova != gpa, thus device needs to translate iova". Confidential guest technologies currently rely on the device/hypervisor offering _F_ACCESS_PLATFORM, so that, after the feature has been negotiated, the guest grants access to the portions of memory the device needs to see. So in for confidential guests, generally, _F_ACCESS_PLATFORM is about the restricted access to memory, but not about the addresses used being something else than guest physical addresses. This is the very reason for which commit f7ef7e6e3b ("vhost: correctly turn on VIRTIO_F_IOMMU_PLATFORM") fences _F_ACCESS_PLATFORM from the vhost device that does not need it, because on the vhost interface it only means "I/O address translation is needed". This patch takes inspiration from f7ef7e6e3b ("vhost: correctly turn on VIRTIO_F_IOMMU_PLATFORM"), and uses the same condition for detecting the situation when _F_ACCESS_PLATFORM is requested, but no I/O translation by the device, and thus no device capability is needed. In this situation claiming that the device does not support iommu_plattform=on is counter-productive. So let us stop doing that! Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Reported-by: Jakob Naucke <Jakob.Naucke@ibm.com> Fixes: 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but unsupported") Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: qemu-stable@nongnu.org Message-Id: <20220207112857.607829-1-pasic@linux.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-03-04vhost-user: fix VirtQ notifier cleanupXueming Li
When vhost-user device cleanup, remove notifier MR and munmaps notifier address in the event-handling thread, VM CPU thread writing the notifier in concurrent fails with an error of accessing invalid address. It happens because MR is still being referenced and accessed in another thread while the underlying notifier mmap address is being freed and becomes invalid. This patch calls RCU and munmap notifiers in the callback after the memory flatview update finish. Fixes: 44866521bd6e ("vhost-user: support registering external host notifiers") Cc: qemu-stable@nongnu.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Message-Id: <20220207071929.527149-3-xuemingl@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04vhost-user: remove VirtQ notifier restoreXueming Li
Notifier set when vhost-user backend asks qemu to mmap an FD and offset. When vhost-user backend restart or getting killed, VQ notifier FD and mmap addresses become invalid. After backend restart, MR contains the invalid address will be restored and fail on notifier access. On the other hand, qemu should munmap the notifier, release underlying hardware resources to enable backend restart and allocate hardware notifier resources correctly. Qemu shouldn't reference and use resources of disconnected backend. This patch removes VQ notifier restore, uses the default vhost-user notifier to avoid invalid address access. After backend restart, the backend should ask qemu to install a hardware notifier if needed. Fixes: 44866521bd6e ("vhost-user: support registering external host notifiers") Cc: qemu-stable@nongnu.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Message-Id: <20220207071929.527149-2-xuemingl@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-02-21include: Move qemu_madvise() and related #defines to new qemu/madvise.hPeter Maydell
The function qemu_madvise() and the QEMU_MADV_* constants associated with it are used in only 10 files. Move them out of osdep.h to a new qemu/madvise.h header that is included where it is needed. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220208200856.3558249-2-peter.maydell@linaro.org
2022-02-21Mark remaining global TypeInfo instances as constBernhard Beschow
More than 1k of TypeInfo instances are already marked as const. Mark the remaining ones, too. This commit was created with: git grep -z -l 'static TypeInfo' -- '*.c' | \ xargs -0 sed -i 's/static TypeInfo/static const TypeInfo/' Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Corey Minyard <cminyard@mvista.com> Message-id: 20220117145805.173070-2-shentey@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-01-28Remove unnecessary minimum_version_id_old fieldsPeter Maydell
The migration code will not look at a VMStateDescription's minimum_version_id_old field unless that VMSD has set the load_state_old field to something non-NULL. (The purpose of minimum_version_id_old is to specify what migration version is needed for the code in the function pointed to by load_state_old to be able to handle it on incoming migration.) We have exactly one VMSD which still has a load_state_old, in the PPC CPU; every other VMSD which sets minimum_version_id_old is doing so unnecessarily. Delete all the unnecessary ones. Commit created with: sed -i '/\.minimum_version_id_old/d' $(git grep -l '\.minimum_version_id_old') with the one legitimate use then hand-edited back in. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> --- It missed vmstate_ppc_cpu.
2022-01-20hw/arm/virt: Support for virtio-mem-pciGavin Shan
This supports virtio-mem-pci device on "virt" platform, by simply following the implementation on x86. * This implements the hotplug handlers to support virtio-mem-pci device hot-add, while the hot-remove isn't supported as we have on x86. * The block size is 512MB on ARM64 instead of 128MB on x86. * It has been passing the tests with various combinations like 64KB and 4KB page sizes on host and guest, different memory device backends like normal, transparent huge page and HugeTLB, plus migration. Co-developed-by: David Hildenbrand <david@redhat.com> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-id: 20220111063329.74447-3-gshan@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-01-20virtio-mem: Correct default THP size for ARM64Gavin Shan
The default block size is same as to the THP size, which is either retrieved from "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" or hardcoded to 2MB. There are flaws in both mechanisms and this intends to fix them up. * When "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size" is used to getting the THP size, 32MB and 512MB are valid values when we have 16KB and 64KB page size on ARM64. * When the hardcoded THP size is used, 2MB, 32MB and 512MB are valid values when we have 4KB, 16KB and 64KB page sizes on ARM64. Co-developed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-id: 20220111063329.74447-2-gshan@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-01-12virtio: unify dataplane and non-dataplane ->handle_output()Stefan Hajnoczi
Now that virtio-blk and virtio-scsi are ready, get rid of the handle_aio_output() callback. It's no longer needed. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20211207132336.36627-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-01-12virtio: use ->handle_output() instead of ->handle_aio_output()Stefan Hajnoczi
The difference between ->handle_output() and ->handle_aio_output() was that ->handle_aio_output() returned a bool return value indicating progress. This was needed by the old polling API but now that the bool return value is gone, the two functions can be unified. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20211207132336.36627-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-01-12virtio: get rid of VirtIOHandleAIOOutputStefan Hajnoczi
The virtqueue host notifier API virtio_queue_aio_set_host_notifier_handler() polls the virtqueue for new buffers. AioContext previously required a bool progress return value indicating whether an event was handled or not. This is no longer necessary because the AioContext polling API has been split into a poll check function and an event handler function. The event handler is only run when we know there is work to do, so it doesn't return bool. The VirtIOHandleAIOOutput function signature is now the same as VirtIOHandleOutput. Get rid of the bool return value. Further simplifications will be made for virtio-blk and virtio-scsi in the next patch. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20211207132336.36627-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-01-12aio-posix: split poll check from ready handlerStefan Hajnoczi
Adaptive polling measures the execution time of the polling check plus handlers called when a polled event becomes ready. Handlers can take a significant amount of time, making it look like polling was running for a long time when in fact the event handler was running for a long time. For example, on Linux the io_submit(2) syscall invoked when a virtio-blk device's virtqueue becomes ready can take 10s of microseconds. This can exceed the default polling interval (32 microseconds) and cause adaptive polling to stop polling. By excluding the handler's execution time from the polling check we make the adaptive polling calculation more accurate. As a result, the event loop now stays in polling mode where previously it would have fallen back to file descriptor monitoring. The following data was collected with virtio-blk num-queues=2 event_idx=off using an IOThread. Before: 168k IOPS, IOThread syscalls: 9837.115 ( 0.020 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 16, iocbpp: 0x7fcb9f937db0) = 16 9837.158 ( 0.002 ms): IO iothread1/620155 write(fd: 103, buf: 0x556a2ef71b88, count: 8) = 8 9837.161 ( 0.001 ms): IO iothread1/620155 write(fd: 104, buf: 0x556a2ef71b88, count: 8) = 8 9837.163 ( 0.001 ms): IO iothread1/620155 ppoll(ufds: 0x7fcb90002800, nfds: 4, tsp: 0x7fcb9f1342d0, sigsetsize: 8) = 3 9837.164 ( 0.001 ms): IO iothread1/620155 read(fd: 107, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.174 ( 0.001 ms): IO iothread1/620155 read(fd: 105, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.176 ( 0.001 ms): IO iothread1/620155 read(fd: 106, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.209 ( 0.035 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 32, iocbpp: 0x7fca7d0cebe0) = 32 174k IOPS (+3.6%), IOThread syscalls: 9809.566 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0cdd62be0) = 32 9809.625 ( 0.001 ms): IO iothread1/623061 write(fd: 103, buf: 0x5647cfba5f58, count: 8) = 8 9809.627 ( 0.002 ms): IO iothread1/623061 write(fd: 104, buf: 0x5647cfba5f58, count: 8) = 8 9809.663 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0d0388b50) = 32 Notice that ppoll(2) and eventfd read(2) syscalls are eliminated because the IOThread stays in polling mode instead of falling back to file descriptor monitoring. As usual, polling is not implemented on Windows so this patch ignores the new io_poll_read() callback in aio-win32.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20211207132336.36627-2-stefanha@redhat.com [Fixed up aio_set_event_notifier() calls in tests/unit/test-fdmon-epoll.c added after this series was queued. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-01-10Revert "virtio: introduce macro IRTIO_CONFIG_IRQ_IDX"Michael S. Tsirkin
This reverts commit bf1d85c166c19af95dbd27b1faba1d2909732323. Fixes: bf1d85c166 ("virtio: introduce macro IRTIO_CONFIG_IRQ_IDX") Cc: "Cindy Lu" <lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-10Revert "virtio-pci: decouple notifier from interrupt process"Michael S. Tsirkin
This reverts commit e3480ef81f6fb61cc9c04e3b5be8b7e84484fc05. Fixes: e3480ef81f ("virtio-pci: decouple notifier from interrupt process") Cc: "Cindy Lu" <lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-10Revert "virtio-pci: decouple the single vector from the interrupt process"Michael S. Tsirkin
This reverts commit 316011b8a74e777eb3ba03171cd701a291c28867. Fixes: 316011b8a7 ("virtio-pci: decouple the single vector from the interrupt process") Cc: "Cindy Lu" <lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-10Revert "vhost-vdpa: add support for config interrupt"Michael S. Tsirkin
This reverts commit 634f7c89fbd78f57d00d5d6b39c0ade9df1fe27f. Fixes: 634f7c89fb ("vhost-vdpa: add support for config interrupt") Cc: "Cindy Lu" <lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-10Revert "virtio: add support for configure interrupt"Michael S. Tsirkin
This reverts commit 081f864f56307551f59c5e934e3f30a7290d0faa. Fixes: 081f864f56 ("virtio: add support for configure interrupt") Cc: "Cindy Lu" <lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-10Revert "vhost: add support for configure interrupt"Michael S. Tsirkin
This reverts commit f7220a7ce21604a4bc6260ccca4dc9068c1f27f2. Fixes: f7220a7ce2 ("vhost: add support for configure interrupt") Cc: "Cindy Lu" <lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-10Revert "virtio-mmio: add support for configure interrupt"Michael S. Tsirkin
This reverts commit d48185f1a40d4e4ed2fa2873a42b2a5eb8748256. Fixes: d48185f1a4 ("virtio-mmio: add support for configure interrupt") Cc: "Cindy Lu" <lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-10Revert "virtio-pci: add support for configure interrupt"Michael S. Tsirkin
This reverts commit d5d24d859c3957ea1674d0e102f96439cdbfe93a. Fixes: d5d24d859c ("virtio-pci: add support for configure interrupt") Cc: "Cindy Lu" <lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07virtio/vhost-vsock: don't double close vhostfd, remove redundant cleanupDaniil Tatianin
In case of an error during initialization in vhost_dev_init, vhostfd is closed in vhost_dev_cleanup. Remove close from err_virtio as it's both redundant and causes a double close on vhostfd. Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Message-Id: <20211129125204.1108088-1-d-tatianin@yandex-team.ru> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07virtio-mem: Set "unplugged-inaccessible=auto" for the 7.0 machine on x86David Hildenbrand
Set the new default to "auto", keeping it set to "off" for compat machines. This property is only available for x86 targets. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134039.29670-4-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07virtio-mem: Support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLEDavid Hildenbrand
With VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we signal the VM that reading unplugged memory is not supported. We have to fail feature negotiation in case the guest does not support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE. First, VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE is required to properly handle memory backends (or architectures) without support for the shared zeropage in the hypervisor cleanly. Without the shared zeropage, even reading an unpopulated virtual memory location can populate real memory and consequently consume memory in the hypervisor. We have a guaranteed shared zeropage only on MAP_PRIVATE anonymous memory. Second, we want VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE to be the default long-term as even populating the shared zeropage can be problematic: for example, without THP support (possible) or without support for the shared huge zeropage with THP (unlikely), the PTE page tables to hold the shared zeropage entries can consume quite some memory that cannot be reclaimed easily. Third, there are other optimizations+features (e.g., protection of unplugged memory, reducing the total memory slot size and bitmap sizes) that will require VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE. We really only support x86 targets with virtio-mem for now (and Linux similarly only support x86), but that might change soon, so prepare for different targets already. Add a new "unplugged-inaccessible" tristate property for x86 targets: - "off" will keep VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE unset and legacy guests working. - "on" will set VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE and stop legacy guests from using the device. - "auto" selects the default based on support for the shared zeropage. Warn in case the property is set to "off" and we don't have support for the shared zeropage. For existing compat machines, the property will default to "off", to not change the behavior but eventually warn about a problematic setup. Short-term, we'll set the property default to "auto" for new QEMU machines. Mid-term, we'll set the property default to "on" for new QEMU machines. Long-term, we'll deprecate the parameter and disallow legacy guests completely. The property has to match on the migration source and destination. "auto" will result in the same VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE setting as long as the qemu command line (esp. memdev) match -- so "auto" is good enough for migration purposes and the parameter doesn't have to be migrated explicitly. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134039.29670-3-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07virtio: signal after wrapping packed used_idxStefan Hajnoczi
Packed Virtqueues wrap used_idx instead of letting it run freely like Split Virtqueues do. If the used ring wraps more than once there is no way to compare vq->signalled_used and vq->used_idx in virtio_packed_should_notify() since they are modulo vq->vring.num. This causes the device to stop sending used buffer notifications when when virtio_packed_should_notify() is called less than once each time around the used ring. It is possible to trigger this with virtio-blk's dataplane notify_guest_bh() irq coalescing optimization. The call to virtio_notify_irqfd() (and virtio_packed_should_notify()) is deferred to a BH. If the guest driver is polling it can complete and submit more requests before the BH executes, causing the used ring to wrap more than once. The result is that the virtio-blk device ceases to raise interrupts and I/O hangs. Cc: Tiwei Bie <tiwei.bie@intel.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20211130134510.267382-1-stefanha@redhat.com> Fixes: 86044b24e865fb9596ed77a4d0f3af8b90a088a1 ("virtio: basic packed virtqueue support") Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07virtio-mem: Support "prealloc=on" optionDavid Hildenbrand
For scarce memory resources, such as hugetlb, we want to be able to prealloc such memory resources in order to not crash later on access. On simple user errors we could otherwise easily run out of memory resources an crash the VM -- pretty much undesired. For ordinary memory devices, such as DIMMs, we preallocate memory via the memory backend for such use cases; however, with virtio-mem we're dealing with sparse memory backends; preallocating the whole memory backend destroys the whole purpose of virtio-mem. Instead, we want to preallocate memory when actually exposing memory to the VM dynamically, and fail plugging memory gracefully + warn the user in case preallocation fails. A common use case for hugetlb will be using "reserve=off,prealloc=off" for the memory backend and "prealloc=on" for the virtio-mem device. This way, no huge pages will be reserved for the process, but we can recover if there are no actual huge pages when plugging memory. Libvirt is already prepared for this. Note that preallocation cannot protect from the OOM killer -- which holds true for any kind of preallocation in QEMU. It's primarily useful only for scarce memory resources such as hugetlb, or shared file-backed memory. It's of little use for ordinary anonymous memory that can be swapped, KSM merged, ... but we won't forbid it. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-9-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07vhost: stick to -errno error return conventionRoman Kagan
The generic vhost code expects that many of the VhostOps methods in the respective backends set errno on errors. However, none of the existing backends actually bothers to do so. In a number of those methods errno from the failed call is clobbered by successful later calls to some library functions; on a few code paths the generic vhost code then negates and returns that errno, thus making failures look as successes to the caller. As a result, in certain scenarios (e.g. live migration) the device doesn't notice the first failure and goes on through its state transitions as if everything is ok, instead of taking recovery actions (break and reestablish the vhost-user connection, cancel migration, etc) before it's too late. To fix this, consolidate on the convention to return negated errno on failures throughout generic vhost, and use it for error propagation. Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru> Message-Id: <20211111153354.18807-10-rvkagan@yandex-team.ru> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>