aboutsummaryrefslogtreecommitdiff
path: root/hw/vfio
AgeCommit message (Collapse)Author
2019-03-04Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190304' into stagingPeter Maydell
s390x updates: - tcg: support the floating-point extension facility - vfio-ap: support hot(un)plug of vfio-ap device - fixes + cleanups # gpg: Signature made Mon 04 Mar 2019 11:55:39 GMT # gpg: using RSA key C3D0D66DC3624FF6A8C018CEDECF6B93C6F02FAF # gpg: issuer "cohuck@redhat.com" # gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>" [unknown] # gpg: aka "Cornelia Huck <huckc@linux.vnet.ibm.com>" [full] # gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>" [full] # gpg: aka "Cornelia Huck <cohuck@kernel.org>" [unknown] # gpg: aka "Cornelia Huck <cohuck@redhat.com>" [unknown] # Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF * remotes/cohuck/tags/s390x-20190304: (27 commits) s390x: Add floating-point extension facility to "qemu" cpu model s390x/tcg: Handle all rounding modes overwritten by BFP instructions s390x/tcg: Implement rounding mode and XxC for LOAD ROUNDED s390x/tcg: Implement XxC and checks for most FP instructions s390x/tcg: Prepare for IEEE-inexact-exception control (XxC) s390x/tcg: Refactor saving/restoring the bfp rounding mode s390x/tcg: Check for exceptions in SET BFP ROUNDING MODE s390x/tcg: Handle SET FPC AND LOAD FPC 3-bit BFP rounding modes s390x/tcg: Fix simulated-IEEE exceptions s390x/tcg: Refactor SET FPC AND SIGNAL handling s390x/tcg: Hide IEEE underflows in some scenarios s390x/tcg: Fix parts of IEEE exception handling s390x/tcg: Factor out conversion of softfloat exceptions s390x/tcg: Fix rounding from float128 to uint64_t/uint32_t s390x/tcg: Fix TEST DATA CLASS instructions s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY s390x/tcg: Implement LOAD LENGTHENED short HFP to long HFP s390x/tcg: Factor out gen_addi_and_wrap_i64() from get_address() s390x/tcg: Factor out vec_full_reg_offset() s390x/tcg: Clarify terminology in vec_reg_offset() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-04Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell
pci, pc, virtio: fixes, cleanups, tests Lots of work on tests: BiosTablesTest UEFI app, vhost-user testing for non-Linux hosts. Misc cleanups and fixes all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Fri 22 Feb 2019 15:51:40 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (26 commits) pci: Sanity test minimum downstream LNKSTA hw/smbios: fix offset of type 3 sku field pci: Move NVIDIA vendor id to the rest of ids virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size virtio-balloon: Use ram_block_discard_range() instead of raw madvise() virtio-balloon: Rework ballon_page() interface virtio-balloon: Corrections to address verification virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate i386/kvm: ignore masked irqs when update msi routes contrib/vhost-user-blk: fix the compilation issue Revert "contrib/vhost-user-blk: fix the compilation issue" pc-dimm: use same mechanism for [get|set]_addr tests/data: introduce "uefi-boot-images" with the "bios-tables-test" ISOs tests/uefi-test-tools: add build scripts tests: introduce "uefi-test-tools" with the BiosTablesTest UEFI app roms: build the EfiRom utility from the roms/edk2 submodule roms: add the edk2 project as a git submodule vhost-user-test: create a temporary directory per TestServer vhost-user-test: small changes to init_hugepagefs vhost-user-test: create a main loop per TestServer ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-04s390x/vfio-ap: Implement hot plug/unplug of vfio-ap deviceTony Krowiak
Introduces hot plug/unplug support for the vfio-ap device. To hot plug a vfio-ap device using the QEMU device_add command: (qemu) device_add vfio-ap,sysfsdev=$path-to-mdev Where $path-to-mdev is the absolute path to the mediated matrix device to which AP resources to be used by the guest have been assigned. A vfio-ap device can be hot plugged only if: 1. A vfio-ap device has not been attached to the virtual machine's ap-bus via the QEMU command line or a prior hot plug action. 2. The guest was started with the CPU model feature for AP enabled (e.g., -cpu host,ap=on) To hot unplug a vfio-ap device using the QEMU device_del command: (qemu) device_del vfio-ap,sysfsdev=$path-to-mdev Where $path-to-mdev is the absolute path to the mediated matrix device specified when the vfio-ap device was attached to the virtual machine's ap-bus. A vfio-ap device can be hot unplugged only if: 1. A vfio-ap device has been attached to the virtual machine's ap-bus via the QEMU command line or a prior hot plug action. 2. The guest was started with the CPU model feature for AP enabled (e.g., -cpu host,ap=on) Please note that a hot plug handler is not necessary for the vfio-ap device because the AP matrix configuration for the guest is performed by the kernel device driver when the vfio-ap device is realized. The vfio-ap device represents a VFIO mediated device created in the host sysfs for use by a guest. The mdev device is configured with an AP matrix (i.e., adapters and domains) via its sysfs attribute interfaces prior to starting the guest or plugging a vfio-ap device in. When the device is realized, a file descriptor is opened on the mdev device which results in a callback to the vfio_ap kernel device driver. The device driver then configures the AP matrix in the guest's SIE state description from the AP matrix assigned via the mdev device's sysfs interfaces. The AP devices will be created for the guest when the AP bus running on the guest subsequently performs its periodic scan for AP devices. The qdev_simple_device_unplug_cb() callback function is used for the same reaons; namely, the vfio_ap kernel device driver will perform the AP resource de-configuration for the guest when the vfio-ap device is unplugged. When the vfio-ap device is unrealized, the mdev device file descriptor is closed which results in a callback to the vfio_ap kernel device driver. The device driver then clears the AP matrix configuration in the guest's SIE state description and resets all of the affected queues. The AP devices created for the guest will be removed when the AP bus running on the guest subsequently performs its periodic scan and finds there are no longer any AP resources assigned to the guest. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Pierre Morel <pmorel@linux.ibm.com> Message-Id: <1550519397-25359-2-git-send-email-akrowiak@linux.ibm.com> [CH: adapt to changed qbus_set_hotplug_handler() signature] Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-02-22pci: Move NVIDIA vendor id to the rest of idsAlexey Kardashevskiy
sPAPR code will use it too so move it from VFIO to the common code. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190214051440.59167-1-aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-21hw/vfio/common: Refactor container initializationEric Auger
We introduce the vfio_init_container_type() helper. It computes the highest usable iommu type and then set the container and the iommu type. Its usage in vfio_connect_container() makes the code ready for addition of new iommu types. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-21vfio/common: Work around kernel overflow bug in DMA unmapAlex Williamson
A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which adds a test for address space wrap-around in the vfio DMA unmap path. Unfortunately due to overflow, the kernel detects an unmap of the last page in the 64-bit address space as a wrap-around. In QEMU, a Q35 guest with VT-d emulation and guest IOMMU enabled will attempt to make such an unmap request during VM system reset, triggering an error: qemu-kvm: VFIO_UNMAP_DMA: -22 qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef00000, 0xffffffff01100000) = -22 (Invalid argument) Here the IOVA start address (0xfef00000) and the size parameter (0xffffffff01100000) add to exactly 2^64, triggering the bug. A kernel fix is queued for the Linux v5.0 release to address this. This patch implements a workaround to retry the unmap, excluding the final page of the range when we detect an unmap failing which matches the requirements for this issue. This is expected to be a safe and complete workaround as the VT-d address space does not extend to the full 64-bit space and therefore the last page should never be mapped. This workaround can be removed once all kernels with this bug are sufficiently deprecated. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1662291 Reported-by: Pei Zhang <pezhang@redhat.com> Debugged-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-05hw/vfio/Makefile.objs: Create new CONFIG_* variables for VFIO core and PCIPaolo Bonzini
Make hw/vfio configurable and add new CONFIG_VFIO_* to the default-configs/s390x*-softmmu.mak. This allow a finer-grain selection of the various VFIO backends. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20190202072456.6468-28-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-05vfio: move conditional up to hw/Makefile.objsPaolo Bonzini
Instead of wrapping the entire Makefile.objs with an ifeq/endif, just include the directory only for Linux. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20190202072456.6468-4-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-24trace: forbid use of %m in trace event format stringsDaniel P. Berrangé
The '%m' format instructs glibc's printf()/syslog() implementation to insert the contents of strerror(errno). Since this is a glibc extension it should generally be avoided in QEMU due to need for portability to a variety of platforms. Even though vfio is Linux-only code that could otherwise use "%m", it must still be avoided in trace-events files because several of the backends do not use the format string and so this error information is invisible to them. The errno string value should be given as an explicit trace argument instead, making it accessible to all backends. This also allows it to work correctly with future patches that use the format string with systemtap's simple printf code. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-id: 20190123120016.4538-4-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-01-11qemu/queue.h: typedef QTAILQ headsPaolo Bonzini
This will be needed when we change the QTAILQ head and elem structs to unions. However, it is also consistent with the usage elsewhere in QEMU for other list head structs (see for example FsMountList). Note that most QTAILQs only need their name in order to do backwards walks. Those do not break with the struct->union change, and anyway the change will also remove the need to name heads when doing backwards walks, so those are not touched here. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11vfio: make vfio_address_spaces staticPaolo Bonzini
It is not used outside hw/vfio/common.c, so it does not need to be extern. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell
pci, pc, virtio: fixes, features VTD fixes IR and split irqchip are now the default for Q35 ACPI refactoring hotplug refactoring new names for virtio devices multiple pcie link width/speeds PCI fixes Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Thu 20 Dec 2018 18:26:03 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (44 commits) x86-iommu: turn on IR by default if proper x86-iommu: switch intr_supported to OnOffAuto type q35: set split kernel irqchip as default pci: Adjust PCI config limit based on bus topology spapr_pci: perform unplug via the hotplug handler pci/shpc: perform unplug via the hotplug handler pci: Reuse pci-bridge hotplug handler handlers for pcie-pci-bridge pci/pcie: perform unplug via the hotplug handler pci/pcihp: perform unplug via the hotplug handler pci/pcihp: overwrite hotplug handler recursively from the start pci/pcihp: perform check for bus capability in pre_plug handler s390x/pci: rename hotplug handler callbacks pci/shpc: rename hotplug handler callbacks pci/pcie: rename hotplug handler callbacks hw/i386: Remove deprecated machines pc-0.10 and pc-0.11 hw: acpi: Remove AcpiRsdpDescriptor and fix tests hw: acpi: Export and share the ARM RSDP build hw: arm: Support both legacy and current RSDP build hw: arm: Convert the RSDP build to the buid_append_foo() API hw: arm: Carry RSDP specific data through AcpiRsdpData ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-12-20Clean up includesMarkus Armbruster
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes, with the changes to the following files manually reverted: contrib/libvhost-user/libvhost-user-glib.h contrib/libvhost-user/libvhost-user.c contrib/libvhost-user/libvhost-user.h linux-user/mips64/cpu_loop.c linux-user/mips64/signal.c linux-user/sparc64/cpu_loop.c linux-user/sparc64/signal.c linux-user/x86_64/cpu_loop.c linux-user/x86_64/signal.c target/s390x/gen-features.c tests/migration/s390x/a-b-bios.c tests/test-rcu-simpleq.c tests/test-rcu-tailq.c Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20181204172535.2799-1-armbru@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Acked-by: Yuval Shaia <yuval.shaia@oracle.com> Acked-by: Viktor Prutyanov <viktor.prutyanov@phystech.edu>
2018-12-19vfio/pci: Remove PCIe Link Status emulationAlex Williamson
Now that the downstream port will virtually negotiate itself to the link status of the downstream device, we can remove this emulation. It's not clear that it was every terribly useful anyway. Tested-by: Geoffrey McRae <geoff@hostfission.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-19pcie: Create enums for link speed and widthAlex Williamson
In preparation for reporting higher virtual link speeds and widths, create enums and macros to help us manage them. Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Tested-by: Geoffrey McRae <geoff@hostfission.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-12vfio-ap: flag as compatible with balloonCornelia Huck
vfio-ap devices do not pin any pages in the host. Therefore, they are compatible with memory ballooning. Flag them as compatible, so both vfio-ap and a balloon can be used simultaneously. Cc: qemu-stable@nongnu.org Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-11-05s390x/vfio-ap: report correct errorCornelia Huck
If ioctl(..., VFIO_DEVICE_RESET) fails, we want to report errno instead of ret (which is always -1 on error). Fixes Coverity issue CID 1396176. Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-10-19vfio: Clean up error reporting after previous commitMarkus Armbruster
The previous commit changed vfio's warning messages from vfio warning: DEV-NAME: Could not frobnicate to warning: vfio DEV-NAME: Could not frobnicate To match this change, change error messages from vfio error: DEV-NAME: On fire to vfio DEV-NAME: On fire Note the loss of "error". If we think marking error messages that way is a good idea, we should mark *all* error messages, i.e. make error_report() print it. Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20181017082702.5581-7-armbru@redhat.com>
2018-10-19vfio: Use warn_report() & friends to report warningsMarkus Armbruster
The vfio code reports warnings like error_report(WARN_PREFIX "Could not frobnicate", DEV-NAME); where WARN_PREFIX is defined so the message comes out as vfio warning: DEV-NAME: Could not frobnicate This usage predates the introduction of warn_report() & friends in commit 97f40301f1d. It's time to convert to that interface. Since these functions already prefix the message with "warning: ", replace WARN_PREFIX by VFIO_MSG_PREFIX, so the messages come out like warning: vfio DEV-NAME: Could not frobnicate The next commit will replace ERR_PREFIX. Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20181017082702.5581-6-armbru@redhat.com>
2018-10-19error: Fix use of error_prepend() with &error_fatal, &error_abortMarkus Armbruster
From include/qapi/error.h: * Pass an existing error to the caller with the message modified: * error_propagate(errp, err); * error_prepend(errp, "Could not frobnicate '%s': ", name); Fei Li pointed out that doing error_propagate() first doesn't work well when @errp is &error_fatal or &error_abort: the error_prepend() is never reached. Since I doubt fixing the documentation will stop people from getting it wrong, introduce error_propagate_prepend(), in the hope that it lures people away from using its constituents in the wrong order. Update the instructions in error.h accordingly. Convert existing error_prepend() next to error_propagate to error_propagate_prepend(). If any of these get reached with &error_fatal or &error_abort, the error messages improve. I didn't check whether that's the case anywhere. Cc: Fei Li <fli@suse.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20181017082702.5581-2-armbru@redhat.com>
2018-10-15vfio-pci: make vfio-pci device more QOM conventionalLi Qiang
Define a TYPE_VFIO_PCI and drop DO_UPCAST. Signed-off-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-10-15vfio/platform: Make the vfio-platform device non-abstractEric Auger
Up to now the vfio-platform device has been abstract and could not be instantiated. The integration of a new vfio platform device required creating a dummy derived device which only set the compatible string. Following the few vfio-platform device integrations we have seen the actual requested adaptation happens on device tree node creation (sysbus-fdt). Hence remove the abstract setting, and read the list of compatible values from sysfs if not set by a derived device. Update the amd-xgbe and calxeda-xgmac drivers to fill in the number of compatible values, as there can now be more than one. Note that sysbus-fdt does not support the instantiation of the vfio-platform device yet. Signed-off-by: Eric Auger <eric.auger@redhat.com> [geert: Rebase, set user_creatable=true, use compatible values in sysfs instead of user-supplied manufacturer/model options, reword] Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-10-15hw/vfio/display: add ramfb supportGerd Hoffmann
So we have a boot display when using a vgpu as primary display. ramfb depends on a fw_cfg file. fw_cfg files can not be added and removed at runtime, therefore a ramfb-enabled vfio device can't be hotplugged. Add a nohotplug variant of the vfio-pci device (as child class). Add the ramfb property to the nohotplug variant only. So to enable the vgpu display with boot support use this: -device vfio-pci-nohotplug,display=on,ramfb=on,sysfsdev=... Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-10-12s390x/vfio: ap: Introduce VFIO AP deviceTony Krowiak
Introduces a VFIO based AP device. The device is defined via the QEMU command line by specifying: -device vfio-ap,sysfsdev=<path-to-mediated-matrix-device> There may be only one vfio-ap device configured for a guest. The mediated matrix device is created by the VFIO AP device driver by writing a UUID to a sysfs attribute file (see docs/vfio-ap.txt). The mediated matrix device will be named after the UUID. Symbolic links to the $uuid are created in many places, so the path to the mediated matrix device $uuid can be specified in any of the following ways: /sys/devices/vfio_ap/matrix/$uuid /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/devices/$uuid /sys/bus/mdev/devices/$uuid /sys/bus/mdev/drivers/vfio_mdev/$uuid When the vfio-ap device is realized, it acquires and opens the VFIO iommu group to which the mediated matrix device is bound. This causes a VFIO group notification event to be signaled. The vfio_ap device driver's group notification handler will get called at which time the device driver will configure the the AP devices to which the guest will be granted access. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Tested-by: Pierre Morel <pmorel@linux.ibm.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Pierre Morel <pmorel@linux.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <20181010170309.12045-6-akrowiak@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> [CH: added missing g_free and device category] Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-09-24qemu-error: add {error, warn}_report_once_condCornelia Huck
Add two functions to print an error/warning report once depending on a passed-in condition variable and flip it if printed. This is useful if you want to print a message not once-globally, but e.g. once-per-device. Inspired by warn_once() in hw/vfio/ccw.c, which has been replaced with warn_report_once_cond(). Signed-off-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20180830145902.27376-2-cohuck@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Function comments reworded] Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-08-23vfio/pci: Fix failure to close file descriptor on errorAlex Williamson
A new error path fails to close the device file descriptor when triggered by a ballooning incompatibility within the group. Fix it. Fixes: 238e91728503 ("vfio/ccw/pci: Allow devices to opt-in for ballooning") Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-23vfio/pci: Handle subsystem realpath() returning NULLAlex Williamson
Fix error reported by Coverity where realpath can return NULL, resulting in a segfault in strcmp(). This should never happen given that we're working through regularly structured sysfs paths, but trivial enough to easily avoid. Fixes: 238e91728503 ("vfio/ccw/pci: Allow devices to opt-in for ballooning") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-21vfio/spapr: Allow backing bigger guest IOMMU pages with smaller physical pagesAlexey Kardashevskiy
At the moment the PPC64/pseries guest only supports 4K/64K/16M IOMMU pages and POWER8 CPU supports the exact same set of page size so so far things worked fine. However POWER9 supports different set of sizes - 4K/64K/2M/1G and the last two - 2M and 1G - are not even allowed in the paravirt interface (RTAS DDW) so we always end up using 64K IOMMU pages, although we could back guest's 16MB IOMMU pages with 2MB pages on the host. This stores the supported host IOMMU page sizes in VFIOContainer and uses this later when creating a new DMA window. This uses the system page size (64k normally, 2M/16M/1G if hugepages used) as the upper limit of the IOMMU pagesize. This changes the type of @pagesize to uint64_t as this is what memory_region_iommu_get_min_page_size() returns and clz64() takes. There should be no behavioral changes on platforms other than pseries. The guest will keep using the IOMMU page size selected by the PHB pagesize property as this only changes the underlying hardware TCE table granularity. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-17vfio/ccw/pci: Allow devices to opt-in for ballooningAlex Williamson
If a vfio assigned device makes use of a physical IOMMU, then memory ballooning is necessarily inhibited due to the page pinning, lack of page level granularity at the IOMMU, and sufficient notifiers to both remove the page on balloon inflation and add it back on deflation. However, not all devices are backed by a physical IOMMU. In the case of mediated devices, if a vendor driver is well synchronized with the guest driver, such that only pages actively used by the guest driver are pinned by the host mdev vendor driver, then there should be no overlap between pages available for the balloon driver and pages actively in use by the device. Under these conditions, ballooning should be safe. vfio-ccw devices are always mediated devices and always operate under the constraints above. Therefore we can consider all vfio-ccw devices as balloon compatible. The situation is far from straightforward with vfio-pci. These devices can be physical devices with physical IOMMU backing or mediated devices where it is unknown whether a physical IOMMU is in use or whether the vendor driver is well synchronized to the working set of the guest driver. The safest approach is therefore to assume all vfio-pci devices are incompatible with ballooning, but allow user opt-in should they have further insight into mediated devices. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-17vfio: Inhibit ballooning based on group attachment to a containerAlex Williamson
We use a VFIOContainer to associate an AddressSpace to one or more VFIOGroups. The VFIOContainer represents the DMA context for that AdressSpace for those VFIOGroups and is synchronized to changes in that AddressSpace via a MemoryListener. For IOMMU backed devices, maintaining the DMA context for a VFIOGroup generally involves pinning a host virtual address in order to create a stable host physical address and then mapping a translation from the associated guest physical address to that host physical address into the IOMMU. While the above maintains the VFIOContainer synchronized to the QEMU memory API of the VM, memory ballooning occurs outside of that API. Inflating the memory balloon (ie. cooperatively capturing pages from the guest for use by the host) simply uses MADV_DONTNEED to "zap" pages from QEMU's host virtual address space. The page pinning and IOMMU mapping above remains in place, negating the host's ability to reuse the page, but the host virtual to host physical mapping of the page is invalidated outside of QEMU's memory API. When the balloon is later deflated, attempting to cooperatively return pages to the guest, the page is simply freed by the guest balloon driver, allowing it to be used in the guest and incurring a page fault when that occurs. The page fault maps a new host physical page backing the existing host virtual address, meanwhile the VFIOContainer still maintains the translation to the original host physical address. At this point the guest vCPU and any assigned devices will map different host physical addresses to the same guest physical address. Badness. The IOMMU typically does not have page level granularity with which it can track this mapping without also incurring inefficiencies in using page size mappings throughout. MMU notifiers in the host kernel also provide indicators for invalidating the mapping on balloon inflation, not for updating the mapping when the balloon is deflated. For these reasons we assume a default behavior that the mapping of each VFIOGroup into the VFIOContainer is incompatible with memory ballooning and increment the balloon inhibitor to match the attached VFIOGroups. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-07-11vfio/pci: do not set the PCIDevice 'has_rom' attributeCédric Le Goater
PCI devices needing a ROM allocate an optional MemoryRegion with pci_add_option_rom(). pci_del_option_rom() does the cleanup when the device is destroyed. The only action taken by this routine is to call vmstate_unregister_ram() which clears the id string of the optional ROM RAMBlock and now, also flags the RAMBlock as non-migratable. This was recently added by commit b895de502717 ("migration: discard non-migratable RAMBlocks"), . VFIO devices do their own loading of the PCI option ROM in vfio_pci_size_rom(). The memory region is switched to an I/O region and the PCI attribute 'has_rom' is set but the RAMBlock of the ROM region is not allocated. When the associated PCI device is deleted, pci_del_option_rom() calls vmstate_unregister_ram() which tries to flag a NULL RAMBlock, leading to a SEGV. It seems that 'has_rom' was set to have memory_region_destroy() called, but since commit 469b046ead06 ("memory: remove memory_region_destroy") this is not necessary anymore as the MemoryRegion is freed automagically. Remove the PCIDevice 'has_rom' attribute setting in vfio. Fixes: b895de502717 ("migration: discard non-migratable RAMBlocks") Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-07-02hw/vfio: Use the IEC binary prefix definitionsPhilippe Mathieu-Daudé
It eases code review, unit is explicit. Patch generated using: $ git grep -E '(1024|2048|4096|8192|(<<|>>).?(10|20|30))' hw/ include/hw/ and modified manually. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20180625124238.25339-38-f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-18vfio-ccw: add force unlimited prefetch propertyHalil Pasic
There is at least one guest (OS) such that although it does not rely on the guarantees provided by ORB 1 word 9 bit (aka unlimited prefetch, aka P bit) not being set, it fails to tell this to the machine. Usually this ain't a big deal, as the original purpose of the P bit is to allow for performance optimizations. vfio-ccw however can not provide the guarantees required if the bit is not set. It is not possible to implement support for the P bit not set without transitioning to lower level protocols for vfio-ccw. So let's give the user the opportunity to force setting the P bit, if the user knows this is safe. For self modifying channel programs forcing the P bit is not safe. If the P bit is forced for a self modifying channel program things are expected to break in strange ways. Let's also avoid warning multiple about P bit not set in the ORB in case P bit is not told to be forced, and designate the affected vfio-ccw device. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Suggested-by: Dong Jia Shi <bjsdjshi@linux.ibm.com> Acked-by: Jason J. Herne <jjherne@linux.ibm.com> Tested-by: Jason J. Herne <jjherne@linux.ibm.com> Message-Id: <20180524175828.3143-2-pasic@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-06-15iommu: Add IOMMU index argument to notifier APIsPeter Maydell
Add support for multiple IOMMU indexes to the IOMMU notifier APIs. When initializing a notifier with iommu_notifier_init(), the caller must pass the IOMMU index that it is interested in. When a change happens, the IOMMU implementation must pass memory_region_notify_iommu() the IOMMU index that has changed and that notifiers must be called for. IOMMUs which support only a single index don't need to change. Callers which only really support working with IOMMUs with a single index can use the result of passing MEMTXATTRS_UNSPECIFIED to memory_region_iommu_attrs_to_index(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180604152941.20374-3-peter.maydell@linaro.org
2018-06-05vfio/pci: Default display option to "off"Alex Williamson
Commit a9994687cb9b ("vfio/display: core & wireup") added display support to vfio-pci with the default being "auto", which breaks existing VMs when the vGPU requires GL support but had no previous requirement for a GL compatible configuration. "Off" is the safer default as we impose no new requirements to VM configurations. Fixes: a9994687cb9b ("vfio/display: core & wireup") Cc: qemu-stable@nongnu.org Cc: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-05vfio/quirks: Enable ioeventfd quirks to be handled by vfio directlyAlex Williamson
With vfio ioeventfd support, we can program vfio-pci to perform a specified BAR write when an eventfd is triggered. This allows the KVM ioeventfd to be wired directly to vfio-pci, entirely avoiding userspace handling for these events. On the same micro-benchmark where the ioeventfd got us to almost 90% of performance versus disabling the GeForce quirks, this gets us to within 95%. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-05vfio/quirks: ioeventfd quirk accelerationAlex Williamson
The NVIDIA BAR0 quirks virtualize the PCI config space mirrors found in device MMIO space. Normally PCI config space is considered a slow path and further optimization is unnecessary, however NVIDIA uses a register here to enable the MSI interrupt to re-trigger. Exiting to QEMU for this MSI-ACK handling can therefore rate limit our interrupt handling. Fortunately the MSI-ACK write is easily detected since the quirk MemoryRegion otherwise has very few accesses, so simply looking for consecutive writes with the same data is sufficient, in this case 10 consecutive writes with the same data and size is arbitrarily chosen. We configure the KVM ioeventfd with data match, so there's no risk of triggering for the wrong data or size, but we do risk that pathological driver behavior might consume all of QEMU's file descriptors, so we cap ourselves to 10 ioeventfds for this purpose. In support of the above, generic ioeventfd infrastructure is added for vfio quirks. This automatically initializes an ioeventfd list per quirk, disables and frees ioeventfds on exit, and allows ioeventfds marked as dynamic to be dropped on device reset. The rationale for this latter feature is that useful ioeventfds may depend on specific driver behavior and since we necessarily place a cap on our use of ioeventfds, a machine reset is a reasonable point at which to assume a new driver and re-profile. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-05vfio/quirks: Add quirk reset callbackAlex Williamson
Quirks can be self modifying, provide a hook to allow them to cleanup on device reset if desired. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-05vfio/quirks: Add common quirk alloc helperAlex Williamson
This will later be used to include list initialization. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-01Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into stagingPeter Maydell
* Linux header upgrade (Peter) * firmware.json definition (Laszlo) * IPMI migration fix (Corey) * QOM improvements (Alexey, Philippe, me) * Memory API cleanups (Jay, me, Tristan, Peter) * WHPX fixes and improvements (Lucian) * Chardev fixes (Marc-André) * IOMMU documentation improvements (Peter) * Coverity fixes (Peter, Philippe) * Include cleanup (Philippe) * -clock deprecation (Thomas) * Disable -sandbox unless CONFIG_SECCOMP (Yi Min Zhao) * Configurability improvements (me) # gpg: Signature made Fri 01 Jun 2018 17:42:13 BST # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (56 commits) hw: make virtio devices configurable via default-configs/ hw: allow compiling out SCSI memory: Make operations using MemoryRegionIoeventfd struct pass by pointer. char: Remove unwanted crlf conversion qdev: Remove DeviceClass::init() and ::exit() qdev: Simplify the SysBusDeviceClass::init path hw/i2c: Use DeviceClass::realize instead of I2CSlaveClass::init hw/i2c/smbus: Use DeviceClass::realize instead of SMBusDeviceClass::init target/i386/kvm.c: Remove compatibility shim for KVM_HINTS_REALTIME Update Linux headers to 4.17-rc6 target/i386/kvm.c: Handle renaming of KVM_HINTS_DEDICATED scripts/update-linux-headers: Handle kernel license no longer being one file scripts/update-linux-headers: Handle __aligned_u64 virtio-gpu-3d: Define VIRTIO_GPU_CAPSET_VIRGL2 elsewhere gdbstub: Prevent fd leakage docs/interop: add "firmware.json" ipmi: Use proper struct reference for KCS vmstate vmstate: Add a VSTRUCT type tcg: remove softfloat from --disable-tcg builds qemu-options: Mark the non-functional -clock option as deprecated ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-31vfio: Include "exec/address-spaces.h" directly in the source filePhilippe Mathieu-Daudé
No declaration of "hw/vfio/vfio-common.h" directly requires to include the "exec/address-spaces.h" header. To simplify dependencies and ease the upcoming cleanup of "exec/address-spaces.h", directly include it in the source file where the declaration are used. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20180528232719.4721-2-f4bug@amsat.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-05-31Make address_space_translate{, _cached}() take a MemTxAttrs argumentPeter Maydell
As part of plumbing MemTxAttrs down to the IOMMU translate method, add MemTxAttrs as an argument to address_space_translate() and address_space_translate_cached(). Callers either have an attrs value to hand, or don't care and can use MEMTXATTRS_UNSPECIFIED. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180521140402.23318-4-peter.maydell@linaro.org
2018-04-30vfio-ccw: introduce vfio_ccw_get_device()Greg Kurz
A recent patch fixed leaks of the dynamically allocated vcdev->vdev.name field in vfio_ccw_realize(), but we now have three freeing sites for it. This is unfortunate and seems to indicate something is wrong with its life cycle. The root issue is that vcdev->vdev.name is set before vfio_get_device() is called, which theoretically prevents to call vfio_put_device() to do the freeing. Well actually, we could call it anyway because vfio_put_base_device() is a nop if the device isn't attached, but this would be confusing. This patch hence moves all the logic of attaching the device, including the "already attached" check, to a separate vfio_ccw_get_device() function, counterpart of vfio_put_device(). While here, vfio_put_device() is renamed to vfio_ccw_put_device() for consistency. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <152326891065.266543.9487977590811413472.stgit@bahia.lan> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-04-27ui: introduce vfio_display_resetTina Zhang
During guest OS reboot, guest framebuffer is invalid. It will cause bugs, if the invalid guest framebuffer is still used by host. This patch is to introduce vfio_display_reset which is invoked during vfio display reset. This vfio_display_reset function is used to release the invalid display resource, disable scanout mode and replace the invalid surface with QemuConsole's DisplaySurafce. This patch can fix the GPU hang issue caused by gd_egl_draw during guest OS reboot. Changes v3->v4: - Move dma-buf based display check into the vfio_display_reset(). (Gerd) Changes v2->v3: - Limit vfio_display_reset to dma-buf based vfio display. (Gerd) Changes v1->v2: - Use dpy_gfx_update_full() update screen after reset. (Gerd) - Remove dpy_gfx_switch_surface(). (Gerd) Signed-off-by: Tina Zhang <tina.zhang@intel.com> Message-id: 1524820266-27079-3-git-send-email-tina.zhang@intel.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2018-04-09vfio-ccw: fix memory leaks in vfio_ccw_realize()Greg Kurz
If the subchannel is already attached or if vfio_get_device() fails, the code jumps to the 'out_device_err' label and doesn't free the string it has just allocated. The code should be reworked so that vcdev->vdev.name only gets set when the device has been attached, and freed when it is about to be detached. This could be achieved with the addition of a vfio_ccw_get_device() function that would be the counterpart of vfio_put_device(). But this is a more elaborate cleanup that should be done in a follow-up. For now, let's just add calls to g_free() on the buggy error paths. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <152311222681.203086.8874800175539040298.stgit@bahia> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-04-05vfio: Use a trace point when a RAM section cannot be DMA mappedEric Auger
Commit 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions") added an error message if a passed memory section address or size is not aligned to the page size and thus cannot be DMA mapped. This patch fixes the trace by printing the region name and the memory region section offset within the address space (instead of offset_within_region). We also turn the error_report into a trace event. Indeed, In some cases, the traces can be confusing to non expert end-users and let think the use case does not work (whereas it works as before). This is the case where a BAR is successively mapped at different GPAs and its sections are not compatible with dma map. The listener is called several times and traces are issued for each intermediate mapping. The end-user cannot easily match those GPAs against the final GPA output by lscpi. So let's keep those information to informed users. In mid term, the plan is to advise the user about BAR relocation relevance. Fixes: 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions") Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-03-13ppc/spapr, vfio: Turn off MSIX emulation for VFIO devicesAlexey Kardashevskiy
This adds a possibility for the platform to tell VFIO not to emulate MSIX so MMIO memory regions do not get split into chunks in flatview and the entire page can be registered as a KVM memory slot and make direct MMIO access possible for the guest. This enables the entire MSIX BAR mapping to the guest for the pseries platform in order to achieve the maximum MMIO preformance for certain devices. Tested on: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-03-13vfio-pci: Allow mmap of MSIX BARAlexey Kardashevskiy
At the moment we unconditionally avoid mapping MSIX data of a BAR and emulate MSIX table in QEMU. However it is 1) not always necessary as a platform may provide a paravirt interface for MSIX configuration; 2) can affect the speed of MMIO access by emulating them in QEMU when frequently accessed registers share same system page with MSIX data, this is particularly a problem for systems with the page size bigger than 4KB. A new capability - VFIO_REGION_INFO_CAP_MSIX_MAPPABLE - has been added to the kernel [1] which tells the userspace that mapping of the MSIX data is possible now. This makes use of it so from now on QEMU tries mapping the entire BAR as a whole and emulate MSIX on top of that. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-03-13vfio/pci: Relax DMA map errors for MMIO regionsAlexey Kardashevskiy
At the moment if vfio_memory_listener is registered in the system memory address space, it maps/unmaps every RAM memory region for DMA. It expects system page size aligned memory sections so vfio_dma_map would not fail and so far this has been the case. A mapping failure would be fatal. A side effect of such behavior is that some MMIO pages would not be mapped silently. However we are going to change MSIX BAR handling so we will end having non-aligned sections in vfio_memory_listener (more details is in the next patch) and vfio_dma_map will exit QEMU. In order to avoid fatal failures on what previously was not a failure and was just silently ignored, this checks the section alignment to the smallest supported IOMMU page size and prints an error if not aligned; it also prints an error if vfio_dma_map failed despite the page size check. Both errors are not fatal; only MMIO RAM regions are checked (aka "RAM device" regions). If the amount of errors printed is overwhelming, the MSIX relocation could be used to avoid excessive error output. This is unlikely to cause any behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: Fix Int128 bit ops] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-03-13vfio/display: adding dmabuf supportGerd Hoffmann
Wire up dmabuf-based display. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>