aboutsummaryrefslogtreecommitdiff
path: root/hw/vfio
AgeCommit message (Collapse)Author
2019-03-12Merge remote-tracking branch 'remotes/awilliam/tags/vfio-updates-20190311.0' ↵Peter Maydell
into staging VFIO updates 2019-03-11 - Resolution support for mdev displays supporting EDID interface (Gerd Hoffmann) # gpg: Signature made Mon 11 Mar 2019 19:17:39 GMT # gpg: using RSA key 239B9B6E3BB08B22 # gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>" [full] # gpg: aka "Alex Williamson <alex@shazbot.org>" [full] # gpg: aka "Alex Williamson <alwillia@redhat.com>" [full] # gpg: aka "Alex Williamson <alex.l.williamson@gmail.com>" [full] # Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B 8A90 239B 9B6E 3BB0 8B22 * remotes/awilliam/tags/vfio-updates-20190311.0: vfio/display: delay link up event vfio/display: add xres + yres properties vfio/display: add edid support. Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-12vfio: Make vfio_get_region_info_cap publicAlexey Kardashevskiy
This makes vfio_get_region_info_cap() to be used in quirks. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20190307050518.64968-3-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12vfio/spapr: Rename local systempagesize variableAlexey Kardashevskiy
The "systempagesize" name suggests that it is the host system page size while it is the smallest page size of memory backing the guest RAM so let's rename it to stop confusion. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20190227085149.38596-4-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12vfio/spapr: Fix indirect levels calculationAlexey Kardashevskiy
The current code assumes that we can address more bits on a PCI bus for DMA than we really can but there is no way knowing the actual limit. This makes a better guess for the number of levels and if the kernel fails to allocate that, this increases the level numbers till succeeded or reached the 64bit limit. This adds levels to the trace point. This may cause the kernel to warn about failed allocation: [65122.837458] Failed to allocate a TCE memory, level shift=28 which might happen if MAX_ORDER is not large enough as it can vary: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/Kconfig?h=v5.0-rc2#n727 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20190227085149.38596-3-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-11vfio/display: delay link up eventGerd Hoffmann
Kick the display link up event with a 0.1 sec delay, so the guest has a chance to notice the link down first. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> [update for redefined macro] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-03-11vfio/display: add xres + yres propertiesGerd Hoffmann
This allows configure the display resolution which the vgpu should use. The information will be passed to the guest using EDID, so the mdev driver must support the vfio edid region for this to work. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-03-11vfio/display: add edid support.Gerd Hoffmann
This patch adds EDID support to the vfio display (aka vgpu) code. When supported by the mdev driver qemu will generate a EDID blob and pass it on using the new vfio edid region. The EDID blob will be updated on UI changes (i.e. window resize), so the guest can adapt. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> [remove control flow via macro, use unsigned format specifier] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-03-11vfio-pci: enable by defaultPaolo Bonzini
CONFIG_VFIO_PCI was not "default y" - and once you do that, it is also important to disable it if PCI is not there. Reported-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-07vfio: express vfio dependencies with KconfigPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-07build: switch to KconfigPaolo Bonzini
The make_device_config.sh script is replaced by minikconf, which is modified to support the same command line as its predecessor. The roots of the parsing are default-configs/*.mak, Kconfig.host and hw/Kconfig. One difference with make_device_config.sh is that all symbols have to be defined in a Kconfig file, including those coming from the configure script. This is the reason for the Kconfig.host file introduced in the previous patch. Whenever a file in default-configs/*.mak used $(...) to refer to a config-host.mak symbol, this is replaced by a Kconfig dependency; this part must be done already in this patch for bisectability. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Acked-by: Thomas Huth <thuth@redhat.com> Message-Id: <20190123065618.3520-28-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-07kconfig: introduce kconfig filesPaolo Bonzini
The Kconfig files were generated mostly with this script: for i in `grep -ho CONFIG_[A-Z0-9_]* default-configs/* | sort -u`; do set fnord `git grep -lw $i -- 'hw/*/Makefile.objs' ` shift if test $# = 1; then cat >> $(dirname $1)/Kconfig << EOF config ${i#CONFIG_} bool EOF git add $(dirname $1)/Kconfig else echo $i $* fi done sed -i '$d' hw/*/Kconfig for i in hw/*; do if test -d $i && ! test -f $i/Kconfig; then touch $i/Kconfig git add $i/Kconfig fi done Whenever a symbol is referenced from multiple subdirectories, the script prints the list of directories that reference the symbol. These symbols have to be added manually to the Kconfig files. Kconfig.host and hw/Kconfig were created manually. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20190123065618.3520-27-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-04Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190304' into stagingPeter Maydell
s390x updates: - tcg: support the floating-point extension facility - vfio-ap: support hot(un)plug of vfio-ap device - fixes + cleanups # gpg: Signature made Mon 04 Mar 2019 11:55:39 GMT # gpg: using RSA key C3D0D66DC3624FF6A8C018CEDECF6B93C6F02FAF # gpg: issuer "cohuck@redhat.com" # gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>" [unknown] # gpg: aka "Cornelia Huck <huckc@linux.vnet.ibm.com>" [full] # gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>" [full] # gpg: aka "Cornelia Huck <cohuck@kernel.org>" [unknown] # gpg: aka "Cornelia Huck <cohuck@redhat.com>" [unknown] # Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF * remotes/cohuck/tags/s390x-20190304: (27 commits) s390x: Add floating-point extension facility to "qemu" cpu model s390x/tcg: Handle all rounding modes overwritten by BFP instructions s390x/tcg: Implement rounding mode and XxC for LOAD ROUNDED s390x/tcg: Implement XxC and checks for most FP instructions s390x/tcg: Prepare for IEEE-inexact-exception control (XxC) s390x/tcg: Refactor saving/restoring the bfp rounding mode s390x/tcg: Check for exceptions in SET BFP ROUNDING MODE s390x/tcg: Handle SET FPC AND LOAD FPC 3-bit BFP rounding modes s390x/tcg: Fix simulated-IEEE exceptions s390x/tcg: Refactor SET FPC AND SIGNAL handling s390x/tcg: Hide IEEE underflows in some scenarios s390x/tcg: Fix parts of IEEE exception handling s390x/tcg: Factor out conversion of softfloat exceptions s390x/tcg: Fix rounding from float128 to uint64_t/uint32_t s390x/tcg: Fix TEST DATA CLASS instructions s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY s390x/tcg: Implement LOAD LENGTHENED short HFP to long HFP s390x/tcg: Factor out gen_addi_and_wrap_i64() from get_address() s390x/tcg: Factor out vec_full_reg_offset() s390x/tcg: Clarify terminology in vec_reg_offset() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-04Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell
pci, pc, virtio: fixes, cleanups, tests Lots of work on tests: BiosTablesTest UEFI app, vhost-user testing for non-Linux hosts. Misc cleanups and fixes all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Fri 22 Feb 2019 15:51:40 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (26 commits) pci: Sanity test minimum downstream LNKSTA hw/smbios: fix offset of type 3 sku field pci: Move NVIDIA vendor id to the rest of ids virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size virtio-balloon: Use ram_block_discard_range() instead of raw madvise() virtio-balloon: Rework ballon_page() interface virtio-balloon: Corrections to address verification virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate i386/kvm: ignore masked irqs when update msi routes contrib/vhost-user-blk: fix the compilation issue Revert "contrib/vhost-user-blk: fix the compilation issue" pc-dimm: use same mechanism for [get|set]_addr tests/data: introduce "uefi-boot-images" with the "bios-tables-test" ISOs tests/uefi-test-tools: add build scripts tests: introduce "uefi-test-tools" with the BiosTablesTest UEFI app roms: build the EfiRom utility from the roms/edk2 submodule roms: add the edk2 project as a git submodule vhost-user-test: create a temporary directory per TestServer vhost-user-test: small changes to init_hugepagefs vhost-user-test: create a main loop per TestServer ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-04s390x/vfio-ap: Implement hot plug/unplug of vfio-ap deviceTony Krowiak
Introduces hot plug/unplug support for the vfio-ap device. To hot plug a vfio-ap device using the QEMU device_add command: (qemu) device_add vfio-ap,sysfsdev=$path-to-mdev Where $path-to-mdev is the absolute path to the mediated matrix device to which AP resources to be used by the guest have been assigned. A vfio-ap device can be hot plugged only if: 1. A vfio-ap device has not been attached to the virtual machine's ap-bus via the QEMU command line or a prior hot plug action. 2. The guest was started with the CPU model feature for AP enabled (e.g., -cpu host,ap=on) To hot unplug a vfio-ap device using the QEMU device_del command: (qemu) device_del vfio-ap,sysfsdev=$path-to-mdev Where $path-to-mdev is the absolute path to the mediated matrix device specified when the vfio-ap device was attached to the virtual machine's ap-bus. A vfio-ap device can be hot unplugged only if: 1. A vfio-ap device has been attached to the virtual machine's ap-bus via the QEMU command line or a prior hot plug action. 2. The guest was started with the CPU model feature for AP enabled (e.g., -cpu host,ap=on) Please note that a hot plug handler is not necessary for the vfio-ap device because the AP matrix configuration for the guest is performed by the kernel device driver when the vfio-ap device is realized. The vfio-ap device represents a VFIO mediated device created in the host sysfs for use by a guest. The mdev device is configured with an AP matrix (i.e., adapters and domains) via its sysfs attribute interfaces prior to starting the guest or plugging a vfio-ap device in. When the device is realized, a file descriptor is opened on the mdev device which results in a callback to the vfio_ap kernel device driver. The device driver then configures the AP matrix in the guest's SIE state description from the AP matrix assigned via the mdev device's sysfs interfaces. The AP devices will be created for the guest when the AP bus running on the guest subsequently performs its periodic scan for AP devices. The qdev_simple_device_unplug_cb() callback function is used for the same reaons; namely, the vfio_ap kernel device driver will perform the AP resource de-configuration for the guest when the vfio-ap device is unplugged. When the vfio-ap device is unrealized, the mdev device file descriptor is closed which results in a callback to the vfio_ap kernel device driver. The device driver then clears the AP matrix configuration in the guest's SIE state description and resets all of the affected queues. The AP devices created for the guest will be removed when the AP bus running on the guest subsequently performs its periodic scan and finds there are no longer any AP resources assigned to the guest. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Pierre Morel <pmorel@linux.ibm.com> Message-Id: <1550519397-25359-2-git-send-email-akrowiak@linux.ibm.com> [CH: adapt to changed qbus_set_hotplug_handler() signature] Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-02-22pci: Move NVIDIA vendor id to the rest of idsAlexey Kardashevskiy
sPAPR code will use it too so move it from VFIO to the common code. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190214051440.59167-1-aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-21hw/vfio/common: Refactor container initializationEric Auger
We introduce the vfio_init_container_type() helper. It computes the highest usable iommu type and then set the container and the iommu type. Its usage in vfio_connect_container() makes the code ready for addition of new iommu types. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-21vfio/common: Work around kernel overflow bug in DMA unmapAlex Williamson
A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which adds a test for address space wrap-around in the vfio DMA unmap path. Unfortunately due to overflow, the kernel detects an unmap of the last page in the 64-bit address space as a wrap-around. In QEMU, a Q35 guest with VT-d emulation and guest IOMMU enabled will attempt to make such an unmap request during VM system reset, triggering an error: qemu-kvm: VFIO_UNMAP_DMA: -22 qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef00000, 0xffffffff01100000) = -22 (Invalid argument) Here the IOVA start address (0xfef00000) and the size parameter (0xffffffff01100000) add to exactly 2^64, triggering the bug. A kernel fix is queued for the Linux v5.0 release to address this. This patch implements a workaround to retry the unmap, excluding the final page of the range when we detect an unmap failing which matches the requirements for this issue. This is expected to be a safe and complete workaround as the VT-d address space does not extend to the full 64-bit space and therefore the last page should never be mapped. This workaround can be removed once all kernels with this bug are sufficiently deprecated. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1662291 Reported-by: Pei Zhang <pezhang@redhat.com> Debugged-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-05hw/vfio/Makefile.objs: Create new CONFIG_* variables for VFIO core and PCIPaolo Bonzini
Make hw/vfio configurable and add new CONFIG_VFIO_* to the default-configs/s390x*-softmmu.mak. This allow a finer-grain selection of the various VFIO backends. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20190202072456.6468-28-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-05vfio: move conditional up to hw/Makefile.objsPaolo Bonzini
Instead of wrapping the entire Makefile.objs with an ifeq/endif, just include the directory only for Linux. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20190202072456.6468-4-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-24trace: forbid use of %m in trace event format stringsDaniel P. Berrangé
The '%m' format instructs glibc's printf()/syslog() implementation to insert the contents of strerror(errno). Since this is a glibc extension it should generally be avoided in QEMU due to need for portability to a variety of platforms. Even though vfio is Linux-only code that could otherwise use "%m", it must still be avoided in trace-events files because several of the backends do not use the format string and so this error information is invisible to them. The errno string value should be given as an explicit trace argument instead, making it accessible to all backends. This also allows it to work correctly with future patches that use the format string with systemtap's simple printf code. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-id: 20190123120016.4538-4-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-01-11qemu/queue.h: typedef QTAILQ headsPaolo Bonzini
This will be needed when we change the QTAILQ head and elem structs to unions. However, it is also consistent with the usage elsewhere in QEMU for other list head structs (see for example FsMountList). Note that most QTAILQs only need their name in order to do backwards walks. Those do not break with the struct->union change, and anyway the change will also remove the need to name heads when doing backwards walks, so those are not touched here. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11vfio: make vfio_address_spaces staticPaolo Bonzini
It is not used outside hw/vfio/common.c, so it does not need to be extern. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell
pci, pc, virtio: fixes, features VTD fixes IR and split irqchip are now the default for Q35 ACPI refactoring hotplug refactoring new names for virtio devices multiple pcie link width/speeds PCI fixes Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Thu 20 Dec 2018 18:26:03 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (44 commits) x86-iommu: turn on IR by default if proper x86-iommu: switch intr_supported to OnOffAuto type q35: set split kernel irqchip as default pci: Adjust PCI config limit based on bus topology spapr_pci: perform unplug via the hotplug handler pci/shpc: perform unplug via the hotplug handler pci: Reuse pci-bridge hotplug handler handlers for pcie-pci-bridge pci/pcie: perform unplug via the hotplug handler pci/pcihp: perform unplug via the hotplug handler pci/pcihp: overwrite hotplug handler recursively from the start pci/pcihp: perform check for bus capability in pre_plug handler s390x/pci: rename hotplug handler callbacks pci/shpc: rename hotplug handler callbacks pci/pcie: rename hotplug handler callbacks hw/i386: Remove deprecated machines pc-0.10 and pc-0.11 hw: acpi: Remove AcpiRsdpDescriptor and fix tests hw: acpi: Export and share the ARM RSDP build hw: arm: Support both legacy and current RSDP build hw: arm: Convert the RSDP build to the buid_append_foo() API hw: arm: Carry RSDP specific data through AcpiRsdpData ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-12-20Clean up includesMarkus Armbruster
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes, with the changes to the following files manually reverted: contrib/libvhost-user/libvhost-user-glib.h contrib/libvhost-user/libvhost-user.c contrib/libvhost-user/libvhost-user.h linux-user/mips64/cpu_loop.c linux-user/mips64/signal.c linux-user/sparc64/cpu_loop.c linux-user/sparc64/signal.c linux-user/x86_64/cpu_loop.c linux-user/x86_64/signal.c target/s390x/gen-features.c tests/migration/s390x/a-b-bios.c tests/test-rcu-simpleq.c tests/test-rcu-tailq.c Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20181204172535.2799-1-armbru@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Acked-by: Yuval Shaia <yuval.shaia@oracle.com> Acked-by: Viktor Prutyanov <viktor.prutyanov@phystech.edu>
2018-12-19vfio/pci: Remove PCIe Link Status emulationAlex Williamson
Now that the downstream port will virtually negotiate itself to the link status of the downstream device, we can remove this emulation. It's not clear that it was every terribly useful anyway. Tested-by: Geoffrey McRae <geoff@hostfission.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-19pcie: Create enums for link speed and widthAlex Williamson
In preparation for reporting higher virtual link speeds and widths, create enums and macros to help us manage them. Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Tested-by: Geoffrey McRae <geoff@hostfission.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-12vfio-ap: flag as compatible with balloonCornelia Huck
vfio-ap devices do not pin any pages in the host. Therefore, they are compatible with memory ballooning. Flag them as compatible, so both vfio-ap and a balloon can be used simultaneously. Cc: qemu-stable@nongnu.org Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-11-05s390x/vfio-ap: report correct errorCornelia Huck
If ioctl(..., VFIO_DEVICE_RESET) fails, we want to report errno instead of ret (which is always -1 on error). Fixes Coverity issue CID 1396176. Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-10-19vfio: Clean up error reporting after previous commitMarkus Armbruster
The previous commit changed vfio's warning messages from vfio warning: DEV-NAME: Could not frobnicate to warning: vfio DEV-NAME: Could not frobnicate To match this change, change error messages from vfio error: DEV-NAME: On fire to vfio DEV-NAME: On fire Note the loss of "error". If we think marking error messages that way is a good idea, we should mark *all* error messages, i.e. make error_report() print it. Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20181017082702.5581-7-armbru@redhat.com>
2018-10-19vfio: Use warn_report() & friends to report warningsMarkus Armbruster
The vfio code reports warnings like error_report(WARN_PREFIX "Could not frobnicate", DEV-NAME); where WARN_PREFIX is defined so the message comes out as vfio warning: DEV-NAME: Could not frobnicate This usage predates the introduction of warn_report() & friends in commit 97f40301f1d. It's time to convert to that interface. Since these functions already prefix the message with "warning: ", replace WARN_PREFIX by VFIO_MSG_PREFIX, so the messages come out like warning: vfio DEV-NAME: Could not frobnicate The next commit will replace ERR_PREFIX. Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20181017082702.5581-6-armbru@redhat.com>
2018-10-19error: Fix use of error_prepend() with &error_fatal, &error_abortMarkus Armbruster
From include/qapi/error.h: * Pass an existing error to the caller with the message modified: * error_propagate(errp, err); * error_prepend(errp, "Could not frobnicate '%s': ", name); Fei Li pointed out that doing error_propagate() first doesn't work well when @errp is &error_fatal or &error_abort: the error_prepend() is never reached. Since I doubt fixing the documentation will stop people from getting it wrong, introduce error_propagate_prepend(), in the hope that it lures people away from using its constituents in the wrong order. Update the instructions in error.h accordingly. Convert existing error_prepend() next to error_propagate to error_propagate_prepend(). If any of these get reached with &error_fatal or &error_abort, the error messages improve. I didn't check whether that's the case anywhere. Cc: Fei Li <fli@suse.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20181017082702.5581-2-armbru@redhat.com>
2018-10-15vfio-pci: make vfio-pci device more QOM conventionalLi Qiang
Define a TYPE_VFIO_PCI and drop DO_UPCAST. Signed-off-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-10-15vfio/platform: Make the vfio-platform device non-abstractEric Auger
Up to now the vfio-platform device has been abstract and could not be instantiated. The integration of a new vfio platform device required creating a dummy derived device which only set the compatible string. Following the few vfio-platform device integrations we have seen the actual requested adaptation happens on device tree node creation (sysbus-fdt). Hence remove the abstract setting, and read the list of compatible values from sysfs if not set by a derived device. Update the amd-xgbe and calxeda-xgmac drivers to fill in the number of compatible values, as there can now be more than one. Note that sysbus-fdt does not support the instantiation of the vfio-platform device yet. Signed-off-by: Eric Auger <eric.auger@redhat.com> [geert: Rebase, set user_creatable=true, use compatible values in sysfs instead of user-supplied manufacturer/model options, reword] Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-10-15hw/vfio/display: add ramfb supportGerd Hoffmann
So we have a boot display when using a vgpu as primary display. ramfb depends on a fw_cfg file. fw_cfg files can not be added and removed at runtime, therefore a ramfb-enabled vfio device can't be hotplugged. Add a nohotplug variant of the vfio-pci device (as child class). Add the ramfb property to the nohotplug variant only. So to enable the vgpu display with boot support use this: -device vfio-pci-nohotplug,display=on,ramfb=on,sysfsdev=... Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-10-12s390x/vfio: ap: Introduce VFIO AP deviceTony Krowiak
Introduces a VFIO based AP device. The device is defined via the QEMU command line by specifying: -device vfio-ap,sysfsdev=<path-to-mediated-matrix-device> There may be only one vfio-ap device configured for a guest. The mediated matrix device is created by the VFIO AP device driver by writing a UUID to a sysfs attribute file (see docs/vfio-ap.txt). The mediated matrix device will be named after the UUID. Symbolic links to the $uuid are created in many places, so the path to the mediated matrix device $uuid can be specified in any of the following ways: /sys/devices/vfio_ap/matrix/$uuid /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/devices/$uuid /sys/bus/mdev/devices/$uuid /sys/bus/mdev/drivers/vfio_mdev/$uuid When the vfio-ap device is realized, it acquires and opens the VFIO iommu group to which the mediated matrix device is bound. This causes a VFIO group notification event to be signaled. The vfio_ap device driver's group notification handler will get called at which time the device driver will configure the the AP devices to which the guest will be granted access. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Tested-by: Pierre Morel <pmorel@linux.ibm.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Pierre Morel <pmorel@linux.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <20181010170309.12045-6-akrowiak@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> [CH: added missing g_free and device category] Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-09-24qemu-error: add {error, warn}_report_once_condCornelia Huck
Add two functions to print an error/warning report once depending on a passed-in condition variable and flip it if printed. This is useful if you want to print a message not once-globally, but e.g. once-per-device. Inspired by warn_once() in hw/vfio/ccw.c, which has been replaced with warn_report_once_cond(). Signed-off-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20180830145902.27376-2-cohuck@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Function comments reworded] Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-08-23vfio/pci: Fix failure to close file descriptor on errorAlex Williamson
A new error path fails to close the device file descriptor when triggered by a ballooning incompatibility within the group. Fix it. Fixes: 238e91728503 ("vfio/ccw/pci: Allow devices to opt-in for ballooning") Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-23vfio/pci: Handle subsystem realpath() returning NULLAlex Williamson
Fix error reported by Coverity where realpath can return NULL, resulting in a segfault in strcmp(). This should never happen given that we're working through regularly structured sysfs paths, but trivial enough to easily avoid. Fixes: 238e91728503 ("vfio/ccw/pci: Allow devices to opt-in for ballooning") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-21vfio/spapr: Allow backing bigger guest IOMMU pages with smaller physical pagesAlexey Kardashevskiy
At the moment the PPC64/pseries guest only supports 4K/64K/16M IOMMU pages and POWER8 CPU supports the exact same set of page size so so far things worked fine. However POWER9 supports different set of sizes - 4K/64K/2M/1G and the last two - 2M and 1G - are not even allowed in the paravirt interface (RTAS DDW) so we always end up using 64K IOMMU pages, although we could back guest's 16MB IOMMU pages with 2MB pages on the host. This stores the supported host IOMMU page sizes in VFIOContainer and uses this later when creating a new DMA window. This uses the system page size (64k normally, 2M/16M/1G if hugepages used) as the upper limit of the IOMMU pagesize. This changes the type of @pagesize to uint64_t as this is what memory_region_iommu_get_min_page_size() returns and clz64() takes. There should be no behavioral changes on platforms other than pseries. The guest will keep using the IOMMU page size selected by the PHB pagesize property as this only changes the underlying hardware TCE table granularity. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-17vfio/ccw/pci: Allow devices to opt-in for ballooningAlex Williamson
If a vfio assigned device makes use of a physical IOMMU, then memory ballooning is necessarily inhibited due to the page pinning, lack of page level granularity at the IOMMU, and sufficient notifiers to both remove the page on balloon inflation and add it back on deflation. However, not all devices are backed by a physical IOMMU. In the case of mediated devices, if a vendor driver is well synchronized with the guest driver, such that only pages actively used by the guest driver are pinned by the host mdev vendor driver, then there should be no overlap between pages available for the balloon driver and pages actively in use by the device. Under these conditions, ballooning should be safe. vfio-ccw devices are always mediated devices and always operate under the constraints above. Therefore we can consider all vfio-ccw devices as balloon compatible. The situation is far from straightforward with vfio-pci. These devices can be physical devices with physical IOMMU backing or mediated devices where it is unknown whether a physical IOMMU is in use or whether the vendor driver is well synchronized to the working set of the guest driver. The safest approach is therefore to assume all vfio-pci devices are incompatible with ballooning, but allow user opt-in should they have further insight into mediated devices. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-17vfio: Inhibit ballooning based on group attachment to a containerAlex Williamson
We use a VFIOContainer to associate an AddressSpace to one or more VFIOGroups. The VFIOContainer represents the DMA context for that AdressSpace for those VFIOGroups and is synchronized to changes in that AddressSpace via a MemoryListener. For IOMMU backed devices, maintaining the DMA context for a VFIOGroup generally involves pinning a host virtual address in order to create a stable host physical address and then mapping a translation from the associated guest physical address to that host physical address into the IOMMU. While the above maintains the VFIOContainer synchronized to the QEMU memory API of the VM, memory ballooning occurs outside of that API. Inflating the memory balloon (ie. cooperatively capturing pages from the guest for use by the host) simply uses MADV_DONTNEED to "zap" pages from QEMU's host virtual address space. The page pinning and IOMMU mapping above remains in place, negating the host's ability to reuse the page, but the host virtual to host physical mapping of the page is invalidated outside of QEMU's memory API. When the balloon is later deflated, attempting to cooperatively return pages to the guest, the page is simply freed by the guest balloon driver, allowing it to be used in the guest and incurring a page fault when that occurs. The page fault maps a new host physical page backing the existing host virtual address, meanwhile the VFIOContainer still maintains the translation to the original host physical address. At this point the guest vCPU and any assigned devices will map different host physical addresses to the same guest physical address. Badness. The IOMMU typically does not have page level granularity with which it can track this mapping without also incurring inefficiencies in using page size mappings throughout. MMU notifiers in the host kernel also provide indicators for invalidating the mapping on balloon inflation, not for updating the mapping when the balloon is deflated. For these reasons we assume a default behavior that the mapping of each VFIOGroup into the VFIOContainer is incompatible with memory ballooning and increment the balloon inhibitor to match the attached VFIOGroups. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-07-11vfio/pci: do not set the PCIDevice 'has_rom' attributeCédric Le Goater
PCI devices needing a ROM allocate an optional MemoryRegion with pci_add_option_rom(). pci_del_option_rom() does the cleanup when the device is destroyed. The only action taken by this routine is to call vmstate_unregister_ram() which clears the id string of the optional ROM RAMBlock and now, also flags the RAMBlock as non-migratable. This was recently added by commit b895de502717 ("migration: discard non-migratable RAMBlocks"), . VFIO devices do their own loading of the PCI option ROM in vfio_pci_size_rom(). The memory region is switched to an I/O region and the PCI attribute 'has_rom' is set but the RAMBlock of the ROM region is not allocated. When the associated PCI device is deleted, pci_del_option_rom() calls vmstate_unregister_ram() which tries to flag a NULL RAMBlock, leading to a SEGV. It seems that 'has_rom' was set to have memory_region_destroy() called, but since commit 469b046ead06 ("memory: remove memory_region_destroy") this is not necessary anymore as the MemoryRegion is freed automagically. Remove the PCIDevice 'has_rom' attribute setting in vfio. Fixes: b895de502717 ("migration: discard non-migratable RAMBlocks") Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-07-02hw/vfio: Use the IEC binary prefix definitionsPhilippe Mathieu-Daudé
It eases code review, unit is explicit. Patch generated using: $ git grep -E '(1024|2048|4096|8192|(<<|>>).?(10|20|30))' hw/ include/hw/ and modified manually. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20180625124238.25339-38-f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-18vfio-ccw: add force unlimited prefetch propertyHalil Pasic
There is at least one guest (OS) such that although it does not rely on the guarantees provided by ORB 1 word 9 bit (aka unlimited prefetch, aka P bit) not being set, it fails to tell this to the machine. Usually this ain't a big deal, as the original purpose of the P bit is to allow for performance optimizations. vfio-ccw however can not provide the guarantees required if the bit is not set. It is not possible to implement support for the P bit not set without transitioning to lower level protocols for vfio-ccw. So let's give the user the opportunity to force setting the P bit, if the user knows this is safe. For self modifying channel programs forcing the P bit is not safe. If the P bit is forced for a self modifying channel program things are expected to break in strange ways. Let's also avoid warning multiple about P bit not set in the ORB in case P bit is not told to be forced, and designate the affected vfio-ccw device. Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Suggested-by: Dong Jia Shi <bjsdjshi@linux.ibm.com> Acked-by: Jason J. Herne <jjherne@linux.ibm.com> Tested-by: Jason J. Herne <jjherne@linux.ibm.com> Message-Id: <20180524175828.3143-2-pasic@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-06-15iommu: Add IOMMU index argument to notifier APIsPeter Maydell
Add support for multiple IOMMU indexes to the IOMMU notifier APIs. When initializing a notifier with iommu_notifier_init(), the caller must pass the IOMMU index that it is interested in. When a change happens, the IOMMU implementation must pass memory_region_notify_iommu() the IOMMU index that has changed and that notifiers must be called for. IOMMUs which support only a single index don't need to change. Callers which only really support working with IOMMUs with a single index can use the result of passing MEMTXATTRS_UNSPECIFIED to memory_region_iommu_attrs_to_index(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180604152941.20374-3-peter.maydell@linaro.org
2018-06-05vfio/pci: Default display option to "off"Alex Williamson
Commit a9994687cb9b ("vfio/display: core & wireup") added display support to vfio-pci with the default being "auto", which breaks existing VMs when the vGPU requires GL support but had no previous requirement for a GL compatible configuration. "Off" is the safer default as we impose no new requirements to VM configurations. Fixes: a9994687cb9b ("vfio/display: core & wireup") Cc: qemu-stable@nongnu.org Cc: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-05vfio/quirks: Enable ioeventfd quirks to be handled by vfio directlyAlex Williamson
With vfio ioeventfd support, we can program vfio-pci to perform a specified BAR write when an eventfd is triggered. This allows the KVM ioeventfd to be wired directly to vfio-pci, entirely avoiding userspace handling for these events. On the same micro-benchmark where the ioeventfd got us to almost 90% of performance versus disabling the GeForce quirks, this gets us to within 95%. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-05vfio/quirks: ioeventfd quirk accelerationAlex Williamson
The NVIDIA BAR0 quirks virtualize the PCI config space mirrors found in device MMIO space. Normally PCI config space is considered a slow path and further optimization is unnecessary, however NVIDIA uses a register here to enable the MSI interrupt to re-trigger. Exiting to QEMU for this MSI-ACK handling can therefore rate limit our interrupt handling. Fortunately the MSI-ACK write is easily detected since the quirk MemoryRegion otherwise has very few accesses, so simply looking for consecutive writes with the same data is sufficient, in this case 10 consecutive writes with the same data and size is arbitrarily chosen. We configure the KVM ioeventfd with data match, so there's no risk of triggering for the wrong data or size, but we do risk that pathological driver behavior might consume all of QEMU's file descriptors, so we cap ourselves to 10 ioeventfds for this purpose. In support of the above, generic ioeventfd infrastructure is added for vfio quirks. This automatically initializes an ioeventfd list per quirk, disables and frees ioeventfds on exit, and allows ioeventfds marked as dynamic to be dropped on device reset. The rationale for this latter feature is that useful ioeventfds may depend on specific driver behavior and since we necessarily place a cap on our use of ioeventfds, a machine reset is a reasonable point at which to assume a new driver and re-profile. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-05vfio/quirks: Add quirk reset callbackAlex Williamson
Quirks can be self modifying, provide a hook to allow them to cleanup on device reset if desired. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-05vfio/quirks: Add common quirk alloc helperAlex Williamson
This will later be used to include list initialization. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>