aboutsummaryrefslogtreecommitdiff
path: root/hw/vfio
AgeCommit message (Collapse)Author
2019-04-27Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190426' ↵Peter Maydell
into staging ppc patch queue 2019-04-26 Here's the first ppc target pull request for qemu-4.1. This has a number of things that have accumulated while qemu-4.0 was frozen. * A number of emulated MMU improvements from Ben Herrenschmidt * Assorted cleanups fro Greg Kurz * A large set of mostly mechanical cleanups from me to make target/ppc much closer to compliant with the modern coding style * Support for passthrough of NVIDIA GPUs using NVLink2 As well as some other assorted fixes. # gpg: Signature made Fri 26 Apr 2019 07:02:19 BST # gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full] # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full] # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full] # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown] # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-4.1-20190426: (36 commits) target/ppc: improve performance of large BAT invalidations ppc/hash32: Rework R and C bit updates ppc/hash64: Rework R and C bit updates ppc/spapr: Use proper HPTE accessors for H_READ target/ppc: Don't check UPRT in radix mode when in HV real mode target/ppc/kvm: Convert DPRINTF to traces target/ppc/trace-events: Fix trivial typo spapr: Drop duplicate PCI swizzle code spapr_pci: Get rid of duplicate code for node name creation target/ppc: Style fixes for translate/spe-impl.inc.c target/ppc: Style fixes for translate/vmx-impl.inc.c target/ppc: Style fixes for translate/vsx-impl.inc.c target/ppc: Style fixes for translate/fp-impl.inc.c target/ppc: Style fixes for translate.c target/ppc: Style fixes for translate_init.inc.c target/ppc: Style fixes for monitor.c target/ppc: Style fixes for mmu_helper.c target/ppc: Style fixes for mmu-hash64.[ch] target/ppc: Style fixes for mmu-hash32.[ch] target/ppc: Style fixes for misc_helper.c ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-04-26spapr: Support NVIDIA V100 GPU with NVLink2Alexey Kardashevskiy
NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver implements special regions for such GPUs and emulates an NVLink bridge. NVLink2-enabled POWER9 CPUs also provide address translation services which includes an ATS shootdown (ATSD) register exported via the NVLink bridge device. This adds a quirk to VFIO to map the GPU memory and create an MR; the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses this to get the MR and map it to the system address space. Another quirk does the same for ATSD. This adds additional steps to sPAPR PHB setup: 1. Search for specific GPUs and NPUs, collect findings in sPAPRPHBState::nvgpus, manage system address space mappings; 2. Add device-specific properties such as "ibm,npu", "ibm,gpu", "memory-block", "link-speed" to advertise the NVLink2 function to the guest; 3. Add "mmio-atsd" to vPHB to advertise the ATSD capability; 4. Add new memory blocks (with extra "linux,memory-usable" to prevent the guest OS from accessing the new memory until it is onlined) and npuphb# nodes representing an NPU unit for every vPHB as the GPU driver uses it for link discovery. This allocates space for GPU RAM and ATSD like we do for MMIOs by adding 2 new parameters to the phb_placement() hook. Older machine types set these to zero. This puts new memory nodes in a separate NUMA node to as the GPU RAM needs to be configured equally distant from any other node in the system. Unlike the host setup which assigns numa ids from 255 downwards, this adds new NUMA nodes after the user configures nodes or from 1 if none were configured. This adds requirement similar to EEH - one IOMMU group per vPHB. The reason for this is that ATSD registers belong to a physical NPU so they cannot invalidate translations on GPUs attached to another NPU. It is guaranteed by the host platform as it does not mix NVLink bridges or GPUs from different NPU in the same IOMMU group. If more than one IOMMU group is detected on a vPHB, this disables ATSD support for that vPHB and prints a warning. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for vfio portions] Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20190312082103.130561-1-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-25Merge tag 's390-ccw-bios-2019-04-12' into s390-next-stagingCornelia Huck
Support for booting from a vfio-ccw passthrough dasd device # gpg: Signature made Fri 12 Apr 2019 01:17:03 PM CEST # gpg: using RSA key 2ED9D774FE702DB5 # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full] # gpg: aka "Thomas Huth <thuth@redhat.com>" [undefined] # gpg: aka "Thomas Huth <huth@tuxfamily.org>" [undefined] # gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown] * tag 's390-ccw-bios-2019-04-12': pc-bios/s390: Update firmware images s390-bios: Use control unit type to find bootable devices s390-bios: Support booting from real dasd device s390-bios: Add channel command codes/structs needed for dasd-ipl s390-bios: Use control unit type to determine boot method s390-bios: Refactor virtio to run channel programs via cio s390-bios: Factor finding boot device out of virtio code path s390-bios: Extend find_dev() for non-virtio devices s390-bios: cio error handling s390-bios: Support for running format-0/1 channel programs s390-bios: ptr2u32 and u32toptr s390-bios: Map low core memory s390-bios: Decouple channel i/o logic from virtio s390-bios: Clean up cio.h s390-bios: decouple common boot logic from virtio s390-bios: decouple cio setup from virtio s390 vfio-ccw: Add bootindex property and IPLB data Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-04-25exec: Introduce qemu_maxrampagesize() and rename qemu_getrampagesize()David Hildenbrand
Rename qemu_getrampagesize() to qemu_minrampagesize(). While at it, properly rename find_max_supported_pagesize() to find_min_backend_pagesize(). s390x is actually interested into the maximum ram pagesize, so introduce and use qemu_maxrampagesize(). Add a TODO, indicating that looking at any mapped memory backends is not 100% correct in some cases. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190417113143.5551-3-david@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-04-18vfio: Report warnings with warn_report(), not error_printf()Markus Armbruster
Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190417190641.26814-8-armbru@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com>
2019-04-12s390 vfio-ccw: Add bootindex property and IPLB dataJason J. Herne
Add bootindex property and iplb data for vfio-ccw devices. This allows us to forward boot information into the bios for vfio-ccw devices. Refactor s390_get_ccw_device() to return device type. This prevents us from having to use messy casting logic in several places. Signed-off-by: Jason J. Herne <jjherne@linux.ibm.com> Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <1554388475-18329-2-git-send-email-jjherne@linux.ibm.com> [thuth: fixed "typedef struct VFIOCCWDevice" build failure with clang] Signed-off-by: Thomas Huth <thuth@redhat.com>
2019-04-03hw/vfio/ccw: avoid taking address members in packed structsDaniel P. Berrangé
The GCC 9 compiler complains about many places in s390 code that take the address of members of the 'struct SCHIB' which is marked packed: hw/vfio/ccw.c: In function ‘vfio_ccw_io_notifier_handler’: hw/vfio/ccw.c:133:15: warning: taking address of packed member of ‘struct SCHIB’ may result in an unaligned pointer value \ [-Waddress-of-packed-member] 133 | SCSW *s = &sch->curr_status.scsw; | ^~~~~~~~~~~~~~~~~~~~~~ hw/vfio/ccw.c:134:15: warning: taking address of packed member of ‘struct SCHIB’ may result in an unaligned pointer value \ [-Waddress-of-packed-member] 134 | PMCW *p = &sch->curr_status.pmcw; | ^~~~~~~~~~~~~~~~~~~~~~ ...snip many more... Almost all of these are just done for convenience to avoid typing out long variable/field names when referencing struct members. We can get most of this convenience by taking the address of the 'struct SCHIB' instead, avoiding triggering the compiler warnings. In a couple of places we copy via a local variable which is a technique already applied elsewhere in s390 code for this problem. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20190329111104.17223-12-berrange@redhat.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-03-22trace-events: Fix attribution of trace points to sourceMarkus Armbruster
Some trace points are attributed to the wrong source file. Happens when we neglect to update trace-events for code motion, or add events in the wrong place, or misspell the file name. Clean up with help of cleanup-trace-events.pl. Same funnies as in the previous commit, of course. Manually shorten its change to linux-user/trace-events to */signal.c. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 20190314180929.27722-6-armbru@redhat.com Message-Id: <20190314180929.27722-6-armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-03-22trace-events: Delete unused trace pointsMarkus Armbruster
Tracked down with cleanup-trace-events.pl. Funnies requiring manual post-processing: * block.c and blockdev.c trace points are in block/trace-events. * hw/block/nvme.c uses the preprocessor to hide its trace point use from cleanup-trace-events.pl. * include/hw/xen/xen_common.h trace points are in hw/xen/trace-events. * net/colo-compare and net/filter-rewriter.c use pseudo trace points colo_compare_udp_miscompare and colo_filter_rewriter_debug to guard debug code. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 20190314180929.27722-5-armbru@redhat.com Message-Id: <20190314180929.27722-5-armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-03-22trace-events: Shorten file names in commentsMarkus Armbruster
We spell out sub/dir/ in sub/dir/trace-events' comments pointing to source files. That's because when trace-events got split up, the comments were moved verbatim. Delete the sub/dir/ part from these comments. Gets rid of several misspellings. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20190314180929.27722-3-armbru@redhat.com Message-Id: <20190314180929.27722-3-armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-03-12Merge remote-tracking branch 'remotes/awilliam/tags/vfio-updates-20190311.0' ↵Peter Maydell
into staging VFIO updates 2019-03-11 - Resolution support for mdev displays supporting EDID interface (Gerd Hoffmann) # gpg: Signature made Mon 11 Mar 2019 19:17:39 GMT # gpg: using RSA key 239B9B6E3BB08B22 # gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>" [full] # gpg: aka "Alex Williamson <alex@shazbot.org>" [full] # gpg: aka "Alex Williamson <alwillia@redhat.com>" [full] # gpg: aka "Alex Williamson <alex.l.williamson@gmail.com>" [full] # Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B 8A90 239B 9B6E 3BB0 8B22 * remotes/awilliam/tags/vfio-updates-20190311.0: vfio/display: delay link up event vfio/display: add xres + yres properties vfio/display: add edid support. Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-12vfio: Make vfio_get_region_info_cap publicAlexey Kardashevskiy
This makes vfio_get_region_info_cap() to be used in quirks. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20190307050518.64968-3-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12vfio/spapr: Rename local systempagesize variableAlexey Kardashevskiy
The "systempagesize" name suggests that it is the host system page size while it is the smallest page size of memory backing the guest RAM so let's rename it to stop confusion. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20190227085149.38596-4-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12vfio/spapr: Fix indirect levels calculationAlexey Kardashevskiy
The current code assumes that we can address more bits on a PCI bus for DMA than we really can but there is no way knowing the actual limit. This makes a better guess for the number of levels and if the kernel fails to allocate that, this increases the level numbers till succeeded or reached the 64bit limit. This adds levels to the trace point. This may cause the kernel to warn about failed allocation: [65122.837458] Failed to allocate a TCE memory, level shift=28 which might happen if MAX_ORDER is not large enough as it can vary: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/Kconfig?h=v5.0-rc2#n727 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20190227085149.38596-3-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-11vfio/display: delay link up eventGerd Hoffmann
Kick the display link up event with a 0.1 sec delay, so the guest has a chance to notice the link down first. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> [update for redefined macro] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-03-11vfio/display: add xres + yres propertiesGerd Hoffmann
This allows configure the display resolution which the vgpu should use. The information will be passed to the guest using EDID, so the mdev driver must support the vfio edid region for this to work. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-03-11vfio/display: add edid support.Gerd Hoffmann
This patch adds EDID support to the vfio display (aka vgpu) code. When supported by the mdev driver qemu will generate a EDID blob and pass it on using the new vfio edid region. The EDID blob will be updated on UI changes (i.e. window resize), so the guest can adapt. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> [remove control flow via macro, use unsigned format specifier] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-03-11vfio-pci: enable by defaultPaolo Bonzini
CONFIG_VFIO_PCI was not "default y" - and once you do that, it is also important to disable it if PCI is not there. Reported-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-07vfio: express vfio dependencies with KconfigPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-07build: switch to KconfigPaolo Bonzini
The make_device_config.sh script is replaced by minikconf, which is modified to support the same command line as its predecessor. The roots of the parsing are default-configs/*.mak, Kconfig.host and hw/Kconfig. One difference with make_device_config.sh is that all symbols have to be defined in a Kconfig file, including those coming from the configure script. This is the reason for the Kconfig.host file introduced in the previous patch. Whenever a file in default-configs/*.mak used $(...) to refer to a config-host.mak symbol, this is replaced by a Kconfig dependency; this part must be done already in this patch for bisectability. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Acked-by: Thomas Huth <thuth@redhat.com> Message-Id: <20190123065618.3520-28-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-07kconfig: introduce kconfig filesPaolo Bonzini
The Kconfig files were generated mostly with this script: for i in `grep -ho CONFIG_[A-Z0-9_]* default-configs/* | sort -u`; do set fnord `git grep -lw $i -- 'hw/*/Makefile.objs' ` shift if test $# = 1; then cat >> $(dirname $1)/Kconfig << EOF config ${i#CONFIG_} bool EOF git add $(dirname $1)/Kconfig else echo $i $* fi done sed -i '$d' hw/*/Kconfig for i in hw/*; do if test -d $i && ! test -f $i/Kconfig; then touch $i/Kconfig git add $i/Kconfig fi done Whenever a symbol is referenced from multiple subdirectories, the script prints the list of directories that reference the symbol. These symbols have to be added manually to the Kconfig files. Kconfig.host and hw/Kconfig were created manually. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20190123065618.3520-27-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-04Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20190304' into stagingPeter Maydell
s390x updates: - tcg: support the floating-point extension facility - vfio-ap: support hot(un)plug of vfio-ap device - fixes + cleanups # gpg: Signature made Mon 04 Mar 2019 11:55:39 GMT # gpg: using RSA key C3D0D66DC3624FF6A8C018CEDECF6B93C6F02FAF # gpg: issuer "cohuck@redhat.com" # gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>" [unknown] # gpg: aka "Cornelia Huck <huckc@linux.vnet.ibm.com>" [full] # gpg: aka "Cornelia Huck <cornelia.huck@de.ibm.com>" [full] # gpg: aka "Cornelia Huck <cohuck@kernel.org>" [unknown] # gpg: aka "Cornelia Huck <cohuck@redhat.com>" [unknown] # Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0 18CE DECF 6B93 C6F0 2FAF * remotes/cohuck/tags/s390x-20190304: (27 commits) s390x: Add floating-point extension facility to "qemu" cpu model s390x/tcg: Handle all rounding modes overwritten by BFP instructions s390x/tcg: Implement rounding mode and XxC for LOAD ROUNDED s390x/tcg: Implement XxC and checks for most FP instructions s390x/tcg: Prepare for IEEE-inexact-exception control (XxC) s390x/tcg: Refactor saving/restoring the bfp rounding mode s390x/tcg: Check for exceptions in SET BFP ROUNDING MODE s390x/tcg: Handle SET FPC AND LOAD FPC 3-bit BFP rounding modes s390x/tcg: Fix simulated-IEEE exceptions s390x/tcg: Refactor SET FPC AND SIGNAL handling s390x/tcg: Hide IEEE underflows in some scenarios s390x/tcg: Fix parts of IEEE exception handling s390x/tcg: Factor out conversion of softfloat exceptions s390x/tcg: Fix rounding from float128 to uint64_t/uint32_t s390x/tcg: Fix TEST DATA CLASS instructions s390x/tcg: Implement LOAD COUNT TO BLOCK BOUNDARY s390x/tcg: Implement LOAD LENGTHENED short HFP to long HFP s390x/tcg: Factor out gen_addi_and_wrap_i64() from get_address() s390x/tcg: Factor out vec_full_reg_offset() s390x/tcg: Clarify terminology in vec_reg_offset() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-04Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell
pci, pc, virtio: fixes, cleanups, tests Lots of work on tests: BiosTablesTest UEFI app, vhost-user testing for non-Linux hosts. Misc cleanups and fixes all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Fri 22 Feb 2019 15:51:40 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (26 commits) pci: Sanity test minimum downstream LNKSTA hw/smbios: fix offset of type 3 sku field pci: Move NVIDIA vendor id to the rest of ids virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size virtio-balloon: Use ram_block_discard_range() instead of raw madvise() virtio-balloon: Rework ballon_page() interface virtio-balloon: Corrections to address verification virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate i386/kvm: ignore masked irqs when update msi routes contrib/vhost-user-blk: fix the compilation issue Revert "contrib/vhost-user-blk: fix the compilation issue" pc-dimm: use same mechanism for [get|set]_addr tests/data: introduce "uefi-boot-images" with the "bios-tables-test" ISOs tests/uefi-test-tools: add build scripts tests: introduce "uefi-test-tools" with the BiosTablesTest UEFI app roms: build the EfiRom utility from the roms/edk2 submodule roms: add the edk2 project as a git submodule vhost-user-test: create a temporary directory per TestServer vhost-user-test: small changes to init_hugepagefs vhost-user-test: create a main loop per TestServer ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-04s390x/vfio-ap: Implement hot plug/unplug of vfio-ap deviceTony Krowiak
Introduces hot plug/unplug support for the vfio-ap device. To hot plug a vfio-ap device using the QEMU device_add command: (qemu) device_add vfio-ap,sysfsdev=$path-to-mdev Where $path-to-mdev is the absolute path to the mediated matrix device to which AP resources to be used by the guest have been assigned. A vfio-ap device can be hot plugged only if: 1. A vfio-ap device has not been attached to the virtual machine's ap-bus via the QEMU command line or a prior hot plug action. 2. The guest was started with the CPU model feature for AP enabled (e.g., -cpu host,ap=on) To hot unplug a vfio-ap device using the QEMU device_del command: (qemu) device_del vfio-ap,sysfsdev=$path-to-mdev Where $path-to-mdev is the absolute path to the mediated matrix device specified when the vfio-ap device was attached to the virtual machine's ap-bus. A vfio-ap device can be hot unplugged only if: 1. A vfio-ap device has been attached to the virtual machine's ap-bus via the QEMU command line or a prior hot plug action. 2. The guest was started with the CPU model feature for AP enabled (e.g., -cpu host,ap=on) Please note that a hot plug handler is not necessary for the vfio-ap device because the AP matrix configuration for the guest is performed by the kernel device driver when the vfio-ap device is realized. The vfio-ap device represents a VFIO mediated device created in the host sysfs for use by a guest. The mdev device is configured with an AP matrix (i.e., adapters and domains) via its sysfs attribute interfaces prior to starting the guest or plugging a vfio-ap device in. When the device is realized, a file descriptor is opened on the mdev device which results in a callback to the vfio_ap kernel device driver. The device driver then configures the AP matrix in the guest's SIE state description from the AP matrix assigned via the mdev device's sysfs interfaces. The AP devices will be created for the guest when the AP bus running on the guest subsequently performs its periodic scan for AP devices. The qdev_simple_device_unplug_cb() callback function is used for the same reaons; namely, the vfio_ap kernel device driver will perform the AP resource de-configuration for the guest when the vfio-ap device is unplugged. When the vfio-ap device is unrealized, the mdev device file descriptor is closed which results in a callback to the vfio_ap kernel device driver. The device driver then clears the AP matrix configuration in the guest's SIE state description and resets all of the affected queues. The AP devices created for the guest will be removed when the AP bus running on the guest subsequently performs its periodic scan and finds there are no longer any AP resources assigned to the guest. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Pierre Morel <pmorel@linux.ibm.com> Message-Id: <1550519397-25359-2-git-send-email-akrowiak@linux.ibm.com> [CH: adapt to changed qbus_set_hotplug_handler() signature] Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2019-02-22pci: Move NVIDIA vendor id to the rest of idsAlexey Kardashevskiy
sPAPR code will use it too so move it from VFIO to the common code. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190214051440.59167-1-aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-02-21hw/vfio/common: Refactor container initializationEric Auger
We introduce the vfio_init_container_type() helper. It computes the highest usable iommu type and then set the container and the iommu type. Its usage in vfio_connect_container() makes the code ready for addition of new iommu types. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-21vfio/common: Work around kernel overflow bug in DMA unmapAlex Williamson
A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which adds a test for address space wrap-around in the vfio DMA unmap path. Unfortunately due to overflow, the kernel detects an unmap of the last page in the 64-bit address space as a wrap-around. In QEMU, a Q35 guest with VT-d emulation and guest IOMMU enabled will attempt to make such an unmap request during VM system reset, triggering an error: qemu-kvm: VFIO_UNMAP_DMA: -22 qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef00000, 0xffffffff01100000) = -22 (Invalid argument) Here the IOVA start address (0xfef00000) and the size parameter (0xffffffff01100000) add to exactly 2^64, triggering the bug. A kernel fix is queued for the Linux v5.0 release to address this. This patch implements a workaround to retry the unmap, excluding the final page of the range when we detect an unmap failing which matches the requirements for this issue. This is expected to be a safe and complete workaround as the VT-d address space does not extend to the full 64-bit space and therefore the last page should never be mapped. This workaround can be removed once all kernels with this bug are sufficiently deprecated. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1662291 Reported-by: Pei Zhang <pezhang@redhat.com> Debugged-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-02-05hw/vfio/Makefile.objs: Create new CONFIG_* variables for VFIO core and PCIPaolo Bonzini
Make hw/vfio configurable and add new CONFIG_VFIO_* to the default-configs/s390x*-softmmu.mak. This allow a finer-grain selection of the various VFIO backends. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20190202072456.6468-28-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-05vfio: move conditional up to hw/Makefile.objsPaolo Bonzini
Instead of wrapping the entire Makefile.objs with an ifeq/endif, just include the directory only for Linux. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20190202072456.6468-4-yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-24trace: forbid use of %m in trace event format stringsDaniel P. Berrangé
The '%m' format instructs glibc's printf()/syslog() implementation to insert the contents of strerror(errno). Since this is a glibc extension it should generally be avoided in QEMU due to need for portability to a variety of platforms. Even though vfio is Linux-only code that could otherwise use "%m", it must still be avoided in trace-events files because several of the backends do not use the format string and so this error information is invisible to them. The errno string value should be given as an explicit trace argument instead, making it accessible to all backends. This also allows it to work correctly with future patches that use the format string with systemtap's simple printf code. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-id: 20190123120016.4538-4-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-01-11qemu/queue.h: typedef QTAILQ headsPaolo Bonzini
This will be needed when we change the QTAILQ head and elem structs to unions. However, it is also consistent with the usage elsewhere in QEMU for other list head structs (see for example FsMountList). Note that most QTAILQs only need their name in order to do backwards walks. Those do not break with the struct->union change, and anyway the change will also remove the need to name heads when doing backwards walks, so those are not touched here. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11vfio: make vfio_address_spaces staticPaolo Bonzini
It is not used outside hw/vfio/common.c, so it does not need to be extern. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-21Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell
pci, pc, virtio: fixes, features VTD fixes IR and split irqchip are now the default for Q35 ACPI refactoring hotplug refactoring new names for virtio devices multiple pcie link width/speeds PCI fixes Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Thu 20 Dec 2018 18:26:03 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (44 commits) x86-iommu: turn on IR by default if proper x86-iommu: switch intr_supported to OnOffAuto type q35: set split kernel irqchip as default pci: Adjust PCI config limit based on bus topology spapr_pci: perform unplug via the hotplug handler pci/shpc: perform unplug via the hotplug handler pci: Reuse pci-bridge hotplug handler handlers for pcie-pci-bridge pci/pcie: perform unplug via the hotplug handler pci/pcihp: perform unplug via the hotplug handler pci/pcihp: overwrite hotplug handler recursively from the start pci/pcihp: perform check for bus capability in pre_plug handler s390x/pci: rename hotplug handler callbacks pci/shpc: rename hotplug handler callbacks pci/pcie: rename hotplug handler callbacks hw/i386: Remove deprecated machines pc-0.10 and pc-0.11 hw: acpi: Remove AcpiRsdpDescriptor and fix tests hw: acpi: Export and share the ARM RSDP build hw: arm: Support both legacy and current RSDP build hw: arm: Convert the RSDP build to the buid_append_foo() API hw: arm: Carry RSDP specific data through AcpiRsdpData ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-12-20Clean up includesMarkus Armbruster
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes, with the changes to the following files manually reverted: contrib/libvhost-user/libvhost-user-glib.h contrib/libvhost-user/libvhost-user.c contrib/libvhost-user/libvhost-user.h linux-user/mips64/cpu_loop.c linux-user/mips64/signal.c linux-user/sparc64/cpu_loop.c linux-user/sparc64/signal.c linux-user/x86_64/cpu_loop.c linux-user/x86_64/signal.c target/s390x/gen-features.c tests/migration/s390x/a-b-bios.c tests/test-rcu-simpleq.c tests/test-rcu-tailq.c Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20181204172535.2799-1-armbru@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Acked-by: Yuval Shaia <yuval.shaia@oracle.com> Acked-by: Viktor Prutyanov <viktor.prutyanov@phystech.edu>
2018-12-19vfio/pci: Remove PCIe Link Status emulationAlex Williamson
Now that the downstream port will virtually negotiate itself to the link status of the downstream device, we can remove this emulation. It's not clear that it was every terribly useful anyway. Tested-by: Geoffrey McRae <geoff@hostfission.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-19pcie: Create enums for link speed and widthAlex Williamson
In preparation for reporting higher virtual link speeds and widths, create enums and macros to help us manage them. Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Tested-by: Geoffrey McRae <geoff@hostfission.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-12vfio-ap: flag as compatible with balloonCornelia Huck
vfio-ap devices do not pin any pages in the host. Therefore, they are compatible with memory ballooning. Flag them as compatible, so both vfio-ap and a balloon can be used simultaneously. Cc: qemu-stable@nongnu.org Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-11-05s390x/vfio-ap: report correct errorCornelia Huck
If ioctl(..., VFIO_DEVICE_RESET) fails, we want to report errno instead of ret (which is always -1 on error). Fixes Coverity issue CID 1396176. Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-10-19vfio: Clean up error reporting after previous commitMarkus Armbruster
The previous commit changed vfio's warning messages from vfio warning: DEV-NAME: Could not frobnicate to warning: vfio DEV-NAME: Could not frobnicate To match this change, change error messages from vfio error: DEV-NAME: On fire to vfio DEV-NAME: On fire Note the loss of "error". If we think marking error messages that way is a good idea, we should mark *all* error messages, i.e. make error_report() print it. Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20181017082702.5581-7-armbru@redhat.com>
2018-10-19vfio: Use warn_report() & friends to report warningsMarkus Armbruster
The vfio code reports warnings like error_report(WARN_PREFIX "Could not frobnicate", DEV-NAME); where WARN_PREFIX is defined so the message comes out as vfio warning: DEV-NAME: Could not frobnicate This usage predates the introduction of warn_report() & friends in commit 97f40301f1d. It's time to convert to that interface. Since these functions already prefix the message with "warning: ", replace WARN_PREFIX by VFIO_MSG_PREFIX, so the messages come out like warning: vfio DEV-NAME: Could not frobnicate The next commit will replace ERR_PREFIX. Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20181017082702.5581-6-armbru@redhat.com>
2018-10-19error: Fix use of error_prepend() with &error_fatal, &error_abortMarkus Armbruster
From include/qapi/error.h: * Pass an existing error to the caller with the message modified: * error_propagate(errp, err); * error_prepend(errp, "Could not frobnicate '%s': ", name); Fei Li pointed out that doing error_propagate() first doesn't work well when @errp is &error_fatal or &error_abort: the error_prepend() is never reached. Since I doubt fixing the documentation will stop people from getting it wrong, introduce error_propagate_prepend(), in the hope that it lures people away from using its constituents in the wrong order. Update the instructions in error.h accordingly. Convert existing error_prepend() next to error_propagate to error_propagate_prepend(). If any of these get reached with &error_fatal or &error_abort, the error messages improve. I didn't check whether that's the case anywhere. Cc: Fei Li <fli@suse.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20181017082702.5581-2-armbru@redhat.com>
2018-10-15vfio-pci: make vfio-pci device more QOM conventionalLi Qiang
Define a TYPE_VFIO_PCI and drop DO_UPCAST. Signed-off-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-10-15vfio/platform: Make the vfio-platform device non-abstractEric Auger
Up to now the vfio-platform device has been abstract and could not be instantiated. The integration of a new vfio platform device required creating a dummy derived device which only set the compatible string. Following the few vfio-platform device integrations we have seen the actual requested adaptation happens on device tree node creation (sysbus-fdt). Hence remove the abstract setting, and read the list of compatible values from sysfs if not set by a derived device. Update the amd-xgbe and calxeda-xgmac drivers to fill in the number of compatible values, as there can now be more than one. Note that sysbus-fdt does not support the instantiation of the vfio-platform device yet. Signed-off-by: Eric Auger <eric.auger@redhat.com> [geert: Rebase, set user_creatable=true, use compatible values in sysfs instead of user-supplied manufacturer/model options, reword] Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-10-15hw/vfio/display: add ramfb supportGerd Hoffmann
So we have a boot display when using a vgpu as primary display. ramfb depends on a fw_cfg file. fw_cfg files can not be added and removed at runtime, therefore a ramfb-enabled vfio device can't be hotplugged. Add a nohotplug variant of the vfio-pci device (as child class). Add the ramfb property to the nohotplug variant only. So to enable the vgpu display with boot support use this: -device vfio-pci-nohotplug,display=on,ramfb=on,sysfsdev=... Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-10-12s390x/vfio: ap: Introduce VFIO AP deviceTony Krowiak
Introduces a VFIO based AP device. The device is defined via the QEMU command line by specifying: -device vfio-ap,sysfsdev=<path-to-mediated-matrix-device> There may be only one vfio-ap device configured for a guest. The mediated matrix device is created by the VFIO AP device driver by writing a UUID to a sysfs attribute file (see docs/vfio-ap.txt). The mediated matrix device will be named after the UUID. Symbolic links to the $uuid are created in many places, so the path to the mediated matrix device $uuid can be specified in any of the following ways: /sys/devices/vfio_ap/matrix/$uuid /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/devices/$uuid /sys/bus/mdev/devices/$uuid /sys/bus/mdev/drivers/vfio_mdev/$uuid When the vfio-ap device is realized, it acquires and opens the VFIO iommu group to which the mediated matrix device is bound. This causes a VFIO group notification event to be signaled. The vfio_ap device driver's group notification handler will get called at which time the device driver will configure the the AP devices to which the guest will be granted access. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Tested-by: Pierre Morel <pmorel@linux.ibm.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Tested-by: Pierre Morel <pmorel@linux.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <20181010170309.12045-6-akrowiak@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> [CH: added missing g_free and device category] Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2018-09-24qemu-error: add {error, warn}_report_once_condCornelia Huck
Add two functions to print an error/warning report once depending on a passed-in condition variable and flip it if printed. This is useful if you want to print a message not once-globally, but e.g. once-per-device. Inspired by warn_once() in hw/vfio/ccw.c, which has been replaced with warn_report_once_cond(). Signed-off-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20180830145902.27376-2-cohuck@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Function comments reworded] Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-08-23vfio/pci: Fix failure to close file descriptor on errorAlex Williamson
A new error path fails to close the device file descriptor when triggered by a ballooning incompatibility within the group. Fix it. Fixes: 238e91728503 ("vfio/ccw/pci: Allow devices to opt-in for ballooning") Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-23vfio/pci: Handle subsystem realpath() returning NULLAlex Williamson
Fix error reported by Coverity where realpath can return NULL, resulting in a segfault in strcmp(). This should never happen given that we're working through regularly structured sysfs paths, but trivial enough to easily avoid. Fixes: 238e91728503 ("vfio/ccw/pci: Allow devices to opt-in for ballooning") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-21vfio/spapr: Allow backing bigger guest IOMMU pages with smaller physical pagesAlexey Kardashevskiy
At the moment the PPC64/pseries guest only supports 4K/64K/16M IOMMU pages and POWER8 CPU supports the exact same set of page size so so far things worked fine. However POWER9 supports different set of sizes - 4K/64K/2M/1G and the last two - 2M and 1G - are not even allowed in the paravirt interface (RTAS DDW) so we always end up using 64K IOMMU pages, although we could back guest's 16MB IOMMU pages with 2MB pages on the host. This stores the supported host IOMMU page sizes in VFIOContainer and uses this later when creating a new DMA window. This uses the system page size (64k normally, 2M/16M/1G if hugepages used) as the upper limit of the IOMMU pagesize. This changes the type of @pagesize to uint64_t as this is what memory_region_iommu_get_min_page_size() returns and clz64() takes. There should be no behavioral changes on platforms other than pseries. The guest will keep using the IOMMU page size selected by the PHB pagesize property as this only changes the underlying hardware TCE table granularity. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-17vfio/ccw/pci: Allow devices to opt-in for ballooningAlex Williamson
If a vfio assigned device makes use of a physical IOMMU, then memory ballooning is necessarily inhibited due to the page pinning, lack of page level granularity at the IOMMU, and sufficient notifiers to both remove the page on balloon inflation and add it back on deflation. However, not all devices are backed by a physical IOMMU. In the case of mediated devices, if a vendor driver is well synchronized with the guest driver, such that only pages actively used by the guest driver are pinned by the host mdev vendor driver, then there should be no overlap between pages available for the balloon driver and pages actively in use by the device. Under these conditions, ballooning should be safe. vfio-ccw devices are always mediated devices and always operate under the constraints above. Therefore we can consider all vfio-ccw devices as balloon compatible. The situation is far from straightforward with vfio-pci. These devices can be physical devices with physical IOMMU backing or mediated devices where it is unknown whether a physical IOMMU is in use or whether the vendor driver is well synchronized to the working set of the guest driver. The safest approach is therefore to assume all vfio-pci devices are incompatible with ballooning, but allow user opt-in should they have further insight into mediated devices. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>