aboutsummaryrefslogtreecommitdiff
path: root/hw/vfio
AgeCommit message (Collapse)Author
2015-09-23vfio/pci: Add emulated PCI IDsAlex Williamson
Specifying an emulated PCI vendor/device ID can be useful for testing various quirk paths, even though the behavior and functionality of the device with bogus IDs is fully unsupportable. We need to use a uint32_t for the vendor/device IDs, even though the registers themselves are only 16-bit in order to be able to determine whether the value is valid and user set. The same support is added for subsystem vendor/device ID, though these have the possibility of being useful and supported for more than a testing tool. An emulated platform might want to impose their own subsystem IDs or at least hide the physical subsystem ID. Windows guests will often reinstall drivers due to a change in subsystem IDs, something that VM users may want to avoid. Of course careful attention would be required to ensure that guest drivers do not rely on the subsystem ID as a basis for device driver quirks. All of these options are added using the standard experimental option prefix and should not be considered stable. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio/pci: Cache vendor and device IDAlex Williamson
Simplify access to commonly referenced PCI vendor and device ID by caching it on the VFIOPCIDevice struct. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio/pci: Move AMD device specific reset to quirksAlex Williamson
This is just another quirk, for reset rather than affecting memory regions. Move it to our new quirks file. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio/pci: Remove old config window and mirror quirksAlex Williamson
These are now unused. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio/pci: Config mirror quirkAlex Williamson
Re-implement our mirror quirk using the new infrastructure. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio/pci: Config window quirksAlex Williamson
Config windows make use of an address register and a data register. In VGA cards, these are often used to provide real mode code in the BIOS an easy way to access MMIO registers since the window often resides in an I/O port register. When the MMIO register has a mirror of PCI config space, we need to trap those accesses and redirect them to emulated config space. The previous version of this functionality made use of a single MemoryRegion and single match address. This version uses separate MemoryRegions for each of the address and data registers and allows for multiple match addresses. This is useful for Nvidia cards which have two ranges which index into PCI config space. The previous implementation is left for the follow-on patch for a more reviewable diff. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio/pci: Rework RTL8168 quirkAlex Williamson
Another rework of this quirk, this time to update to the new quirk structure. We can handle the address and data registers with separate MemoryRegions and a quirk specific data structure, making the code much more understandable. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio/pci: Cleanup Nvidia 0x3d0 quirkAlex Williamson
The Nvidia 0x3d0 quirk makes use of a two separate registers and gives us our first chance to make use of separate memory regions for each to simplify the code a bit. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio/pci: Cleanup ATI 0x3c3 quirkAlex Williamson
This is an easy quirk that really doesn't need a data structure if its own. We can pass vdev as the opaque data and access to the MemoryRegion isn't required. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio/pci: Foundation for new quirk structureAlex Williamson
VFIOQuirk hosts a single memory region and a fixed set of data fields that try to handle all the quirk cases, but end up making those that don't exactly match really confusing. This patch introduces a struct intended to provide more flexibility and simpler code. VFIOQuirk is stripped to its basics, an opaque data pointer for quirk specific data and a pointer to an array of MemoryRegions with a counter. This still allows us to have common teardown routines, but adds much greater flexibility to support multiple memory regions and quirk specific data structures that are easier to maintain. The existing VFIOQuirk is transformed into VFIOLegacyQuirk, which further patches will eliminate entirely. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio/pci: Cleanup ROM blacklist quirkAlex Williamson
Create a vendor:device ID helper that we'll also use as we rework the rest of the quirks. Re-reading the config entries, even if we get more blacklist entries, is trivial overhead and only incurred during device setup. There's no need to typedef the blacklist structure, it's a static private data type used once. The elements get bumped up to uint32_t to avoid future maintenance issues if PCI_ANY_ID gets used for a blacklist entry (avoiding an actual hardware match). Our test loop is also crying out to be simplified as a for loop. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio/pci: Split quirks to a separate fileAlex Williamson
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio/pci: Extract PCI structures to a separate headerAlex Williamson
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio: Change polarity of our no-mmap optionAlex Williamson
The default should be to allow mmap and new drivers shouldn't need to expose an option or set it to other than the allocation default in their initfn. Take advantage of the experimental flag to change this option to the correct polarity. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio/pci: Make interrupt bypass runtime configurableAlex Williamson
Tracing is more effective when we can completely disable all KVM bypass paths. Make these runtime rather than build-time configurable. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio/pci: Rename MSI/X functions for easier tracingAlex Williamson
This allows vfio_msi* tracing. The MSI/X interrupt tracing is also pulled out of #ifdef DEBUG_VFIO to avoid a recompile for tracing this path. A few cycles to read the message is hardly anything if we're already in QEMU. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio/pci: Rename INTx functions for easier tracingAlex Williamson
Rename functions and tracing callbacks so that we can trace vfio_intx* to see all the INTx related activities. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-23vfio/pci: Cleanup vfio_early_setup_msix() error pathAlex Williamson
With the addition of the Chelsio quirk we have an error path out of vfio_early_setup_msix() that doesn't free the allocated VFIOMSIXInfo struct. This doesn't introduce a leak as it still gets freed in the vfio_put_device() path, but it's complicated and sloppy to rely on that. Restructure to free the allocated data on error and only link it into the vdev on success. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reported-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
2015-09-23vfio/pci: Cleanup RTL8168 quirk and tracingAlex Williamson
There's quite a bit of cleanup that can be done to the RTL8168 quirk, as well as the tracing to prevent a spew of uninteresting accesses for anything else the driver might choose to use the window registers for besides the MSI-X table. There should be no functional change, but it's now possible to get compact and useful traces by enabling vfio_rtl8168_quirk*, ex: vfio_rtl8168_quirk_write 0000:04:00.0 [address]: 0x1f000 vfio_rtl8168_quirk_read 0000:04:00.0 [address]: 0x8001f000 vfio_rtl8168_quirk_read 0000:04:00.0 [data]: 0xfee0100c vfio_rtl8168_quirk_write 0000:04:00.0 [address]: 0x1f004 vfio_rtl8168_quirk_read 0000:04:00.0 [address]: 0x8001f004 vfio_rtl8168_quirk_read 0000:04:00.0 [data]: 0x0 vfio_rtl8168_quirk_write 0000:04:00.0 [address]: 0x1f008 vfio_rtl8168_quirk_read 0000:04:00.0 [address]: 0x8001f008 vfio_rtl8168_quirk_read 0000:04:00.0 [data]: 0x49b1 vfio_rtl8168_quirk_write 0000:04:00.0 [address]: 0x1f00c vfio_rtl8168_quirk_read 0000:04:00.0 [address]: 0x8001f00c vfio_rtl8168_quirk_read 0000:04:00.0 [data]: 0x0 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-09-11typofixes - v4Veres Lajos
Signed-off-by: Veres Lajos <vlajos@gmail.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-09-11trivial: remove trailing newline from error_reportJohn Snow
Minor cleanup. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-09-11maint: remove unused include for dirent.hDaniel P. Berrange
A number of files were including dirent.h but not using any of the functions it provides Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-09-11maint: remove / fix many doubled wordsDaniel P. Berrange
Many source files have doubled words (eg "the the", "to to", and so on). Most of these can simply be removed, but a couple were actual mis-spellings (eg "to to" instead of "to do"). There was even one triple word score "to to to" :-) Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-07-22vfio/pci: Fix bootindexAlex Williamson
bootindex was incorrectly changed to a device Property during the platform code split, resulting in it no longer working. Remove it. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: qemu-stable@nongnu.org # v2.3+
2015-07-22vfio/pci: Fix RTL8168 NIC quirksAlex Williamson
The RTL8168 quirk correctly describes using bit 31 as a signal to mark a latch/completion, but the code mistakenly uses bit 28. This causes the Realtek driver to spin on this register for quite a while, 20k cycles on Windows 7 v7.092 driver. Then it gets frustrated and tries to set the bit itself and spins for another 20k cycles. For some this still results in a working driver, for others not. About the only thing the code really does in its current form is protect the guest from sneaking in writes to the real hardware MSI-X table. The fix is obviously to use bit 31 as we document that we should. The other problem doesn't seem to affect current drivers as nobody seems to use these window registers for writes to the MSI-X table, but we need to use the stored data when a write is triggered, not the value of the current write, which only provides the offset. Note that only the Windows drivers from Realtek seem to use these registers, the Microsoft drivers provided with Windows 8.1 do not access them, nor do Linux in-kernel drivers. Link: https://bugs.launchpad.net/qemu/+bug/1384892 Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: qemu-stable@nongnu.org # v2.1+
2015-07-06vfio/pci : Add pba_offset PCI quirk for Chelsio T5 devicesGabriel Laupre
Fix pba_offset initialization value for Chelsio T5 Virtual Function device. The T5 hardware has a bug in it where it reports a Pending Interrupt Bit Array Offset of 0x8000 for its SR-IOV Virtual Functions instead of the 0x1000 that the hardware actually uses internally. As the hardware doesn't return the correct pba_offset value, add a quirk to instead return a hardcoded value of 0x1000 when a Chelsio T5 VF device is detected. This bug has been fixed in the Chelsio's next chip series T6 but there are no plans to respin the T5 ASIC for this bug. It is just documented in the T5 Errata and left it at that. Signed-off-by: Gabriel Laupre <glaupre@chelsio.com> Reviewed-by: Bandan Das <bsd@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-07-06vfio: Unregister IOMMU notifiers when container is destroyedAlexey Kardashevskiy
On systems with guest visible IOMMU, adding a new memory region onto PCI bus calls vfio_listener_region_add() for every DMA window. This installs a notifier for IOMMU memory regions. The notifier is supposed to be removed vfio_listener_region_del(), however in the case of mixed PHB (emulated + VFIO devices) when last VFIO device is unplugged and container gets destroyed, all existing DMA windows stay alive altogether with the notifiers which are on the linked list which head was in the destroyed container. This unregisters IOMMU memory region notifier when a container is destroyed. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-07-06hw/vfio/platform: add irqfd supportEric Auger
This patch aims at optimizing IRQ handling using irqfd framework. Instead of handling the eventfds on user-side they are handled on kernel side using - the KVM irqfd framework, - the VFIO driver virqfd framework. the virtual IRQ completion is trapped at interrupt controller This removes the need for fast/slow path swap. Overall this brings significant performance improvements. Signed-off-by: Alvise Rigo <a.rigo@virtualopensystems.com> Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Vikram Sethi <vikrams@codeaurora.org> Acked-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-07-06kvm: rename kvm_irqchip_[add,remove]_irqfd_notifier with gsi suffixEric Auger
Anticipating for the introduction of new add/remove functions taking a qemu_irq parameter, let's rename existing ones with a gsi suffix. Signed-off-by: Eric Auger <eric.auger@linaro.org> Tested-by: Vikram Sethi <vikrams@codeaurora.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-07-06vfio: cpu: Use "real" page size APIPeter Crosthwaite
This is system level code, and should only depend on the host page size, not the target page size. Note that HOST_PAGE_SIZE is misleadingly lead and is really aligning to both host and target page size. Hence it's replacement with REAL_HOST_PAGE_SIZE. Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-07-06vfio: fix return type of preadPaolo Bonzini
size_t is an unsigned type, thus the error case is never reached in the below call to pread. If bytes is negative, it will be seen as a very high positive value. Spotted by Coverity. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-06-18vfio: fix build error on CentOS 5.7Leon Alrae
Include linux/vfio.h after sys/ioctl.h, just like in hw/vfio/common.c. Signed-off-by: Leon Alrae <leon.alrae@imgtec.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-id: 1434544500-22405-1-git-send-email-leon.alrae@imgtec.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-06-11hw/vfio/platform: replace g_malloc0_n by g_new0Eric Auger
g_malloc0_n() is introduced since glib-2.24 while QEMU currently requires glib-2.22. This may cause a link error on some distributions. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-06-09hw/vfio/platform: calxeda xgmac deviceEric Auger
The platform device class has become abstract. This patch introduces a calxeda xgmac device that derives from it. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-06-08hw/vfio/platform: add irq assignmentEric Auger
This patch adds the code requested to assign interrupts to a guest. The interrupts are mediated through user handled eventfds only. Signed-off-by: Eric Auger <eric.auger@linaro.org> Tested-by: Vikram Sethi <vikrams@codeaurora.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-06-08hw/vfio/platform: vfio-platform skeletonEric Auger
Minimal VFIO platform implementation supporting register space user mapping but not IRQ assignment. Signed-off-by: Kim Phillips <kim.phillips@linaro.org> Signed-off-by: Eric Auger <eric.auger@linaro.org> Tested-by: Vikram Sethi <vikrams@codeaurora.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-04-30exec: move rcu_read_lock/unlock to address_space_translate callersPaolo Bonzini
Once address_space_translate will be called outside the BQL, the returned MemoryRegion might disappear as soon as the RCU read-side critical section ends. Avoid this by moving the critical section to the callers. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1426684909-95030-3-git-send-email-pbonzini@redhat.com>
2015-04-28vfio-pci: Reset workaround for AMD Bonaire and Hawaii GPUsAlex Williamson
Somehow these GPUs manage not to respond to a PCI bus reset, removing our primary mechanism for resetting graphics cards. The result is that these devices typically work well for a single VM boot. If the VM is rebooted or restarted, the guest driver is not able to init the card from the dirty state, resulting in a blue screen for Windows guests. The workaround is to use a device specific reset. This is not 100% reliable though since it depends on the incoming state of the device, but it substantially improves the usability of these devices in a VM. Credit to Alex Deucher <alexander.deucher@amd.com> for his guidance. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-04-28vfio-pci: Fix error path signAlex Williamson
This is an impossible error path due to the fact that we're reading a kernel provided, rather than user provided link, which will certainly always fit in PATH_MAX. Currently it returns a fixed 26 char path plus %d group number, which typically maxes out at double digits. However, the caller of the initfn certainly expects a less-than zero return value on error, not just a non-zero value. Therefore we should correct the sign here. Reported-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-04-28vfio-pci: Further fix BAR size overflowAlex Williamson
In an analysis by Laszlo, the resulting type of our calculation for the end of the MSI-X table, and thus the start of memory after the table, is uint32_t. We're therefore not correctly preventing the corner case overflow that we intended to fix here where a BAR >=4G could place the MSI-X table to end exactly at the 4G boundary. The MSI-X table offset is defined by the hardware spec to 32bits, so we simply use a cast rather than changing data structure types. This scenario is purely theoretically, typically the MSI-X table is located at the front of the BAR. Reported-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-04-26memory: Replace io_mem_read/write with memory_region_dispatch_read/writePeter Maydell
Rather than retaining io_mem_read/write as simple wrappers around the memory_region_dispatch_read/write functions, make the latter public and change all the callers to use them, since we need to touch all the callsites anyway to add MemTxAttrs and MemTxResult support. Delete io_mem_read and io_mem_write entirely. (All the callers currently pass MEMTXATTRS_UNSPECIFIED and convert the return value back to bool or ignore it.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2015-03-10vfio: Remove superfluous '\n' around error_report()Gonglei
Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-03-09sPAPR: Implement sPAPRPHBClass EEH callbacksGavin Shan
The patch implements sPAPRPHBClass EEH callbacks so that the EEH RTAS requests can be routed to VFIO for further handling. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-02vfio-pci: Enable device request notification supportAlex Williamson
Linux v4.0-rc1 vfio-pci introduced a new virtual interrupt to allow the kernel to request a device from the user. When signaled, QEMU will by default attmempt to hot-unplug the device. This is a one- shot attempt with the expectation that the kernel will continue to poll for the device if it is not returned. Returning the device when requested is the expected standard model of cooperative usage, but we also add an option option to disable this feature. Initially this opt-out is set as an experimental option because we really should honor kernel requests for the device. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-03-02vfio: allow to disable MMAP per device with -x-mmap=off optionSamuel Pitoiset
Disabling MMAP support uses the slower read/write accesses but allows to trace all MMIO accesses, which is not good for performance, but very useful for reverse engineering PCI drivers. This option allows to disable MMAP per device without a compile-time change. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-03-02vfio: Make type1 listener symbols staticAlexey Kardashevskiy
They are not used from anywhere but common.c which is where these are defined so make them static. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-03-02vfio: Add ioctl number to error reportAlexey Kardashevskiy
This makes the error report more informative. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-02-10vfio: Fix debug message compile errorAlexey Kardashevskiy
This fixes a compiler error which occurs if DEBUG_VFIO is defined. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-02-10vfio: Use vfio type1 v2 IOMMU interfaceAlex Williamson
The difference between v1 and v2 is fairly subtle, simply more deterministic behavior for unmaps. The v1 interface allows the user to attempt to unmap sub-regions of previous mappings, returning success with zero size if unable to comply. This was a reflection of the underlying IOMMU API. The v2 interface requires that the user may only unmap fully contained mappings, ie. an unmap cannot intersect or bisect a previous mapping, but may cover multiple mappings. QEMU never made use of the sub-region v1 support anyway, so we can support either v1 or v2. We'll favor v2 since it's newer. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2015-02-10vfio: unmap and free BAR data in instance_finalizePaolo Bonzini
In the case of VFIO, the unrealize callback is too early to munmap the BARs. The munmap must be delayed until memory accesses are complete. To do this, split vfio_unmap_bars in two. The removal step, now called vfio_unregister_bars, remains in vfio_exitfn. The reclamation step is vfio_unmap_bars and is moved to the instance_finalize callback. Similarly, quirk MemoryRegions have to be removed during vfio_unregister_bars, but freeing the data structure must be delayed to vfio_unmap_bars. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>