aboutsummaryrefslogtreecommitdiff
path: root/hw/ppc/spapr_pci.c
AgeCommit message (Collapse)Author
2015-10-23spapr_pci: Allow VFIO devices to work on the normal PCI host bridgeDavid Gibson
The core VFIO infrastructure more or less allows VFIO devices to work on any normal guest PCI host bridge (PHB) without extra logic. However, the "spapr-pci-host-bridge" device (as opposed to the special "spapr-pci-vfio-host-bridge" device) breaks this by using a partially KVM accelerated implementation of the guest kernel IOMMU which won't work with VFIO devices, without additional kernel support. This patch allows VFIO devices to work on the spapr-pci-host-bridge, by having it switch off KVM TCE acceleration when a VFIO device is added to the PHB (either on startup, or by hotplug). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2015-10-23spapr_pci: Allow PCI host bridge DMA window to be configuredDavid Gibson
At present the PCI host bridge (PHB) for the pseries machine type has a fixed DMA window from 0..1GB (in PCI address space) which is mapped to real memory via the PAPR paravirtualized IOMMU. For better support of VFIO devices, we're going to want to allow for different configurations of the DMA window. Eventually we'll want to allow the guest itself to reconfigure the window via the PAPR dynamic DMA window interface, but as a preliminary this patch allows the user to reconfigure the window with new properties on the PHB device. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2015-09-23sPAPR: Revert don't enable EEH on emulated PCI devicesGavin Shan
This reverts commit 7cb18007 ("sPAPR: Don't enable EEH on emulated PCI devices") as rtas_ibm_set_eeh_option() isn't the right place to check if there has the corresponding PCI device for the input address, which can be PE address, not PCI device address. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-09-23spapr: Support hotplug by specifying DRC countBharata B Rao
Support hotplug identifier type RTAS_LOG_V6_HP_ID_DRC_COUNT that allows hotplugging of DRCs by specifying the DRC count. While we are here, rename spapr_hotplug_req_add_event() to spapr_hotplug_req_add_by_index() spapr_hotplug_req_remove_event() to spapr_hotplug_req_remove_by_index() so that they match with spapr_hotplug_req_add_by_count(). Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-09-23spapr_pci: fix device tree props for MSI/MSI-XMichael Roth
PAPR requires ibm,req#msi and ibm,req#msi-x to be present in the device node to define the number of msi/msi-x interrupts the device supports, respectively. Currently we have ibm,req#msi-x hardcoded to a non-sensical constant that happens to be 2, and are missing ibm,req#msi entirely. The result of that is that msi-x capable devices get limited to 2 msi-x interrupts (which can impact performance), and msi-only devices likely wouldn't work at all. Additionally, if devices expect a minimum that exceeds 2, the guest driver may fail to load entirely. SLOF still owns the generation of these properties at boot-time (although other device properties have since been offloaded to QEMU), but for hotplugged devices we rely on the values generated by QEMU and thus hit the limitations above. Fix this by generating these properties in QEMU as expected by guests. In the future it may make sense to modify SLOF to pass through these values directly as we do with other props since we're duplicating SLOF code. Cc: qemu-ppc@nongnu.org Cc: qemu-stable@nongnu.org Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-09-23sPAPR: Introduce rtas_ldq()Gavin Shan
This introduces rtas_ldq() to load 64-bits parameter from continuous two 4-bytes memory chunk of RTAS parameter buffer, to simplify the code. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-09-23spapr: Make ibm, change-msi respect 3 return valuesSam Bobroff
Currently, rtas_ibm_change_msi() always returns four values even if less are specified. Correct this by only returning the fourth parameter if it was requested. This is specified by PAPR. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-09-11maint: avoid useless "if (foo) free(foo)" patternMarkus Armbruster
My Coccinelle semantic patch finds a few more, because it also fixes up the equally pointless conditional if (foo) { free(foo); foo = NULL; } Result (feel free to squash it into your patch): Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-07-07sPAPR: Don't enable EEH on emulated PCI devicesGavin Shan
There might have emulated PCI devices, together with VFIO PCI devices under one PHB. The EEH capability shouldn't enabled on emulated PCI devices. The patch returns error when enabling EEH capability on emulated PCI devices by RTAS call "ibm,set-eeh-option". Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07spapr_pci: drop redundant args in spapr_[populate, create]_pci_child_dtNikunj A Dadhania
* phb_index is not being used and if required can be obtained from sphb * use helper to get drc_index in spapr_populate_pci_child_dt() * Check if drc_index is zero Suggested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07spapr_pci: populate ibm,loc-codeNikunj A Dadhania
Each hardware instance has a platform unique location code. The OF device tree that describes a part of a hardware entity must include the “ibm,loc-code” property with a value that represents the location code for that hardware entity. Populate ibm,loc-code. 1) PCI passthru devices need to identify with its own ibm,loc-code available on the host. In failure cases use: vfio_<name>:<phb-index>:<bus>:<slot>.<fn> 2) Emulated devices encode as following: qemu_<name>:<phb-index>:<bus>:<slot>.<fn> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07spapr_pci: enumerate and add PCI device treeNikunj A Dadhania
All the PCI enumeration and device node creation was off-loaded to SLOF. With PCI hotplug support, code needed to be added to add device node. This creates multiple copy of the code one in SLOF and other in hotplug code. To unify this, the patch adds the pci device node creation in Qemu. For backward compatibility, a flag "qemu,phb-enumerated" is added to the phb, suggesting to SLOF to not do device node creation. Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> [ Squashed Michael's drc_index changes ] Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07Revert "hw/ppc/spapr_pci.c: Avoid functions not in glib 2.12 ↵Markus Armbruster
(g_hash_table_iter_*)" Since we now require GLib 2.22+ (commit f40685c), we don't have to work around lack of g_hash_table_iter_init() & friends anymore. This reverts commit f8833a37c0c6b22ddd57b45e48cfb0f97dbd5af4. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07spapr_pci: set device node unit address as hexNikunj A Dadhania
Device node names should encode the unit address as hex, while the code was encodind it as integers. Also, use FDT_NAME_MAX macro for allocating and composing the name. Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07spapr_pci: encode class code including Prog IF registerNikunj A Dadhania
Current code missed the Prog IF register. All Class Code, Subclass, and Prog IF registers are needed to identify the accurate device type. For example: USB controllers use the PROG IF for denoting: USB FullSpeed, HighSpeed or SuperSpeed. Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07spapr_pci: encode missing 64-bit memory address spaceNikunj A Dadhania
The properties reg/assigned-resources need to encode 64-bit memory address space as part of phys.hi dword. 00 if configuration space 01 if IO region, 10 if 32-bit MEM region 11 if 64-bit MEM region Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07spapr: Merge sPAPREnvironment into sPAPRMachineStateDavid Gibson
The code for -machine pseries maintains a global sPAPREnvironment structure which keeps track of general state information about the guest platform. This predates the existence of the MachineState structure, but performs basically the same function. Now that we have the generic MachineState, fold sPAPREnvironment into sPAPRMachineState, the pseries specific subclass of MachineState. This is mostly a matter of search and replace, although a few places which relied on the global spapr variable are changed to find the structure via qdev_get_machine(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-22qerror: Clean up QERR_ macros to expand into a single stringMarkus Armbruster
These macros expand into error class enumeration constant, comma, string. Unclean. Has been that way since commit 13f59ae. The error class is always ERROR_CLASS_GENERIC_ERROR since the previous commit. Clean up as follows: * Prepend every use of a QERR_ macro by ERROR_CLASS_GENERIC_ERROR, and delete it from the QERR_ macro. No change after preprocessing. * Rewrite error_set(ERROR_CLASS_GENERIC_ERROR, ...) into error_setg(...). Again, no change after preprocessing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-06-03spapr_pci: emit hotplug add/remove events during hotplugTyrel Datwyler
This uses extension of existing EPOW interrupt/event mechanism to notify userspace tools like librtas/drmgr to handle in-guest configuration/cleanup operations in response to device_add/device_del. Userspace tools that don't implement this extension will need to be run manually in response/advance of device_add/device_del, respectively. Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03spapr_pci: enable basic hotplug operationsMichael Roth
This enables hotplug of PCI devices to a PHB. Upon hotplug we generate the OF-nodes required by PAPR specification and IEEE 1275-1994 "PCI Bus Binding to Open Firmware" for the device. We associate the corresponding FDT for these nodes with the DRC corresponding to the slot, which will be fetched via ibm,configure-connector RTAS calls by the guest as described by PAPR specification. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03spapr_pci: create DRConnectors for each PCI slot during PHB realizeMichael Roth
These will be used to support hotplug/unplug of PCI devices to the PCI bus associated with a particular PHB. We also set up device-tree properties in each PHBs initial FDT to describe the DRCs associated with them. This advertises to guests that each PHB is DR-capable device with physical hotpluggable slots, each managed by the corresponding DRC. This is necessary for allowing hotplugging of devices to it later via bus rescan or guest rpaphp hotplug module. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridgeMichael Roth
This option enables/disables PCI hotplug for a particular PHB. Also add machine compatibility code to disable it by default for machine types prior to pseries-2.4. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [agraf: move commas for compat fields] Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03spapr_pci: Rework device-tree renderingAlexey Kardashevskiy
This replaces object_child_foreach() and callback with existing SPAPR_PCI_LIOBN() and spapr_tce_find_by_liobn() to make the code easier to read. This is a mechanical patch so no behaviour change is expected. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03spapr_pci: Make find_phb()/find_dev() publicAlexey Kardashevskiy
This makes find_phb()/find_dev() public and changed its names to spapr_pci_find_phb()/spapr_pci_find_dev() as they are going to be used from other parts of QEMU such as VFIO DDW (dynamic DMA window) or VFIO PCI error injection or VFIO EEH handling - in all these cases there are RTAS calls which are addressed to BUID+config_addr in IEEE1275 format. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03spapr_pci: Define default DMA window size as a macroAlexey Kardashevskiy
This gets rid of a magic constant describing the default DMA window size for an emulated PHB. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03spapr_pci: Introduce a liobn number generating macrosAlexey Kardashevskiy
We are going to have multiple DMA windows per PHB and we want them to migrate so we need a predictable way of assigning LIOBNs. This introduces a macro which makes up a LIOBN from fixed prefix, PHB index (unique PHB id) and window number. This introduces a SPAPR_PCI_DMA_WINDOW_NUM() to know the window number from LIOBN. It is used to distinguish the default 32bit windows from dynamic windows and avoid picking default DMA window properties from a wrong TCE table. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03spapr_pci: Fix unsafe signed/unsigned comparisonsDavid Gibson
spapr_pci.c contains a number of expressions of the form (uval == -1) or (uval != -1), where 'uval' is an unsigned value. This mostly works in practice, because as long as the width of uval is greater or equal than that of (int), the -1 will be promoted to the unsigned type, which is the expected outcome. However, at least for the cases where uval is uint32_t, this would break on platforms where sizeof(int) > 4 (and a few such do exist), because then the uint32_t value would be promoted to the larger int type, and never be equal to -1. This patch fixes these errors. The fixes for the (uint32_t) cases are necessary as described above. I've made similar fixes to (uint64_t) and (hwaddr) cases. Those are strictly theoretical, since I don't know of any platforms where sizeof(int) > 8, but hey, it's not that hard so we might as well be strictly C standard compliant. Reported-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-09sPAPR: Implement EEH RTAS callsGavin Shan
The emulation for EEH RTAS requests from guest isn't covered by QEMU yet and the patch implements them. The patch defines constants used by EEH RTAS calls and adds callbacks sPAPRPHBClass::{eeh_set_option, eeh_get_state, eeh_reset, eeh_configure}, which are going to be used as follows: * RTAS calls are received in spapr_pci.c, sanity check is done there. * RTAS handlers handle what they can. If there is something it cannot handle and the corresponding sPAPRPHBClass callback is defined, it is called. * Those callbacks are only implemented for VFIO now. They do ioctl() to the IOMMU container fd to complete the calls. Error codes from that ioctl() are transferred back to the guest. [aik: defined RTAS tokens for EEH RTAS calls] Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-09pseries: Switch VGA endian on H_SET_MODEDavid Gibson
When the guest switches the interrupt endian mode, which essentially means a global machine endian switch, we want to change the VGA framebuffer endian mode as well in order to be backward compatible with existing guests who don't know about the new endian control register. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-09spapr-pci: Enable huge BARsAlexey Kardashevskiy
At the moment sPAPR only supports 512MB window for MMIO BARs. However modern devices might want bigger 64bit BARs. This extends MMIO window from 512MB to 62GB (aligned to SPAPR_PCI_WINDOW_SPACING) and advertises it in 2 records in the PHB "ranges" property. 32bit gets the space from SPAPR_PCI_MEM_WIN_BUS_OFFSET till the end of 4GB, 64bit gets the rest of the space. If no space is left, 64bit range is not advertised. The MMIO space size is set to old value of 0x20000000 by default for pseries machines older than 2.3. The approach changes the device tree which is a guest visible change, however it won't break migration as: 1. we do not support migration to older QEMU versions 2. migration to newer QEMU will migrate the device tree as well and since the new layout only extends the old one and does not change address mappigns, no breakage is expected here too. SLOF change is required to utilize this extension. Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-09pseries: Limit PCI host bridge "index" valueDavid Gibson
pseries guests can have large numbers of PCI host bridges. To avoid the user having to specify a number of different configuration values for every one, the device supports an "index" property which is a shorthand setting the various window and configuration addresses from a predefined sensible set. There are some problems with the details at present: * The "index" propery is signed, but negative values will create PCI windows below where we expect, potentially colliding with other devices * No limit is imposed on the "index" property and large values can translate to extremely large window addresses. With PCI passthrough in particular this can mean we exceed various mapping and physical address limits causing the guest host bridge to not work in strange ways. This patch addresses this, by making "index" unsigned, and imposing a limit. Currently the limit allows indices from 0..255 which is probably enough host bridges for the time being. It's fairly easy to extend if we discover we need more. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-11-04hw/ppc/spapr_pci.c: Avoid functions not in glib 2.12 (g_hash_table_iter_*)Peter Maydell
The g_hash_table_iter_* functions for iterating through a hash table are not present in glib 2.12, which is our current minimum requirement. Rewrite the code to use g_hash_table_foreach() instead. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-08spapr_pci: map the MSI window in each PHBGreg Kurz
On sPAPR, virtio devices are connected to the PCI bus and use MSI-X. Commit cc943c36faa192cd4b32af8fe5edb31894017d35 has modified MSI-X so that writes are made using the bus master address space and follow the IOMMU path. Unfortunately, the IOMMU address space address space does not have an MSI window: the notification is silently dropped in unassigned_mem_write instead of reaching the guest... The most visible effect is that all virtio devices are non-functional on sPAPR since then. :( This patch does the following: 1) map the MSI window into the IOMMU address space for each PHB - since each PHB instantiates its own IOMMU address space, we can safely map the window at a fixed address (SPAPR_PCI_MSI_WINDOW) - no real need to keep the MSI window setup in a separate function, the spapr_pci_msi_init() code moves to spapr_phb_realize(). 2) kill the global MSI window as it is not needed in the end Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-08spapr_pci: Fix config space corruptionAlexey Kardashevskiy
When disabling MSI/MSIX via "ibm,change-msi" RTAS call, no check was made if MSI or MSIX is actually supported and the MSI message was reset unconditionally. If this happened on a device which does not support MSI (but does support MSIX, otherwise "ibm,change-msi" would not be called), this device would have PCIDevice::msi_cap field (MSI capability offset) set to zero and writing a vector would actually clear PCI status. This clears MSI message only if MSI or MSIX is present on a device. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHBAlexey Kardashevskiy
Currently SPAPR PHB keeps track of all allocated MSI (here and below MSI stands for both MSI and MSIX) interrupt because XICS used to be unable to reuse interrupts. This is a problem for dynamic MSI reconfiguration which happens when guest reloads a driver or performs PCI hotplug. Another problem is that the existing implementation can enable MSI on 32 devices maximum (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason for that. This makes use of new XICS ability to reuse interrupts. This reorganizes MSI information storage in sPAPRPHBState. Instead of static array of 32 descriptors (one per a PCI function), this patch adds a GHashTable when @config_addr is a key and (first_irq, num) pair is a value. GHashTable can dynamically grow and shrink so the initial limit of 32 devices is gone. This changes migration stream as @msi_table was a static array while new @msi_devs is a dynamic hash table. This adds temporary array which is used for migration, it is populated in "spapr_pci"::pre_save() callback and expanded into the hash table in post_load() callback. Since the destination side does not know the number of MSI-enabled devices in advance and cannot pre-allocate the temporary array to receive migration state, this makes use of new VMSTATE_STRUCT_VARRAY_ALLOC macro which allocates the array automatically. This resets the MSI configuration space when interrupts are released by the ibm,change-msi RTAS call. This fixed traces to be more informative. This changes vmstate_spapr_pci_msi name from "...lsi" to "...msi" which was incorrect by accident. As the internal representation changed, thus bumps migration version number. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [agraf: drop g_malloc_n usage] Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27spapr: Move interrupt allocator to xicsAlexey Kardashevskiy
The current allocator returns IRQ numbers from a pool and does not support IRQs reuse in any form as it did not keep track of what it previously returned, it only keeps the last returned IRQ. Some use cases such as PCI hot(un)plug may require IRQ release and reallocation. This moves an allocator from SPAPR to XICS. This switches IRQ users to use new API. This uses LSI/MSI flags to know if interrupt is allocated. The interrupt release function will be posted as a separate patch. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27spapr_iommu: Make in-kernel TCE table optionalAlexey Kardashevskiy
POWER KVM supports an KVM_CAP_SPAPR_TCE capability which allows allocating TCE tables in the host kernel memory and handle H_PUT_TCE requests targeted to specific LIOBN (logical bus number) right in the host without switching to QEMU. At the moment this is used for emulated devices only and the handler only puts TCE to the table. If the in-kernel H_PUT_TCE handler finds a LIOBN and corresponding table, it will put a TCE to the table and complete hypercall execution. The user space will not be notified. Upcoming VFIO support is going to use the same sPAPRTCETable device class so KVM_CAP_SPAPR_TCE is going to be used as well. That means that TCE tables for VFIO are going to be allocated in the host as well. However VFIO operates with real IOMMU tables and simple copying of a TCE to the real hardware TCE table will not work as guest physical to host physical address translation is requited. So until the host kernel gets VFIO support for H_PUT_TCE, we better not to register VFIO's TCE in the host. This adds a place holder for KVM_CAP_SPAPR_TCE_VFIO capability. It is not in upstream yet and being discussed so now it is always false which means that in-kernel VFIO acceleration is not supported. This adds a bool @vfio_accel flag to the sPAPRTCETable device telling that sPAPRTCETable should not try allocating TCE table in the host kernel for VFIO. The flag is false now as at the moment there is no VFIO. This adds an vfio_accel parameter to spapr_tce_new_table(), the semantic is the same. Since there is only emulated PCI and VIO now, the flag is set to false. Upcoming VFIO support will set it to true. This is a preparation patch so no change in behaviour is expected Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27spapr: Fix RTAS token numbersAlexey Kardashevskiy
At the moment spapr_rtas_register() allocates a new token number for every new RTAS callback so numbers are not fixed and depend on the number of supported RTAS handlers and the exact order of spapr_rtas_register() calls. These tokens are copied into the device tree and remain the same during the guest lifetime. When we start another guest to receive a migration, it calls spapr_rtas_register() as well. If the number of RTAS handlers or their order is different in QEMU on source and destination sides, the "/rtas" node in the device tree will differ. Since migration overwrites the device tree (as it overwrites the entire RAM), the actual RTAS config on the destination side gets broken. This defines global contant values for every RTAS token which QEMU is using today. This changes spapr_rtas_register() to accept a token number instead of allocating one. This changes all users of spapr_rtas_register(). This changes XICS-KVM not to cache tokens registered with KVM as they constant now. This makes TOKEN_BASE global as RTAS_XXX use TOKEN_BASE as a base. TOKEN_MAX is moved and renamed too and its value is changed to the last token + 1. Boundary checks for token values are adjusted. This reserves token numbers for "os-term" handlers and PCI hotplug which we are working on. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16spapr_pci: Advertise MSI quotaBadari Pulavarty
Hotplug of multiple disks fails due to MSI vector quota check. Number of MSI vectors default to 8 allowing only 4 devices. This happens on RHEL6.5 guest. RHEL7 and SLES11 guests fallback to INTX. One way to workaround the issue is to increase total MSIs, so that MSI quota check allows us to hotplug multiple disks. This sets the quota to the maximum number of interupts XICS has which is 1024 now (XICS_IRQS). This moves XICS_IRQS from spapr.c to xics.h for wider visibility. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> [aik: put XICS_IRQS=1024 instead of 64i, fixed endianness and size] Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16spapr_iommu: Introduce bus_offset in sPAPRTCETableAlexey Kardashevskiy
This adds @bus_offset into sPAPRTCETable to tell where TCE table starts from. It is set to 0 for emulated devices. Dynamic DMA windows will use other offset. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16spapr_iommu: Introduce page_shift in sPAPRTCETableAlexey Kardashevskiy
At the moment only 4K pages are supported by sPAPRTCETable. Since sPAPR spec allows other page sizes and we are going to implement them, we need page size to be configrable. This adds @page_shift into sPAPRTCETable and replaces SPAPR_TCE_PAGE_SHIFT with it where it is possible. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16spapr_iommu: Get rid of window_size in sPAPRTCETableAlexey Kardashevskiy
This removes window_size as it is basically a copy of nb_table shifted by SPAPR_TCE_PAGE_SHIFT. As new dynamic DMA windows are going to support windows as big as the entire RAM and this number will be bigger that 32 capacity, we will have to do something about @window_size anyway and removal seems to be the right way to go. This removes dma_window_start/dma_window_size from sPAPRPHBState as they are no longer used. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16spapr_pci: Allow multiple TCE tables per PHBAlexey Kardashevskiy
At the moment sPAPRPHBState contains a @tcet pointer to the only TCE table. However sPAPR spec allows having more than one DMA window. Since the TCE object is already a child of SPAPR PHB object, there is no need to keep an additional pointer to it in sPAPRPHBState so remove it. This changes the way sPAPRPHBState::reset performs reset of sPAPRTCETable objects. This changes the default DMA window properties calculation. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16spapr_pci: spapr_iommu: Make DMA window a subregionAlexey Kardashevskiy
Currently the default DMA window is represented by a single MemoryRegion. However there can be more than just one window so we need a "root" memory region to be separated from the actual DMA window(s). This introduces a "root" IOMMU memory region and adds a subregion for the default DMA 32bit window. Following patches will add other subregion(s). This initializes a default DMA window subregion size to the guest RAM size as this window can be switched into "bypass" mode which implements direct DMA mapping. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16spapr_pci: Introduce a finish_realize() callbackAlexey Kardashevskiy
The spapr-pci PHB initializes IOMMU for emulated devices only. The upcoming VFIO support will do it different. However both emulated and VFIO PHB types share most of the initialization code. For the type specific things a new finish_realize() callback is introduced. This introduces sPAPRPHBClass derived from PCIHostBridgeClass and adds the callback pointer. This implements finish_realize() for emulated devices. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [agraf: Fix compilation] Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16spapr_pci: fix MSI limitAlexey Kardashevskiy
At the moment XICS does not support interrupts reuse so sPAPR PHB implements this. sPAPRPHBState holds array of 32 spapr_pci_msi to describe PCI config address, first MSI and number of MSIs. Once allocated for a device, QEMU tries reusing this config until the number of MSIs changes. Existing SPAPR guests call ibm,change-msi in a loop until the handler returns the requested number of vectors. Recently introduced check for the maximum number of MSI/MSIX vectors supported by a device only works for a device which is new for PHB's MSI cache. If it is already there, the check is not performed which leads to new IRQ block allocation. This happens during PCI hotplug even when the user hot plug the same device which he just hot unplugged. This moves the check earlier. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16spapr_pci: Fix number of returned vectors in ibm, change-msiAlexey Kardashevskiy
Current guest kernels try allocating as many vectors as the quota is. For example, in the case of virtio-net (which has just 3 vectors) the guest requests 4 vectors (that is the quota in the test) and the existing ibm,change-msi handler returns 4. But before it returns, it calls msix_set_message() in a loop and corrupts memory behind the end of msix_table. This limits the number of vectors returned by ibm,change-msi to the maximum supported by the actual device. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: qemu-stable@nongnu.org [agraf: squash in bugfix from aik] Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16spapr-pci: remove io ports workaroundGreg Kurz
In the past, IO space could not be mapped into the memory address space so we introduced a workaround for that. Nowadays it does not look necessary so we can remove the workaround and make sPAPR PCI configuration simplier. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16savevm: Remove all the unneeded version_minimum_id_old (ppc)Juan Quintela
After previous Peter patch, they are redundant. This way we don't assign them except when needed. Once there, there were lots of case where the ".fields" indentation was wrong: .fields = (VMStateField []) { and .fields = (VMStateField []) { Change all the combinations to: .fields = (VMStateField[]){ The biggest problem (appart from aesthetics) was that checkpatch complained when we copy&pasted the code from one place to another. Signed-off-by: Juan Quintela <quintela@redhat.com> Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2014-03-13Merge remote-tracking branch 'remotes/afaerber/tags/ppc-for-2.0' into stagingPeter Maydell
PowerPC queue for 2.0-rc0 * QEMUMachine include cleanup * SLOF update * XICS reset fix * sPAPR PCI host bridge refactorings # gpg: Signature made Thu 13 Mar 2014 02:50:51 GMT using RSA key ID 3E7E013F # gpg: Good signature from "Andreas Färber <afaerber@suse.de>" # gpg: aka "Andreas Färber <afaerber@suse.com>" * remotes/afaerber/tags/ppc-for-2.0: spapr-pci: Convert fprintf() to error_report() spapr-pci: Convert to QOM realize xics-kvm: Fix reset function pseries: Update SLOF firmware image to qemu-slof-20140304 Move QEMUMachine typedef to qemu/typedefs.h Revert "KVM: Split QEMUMachine typedef into separate header" Signed-off-by: Peter Maydell <peter.maydell@linaro.org>