aboutsummaryrefslogtreecommitdiff
path: root/hw/ppc/spapr_pci.c
AgeCommit message (Collapse)Author
2019-08-16Include hw/irq.h a lot lessMarkus Armbruster
In my "build everything" tree, changing hw/irq.h triggers a recompile of some 5400 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). hw/hw.h supposedly includes it for convenience. Several other headers include it just to get qemu_irq and.or qemu_irq_handler. Move the qemu_irq and qemu_irq_handler typedefs from hw/irq.h to qemu/typedefs.h, and then include hw/irq.h only where it's still needed. Touching it now recompiles only some 500 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190812052359.30071-13-armbru@redhat.com>
2019-07-02spapr_pci: Unregister listeners before destroying the IOMMU address spaceGreg Kurz
Hot-unplugging a PHB with a VFIO device connected to it crashes QEMU: -device spapr-pci-host-bridge,index=1,id=phb1 \ -device vfio-pci,host=0034:01:00.3,id=vfio0 (qemu) device_del phb1 [ 357.207183] iommu: Removing device 0001:00:00.0 from group 1 [ 360.375523] rpadlpar_io: slot PHB 1 removed qemu-system-ppc64: memory.c:2742: do_address_space_destroy: Assertion `QTAILQ_EMPTY(&as->listeners)' failed. 'as' is the IOMMU address space, which indeed has a listener registered to by vfio_connect_container() when the VFIO device is realized. This listener is supposed to be unregistered by vfio_disconnect_container() when the VFIO device is finalized. Unfortunately, the VFIO device hasn't reached finalize yet at the time the PHB unrealize function is called, and address_space_destroy() gets called with the VFIO listener still being registered. All regions have just been unmapped from the address space. Listeners aren't needed anymore at this point. Remove them before destroying the address space. The VFIO code will try to remove them _again_ at device finalize, but it is okay since memory_listener_unregister() is idempotent. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156110925375.92514.11649846071216864570.stgit@bahia.lan> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> [dwg: Correct spelling error pointed out by aik] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-02spapr_pci: Drop useless CONFIG_KVM ifdeferyGreg Kurz
kvm_enabled() expands to (0) when CONFIG_KVM is not defined. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156051052977.224162.17306829691809502082.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-02spapr_pci: Fix DRC owner in spapr_dt_pci_bus()Greg Kurz
spapr_dt_drc() scans the aliases of all DRConnector objects and filters the ones that it will use to generate OF properties according to their owner and type. Passing bus->parent_dev _works_ if bus belongs to a PCI bridge, but it is NULL if it is the PHB's root bus. This causes all allocated PCI DRCs to be associated to all PHBs (visible in their "ibm,drc-types" properties). As a consequence, hot unplugging a PHB results in PCI devices from the other PHBs to be unplugged as well, and likely confuses the guest. Use the same logic as in add_drcs() to ensure the correct owner is passed to spapr_dt_drc(). Fixes: 14e714900f6b "spapr: Allow hot plug/unplug of PCI bridges and devices under PCI bridges" Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156084737348.512412.3552825999605902691.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-07-02spapr_pci: Fix potential NULL pointer dereference in spapr_dt_pci_bus()Philippe Mathieu-Daudé
Commit 14e714900f6 refactored the call to spapr_dt_drc(), introducing a potential NULL pointer dereference while accessing bus->parent_dev. A trivial audit show 'bus' is not null in the two places the static function spapr_dt_drc() is called. Since the 'bus' parameter is not NULL in both callers, remove remove the test on if (bus), and add an assert() to silent static analyzers. This fixes: /hw/ppc/spapr_pci.c: 1367 in spapr_dt_pci_bus() >>> CID 1401933: Null pointer dereferences (FORWARD_NULL) >>> Dereferencing null pointer "bus". 1367 ret = spapr_dt_drc(fdt, offset, OBJECT(bus->parent_dev), 1368 SPAPR_DR_CONNECTOR_TYPE_PCI); Fixes: 14e714900f6 Reported-by: Coverity (CID 1401933) Suggested-by: Greg Kurz <groug@kaod.org> Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190613213406.22053-1-philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-06-12Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.1-20190612' ↵Peter Maydell
into staging ppc patch queue 2019-06-12 Next pull request against qemu-4.1. The big thing here is adding support for hot plug of P2P bridges, and PCI devices under P2P bridges on the "pseries" machine (which doesn't use SHPC). Other than that there's just a handful of fixes and small enhancements. # gpg: Signature made Wed 12 Jun 2019 06:47:56 BST # gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full] # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full] # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full] # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown] # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-4.1-20190612: ppc/xive: Make XIVE generate the proper interrupt types ppc/pnv: activate the "dumpdtb" option on the powernv machine target/ppc: Use tcg_gen_gvec_bitsel spapr: Allow hot plug/unplug of PCI bridges and devices under PCI bridges spapr: Direct all PCI hotplug to host bridge, rather than P2P bridge spapr: Don't use bus number for building DRC ids spapr: Clean up DRC index construction spapr: Clean up spapr_drc_populate_dt() spapr: Clean up dt creation for PCI buses spapr: Clean up device tree construction for PCI devices spapr: Clean up device node name generation for PCI devices target/ppc: Fix lxvw4x, lxvh8x and lxvb16x spapr_pci: Improve error message Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-06-12Include qemu/module.h where needed, drop it from qemu-common.hMarkus Armbruster
Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-4-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c; ui/cocoa.m fixed up]
2019-06-12spapr: Allow hot plug/unplug of PCI bridges and devices under PCI bridgesDavid Gibson
The pseries machine type already allows PCI hotplug and unplug via the PAPR mechanism, but only on the root bus of each PHB. This patch extends this to allow PCI to PCI bridges to be hotplugged, and devices to be hotplugged or unplugged under P2P bridges. For now we disallow hot unplugging P2P bridges. I tried doing that, but haven't managed to get it working, I think due to some guest side problems that need further investigation. To do this we dynamically construct DRCs when bridges are hot (or cold) added, which can in turn be used to hotplug devices under the bridge. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-12spapr: Don't use bus number for building DRC idsDavid Gibson
DRC ids are more or less arbitrary, as long as they're consistent. For PCI, we notionally build them from the phb's index along with PCI bus number, slot and function number. Using bus number is broken, however, because it can change if the guest re-enumerates the PCI topology for whatever reason (e.g. due to hotplug of a bridge, which we don't support yet but want to). Fortunately, there's an alternative. Bridges are required to have a unique non-zero "chassis number" that we can use instead. Adjust the code to use that instead. This looks like it would introduce a guest visible breaking change, but in fact it does not because we don't yet ever use non-zero bus numbers. Both chassis and bus number are always 0 for the root bus, so there's no change for the existing cases. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-12spapr: Clean up DRC index constructionDavid Gibson
spapr_pci.c currently has several confusingly similarly named functions for various conversions between representations of DRCs. Make things clearer by renaming things in a more consistent XXX_from_YYY() manner and remove some called-only-once variants in favour of open coding. While we're at it, move this code together in the file to avoid some extra forward references, and split out construction and removal of DRCs for the host bridge into helper functions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-12spapr: Clean up spapr_drc_populate_dt()David Gibson
This makes some minor cleanups to spapr_drc_populate_dt(), renaming it to the shorter and more idiomatic spapr_dt_drc() along the way. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-12spapr: Clean up dt creation for PCI busesDavid Gibson
Device nodes for PCI bridges (both host and P2P) describe both the bridge device itself and the bus hanging off it, handling of this is a bit of a mess. spapr_dt_pci_device() has a few things it only adds for non-bridges, but always adds #address-cells and #size-cells which should only appear for bridges. But the walking down the subordinate PCI bus is done in one of its callers spapr_populate_pci_devices_dt(). The PHB dt creation in spapr_populate_pci_dt() open codes some similar logic to the bridge case. This patch consolidates things in a bunch of ways: * Bus specific dt info is now created in spapr_dt_pci_bus() used for both P2P bridges and the host bridge. This includes walking subordinate devices * spapr_dt_pci_device() now calls spapr_dt_pci_bus() when called on a P2P bridge * We do detection of bridges with the is_bridge field of the device class, rather than checking PCI config space directly, for consistency with qemu's core PCI code. * Several things are renamed for brevity and clarity Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-12spapr: Clean up device tree construction for PCI devicesDavid Gibson
spapr_create_pci_child_dt() is a trivial wrapper around spapr_populate_pci_child_dt(), but is the latter's only caller. So fold them together into spapr_dt_pci_device(), which closer matches our modern naming convention. While there, make a number of cleanups to the function itself. This is mostly using more temporary locals to avoid awkwardly long lines, and in some cases avoiding double reads of PCI config space variables. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-12spapr: Clean up device node name generation for PCI devicesDavid Gibson
spapr_populate_pci_child_dt() adds a 'name' property to the device tree node for PCI devices. This is never necessary for a flattened device tree, it is implicit in the name added when the node is constructed. In fact anything we do add to a 'name' property will be overwritten with something derived from the structural name in the guest firmware (but in fact it is exactly the same bytes). So, remove that. In addition, pci_get_node_name() is very simple, so fold it into its (also simple) sole caller spapr_create_pci_child_dt(). While we're there rename pci_find_device_name() to the shorter and more accurate dt_name_from_class(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2019-06-12spapr_pci: Improve error messageGreg Kurz
Every PHB must have a unique index. This is checked at realize but when a duplicate index is detected, an error message mentioning BUIDs is printed. This doesn't help much, especially since BUID is an internal concept that is no longer exposed to the user. Fix the message to mention the index property instead of BUID. As a bonus print a list of indexes already in use. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155915010892.2061314.10485622810149098411.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-05-29pcie: Simplify pci_adjust_config_limit()David Gibson
Since c2077e2c "pci: Adjust PCI config limit based on bus topology", pci_adjust_config_limit() has been used in the config space read and write paths to only permit access to extended config space on buses which permit it. Specifically it prevents access on devices below a vanilla-PCI bus via some combination of bridges, even if both the host bridge and the device itself are PCI-E. It accomplishes this with a somewhat complex call up the chain of bridges to see if any of them prohibit extended config space access. This is overly complex, since we can always know if the bus will support such access at the point it is constructed. This patch simplifies the test by using a flag in the PCIBus instance indicating whether extended configuration space is accessible. It is false for vanilla PCI buses. For PCI-E buses, it is true for root buses and equal to the parent bus's's capability otherwise. For the special case of sPAPR's paravirtualized PCI root bus, which acts mostly like vanilla PCI, but does allow extended config space access, we override the default value of the flag from the host bridge code. This should cause no behavioural change. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20190513061939.3464-4-david@gibson.dropbear.id.au> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-04-26spapr: Drop duplicate PCI swizzle codeGreg Kurz
LSI mapping in spapr currently open-codes standard PCI swizzling. It thus duplicates the code of pci_swizzle_map_irq_fn(). Expose the swizzling formula so that it can be used with a slot number when building the device tree. Simply drop pci_spapr_map_irq() and call pci_swizzle_map_irq_fn() instead. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155448184841.8446.13959787238854054119.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26spapr_pci: Get rid of duplicate code for node name creationGreg Kurz
According to the changelog of 298a971024534, SpaprPhbState::dtbusname was introduced to "make it easier to relate the guest and qemu views of memory to each other", hence its name. Use it when creating the PHB node to avoid code duplication. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155448184292.8446.8225650773162648595.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26spapr: Support NVIDIA V100 GPU with NVLink2Alexey Kardashevskiy
NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver implements special regions for such GPUs and emulates an NVLink bridge. NVLink2-enabled POWER9 CPUs also provide address translation services which includes an ATS shootdown (ATSD) register exported via the NVLink bridge device. This adds a quirk to VFIO to map the GPU memory and create an MR; the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses this to get the MR and map it to the system address space. Another quirk does the same for ATSD. This adds additional steps to sPAPR PHB setup: 1. Search for specific GPUs and NPUs, collect findings in sPAPRPHBState::nvgpus, manage system address space mappings; 2. Add device-specific properties such as "ibm,npu", "ibm,gpu", "memory-block", "link-speed" to advertise the NVLink2 function to the guest; 3. Add "mmio-atsd" to vPHB to advertise the ATSD capability; 4. Add new memory blocks (with extra "linux,memory-usable" to prevent the guest OS from accessing the new memory until it is onlined) and npuphb# nodes representing an NPU unit for every vPHB as the GPU driver uses it for link discovery. This allocates space for GPU RAM and ATSD like we do for MMIOs by adding 2 new parameters to the phb_placement() hook. Older machine types set these to zero. This puts new memory nodes in a separate NUMA node to as the GPU RAM needs to be configured equally distant from any other node in the system. Unlike the host setup which assigns numa ids from 255 downwards, this adds new NUMA nodes after the user configures nodes or from 1 if none were configured. This adds requirement similar to EEH - one IOMMU group per vPHB. The reason for this is that ATSD registers belong to a physical NPU so they cannot invalidate translations on GPUs attached to another NPU. It is guaranteed by the host platform as it does not mix NVLink bridges or GPUs from different NPU in the same IOMMU group. If more than one IOMMU group is detected on a vPHB, this disables ATSD support for that vPHB and prints a warning. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for vfio portions] Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20190312082103.130561-1-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-12spapr_pci: Fix broken naming of PCI busGreg Kurz
Recent commit 5cf0d326a0fe fixed a regression which was preventing the guest to access the extended config space of a PCIe device. This was done by introducing a new PCI bus subtype for PAPR. The original fix was causing PCI busses to be named "spapr-pci-host-bridge-root-bus.N" instead of "pci.N", which was making upper layers unhappy of course. This got worked around by hardcoding the PCI bus name to "pci.0", but this only works for the default PHB. And we're now hitting: # qemu-system-ppc64 \ -device spapr-pci-host-bridge,index=1 \ -device e1000e,bus=pci.0 \ -device e1000e,bus=pci.1 qemu-system-ppc64: -device e1000e,bus=pci.1: Bus 'pci.1' not found David already posted some patches [1] to control PCI extended config space accesses with a new flag in the base PCI bus class instead of subtyping. These patches are a bit more intrusive though, and are targetted for 4.1. When no name is passed to pci_register_bus(), the core device code generates a lowercase name based on the QOM typename. The typename for the base PCI bus class is "PCI", hence the "pci.0", "pci.1" bus names. Rename the type of the PAPR PCI bus to "pci", so that the QOM code can generate proper names. This is a hack but it is enough to fix the regression. And all this will be reworked properly in 4.1. [1] https://patchwork.ozlabs.org/project/qemu-devel/list/?series=100486 Fixes: 5cf0d326a0fe Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155500034416.646888.1307366522340665522.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-09spapr_pci: Fix extended config space accessesGreg Kurz
The PAPR PHB acts as a legacy PCI bus but it allows PCIe extended config space accesses anyway (for pseries-2.9 and newer machine types). Introduce a specific PCI bus subtype to inform the common PCI code about that. Fixes: c2077e2ca0da7 Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155414130834.574858.16502276132110219890.stgit@bahia.lan> [dwg: Apply fix so we don't rename the default pci bus, breaking everything] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-19spapr: Remove NULL checks on error_propagate() callsMarkus Armbruster
Patch created mechanically by rerunning: $ spatch --sp-file scripts/coccinelle/error_propagate_null.cocci \ --macro-file scripts/cocci-macro-file.h \ --dir . --in-place Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190318190148.18283-1-armbru@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12spapr: Use CamelCase properlyDavid Gibson
The qemu coding standard is to use CamelCase for type and structure names, and the pseries code follows that... sort of. There are quite a lot of places where we bend the rules in order to preserve the capitalization of internal acronyms like "PHB", "TCE", "DIMM" and most commonly "sPAPR". That was a bad idea - it frequently leads to names ending up with hard to read clusters of capital letters, and means they don't catch the eye as type identifiers, which is kind of the point of the CamelCase convention in the first place. In short, keeping type identifiers look like CamelCase is more important than preserving standard capitalization of internal "words". So, this patch renames a heap of spapr internal type names to a more standard CamelCase. In addition to case changes, we also make some other identifier renames: VIOsPAPR* -> SpaprVio* The reverse word ordering was only ever used to mitigate the capital cluster, so revert to the natural ordering. VIOsPAPRVTYDevice -> SpaprVioVty VIOsPAPRVLANDevice -> SpaprVioVlan Brevity, since the "Device" didn't add useful information sPAPRDRConnector -> SpaprDrc sPAPRDRConnectorClass -> SpaprDrcClass Brevity, and makes it clearer this is the same thing as a "DRC" mentioned in many other places in the code This is 100% a mechanical search-and-replace patch. It will, however, conflict with essentially any and all outstanding patches touching the spapr code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-06qdev: Let the hotplug_handler_unplug() caller delete the deviceDavid Hildenbrand
When unplugging a device, at one point the device will be destroyed via object_unparent(). This will, one the one hand, unrealize the removed device hierarchy, and on the other hand, destroy/free the device hierarchy. When chaining hotplug handlers, we want to overwrite a bus hotplug handler by the machine hotplug handler, to be able to perform some part of the plug/unplug and to forward the calls to the bus hotplug handler. For now, the bus hotplug handler would trigger an object_unparent(), not allowing us to perform some unplug action on a device after we forwarded the call to the bus hotplug handler. The device would be gone at that point. machine_unplug_handler(dev) /* eventually do unplug stuff */ bus_unplug_handler(dev) /* dev is gone, we can't do more unplug stuff */ So move the object_unparent() to the original caller of the unplug. For now, keep the unrealize() at the original places of the object_unparent(). For implicitly chained hotplug handlers (e.g. pc code calling acpi hotplug handlers), the object_unparent() has to be done by the outermost caller. So when calling hotplug_handler_unplug() from inside an unplug handler, nothing is to be done. hotplug_handler_unplug(dev) -> calls machine_unplug_handler() machine_unplug_handler(dev) { /* eventually do unplug stuff */ bus_unplug_handler(dev) -> calls unrealize(dev) /* we can do more unplug stuff but device already unrealized */ } object_unparent(dev) In the long run, every unplug action should be factored out of the unrealize() function into the unplug handler (especially for PCI). Then we can get rid of the additonal unrealize() calls and object_unparent() will properly unrealize the device hierarchy after the device has been unplugged. hotplug_handler_unplug(dev) -> calls machine_unplug_handler() machine_unplug_handler(dev) { /* eventually do unplug stuff */ bus_unplug_handler(dev) -> only unplugs, does not unrealize /* we can do more unplug stuff */ } object_unparent(dev) -> will unrealize The original approach was suggested by Igor Mammedov for the PCI part, but I extended it to all hotplug handlers. I consider this one step into the right direction. To summarize: - object_unparent() on synchronous unplugs is done by common code -- "Caller of hotplug_handler_unplug" - object_unparent() on asynchronous unplugs ("unplug requests") has to be done manually -- "Caller of hotplug_handler_unplug" Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190228122849.4296-2-david@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-02-26spapr: add hotplug hooks for PHB hotplugGreg Kurz
Hotplugging PHBs is a machine-level operation, but PHBs reside on the main system bus, so we register spapr machine as the handler for the main system bus. Provide the usual pre-plug, plug and unplug-request handlers. Move the checking of the PHB index to the pre-plug handler. It is okay to do that and assert in the realize function because the pre-plug handler is always called, even for the oldest machine types we support. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> (Fixed interrupt controller phandle in "interrupt-map" and TCE table size in "ibm,dma-window" FDT fragment, Greg Kurz) Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059672926.1466090.13612804072190051439.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr_pci: add ibm, my-drc-index property for PHB hotplugMichael Roth
This is needed to denote a boot-time PHB as being hot-pluggable. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059672420.1466090.15147504040270659866.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr_pci: provide node start offset via spapr_populate_pci_dt()Michael Roth
PHB hotplug re-uses PHB device tree generation code and passes it to a guest via RTAS. Doing this requires knowledge of where exactly in the device tree the node describing the PHB begins. Provide this via a new optional pointer that can be used to store the PHB node's start offset. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059671912.1466090.10891589403973703473.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr_pci: add PHB unrealizeGreg Kurz
To support PHB hotplug we need to clean up lingering references, memory, child properties, etc. prior to the PHB object being finalized. Generally this will be called as a result of calling object_unparent() on the PHB object, which in turn would normally be called as the result of an unplug() operation. When the PHB is finalized, child objects will be unparented in turn, and finalized if the PHB was the only reference holder. so we don't bother to explicitly unparent child objects of the PHB, with the notable exception of DRCs. This is needed to avoid a QEMU crash when unplugging a PHB and resetting the machine before the guest could handle the event. The DRCs are removed from the QOM tree by pci_unregister_root_bus() and we must make sure we're not leaving stale aliases under the global /dr-connector path. The formula that gives the number of DMA windows is moved to an inline function in the hw/pci-host/spapr.h header because it will have other users. The unrealize function is able to cope with partially realized PHBs. It is hence used to implement proper rollback on the realize error path. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <155059669881.1466090.13515030705986041517.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr/drc: Drop spapr_drc_attach() fdt argumentGreg Kurz
All DRC subtypes have been converted to generate the FDT fragment at configure connector time instead of attach time. The fdt and fdt_offset arguments of spapr_drc_attach() aren't needed anymore. Drop them and make the implementation of the dt_populate() method mandatory. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059667853.1466090.16527852453054217565.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr/pci: Generate FDT fragment at configure connector timeGreg Kurz
Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059667346.1466090.326696113231137772.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-17qdev: pass an Object * to qbus_set_hotplug_handler()Michael Roth
Certain devices types, like memory/CPU, are now being handled using a hotplug interface provided by a top-level MachineClass. Hotpluggable host bridges are another such device where it makes sense to use a machine-level hotplug handler. However, unlike those devices, host-bridges have a parent bus (the main system bus), and devices with a parent bus use a different mechanism for registering their hotplug handlers: qbus_set_hotplug_handler(). This interface currently expects a handler to be a subclass of DeviceClass, but this is not the case for MachineClass, which derives directly from ObjectClass. Internally, the interface only requires an ObjectClass, so expose that in qbus_set_hotplug_handler(). Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <154999589921.690774.3640149277362188566.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-17spapr_pci: Fix interrupt leak in rtas_ibm_change_msi() error pathGreg Kurz
Now that IRQ allocation has been split in two (first allocate IRQ numbers, then claim them), if the claiming fails, we must release the IRQs. Fixes: 4fe75a8ccd80 "spapr: split the IRQ allocation sequence" Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-17spapr: Rename xics to intc in interrupt controller agnostic codeGreg Kurz
All this code is used with both the XICS and XIVE interrupt controllers. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-04spapr_pci: Fix endianness in assigned-addresses propertyAlexey Kardashevskiy
reg->phys_hi and assigned->phys_hi are big endian but we do an extra byteswap anyway when copying reg->phys_hi to assigned->phys_hi. To make things slightly more messy, we also add a relocatable bit (b_n()) although in the right endianness. This fixes endianness of assigned->phys_hi. This is unlikely to produce any visible difference though as we should end up there only in the case of PCI hotplug and even then I am not sure if (d->io_regions[i].addr == PCI_BAR_UNMAPPED) == true. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-04spapr/pci: Fix primary bus number for PCI bridgesDavid Hildenbrand
While looking at the s390x implementation, looks like spapr has a similar BUG when building the topology. The primary bus number corresponds always to the bus number of the bus the bridge is attached to. Right now, if we have two bridges attached to the same bus (e.g. root bus) this is however not the case. The first bridge will have primary bus 0, the second bridge primary bus 1, which is wrong. Fix the assignment. While at it, drop setting the PCI_SUBORDINATE_BUS temporarily to 0xff. Setting it temporarily to that value (as discussed e.g. in [1]), is only relevant for a running system that probes the buses. The value is effectively unused for us just doing a DFS. [1] http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09spapr: move spapr_create_phb() to core machine codeGreg Kurz
This function is only used when creating the default PHB. Let's rename it and move it to the core machine code for clarity. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-20spapr_pci: perform unplug via the hotplug handlerDavid Hildenbrand
Introduce and use the "unplug" callback. This is a preparation for multi-stage hotplug handlers, whereby the bus hotplug handler is overwritten by the machine hotplug handler. This handler will then pass control to the bus hotplug handler. So to get this running cleanly, we also have to make sure to go via the hotplug handler chain when actually unplugging a device after an unplug request. Lookup the hotplug handler and call "unplug". Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-08spapr_pci: convert g_malloc() to g_new()Greg Kurz
When allocating an array, it is a recommended coding practice to call g_new(FooType, n) instead of g_malloc(n * sizeof(FooType)) because it takes care to avoid overflow when calculating the size of the allocated block and it returns FooType *, which allows the compiler to perform type checking. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-10-19error: Fix use of error_prepend() with &error_fatal, &error_abortMarkus Armbruster
From include/qapi/error.h: * Pass an existing error to the caller with the message modified: * error_propagate(errp, err); * error_prepend(errp, "Could not frobnicate '%s': ", name); Fei Li pointed out that doing error_propagate() first doesn't work well when @errp is &error_fatal or &error_abort: the error_prepend() is never reached. Since I doubt fixing the documentation will stop people from getting it wrong, introduce error_propagate_prepend(), in the hope that it lures people away from using its constituents in the wrong order. Update the instructions in error.h accordingly. Convert existing error_prepend() next to error_propagate to error_propagate_prepend(). If any of these get reached with &error_fatal or &error_abort, the error messages improve. I didn't check whether that's the case anywhere. Cc: Fei Li <fli@suse.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20181017082702.5581-2-armbru@redhat.com>
2018-09-25spapr_pci: add an extra 'nr_msis' argument to spapr_populate_pci_dtCédric Le Goater
So that we don't have to call qdev_get_machine() to get the machine class and the sPAPRIrq backend holding the number of MSIs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-09-25spapr: introduce a spapr_irq class 'nr_msis' attributeCédric Le Goater
The number of MSI interrupts a sPAPR machine can allocate is in direct relation with the number of interrupts of the sPAPRIrq backend. Define statically this value at the sPAPRIrq class level and use it for the "ibm,pe-total-#msi" property of the sPAPR PHB. According to the PAPR specs, "ibm,pe-total-#msi" defines the maximum number of MSIs that are available to the PE. We choose to advertise the maximum number of MSIs that are available to the machine for simplicity of the model and to avoid segmenting the MSI interrupt pool which can be easily shared. If the pool limit is reached, it can be extended dynamically. Finally, remove XICS_IRQS_SPAPR which is now unused. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-28spapr_pci: fix potential NULL pointer dereferenceGreg Kurz
Commit 2c88b098e76fd added a call to SPAPR_MACHINE_GET_CLASS(spapr) in spapr_phb_realize() before we check spapr isn't NULL. This causes QEMU to crash when starting a non-pseries machine with a sPAPR PHB. This could be fixed by setting the smc variable after the null check, but it seems more explicit to use a ternary operator to skip the call to SPAPR_MACHINE_GET_CLASS() if spapr is NULL, since spapr_phb_realize() will return immediately in this case. This was reported by Coverity (CID 1395170 and 1395183). Fixes: 2c88b098e76fde0c7fcc0476dd3f80ce58409505 Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-21spapr_pci: factorize the use of SPAPR_MACHINE_GET_CLASS()Cédric Le Goater
It should save us some CPU cycles as these routines perform a lot of checks. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-21spapr: introduce a fixed IRQ number spaceCédric Le Goater
This proposal introduces a new IRQ number space layout using static numbers for all devices, depending on a device index, and a bitmap allocator for the MSI IRQ numbers which are negotiated by the guest at runtime. As the VIO device model does not have a device index but a "reg" property, we introduce a formula to compute an IRQ number from a "reg" value. It should minimize most of the collisions. The previous layout is kept in pre-3.1 machines raising the 'legacy_irq_allocation' machine class flag. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21spapr: split the IRQ allocation sequenceCédric Le Goater
Today, when a device requests for IRQ number in a sPAPR machine, the spapr_irq_alloc() routine first scans the ICSState status array to find an empty slot and then performs the assignement of the selected numbers. Split this sequence in two distinct routines : spapr_irq_find() for lookups and spapr_irq_claim() for claiming the IRQ numbers. This will ease the introduction of a static layout of IRQ numbers. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-12spapr_pci: Remove unhelpful pagesize warningDavid Gibson
By default, the IOMMU model built into the spapr virtual PCI host bridge supports 4kiB and 64kiB IOMMU page sizes. However this can be overridden which may be desirable to allow larger IOMMU page sizes when running a guest with hugepage backing and passthrough devices. For that reason a warning was printed when the device wasn't configured to allow the pagesize with which guest RAM is backed. Experience has proven, however, that this message is more confusing than useful. Worse it sometimes makes little sense when the host-available page sizes don't match those available on the guest, which can happen with a POWER8 guest running on a POWER9 KVM host. Long term we do want better handling to allow large IOMMU page sizes to be used, but for now this parameter and warning don't really accomplish it. So, remove the message, pending a better solution. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29spapr_pci: fix MSI/MSIX selectionGreg Kurz
In various place we don't correctly check if the device supports MSI or MSI-X. This can cause devices to be advertised with MSI support, even if they only support MSI-X (like virtio-pci-* devices for example): ethernet@0 { ibm,req#msi = <0x1>; <--- wrong! . ibm,loc-code = "qemu_virtio-net-pci:0000:00:00.0"; . ibm,req#msi-x = <0x3>; }; Worse, this can also cause the "ibm,change-msi" RTAS call to corrupt the PCI status and cause migration to fail: qemu-system-ppc64: get_pci_config_device: Bad config data: i=0x6 read: 0 device: 10 cmask: 10 wmask: 0 w1cmask:0 ^^ PCI_STATUS_CAP_LIST bit which is assumed to be constant This patch changes spapr_populate_pci_child_dt() to properly check for MSI support using msi_present(): this ensures that PCIDevice::msi_cap was set by msi_init() and that msi_nr_vectors_allocated() will look at the right place in the config space. Checking PCIDevice::msix_entries_nr is enough for MSI-X but let's add a call to msix_present() there as well for consistency. It also changes rtas_ibm_change_msi() to select the appropriate MSI type in Function 1 instead of always selecting plain MSI. This new behaviour is compliant with LoPAPR 1.1, as described in "Table 71. ibm,change-msi Argument Call Buffer": Function 1: If Number Outputs is equal to 3, request to set to a new number of MSIs (including set to 0). If the “ibm,change-msix-capable” property exists and Number Outputs is equal to 4, request is to set to a new number of MSI or MSI-X (platform choice) interrupts (including set to 0). Since MSI is the the platform default (LoPAPR 6.2.3 MSI Option), let's check for MSI support first. And finally, it checks the input parameters are valid, as described in LoPAPR 1.1 "R1–7.3.10.5.1–3": For the MSI option: The platform must return a Status of -3 (Parameter error) from ibm,change-msi, with no change in interrupt assignments if the PCI configuration address does not support MSI and Function 3 was requested (that is, the “ibm,req#msi” property must exist for the PCI configuration address in order to use Function 3), or does not support MSI-X and Function 4 is requested (that is, the “ibm,req#msi-x” property must exist for the PCI configuration address in order to use Function 4), or if neither MSIs nor MSI-Xs are supported and Function 1 is requested. This ensures that the ret_intr_type variable contains a valid MSI type for this device, and that spapr_msi_setmsg() won't corrupt the PCI status. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-11Merge remote-tracking branch 'origin/master' into HEADMichael S. Tsirkin
Resolve conflicts around apb. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-01-10spapr_pci: use warn_report()Greg Kurz
These two are definitely warnings. Let's use the appropriate API. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15spapr: fix LSI interrupt specifiers in the device treeGreg Kurz
LoPAPR 1.1 B.6.9.1.2 describes the "#interrupt-cells" property of the PowerPC External Interrupt Source Controller node as follows: “#interrupt-cells” Standard property name to define the number of cells in an interrupt- specifier within an interrupt domain. prop-encoded-array: An integer, encoded as with encode-int, that denotes the number of cells required to represent an interrupt specifier in its child nodes. The value of this property for the PowerPC External Interrupt option shall be 2. Thus all interrupt specifiers (as used in the standard “interrupts” property) shall consist of two cells, each containing an integer encoded as with encode-int. The first integer represents the interrupt number the second integer is the trigger code: 0 for edge triggered, 1 for level triggered. This patch fixes the interrupt specifiers in the "interrupt-map" property of the PHB node, that were setting the second cell to 8 (confusion with IRQ_TYPE_LEVEL_LOW ?) instead of 1. VIO devices and RTAS event sources use the same format for interrupt specifiers: while here, we introduce a common helper to handle the encoding details. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> -- v3: - reference public LoPAPR instead of internal PAPR+ in changelog - change helper name to spapr_dt_xics_irq() v2: - drop the erroneous changes to the "interrupts" prop in PCI device nodes - introduce a common helper to encode interrupt specifiers Signed-off-by: David Gibson <david@gibson.dropbear.id.au>