aboutsummaryrefslogtreecommitdiff
path: root/include/hw/ppc
AgeCommit message (Collapse)Author
2019-05-13Clean up ill-advised or unusual header guardsMarkus Armbruster
Leading underscores are ill-advised because such identifiers are reserved. Trailing underscores are merely ugly. Strip both. Our header guards commonly end in _H. Normalize the exceptions. Done with scripts/clean-header-guards.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190315145123.28030-7-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> [Changes to slirp/ dropped, as we're about to spin it off]
2019-04-26ppc/hash64: Rework R and C bit updatesBenjamin Herrenschmidt
With MT-TCG, we are now running translation in a racy way, thus we need to mimic hardware when it comes to updating the R and C bits, by doing byte stores. The current "store_hpte" abstraction is ill suited for this, we replace it with two separate callbacks for setting R and C. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190411080004.8690-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26spapr/rtas: modify spapr_rtas_register() to remove RTAS handlersCédric Le Goater
Removing RTAS handlers will become necessary when the new pseries machine supporting multiple interrupt mode is introduced. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190321144914.19934-9-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-26spapr: Support NVIDIA V100 GPU with NVLink2Alexey Kardashevskiy
NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver implements special regions for such GPUs and emulates an NVLink bridge. NVLink2-enabled POWER9 CPUs also provide address translation services which includes an ATS shootdown (ATSD) register exported via the NVLink bridge device. This adds a quirk to VFIO to map the GPU memory and create an MR; the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses this to get the MR and map it to the system address space. Another quirk does the same for ATSD. This adds additional steps to sPAPR PHB setup: 1. Search for specific GPUs and NPUs, collect findings in sPAPRPHBState::nvgpus, manage system address space mappings; 2. Add device-specific properties such as "ibm,npu", "ibm,gpu", "memory-block", "link-speed" to advertise the NVLink2 function to the guest; 3. Add "mmio-atsd" to vPHB to advertise the ATSD capability; 4. Add new memory blocks (with extra "linux,memory-usable" to prevent the guest OS from accessing the new memory until it is onlined) and npuphb# nodes representing an NPU unit for every vPHB as the GPU driver uses it for link discovery. This allocates space for GPU RAM and ATSD like we do for MMIOs by adding 2 new parameters to the phb_placement() hook. Older machine types set these to zero. This puts new memory nodes in a separate NUMA node to as the GPU RAM needs to be configured equally distant from any other node in the system. Unlike the host setup which assigns numa ids from 255 downwards, this adds new NUMA nodes after the user configures nodes or from 1 if none were configured. This adds requirement similar to EEH - one IOMMU group per vPHB. The reason for this is that ATSD registers belong to a physical NPU so they cannot invalidate translations on GPUs attached to another NPU. It is guaranteed by the host platform as it does not mix NVLink bridges or GPUs from different NPU in the same IOMMU group. If more than one IOMMU group is detected on a vPHB, this disables ATSD support for that vPHB and prints a warning. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for vfio portions] Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20190312082103.130561-1-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-29spapr: Simplify handling of host-serial and host-model valuesDavid Gibson
27461d69a0f "ppc: add host-serial and host-model machine attributes (CVE-2019-8934)" introduced 'host-serial' and 'host-model' machine properties for spapr to explicitly control the values advertised to the guest in device tree properties with the same names. The previous behaviour on KVM was to unconditionally populate the device tree with the real host serial number and model, which leaks possibly sensitive information about the host to the guest. To maintain compatibility for old machine types, we allowed those props to be set to "passthrough" to take the value from the host as before. Or they could be set to "none" to explicitly omit the device tree items. Special casing specific values on what's otherwise a user supplied string is very ugly. So, this patch simplifies things by implementing the backwards compatibility in a different way: we have a machine class flag set for the older machines, and we only load the host values into the device tree if A) they're not set by the user and B) we have that flag set. This does mean that the "passthrough" functionality is no longer available with the current machine type. That's ok though: if a user or management layer really wants the information passed through they can read it themselves (OpenStack Nova already does something similar for x86). It also means the user can't explicitly ask for the values to be omitted on the old machine types. I think that's an acceptable trade-off: if you care enough about not leaking the host information you can either move to the new machine type, or use a dummy value for the properties. For the new machine type, this also removes an odd inconsistency between running on a POWER and non-POWER (or non-Linux) hosts: if the host information couldn't be read from where we expect (in the host's device tree as exposed by Linux), we'd fallback to omitting the guest device tree items. While we're there, improve some poorly worded comments, and the help text for the properties. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org>
2019-03-12spapr: Use CamelCase properlyDavid Gibson
The qemu coding standard is to use CamelCase for type and structure names, and the pseries code follows that... sort of. There are quite a lot of places where we bend the rules in order to preserve the capitalization of internal acronyms like "PHB", "TCE", "DIMM" and most commonly "sPAPR". That was a bad idea - it frequently leads to names ending up with hard to read clusters of capital letters, and means they don't catch the eye as type identifiers, which is kind of the point of the CamelCase convention in the first place. In short, keeping type identifiers look like CamelCase is more important than preserving standard capitalization of internal "words". So, this patch renames a heap of spapr internal type names to a more standard CamelCase. In addition to case changes, we also make some other identifier renames: VIOsPAPR* -> SpaprVio* The reverse word ordering was only ever used to mitigate the capital cluster, so revert to the natural ordering. VIOsPAPRVTYDevice -> SpaprVioVty VIOsPAPRVLANDevice -> SpaprVioVlan Brevity, since the "Device" didn't add useful information sPAPRDRConnector -> SpaprDrc sPAPRDRConnectorClass -> SpaprDrcClass Brevity, and makes it clearer this is the same thing as a "DRC" mentioned in many other places in the code This is 100% a mechanical search-and-replace patch. It will, however, conflict with essentially any and all outstanding patches touching the spapr code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc/pnv: POWER9 XSCOM quad supportCédric Le Goater
The POWER9 processor does not support per-core frequency control. The cores are arranged in groups of four, along with their respective L2 and L3 caches, into a structure known as a Quad. The frequency must be managed at the Quad level. Provide a basic Quad model to fake the settings done by the firmware on the Non-Cacheable Unit (NCU). Each core pair (EX) needs a special BAR setting for the TIMA area of XIVE because it resides on the same address on all chips. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190307223548.20516-12-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc/pnv: extend XSCOM core support for POWER9Cédric Le Goater
Provide a new class attribute to define XSCOM operations per CPU family and add a couple of XSCOM addresses controlling the power management states of the core on POWER9. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190307223548.20516-11-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc/pnv: add a OCC model for POWER9Cédric Le Goater
The OCC on POWER9 is very similar to the one found on POWER8. Provide the same routines with P9 values for the registers and IRQ number. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190307223548.20516-10-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc/pnv: add a OCC model classCédric Le Goater
To ease the introduction of the OCC model for POWER9, provide a new class attributes to define XSCOM operations per CPU family and a PSI IRQ number. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190307223548.20516-9-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc/pnv: add SerIRQ routing registersCédric Le Goater
This is just a simple reminder that SerIRQ routing should be addressed. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190307223548.20516-8-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc/pnv: add a LPC Controller model for POWER9Cédric Le Goater
The LPC Controller on POWER9 is very similar to the one found on POWER8 but accesses are now done via on MMIOs, without the XSCOM and ECCB logic. The device tree is populated differently so we add a specific POWER9 routine for the purpose. SerIRQ routing is yet to be done. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190307223548.20516-7-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc/pnv: add a 'dt_isa_nodename' to the chipCédric Le Goater
The ISA bus has a different DT nodename on POWER9. Compute the name when the PnvChip is realized, that is before it is used by the machine to populate the device tree with the ISA devices. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190307223548.20516-6-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc/pnv: add a LPC Controller class modelCédric Le Goater
It will ease the introduction of the LPC Controller model for POWER9. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190307223548.20516-5-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc/pnv: add a PSI bridge model for POWER9Cédric Le Goater
The PSI bridge on POWER9 is very similar to POWER8. The BAR is still set through XSCOM but the controls are now entirely done with MMIOs. More interrupts are defined and the interrupt controller interface has changed to XIVE. The POWER9 model is a first example of the usage of the notify() handler of the XiveNotifier interface, linking the PSI XiveSource to its owning device model. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190307223548.20516-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc/pnv: add a PSI bridge class modelCédric Le Goater
To ease the introduction of the PSI bridge model for POWER9, abstract the POWER chip differences in a PnvPsi class model and introduce a specific Pnv8Psi type for POWER8. POWER8 interface to the interrupt controller is still XICS whereas POWER9 uses the new XIVE model. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190307223548.20516-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12spapr_iommu: Do not replay mappings from just created DMA windowAlexey Kardashevskiy
On sPAPR vfio_listener_region_add() is called in 2 situations: 1. a new listener is registered from vfio_connect_container(); 2. a new IOMMU Memory Region is added from rtas_ibm_create_pe_dma_window(). In both cases vfio_listener_region_add() calls memory_region_iommu_replay() to notify newly registered IOMMU notifiers about existing mappings which is totally desirable for case 1. However for case 2 it is nothing but noop as the window has just been created and has no valid mappings so replaying those does not do anything. It is barely noticeable with usual guests but if the window happens to be really big, such no-op replay might take minutes and trigger RCU stall warnings in the guest. For example, a upcoming GPU RAM memory region mapped at 64TiB (right after SPAPR_PCI_LIMIT) causes a 64bit DMA window to be at least 128TiB which is (128<<40)/0x10000=2.147.483.648 TCEs to replay. This mitigates the problem by adding an "skipping_replay" flag to sPAPRTCETable and defining sPAPR own IOMMU MR replay() hook which does exactly the same thing as the generic one except it returns early if @skipping_replay==true. Another way of fixing this would be delaying replay till the very first H_PUT_TCE but this does not work if in-kernel H_PUT_TCE handler is enabled (a likely case). When "ibm,create-pe-dma-window" is complete, the guest will map only required regions of the huge DMA window. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20190307050518.64968-2-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc/pnv: introduce a new pic_print_info() operation to the chip modelCédric Le Goater
The POWER9 and POWER8 processors have different interrupt controllers, and reporting their state requires calling different helper routines. However, the interrupt presenters are still handled in the higher level pic_print_info() routine because they are not related to the chip. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190306085032.15744-9-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc/pnv: introduce a new dt_populate() operation to the chip modelCédric Le Goater
The POWER9 and POWER8 processors have a different set of devices and a different device tree layout. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190306085032.15744-8-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc/pnv: add a XIVE interrupt controller model for POWER9Cédric Le Goater
This is a simple model of the POWER9 XIVE interrupt controller for the PowerNV machine which only addresses the needs of the skiboot firmware. The PowerNV model reuses the common XIVE framework developed for sPAPR as the fundamentals aspects are quite the same. The difference are outlined below. The controller initial BAR configuration is performed using the XSCOM bus from there, MMIO are used for further configuration. The MMIO regions exposed are : - Interrupt controller registers - ESB pages for IPIs and ENDs - Presenter MMIO (Not used) - Thread Interrupt Management Area MMIO, direct and indirect The virtualization controller MMIO region containing the IPI ESB pages and END ESB pages is sub-divided into "sets" which map portions of the VC region to the different ESB pages. These are modeled with custom address spaces and the XiveSource and XiveENDSource objects are sized to the maximum allowed by HW. The memory regions are resized at run-time using the configuration of EDT set translation table provided by the firmware. The XIVE virtualization structure tables (EAT, ENDT, NVTT) are now in the machine RAM and not in the hypervisor anymore. The firmware (skiboot) configures these tables using Virtual Structure Descriptor defining the characteristics of each table : SBE, EAS, END and NVT. These are later used to access the virtual interrupt entries. The internal cache of these tables in the interrupt controller is updated and invalidated using a set of registers. Still to address to complete the model but not fully required is the support for block grouping. Escalation support will be necessary for KVM guests. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190306085032.15744-7-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc/pnv: change the CPU machine_data presenter type to Object *Cédric Le Goater
The POWER9 PowerNV machine will use a XIVE interrupt presenter type. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190306085032.15744-6-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc/pnv: export the xive_router_notify() routineCédric Le Goater
The PowerNV machine with need to encode the block id in the source interrupt number before forwarding the source event notification to the Router. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190306085032.15744-5-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc/xive: export the TIMA memory accessorsCédric Le Goater
The PowerNV machine can perform indirect loads and stores on the TIMA on behalf of another CPU. Give the controller the possibility to call the TIMA memory accessors with a XiveTCTX of its choice. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190306085032.15744-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12ppc: externalize ppc_get_vcpu_by_pir()Cédric Le Goater
We will use it to get the CPU interrupt presenter in XIVE when the TIMA is accessed from the indirect page. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190306085032.15744-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12spapr: Force SPAPR_MEMORY_BLOCK_SIZE to be a hwaddr (64-bit)David Gibson
SPAPR_MEMORY_BLOCK_SIZE is logically a difference in memory addresses, and hence of type hwaddr which is 64-bit. Previously it wasn't marked as such which means that it could be treated as 32-bit. That will work in some circumstances but if multiplied by another 32-bit value it could lead to a 32-bit overflow and an incorrect result. One specific instance of this in spapr_lmb_dt_populate() was spotted by Coverity (CID 1399145). Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12target/ppc/spapr: Add SPAPR_CAP_CCF_ASSISTSuraj Jitindar Singh
Introduce a new spapr_cap SPAPR_CAP_CCF_ASSIST to be used to indicate the requirement for a hw-assisted version of the count cache flush workaround. The count cache flush workaround is a software workaround which can be used to flush the count cache on context switch. Some revisions of hardware may have a hardware accelerated flush, in which case the software flush can be shortened. This cap is used to set the availability of such hardware acceleration for the count cache flush routine. The availability of such hardware acceleration is indicated by the H_CPU_CHAR_BCCTR_FLUSH_ASSIST flag being set in the characteristics returned from the KVM_PPC_GET_CPU_CHAR ioctl. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190301031912.28809-2-sjitindarsingh@gmail.com> [dwg: Small style fixes] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12target/ppc/spapr: Add workaround option to SPAPR_CAP_IBSSuraj Jitindar Singh
The spapr_cap SPAPR_CAP_IBS is used to indicate the level of capability for mitigations for indirect branch speculation. Currently the available values are broken (default), fixed-ibs (fixed by serialising indirect branches) and fixed-ccd (fixed by diabling the count cache). Introduce a new value for this capability denoted workaround, meaning that software can work around the issue by flushing the count cache on context switch. This option is available if the hypervisor sets the H_CPU_BEHAV_FLUSH_COUNT_CACHE flag in the cpu behaviours returned from the KVM_PPC_GET_CPU_CHAR ioctl. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190301031912.28809-1-sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-03-12target/ppc/spapr: Add SPAPR_CAP_LARGE_DECREMENTERSuraj Jitindar Singh
Add spapr_cap SPAPR_CAP_LARGE_DECREMENTER to be used to control the availability of the large decrementer for a guest. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20190301024317.22137-1-sjitindarsingh@gmail.com> [dwg: Trivial style fix] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr: add hotplug hooks for PHB hotplugGreg Kurz
Hotplugging PHBs is a machine-level operation, but PHBs reside on the main system bus, so we register spapr machine as the handler for the main system bus. Provide the usual pre-plug, plug and unplug-request handlers. Move the checking of the PHB index to the pre-plug handler. It is okay to do that and assert in the realize function because the pre-plug handler is always called, even for the oldest machine types we support. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> (Fixed interrupt controller phandle in "interrupt-map" and TCE table size in "ibm,dma-window" FDT fragment, Greg Kurz) Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059672926.1466090.13612804072190051439.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr: create DR connectors for PHBsMichael Roth
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059670389.1466090.10015601248906623076.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr_irq: Expose the phandle of the interrupt controllerGreg Kurz
This will be used by PHB hotplug in order to create the "interrupt-map" property of the PHB node. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059669374.1466090.12943228478046223856.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr: Expose the name of the interrupt controller nodeGreg Kurz
This will be needed by PHB hotplug in order to access the "phandle" property of the interrupt controller node. Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <155059668867.1466090.6339199751719123386.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26xics: Write source state to KVM at claim timeGreg Kurz
The pseries machine only uses LSIs to support legacy PCI devices. Every PHB claims 4 LSIs at realize time. When using in-kernel XICS (or upcoming in-kernel XIVE), QEMU synchronizes the state of all irqs, including these LSIs, later on at machine reset. In order to support PHB hotplug, we need a way to tell KVM about the LSIs that doesn't require a machine reset. An easy way to do that is to always inform KVM when an interrupt is claimed, which really isn't a performance path. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059668360.1466090.5969630516627776426.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr/drc: Drop spapr_drc_attach() fdt argumentGreg Kurz
All DRC subtypes have been converted to generate the FDT fragment at configure connector time instead of attach time. The fdt and fdt_offset arguments of spapr_drc_attach() aren't needed anymore. Drop them and make the implementation of the dt_populate() method mandatory. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059667853.1466090.16527852453054217565.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr: Generate FDT fragment for CPUs at configure connector timeGreg Kurz
Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059666839.1466090.3833376527523126752.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr: Generate FDT fragment for LMBs at configure connector timeGreg Kurz
Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059666331.1466090.6766540766297333313.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26spapr_drc: Allow FDT fragment to be added laterGreg Kurz
The current logic is to provide the FDT fragment when attaching a device to a DRC. This works perfectly fine for our current hotplug support, but soon we will add support for PHB hotplug which has some constraints, that CPU, PCI and LMB devices don't seem to have. The first constraint is that the "ibm,dma-window" property of the PHB node requires the IOMMU to be configured, ie, spapr_tce_table_enable() has been called, which happens during PHB reset. It is okay in the case of hotplug since the device is reset before the hotplug handler is called. On the contrary with coldplug, the hotplug handler is called first and device is only reset during the initial system reset. Trying to create the FDT fragment on the hotplug path in this case, would result in somthing like this: ibm,dma-window = < 0x80000000 0x00 0x00 0x00 0x00 >; This will cause linux in the guest to panic, by simply removing and re-adding the PHB using the drmgr command: page = alloc_pages_node(nid, GFP_KERNEL, get_order(sz)); if (!page) panic("iommu_init_table: Can't allocate %ld bytes\n", sz); The second and maybe more problematic constraint is that the "interrupt-map" property needs to reference the interrupt controller node using the very same phandle that SLOF has already exposed to the guest. QEMU requires SLOF to call the private KVMPPC_H_UPDATE_DT hcall at some point to know about this phandle. With the latest QEMU and SLOF, this happens when SLOF gets quiesced. This means that if the PHB gets hotplugged after CAS but before SLOF quiesce, then we're sure that the phandle is not known when the hotplug handler is called. The FDT is only needed when the guest first invokes RTAS to configure the connector actually, long after SLOF quiesce. Let's postpone the creation of FDT fragments for PHBs to rtas_ibm_configure_connector(). Since we only need this for PHBs, introduce a new method in the base DRC class for that. DRC subtypes will be converted to use it in subsequent patches. Allow spapr_drc_attach() to be passed a NULL fdt argument if the method is available. When all DRC subtypes have been converted, the fdt argument will eventually disappear. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155059665823.1466090.18358845122627355537.stgit@bahia.lab.toulouse-stg.fr.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26target/ppc/spapr: Set LPCR:HR when using Radix modeBenjamin Herrenschmidt
The HW relies on LPCR:HR along with the PATE to determine whether to use Radix or Hash mode. In fact it uses LPCR:HR more commonly than the PATE. For us, it's also more efficient to do so, especially since unlike the HW we do not maintain a cache of the current PATE and HV PATE in a generic place. Prepare the grounds for that by ensuring that LPCR:HR is set properly on SPAPR machines. Another option would have been to use a callback to get the PATE but this gets messy when implementing bare metal support, it's much simpler (and faster) to use LPCR. Since existing migration streams may not have it, fix it up in spapr_post_load() as well based on the pseudo-PATE entry that we keep. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215170029.15641-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26ppc: add host-serial and host-model machine attributes (CVE-2019-8934)Prasad J Pandit
On ppc hosts, hypervisor shares following system attributes - /proc/device-tree/system-id - /proc/device-tree/model with a guest. This could lead to information leakage and misuse.[*] Add machine attributes to control such system information exposure to a guest. [*] https://wiki.openstack.org/wiki/OSSN/OSSN-0028 Reported-by: Daniel P. Berrangé <berrange@redhat.com> Fix-suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Message-Id: <20190218181349.23885-1-ppandit@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-26target/ppc: Add POWER9 external interrupt modelBenjamin Herrenschmidt
Adds support for the Hypervisor directed interrupts in addition to the OS ones. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [clg: - modified the icp_realize() and xive_tctx_realize() to take into account explicitely the POWER9 interrupt model - introduced a specific power9_set_irq for POWER9 ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190215161648.9600-10-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18xics: Drop the KVM ICS classGreg Kurz
The KVM ICS class isn't used anymore. Drop it. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155023084177.1011724.14693955932559990358.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18xics: Handle KVM interrupt presentation from "simple" ICS codeGreg Kurz
We want to use the "simple" ICS type in both KVM and non-KVM setups. Teach the "simple" ICS how to present interrupts to KVM and adapt sPAPR accordingly. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155023082996.1011724.16237920586343905010.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18xics: Explicitely call KVM ICS methods from the common codeGreg Kurz
The pre_save(), post_load() and synchronize_state() methods of the ICSStateClass type are really KVM only things. Make that obvious by dropping the indirections and directly calling the KVM functions instead. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155023081817.1011724.14078777320394028836.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18xics: Drop the KVM ICP classGreg Kurz
The KVM ICP class isn't used anymore. Drop it. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155023081228.1011724.12474992370439652538.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18spapr/irq: Use the base ICP class for KVMGreg Kurz
The base ICP class knows how to interact with KVM. Adapt sPAPR to use it instead of the ICP KVM class. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155023080638.1011724.792095453419098948.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18xics: Handle KVM ICP realize from the common codeGreg Kurz
The realization of KVM ICP currently follows the parent_realize logic, which is a bit overkill here. Also we want to get rid of the KVM ICP class. Explicitely call icp_kvm_realize() from the base ICP realize function. Note that ICPStateClass::parent_realize is retained because powernv needs it. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155023080049.1011724.15423463482790260696.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18xics: Handle KVM ICP reset from the common codeGreg Kurz
The KVM ICP reset handler simply writes the ICP state to KVM. This doesn't need the overkill parent_reset logic we have today. Call icp_set_kvm_state() from the base ICP reset function instead. Since there are no other users for ICPStateClass::parent_reset, and it isn't currently expected to change, drop it as well. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155023079461.1011724.12644984391500635645.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-18xics: Explicitely call KVM ICP methods from the common codeGreg Kurz
The pre_save(), post_load() and synchronize_state() methods of the ICPStateClass type are really KVM only things. Make that obvious by dropping the indirections and directly calling the KVM functions instead. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <155023078871.1011724.3083923389814185598.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-17spapr/irq: add an 'nr_irq' parameter to initialize the backend.Cédric Le Goater
When using the 'dual' interrupt mode, the source numbers of both sPAPR IRQ backends are aligned to share a common IRQ number space and to use a similar mapping of the machine qemu_irq array which is indexed by the source number. The XICS IRQ number range initially being [ 0x1000 - 0x2000 ], this requires to change the XICS ICSState offset to 0 and to provision for an extra 4K of source numbers and qemu_irqs which will never be used by the machine when running under the XICS interrupt mode. This is not an optimal solution. Change the init() method to allocate an IRQ number space of the expected size for the XICS sPAPR IRQ backend. It breaks the interrupt signaling when under the 'dual' mode because source numbers have unexpected values but next patch will fix that. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190213210756.27032-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-17xive: Only set source type for LSIsGreg Kurz
MSI is the default and LSI specific code is guarded by the xive_source_irq_is_lsi() helper. The xive_source_irq_set() helper is a nop for MSIs. Simplify the code by turning xive_source_irq_set() into xive_source_irq_set_lsi() and only call it for LSIs. The call to xive_source_irq_set(false) in spapr_xive_irq_free() is also a nop. Just drop it. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <154999584656.690774.18352404495120358613.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>