aboutsummaryrefslogtreecommitdiff
path: root/include/hw/ppc/spapr.h
AgeCommit message (Collapse)Author
2016-10-28spapr: Add DRC count indexed hotplug identifier typeBharata B Rao
Add support for DRC count indexed hotplug ID type which is primarily needed for memory hot unplug. This type allows for specifying the number of DRs that should be plugged/unplugged starting from a given DRC index. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> * updated rtas_event_log_v6_hp to reflect count/index field ordering used in PAPR hotplug ACR Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28spapr_events: add support for dedicated hotplug event sourceMichael Roth
Hotplug events were previously delivered using an EPOW interrupt and were queued by linux guests into a circular buffer. For traditional EPOW events like shutdown/resets, this isn't an issue, but for hotplug events there are cases where this buffer can be exhausted, resulting in the loss of hotplug events, resets, etc. Newer-style hotplug event are delivered using a dedicated event source. We enable this in supported guests by adding standard an additional event source in the guest device-tree via /event-sources, and, if the guest advertises support for the newer-style hotplug events, using the corresponding interrupt to signal the available of hotplug/unplug events. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28spapr: add option vector handling in CAS-generated resetsMichael Roth
In some cases, ibm,client-architecture-support calls can fail. This could happen in the current code for situations where the modified device tree segment exceeds the buffer size provided by the guest via the call parameters. In these cases, QEMU will reset, allowing an opportunity to regenerate the device tree from scratch via boot-time handling. There are potentially other scenarios as well, not currently reachable in the current code, but possible in theory, such as cases where device-tree properties or nodes need to be removed. We currently don't handle either of these properly for option vector capabilities however. Instead of carrying the negotiated capability beyond the reset and creating the boot-time device tree accordingly, we start from scratch, generating the same boot-time device tree as we did prior to the CAS-generated and the same device tree updates as we did before. This could (in theory) cause us to get stuck in a reset loop. This hasn't been observed, but depending on the extensiveness of CAS-induced device tree updates in the future, could eventually become an issue. Address this by pulling capability-related device tree updates resulting from CAS calls into a common routine, spapr_dt_cas_updates(), and adding an sPAPROptionVector* parameter that allows us to test for newly-negotiated capabilities. We invoke it as follows: 1) When ibm,client-architecture-support gets called, we call spapr_dt_cas_updates() with the set of capabilities added since the previous call to ibm,client-architecture-support. For the initial boot, or a system reset generated by something other than the CAS call itself, this set will consist of *all* options supported both the platform and the guest. For calls to ibm,client-architecture-support immediately after a CAS-induced reset, we call spapr_dt_cas_updates() with only the set of capabilities added since the previous call, since the other capabilities will have already been addressed by the boot-time device-tree this time around. In the unlikely event that capabilities are *removed* since the previous CAS, we will generate a CAS-induced reset. In the unlikely event that we cannot fit the device-tree updates into the buffer provided by the guest, well generate a CAS-induced reset. 2) When a CAS update results in the need to reset the machine and include the updates in the boot-time device tree, we call the spapr_dt_cas_updates() using the full set of negotiated capabilities as part of the reset path. At initial boot, or after a reset generated by something other than the CAS call itself, this set will be empty, resulting in what should be the same boot-time device-tree as we generated prior to this patch. For CAS-induced reset, this routine will be called with the full set of capabilities negotiated by the platform/guest in the previous CAS call, which should result in CAS updates from previous call being accounted for in the initial boot-time device tree. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Changed an int -> bool conversion to be more explicit] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28spapr_hcall: use spapr_ovec_* interfaces for CAS optionsMichael Roth
Currently we access individual bytes of an option vector via ldub_phys() to test for the presence of a particular capability within that byte. Currently this is only done for the "dynamic reconfiguration memory" capability bit. If that bit is present, we pass a boolean value to spapr_h_cas_compose_response() to generate a modified device tree segment with the additional properties required to enable this functionality. As more capability bits are added, will would need to modify the code to add additional option vector accesses and extend the param list for spapr_h_cas_compose_response() to include similar boolean values for these parameters. Avoid this by switching to spapr_ovec_* helpers so we can do all the parsing in one shot and then test for these additional bits within spapr_h_cas_compose_response() directly. Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28pseries: Remove spapr_create_fdt_skel()David Gibson
For historical reasons construction of the guest device tree in spapr is divided between spapr_create_fdt_skel() which is called at init time, and spapr_build_fdt() which runs at reset time. Over time, more and more things have needed to be moved to reset time. Previous cleanups mean the only things left in spapr_create_fdt_skel() are the properties of the root node itself. Finish consolidating these two parts of device tree construction, by moving this to the start of spapr_build_fdt(), and removing spapr_create_fdt_skel() entirely. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28pseries: Move /event-sources construction to spapr_build_fdt()David Gibson
The /event-sources device tree node is built from spapr_create_fdt_skel(). As part of consolidating device tree construction to reset time, this moves it to spapr_build_fdt(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28pseries: Consolidate construction of /rtas device tree nodeDavid Gibson
For historical reasons construction of the /rtas node in the device tree (amongst others) is split into several places. In particular it's split between spapr_create_fdt_skel(), spapr_build_fdt() and spapr_rtas_device_tree_setup(). In fact, as well as adding the actual RTAS tokens to the device tree, spapr_rtas_device_tree_setup() just adds the ibm,lrdr-capacity property, which despite going in the /rtas node, doesn't have a lot to do with RTAS. This patch consolidates the code constructing /rtas together into a new spapr_dt_rtas() function. spapr_rtas_device_tree_setup() is renamed to spapr_dt_rtas_tokens() and now only adds the token properties. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28pseries: Consolidate RTAS loadingDavid Gibson
At each system reset, the pseries machine needs to load RTAS (the runtime portion of the guest firmware) into the VM. This means copying the actual RTAS code into guest memory, and also updating the device tree so that the guest OS and boot firmware can locate it. For historical reasons the copy and update to the device tree were in different parts of the code. This cleanup brings them both together in an spapr_load_rtas() function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28pseries: Make spapr_create_fdt_skel() get information from machine stateDavid Gibson
Currently spapr_create_fdt_skel() takes a bunch of individual parameters for various things it will put in the device tree. Some of these can already be taken directly from sPAPRMachineState. This patch alters it so that all of them can be taken from there, which will allow this code to be moved away from its current caller in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-28pseries: Remove rtas_addr and fdt_addr fields from machinestateDavid Gibson
These values are used only within ppc_spapr_reset(), so just change them to local variables. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2016-10-16spapr_pci: Add a 64-bit MMIO windowDavid Gibson
On real hardware, and under pHyp, the PCI host bridges on Power machines typically advertise two outbound MMIO windows from the guest's physical memory space to PCI memory space: - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space - A 64-bit window which maps onto a large region somewhere high in PCI address space (traditionally this used an identity mapping from guest physical address to PCI address, but that's not always the case) The qemu implementation in spapr-pci-host-bridge, however, only supports a single outbound MMIO window, however. At least some Linux versions expect the two windows however, so we arranged this window to map onto the PCI memory space from 2 GiB..~64 GiB, then advertised it as two contiguous windows, the "32-bit" window from 2G..4G and the "64-bit" window from 4G..~64G. This approach means, however, that the 64G window is not naturally aligned. In turn this limits the size of the largest BAR we can map (which does have to be naturally aligned) to roughly half of the total window. With some large nVidia GPGPU cards which have huge memory BARs, this is starting to be a problem. This patch adds true support for separate 32-bit and 64-bit outbound MMIO windows to the spapr-pci-host-bridge implementation, each of which can be independently configured. The 32-bit window always maps to 2G.. in PCI space, but the PCI address of the 64-bit window can be configured (it defaults to the same as the guest physical address). So as not to break possible existing configurations, as long as a 64-bit window is not specified, a large single window can be specified. This will appear the same way to the guest as the old approach, although it's now implemented by two contiguous memory regions rather than a single one. For now, this only adds the possibility of 64-bit windows. The default configuration still uses the legacy mode. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16spapr_pci: Delegate placement of PCI host bridges to machine typeDavid Gibson
The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB) for a PAPR guest. Unlike on x86, it's routine on Power (both bare metal and PAPR guests) to have numerous independent PHBs, each controlling a separate PCI domain. There are two ways of configuring the spapr-pci-host-bridge device: first it can be done fully manually, specifying the locations and sizes of all the IO windows. This gives the most control, but is very awkward with 6 mandatory parameters. Alternatively just an "index" can be specified which essentially selects from an array of predefined PHB locations. The PHB at index 0 is automatically created as the default PHB. The current set of default locations causes some problems for guests with large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia GPGPU cards via VFIO). Obviously, for migration we can only change the locations on a new machine type, however. This is awkward, because the placement is currently decided within the spapr-pci-host-bridge code, so it breaks abstraction to look inside the machine type version. So, this patch delegates the "default mode" PHB placement from the spapr-pci-host-bridge device back to the machine type via a public method in sPAPRMachineClass. It's still a bit ugly, but it's about the best we can do. For now, this just changes where the calculation is done. It doesn't change the actual location of the host bridges, or any other behaviour. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-06hw/ppc/spapr: Use POWER8 by default for the pseries-2.8 machineThomas Huth
A couple of distributors are compiling their distributions with "-mcpu=power8" for ppc64le these days, so the user sooner or later runs into a crash there when not explicitely specifying the "-cpu POWER8" option to QEMU (which is currently using POWER7 for the "pseries" machine by default). Due to this reason, the linux-user target already switched to POWER8 a while ago (see commit de3f1b98410e0d5b406a0df3a48547b559d18602). Since the softmmu target of course has the same problem, we should switch there to POWER8 for the newer machine types, too. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-15Remove unused function declarationsLadi Prosek
Unused function declarations were found using a simple gcc plugin and manually verified by grepping the sources. Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-08-08spapr: Correctly set query_hotpluggable_cpus hook based on machine versionDavid Gibson
Prior to c8721d3 "spapr: Error out when CPU hotplug is attempted on older pseries machines", attempting to use query-hotpluggable-cpus on pseries-2.6 and earlier machine types would SEGV. That change fixed that, but due to some unexpected interactions in init order and a brown-paper-bag worthy failure to test, it accidentally disabled query-hotpluggable-cpus for all pseries machine types, including the current one which should allow it. In fact, query_hotpluggable_cpus needs to be non-NULL when and only when the dr_cpu_enabled flag in sPAPRMachineClass is set, which makes dr_cpu_enabled itself redundant. This patch removes dr_cpu_enabled, instead directly setting query_hotpluggable_cpus from the machine class_init functions, and using that to determine the availability of CPU hotplug when necessary. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-12Clean up ill-advised or unusual header guardsMarkus Armbruster
Cleaned up with scripts/clean-header-guards.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-07-05spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)Alexey Kardashevskiy
This adds support for Dynamic DMA Windows (DDW) option defined by the SPAPR specification which allows to have additional DMA window(s) The "ddw" property is enabled by default on a PHB but for compatibility the pseries-2.6 machine and older disable it. This also creates a single DMA window for the older machines to maintain backward migration. This implements DDW for PHB with emulated and VFIO devices. The host kernel support is required. The advertised IOMMU page sizes are 4K and 64K; 16M pages are supported but not advertised by default, in order to enable them, the user has to specify "pgsz" property for PHB and enable huge pages for RAM. The existing linux guests try creating one additional huge DMA window with 64K or 16MB pages and map the entire guest RAM to. If succeeded, the guest switches to dma_direct_ops and never calls TCE hypercalls (H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM and not waste time on map/unmap later. This adds a "dma64_win_addr" property which is a bus address for the 64bit window and by default set to 0x800.0000.0000.0000 as this is what the modern POWER8 hardware uses and this allows having emulated and VFIO devices on the same bus. This adds 4 RTAS handlers: * ibm,query-pe-dma-window * ibm,create-pe-dma-window * ibm,remove-pe-dma-window * ibm,reset-pe-dma-window These are registered from type_init() callback. These RTAS handlers are implemented in a separate file to avoid polluting spapr_iommu.c with PCI. This changes sPAPRPHBState::dma_liobn to an array to allow 2 LIOBNs and updates all references to dma_liobn. However this does not add 64bit LIOBN to the migration stream as in fact even 32bit LIOBN is rather pointless there (as it is a PHB property and the management software can/should pass LIOBNs via CLI) but we keep it for the backward migration support. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-01ppc/xics: Replace "icp" with "xics" in most placesBenjamin Herrenschmidt
The "ICP" is a different object than the "XICS". For historical reasons, we have a number of places where we name a variable "icp" while it contains a XICSState pointer. There *is* an ICPState structure too so this makes the code really confusing. This is a mechanical replacement of all those instances to use the name "xics" instead. There should be no functional change. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [spapr_cpu_init has been moved to spapr_cpu_core.c, change there] Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17spapr: CPU hotplug supportBharata B Rao
Set up device tree entries for the hotplugged CPU core and use the exising RTAS event logging infrastructure to send CPU hotplug notification to the guest. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17spapr: convert boot CPUs into CPU core devicesBharata B Rao
Introduce sPAPRMachineClass.dr_cpu_enabled to indicate support for CPU core hotplug. Initialize boot time CPUs as core deivces and prevent topologies that result in partially filled cores. Both of these are done only if CPU core hotplug is supported. Note: An unrelated change in the call to xics_system_init() is done in this patch as it makes sense to use the local variable smt introduced in this patch instead of kvmppc_smt_threads() call here. TODO: We derive sPAPR core type by looking at -cpu <model>. However we don't take care of "compat=" feature yet for boot time as well as hotplug CPUs. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17spapr: Move spapr_cpu_init() to spapr_cpu_core.cBharata B Rao
Start consolidating CPU init related routines in spapr_cpu_core.c. As part of this, move spapr_cpu_init() and its dependencies from spapr.c to spapr_cpu_core.c No functionality change in this patch. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> [dwg: Rename TIMEBASE_FREQ to SPAPR_TIMEBASE_FREQ, since it's now in a public(ish) header] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-17spapr: Abstract CPU core device and type specific core devicesBharata B Rao
Add sPAPR specific abastract CPU core device that is based on generic CPU core device. Use this as base type to create sPAPR CPU specific core devices. TODO: - Add core types for other remaining CPU types - Handle CPU model alias correctly Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-14spapr: Ensure all LMBs are represented in ibm,dynamic-memoryBharata B Rao
Memory hotplug can fail for some combinations of RAM and maxmem when DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends on maximum addressable memory returned by guest and this value is currently being calculated wrongly by the guest kernel routine memory_hotplug_max(). While there is an attempt to fix the guest kernel, this patch works around the problem within QEMU itself. memory_hotplug_max() routine in the guest kernel arrives at max addressable memory by multiplying lmb-size with the lmb-count obtained from ibm,dynamic-memory property. There are two assumptions here: - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM where only hot-pluggable LMBs are present in this property. - The memory area comprising of RAM and hotplug region is contiguous: This needn't be true always for PowerKVM as there can be gap between boot time RAM and hotplug region. To work around this guest kernel bug, ensure that ibm,dynamic-memory has information about all the LMBs (RMA, boot-time LMBs, future hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and hotpluggable region). RMA is represented separately by memory@0 node. Hence mark RMA LMBs and also the LMBs for the gap b/n RAM and hotpluggable region as reserved and as having no valid DRC so that these LMBs are not considered by the guest. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-07spapr_iommu: Add root memory regionAlexey Kardashevskiy
We are going to have multiple DMA windows at different offsets on a PCI bus. For the sake of migration, we will have as many TCE table objects pre-created as many windows supported. So we need a way to map windows dynamically onto a PCI bus when migration of a table is completed but at this stage a TCE table object does not have access to a PHB to ask it to map a DMA window backed by just migrated TCE table. This adds a "root" memory region (UINT64_MAX long) to the TCE object. This new region is mapped on a PCI bus with enabled overlapping as there will be one root MR per TCE table, each of them mapped at 0. The actual IOMMU memory region is a subregion of the root region and a TCE table enables/disables this subregion and maps it at the specific offset inside the root MR which is 1:1 mapping of a PCI address space. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-07spapr_iommu: Migrate full stateAlexey Kardashevskiy
The source guest could have reallocated the default TCE table and migrate bigger/smaller table. This adds reallocation in post_load() if the default table size is different on source and destination. This adds @bus_offset, @page_shift to the migration stream as a subsection so when DDW is added, migration to older machines will still be possible. As @bus_offset and @page_shift are not used yet, this makes no change in behavior. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-07spapr_iommu: Introduce "enabled" state for TCE tableAlexey Kardashevskiy
Currently TCE tables are created once at start and their sizes never change. We are going to change that by introducing a Dynamic DMA windows support where DMA configuration may change during the guest execution. This changes spapr_tce_new_table() to create an empty zero-size IOMMU memory region (IOMMU MR). Only LIOBN is assigned by the time of creation. It still will be called once at the owner object (VIO or PHB) creation. This introduces an "enabled" state for TCE table objects, some helper functions are added: - spapr_tce_table_enable() receives TCE table parameters, stores in sPAPRTCETable and allocates a guest view of the TCE table (in the user space or KVM) and sets the correct size on the IOMMU MR; - spapr_tce_table_disable() disposes the table and resets the IOMMU MR size; it is made public as the following DDW code will be using it. This changes the PHB reset handler to do the default DMA initialization instead of spapr_phb_realize(). This does not make differenct now but later with more than just one DMA window, we will have to remove them all and create the default one on a system reset. No visible change in behaviour is expected except the actual table will be reallocated every reset. We might optimize this later. The other way to implement this would be dynamically create/remove the TCE table QOM objects but this would make migration impossible as the migration code expects all QOM objects to exist at the receiver so we have to have TCE table objects created when migration begins. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-04-05ppc: Rework POWER7 & POWER8 exception modelCédric Le Goater
From: Benjamin Herrenschmidt <benh@kernel.crashing.org> This patch fixes the current AIL implementation for POWER8. The interrupt vector address can be calculated directly from LPCR when the exception is handled. The excp_prefix update becomes useless and we can cleanup the H_SET_MODE hcall. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [clg: Removed LPES0/1 handling for HV vs. !HV Fixed LPCR_ILE case for POWERPC_EXCP_POWER8 ] Signed-off-by: Cédric Le Goater <clg@fr.ibm.com> [dwg: This was written as a cleanup, but it also fixes a real bug where setting an alternative interrupt location would not be correctly migrated] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-02-17pseries: Simplify handling of the hash page table fdDavid Gibson
When migrating the 'pseries' machine type with KVM, we use a special fd to access the hash page table stored within KVM. Usually, this fd is opened at the beginning of migration, and kept open until the migration is complete. However, if there is a guest reset during the migration, the fd can become stale and we need to re-open it. At the moment we use an 'htab_fd_stale' flag in sPAPRMachineState to signal this, which is checked in the migration iterators. But that's rather ugly. It's simpler to just close and invalidate the fd on reset, and lazily re-open it in migration if necessary. This patch implements that change. This requires a small addition to the machine state's instance_init, so that htab_fd is initialized to -1 (telling the migration code it needs to open it) instead of 0, which could be a valid fd. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-01-30spapr: Remove rtas_st_buffer_direct()David Gibson
rtas_st_buffer_direct() is a not particularly useful wrapper around cpu_physical_memory_write(). All the callers are in rtas_ibm_configure_connector, where it's better handled by local helper. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-01-30spapr: Small fixes to rtas_ibm_get_system_parameter, remove rtas_st_bufferDavid Gibson
rtas_st_buffer() appears in spapr.h as though it were a widely used helper, but in fact it is only used for saving data in a format used by rtas_ibm_get_system_parameter(). This changes it to a local helper more specifically for that function. While we're there fix a couple of small defects in rtas_ibm_get_system_parameter: - For the string value SPLPAR_CHARACTERISTICS, it wasn't including the terminating \0 in the length which it should according to LoPAPR 7.3.16.1 - It now checks that the supplied buffer has at least enough space for the length of the returned data, and returns an error if it does not. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-01-11hw/ppc/spapr: Use XHCI as host controller for new spapr machinesThomas Huth
The OHCI has some bugs and performance issues, so for newer machines it's preferable to use XHCI instead. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23spapr_iommu: Provide a function to switch a TCE table to allowing VFIODavid Gibson
Because of the way non-VFIO guest IOMMU operations are KVM accelerated, not all TCE tables (guest IOMMU contexts) can support VFIO devices. Currently, this is decided at creation time. To support hotplug of VFIO devices, we need to allow a TCE table which previously didn't allow VFIO devices to be switched so that it can. This patch adds an spapr_tce_set_need_vfio() function to do this, by reallocating the table in userspace if necessary. Currently this doesn't allow the KVM acceleration to be re-enabled if all the VFIO devices are removed. That's an optimization for another time. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2015-10-23spapr_iommu: Rename vfio_accel parameterDavid Gibson
The vfio_accel parameter used when creating a new TCE table (guest IOMMU context) has a confusing name. What it really means is whether we need the TCE table created to be able to support VFIO devices. VFIO is relevant, because when available we use in-kernel acceleration of the TCE table, but that may not work with VFIO devices because updates to the table are handled in kernel, bypass qemu and so don't hit qemu's infrastructure for keeping the VFIO host IOMMU state in sync with the guest IOMMU state. Rename the parameter to "need_vfio" throughout. This is a cosmetic change, with no impact on the logic. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2015-09-23ppc/spapr: Implement H_RANDOM hypercall in QEMUThomas Huth
The PAPR interface defines a hypercall to pass high-quality hardware generated random numbers to guests. Recent kernels can already provide this hypercall to the guest if the right hardware random number generator is available. But in case the user wants to use another source like EGD, or QEMU is running with an older kernel, we should also have this call in QEMU, so that guests that do not support virtio-rng yet can get good random numbers, too. This patch now adds a new pseudo-device to QEMU that either directly provides this hypercall to the guest or is able to enable the in-kernel hypercall if available. The in-kernel hypercall can be enabled with the use-kvm property, e.g.: qemu-system-ppc64 -device spapr-rng,use-kvm=true For handling the hypercall in QEMU instead, a "RngBackend" is required since the hypercall should provide "good" random data instead of pseudo-random (like from a "simple" library function like rand() or g_random_int()). Since there are multiple RngBackends available, the user must select an appropriate back-end via the "rng" property of the device, e.g.: qemu-system-ppc64 -object rng-random,filename=/dev/hwrng,id=gid0 \ -device spapr-rng,rng=gid0 ... See http://wiki.qemu-project.org/Features-Done/VirtIORNG for other example of specifying RngBackends. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-09-23spapr: Support hotplug by specifying DRC countBharata B Rao
Support hotplug identifier type RTAS_LOG_V6_HP_ID_DRC_COUNT that allows hotplugging of DRCs by specifying the DRC count. While we are here, rename spapr_hotplug_req_add_event() to spapr_hotplug_req_add_by_index() spapr_hotplug_req_remove_event() to spapr_hotplug_req_remove_by_index() so that they match with spapr_hotplug_req_add_by_count(). Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-09-23spapr: Support ibm,dynamic-reconfiguration-memoryBharata B Rao
Parse ibm,architecture.vec table obtained from the guest and enable memory node configuration via ibm,dynamic-reconfiguration-memory if guest supports it. This is in preparation to support memory hotplug for sPAPR guests. This changes the way memory node configuration is done. Currently all memory nodes are built upfront. But after this patch, only memory@0 node for RMA is built upfront. Guest kernel boots with just that and rest of the memory nodes (via memory@XXX or ibm,dynamic-reconfiguration-memory) are built when guest does ibm,client-architecture-support call. Note: This patch needs a SLOF enhancement which is already part of SLOF binary in QEMU. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-09-23spapr: Add LMB DR connectorsDavid Gibson
Enable memory hotplug for pseries 2.4 and add LMB DR connectors. With memory hotplug, enforce RAM size, NUMA node memory size and maxmem to be a multiple of SPAPR_MEMORY_BLOCK_SIZE (256M) since that's the granularity in which LMBs are represented and hot-added. LMB DR connectors will be used by the memory hotplug code. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> [spapr_drc_reset implementation] [since this missed the 2.4 cutoff, changing to only enable for 2.5] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-09-23spapr: Initialize hotplug memory address spaceBharata B Rao
Initialize a hotplug memory region under which all the hotplugged memory is accommodated. Also enable memory hotplug by setting CONFIG_MEM_HOTPLUG. Modelled on i386 memory hotplug. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-09-23spapr_drc: don't allow 'empty' DRCs to be unisolated or allocatedMichael Roth
Logical resources start with allocation-state:UNUSABLE / isolation-state:ISOLATED. During hotplug, guests will transition them to allocation-state:USABLE, and then to isolation-state:UNISOLATED. For cases where we cannot transition to allocation-state:USABLE, in this case due to no device/resource being association with the logical DRC, we should return an error -3. For physical DRCs, we default to allocation-state:USABLE and stay there, so in this case we should report an error -3 when the guest attempts to make the isolation-state:ISOLATED transition for a DRC with no device associated. These are as documented in PAPR 2.7, 13.5.3.4. We also ensure allocation-state:USABLE when the guest attempts transition to isolation-state:UNISOLATED to deal with misbehaving guests attempting to bring online an unallocated logical resource. This is as documented in PAPR 2.7, 13.7. Currently we implement no such error logic. Fix this by handling these error cases as PAPR defines. Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-09-23sPAPR: Introduce rtas_ldq()Gavin Shan
This introduces rtas_ldq() to load 64-bits parameter from continuous two 4-bytes memory chunk of RTAS parameter buffer, to simplify the code. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-09-23ppc/spapr: Use qemu_log_mask() for hcall_dprintf()Thomas Huth
To see the output of the hcall_dprintf statements, you currently have to enable the DEBUG_SPAPR_HCALLS macro in include/hw/ppc/spapr.h. This is ugly because a) not every user who wants to debug guest problems can or wants to recompile QEMU to be able to see such issues, and b) since this macro is disabled by default, the code in the hcall_dprintf() brackets tends to bitrot until somebody temporarily enables that macro again. Since the hcall_dprintf statements except one indicate guest problems, let's always use qemu_log_mask(LOG_GUEST_ERROR, ...) for this macro instead. One spot indicated an unimplemented host feature, so this is changed into qemu_log_mask(LOG_UNIMP, ...) instead. Now it's possible to see all those messages by simply adding the CLI parameter "-d guest_errors,unimp", without the need to re-compile the binary. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-07-07spapr: Support ibm, lrdr-capacity device tree propertyBharata B Rao
Add support for ibm,lrdr-capacity since this is needed by the guest kernel to know about the possible hot-pluggable CPUs and Memory. With this, pseries kernels will start reporting correct maxcpus in /sys/devices/system/cpu/possible. Also define the minimum hotpluggable memory size as 256MB. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [agraf: Fix compile error on 32bit hosts] Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07spapr: Add sPAPRMachineClassDavid Gibson
Currently although we have an sPAPRMachineState descended from MachineState we don't have an sPAPRMAchineClass descended from MachineClass. So far it hasn't been needed, but several upcoming features are going to want it, so this patch creates a stub implementation. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07spapr: Remove obsolete entry_point field from sPAPRMachineStateDavid Gibson
The sPAPRMachineState structure includes an entry_point field containing the initial PC value for starting the machine, even though this always has the value 0x100. I think this is a hangover from very early versions which bypassed the firmware when using -kernel. In any case it has no function now, so remove it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07spapr: Remove obsolete ram_limit field from sPAPRMachineStateDavid Gibson
The ram_limit field was imported from sPAPREnvironment where it predates the machine's ram size being available generically from machine->ram_size. Worse, the existing code was inconsistent about where it got the ram size from. Sometimes it used spapr->ram_limit, sometimes the global 'ram_size' and sometimes a local 'ram_size' masking the global. This cleans up the code to consistently use machine->ram_size, eliminating spapr->ram_limit in the process. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07spapr: Merge sPAPREnvironment into sPAPRMachineStateDavid Gibson
The code for -machine pseries maintains a global sPAPREnvironment structure which keeps track of general state information about the guest platform. This predates the existence of the MachineState structure, but performs basically the same function. Now that we have the generic MachineState, fold sPAPREnvironment into sPAPRMachineState, the pseries specific subclass of MachineState. This is mostly a matter of search and replace, although a few places which relied on the global spapr variable are changed to find the structure via qdev_get_machine(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03spapr_events: event-scan RTAS interfaceTyrel Datwyler
We don't actually rely on this interface to surface hotplug events, and instead rely on the similar-but-interrupt-driven check-exception RTAS interface used for EPOW events. However, the existence of this interface is needed to ensure guest kernels initialize the event-reporting interfaces which will in turn be used by userspace tools to handle these events, so we implement this interface here. Since events surfaced by this call are mutually exclusive to those surfaced via check-exception, we also update the RTAS event queue code to accept a boolean to mark/filter for events accordingly. Events of this sort are not currently generated by QEMU, but the interface has been tested by surfacing hotplug events via event-scan in place of check-exception. Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03spapr_events: re-use EPOW event infrastructure for hotplug eventsNathan Fontenot
This extends the data structures currently used to report EPOW events to guests via the check-exception RTAS interfaces to also include event types for hotplug/unplug events. This is currently undocumented and being finalized for inclusion in PAPR specification, but we implement this here as an extension for guest userspace tools to implement (existing guest kernels simply log these events via a sysfs interface that's read by rtas_errd, and current versions of rtas_errd/powerpc-utils already support the use of this mechanism for initiating hotplug operations). We also add support for queues of pending RTAS events, since in the case of hotplug there's chance for multiple events being in-flight at any point in time. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03spapr_rtas: add ibm, configure-connector RTAS interfaceMichael Roth
This interface is used to fetch an OF device-tree nodes that describes a newly-attached device to guest. It is called multiple times to walk the device-tree node and fetch individual properties into a 'workarea'/buffer provided by the guest. The device-tree is generated by QEMU and passed to an sPAPRDRConnector during the initial hotplug operation, and the state of these RTAS calls is tracked by the sPAPRDRConnector. When the last of these properties is successfully fetched, we report as special return value to the guest and transition the device to a 'configured' state on the QEMU/DRC side. See docs/specs/ppc-spapr-hotplug.txt for a complete description of this interface. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-03spapr: add rtas_st_buffer_direct() helperMichael Roth
This is similar to the existing rtas_st_buffer(), but for cases where the guest is not expecting a length-encoded byte array. Namely, for calls where a "work area" buffer is used to pass around arbitrary fields/data. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>