aboutsummaryrefslogtreecommitdiff
path: root/hw/ppc/spapr.c
AgeCommit message (Collapse)Author
2018-06-01hw: Do not include "sysemu/block-backend.h" if it is not necessaryPhilippe Mathieu-Daudé
Remove those unneeded includes to speed up the compilation process a little bit. (Continue 7eceff5b5a1fa cleanup) Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20180528232719.4721-13-f4bug@amsat.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-05-29ppc: Rename 2.13 machines to 3.0Peter Maydell
Rename the 2.13 machines to match the number we're going to use for the next release. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-id: 20180522104000.9044-5-peter.maydell@linaro.org
2018-05-10make sure that we aren't overwriting mc->get_hotplug_handler by accidentIgor Mammedov
Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 1525691524-32265-5-git-send-email-imammedo@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-05-07spapr: rename "hotplug memory" terminology to "device memory"David Hildenbrand
Let's make it clear at relevant places that we are dealing with device memory. That it can be used for memory hotplug is just a special case. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-11-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> [ehabkost: rebased series, solved conflicts at spapr.c] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-07pc-dimm: pass in the machine and to the MemoryHotplugStateDavid Hildenbrand
We use the machine internally either way, so let's just pass it in then. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-5-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-07pc-dimm: no need to pass the memory regionDavid Hildenbrand
We can just query it ourselves. When unplugging, we should always be able to the region (as it was previously plugged). E.g. PPC already assumed that and used &error_abort. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-4-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-07machine: make MemoryHotplugState accessible via the machineDavid Hildenbrand
Let's allow to query the MemoryHotplugState directly from the machine. If the pointer is NULL, the machine does not support memory devices. If the pointer is !NULL, the machine supports memory devices and the data structure contains information about the applicable physical guest address space region. This allows us to generically detect if a certain machine has support for memory devices, and to generically manage it (find free address range, plug/unplug a memory region). We will rename "MemoryHotplugState" to something more meaningful ("DeviceMemory") after we completed factoring out the pc-dimm code into MemoryDevice code. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-3-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> [ehabkost: rebased series, solved conflicts at spapr.c] [ehabkost: squashed fix to use g_malloc0()] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-07pc-dimm: factor out MemoryDevice interfaceDavid Hildenbrand
On the qmp level, we already have the concept of memory devices: "query-memory-devices" Right now, we only support NVDIMM and PCDIMM. We want to map other devices later into the address space of the guest. Such device could e.g. be virtio devices. These devices will have a guest memory range assigned but won't be exposed via e.g. ACPI. We want to make them look like memory device, but not glued to pc-dimm. Especially, it will not always be possible to have TYPE_PC_DIMM as a parent class (e.g. virtio devices). Let's use an interface instead. As a first part, convert handling of - qmp_pc_dimm_device_list - get_plugged_memory_size to our new model. plug/unplug stuff etc. will follow later. A memory device will have to provide the following functions: - get_addr(): Necessary, as the property "addr" can e.g. not be used for virtio devices (already defined). - get_plugged_size(): The amount this device offers to the guest as of now. - get_region_size(): Because this can later on be bigger than the plugged size. - fill_device_info(): Fill MemoryDeviceInfo, e.g. for qmp. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-2-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-04spapr: don't advertise radix GTSE if max-compat-cpu < power9Greg Kurz
On a POWER9 host, if a guest runs in pre POWER9 compat mode, it necessarily uses the hash MMU mode. In this case, we shouldn't advertise radix GTSE in the ibm,arch-vec-5-platform-support DT property as the current code does. The first reason is that it doesn't make sense, and the second one is that causes the CAS-negotiated options subsection to be migrated. This breaks backward migration to QEMU 2.7 and older versions on POWER8 hosts: qemu-system-ppc64: error while loading state for instance 0x0 of device 'spapr' qemu-system-ppc64: load of migration failed: No such file or directory This patch hence initialize CPUs a bit earlier so that we can check the requested compat mode, and don't set OV5_MMU_RADIX_GTSE for power8 and older. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-05-04spapr: don't migrate "spapr_option_vector_ov5_cas" to pre 2.8 machinesGreg Kurz
a324d6f16697 "spapr: Support ibm,dynamic-memory-v2 property" added a new feature in the set of CAS-negotiatable options. This causes the CAS-negotiated options subsection to be migrated, even for old machine types that don't know about it, and breaks backward migration to QEMU 2.7 and older versions: qemu-system-ppc64: error while loading state for instance 0x0 of device 'spapr' qemu-system-ppc64: load of migration failed: No such file or directory Since this feature only affects boot time behaviour, it should be filtered out when we decide to migrate CAS-negotiated options, like we already do with OV5_FORM1_AFFINITY and OV5_DRCONF_MEMORY. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-05-04spapr: Make a helper to set up cpu entry point stateDavid Gibson
Under PAPR, only the boot CPU is active when the system starts. Other cpus must be explicitly activated using an RTAS call. The entry state for the boot and secondary cpus isn't identical, but it has some things in common. We're going to add a bit more common setup later, too, so to simplify make a helper which sets up the common entry state for both boot and secondary cpu threads. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
2018-05-04spapr: Remove support for explicitly allocated RMAsDavid Gibson
Current POWER cpus allow for a VRMA, a special mapping which describes a guest's view of memory when in real mode (MMU off, from the guest's point of view). Older cpus didn't have that which meant that to support a guest a special host-contiguous region of memory was needed to give the guest its Real Mode Area (RMA). KVM used to provide special calls to allocate a contiguous RMA for those cases. This was useful in the early days of KVM on Power to allow it to be tested on PowerPC 970 chips as used in Macintosh G5 machines. Now, those machines are so old as to be almost irrelevant. The normal qemu deprecation process would require this to be marked deprecated then removed in 2 releases. However, this can only be used with corresponding support in the host kernel - which was dropped years ago (in c17b98cf "KVM: PPC: Book3S HV: Remove code for PPC970 processors" of 2014-12-03 to be precise). Therefore it should be ok to drop this immediately. Just to be clear this only affects *KVM HV* guests with PowerPC 970, and those already require an ancient host kernel. TCG and KVM PR guests with PowerPC 970 should still work. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Thomas Huth <thuth@redhat.com>
2018-04-27spapr: Support ibm,dynamic-memory-v2 propertyBharata B Rao
The new property ibm,dynamic-memory-v2 allows memory to be represented in a more compact manner in device tree. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-04-27spapr: Add ibm,max-associativity-domains propertySerhii Popovych
Now recent kernels (i.e. since linux-stable commit a346137e9142 ("powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes") support this property to mark initially memory-less NUMA nodes as "possible" to allow further memory hot-add to them. Advertise this property for pSeries machines to let guest kernels detect maximum supported node configuration and benefit from kernel side change when hot-add memory to specific, possibly empty before, NUMA node. Signed-off-by: Serhii Popovych <spopovyc@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-04-27target/ppc: Fold slb_nr into PPCHash64OptionsDavid Gibson
The env->slb_nr field gives the size of the SLB (Segment Lookaside Buffer). This is another static-after-initialization parameter of the specific version of the 64-bit hash MMU in the CPU. So, this patch folds the field into PPCHash64Options with the other hash MMU options. This is a bit more complicated that the things previously put in there, because slb_nr was foolishly included in the migration stream. So we need some of the usual dance to handle backwards compatible migration. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>
2018-04-27target/ppc: Fold ci_large_pages flag into PPCHash64OptionsDavid Gibson
The ci_large_pages boolean in CPUPPCState is only relevant to 64-bit hash MMU machines, indicating whether it's possible to map large (> 4kiB) pages as cache-inhibitied (i.e. for IO, rather than memory). Fold it as another flag into the PPCHash64Options structure. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
2018-04-27target/ppc: Move 1T segment and AMR options to PPCHash64OptionsDavid Gibson
Currently env->mmu_model is a bit of an unholy mess of an enum of distinct MMU types, with various flag bits as well. This makes which bits of the field should be compared pretty confusing. Make a start on cleaning that up by moving two of the flags bits - POWERPC_MMU_1TSEG and POWERPC_MMU_AMR - which are specific to the 64-bit hash MMU into a new flags field in PPCHash64Options structure. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
2018-04-27target/ppc: Pass cpu instead of env to ppc_create_page_sizes_prop()David Gibson
As a rule we prefer to pass PowerPCCPU instead of CPUPPCState, and this change will make some things simpler later on. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>
2018-04-27spapr: drop useless dynamic sysbus device sanity checkGreg Kurz
Since commit 7da79a167aa11, the machine class init function registers dynamic sysbus device types it supports. Passing an unsupported device type on the command line causes QEMU to exit with an error message just after machine init. It is hence not needed to do the same sanity check at machine reset. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-04-27Revert "spapr: Don't allow memory hotplug to memory less nodes"Serhii Popovych
This reverts commit b556854bd8524c26b8be98ab1bfdf0826831e793. Leave change @node type from uint32_t to to int from reverted commit because node < 0 is always false. Note that implementing capability or some trick to detect if guest kernel does not support hot-add to memory: this returns previous behavour where memory added to first non-empty node. Signed-off-by: Serhii Popovych <spopovyc@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-04-27spapr: drop useless sanity check in spapr_irq_alloc*()Greg Kurz
Both spapr_irq_alloc() and spapr_irq_alloc_block() have an errp parameter, but they don't use it if XICS hasn't been initialized yet. This is doubly wrong: - all callers do pass a non-null Error **, ie, they expect an error to be propagated in case of failure - XICS obviously needs to be initialized before anything starts allocating IRQs So this patch turns the check into an assert. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-04-27spapr: Introduce pseries-2.13 machine typeDavid Gibson
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-04-26vl.c: new function serial_max_hds()Peter Maydell
Create a new function serial_max_hds() which returns the number of serial ports defined by the user. This is needed only by spapr. This allows us to remove the MAX_SERIAL_PORTS define. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20180420145249.32435-14-peter.maydell@linaro.org
2018-04-26Change references to serial_hds[] to serial_hd()Peter Maydell
Change all the uses of serial_hds[] to go via the new serial_hd() function. Code change produced with: find hw -name '*.[ch]' | xargs sed -i -e 's/serial_hds\[\([^]]*\)\]/serial_hd(\1)/g' Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-id: 20180420145249.32435-8-peter.maydell@linaro.org
2018-04-10spapr: Initialize reserved areas list in FDT in H_CAS handlerAlexey Kardashevskiy
At the moment the device tree produced by the H_CAS handler has no reserved map initialized at all which is not correct as at least one empty record is required to be present as a marker of the end. This does not cause problems now as the only consumer is SLOF which does not look at the reserved map area. However when DTC's "Improve libfdt's memory safety" changeset hits the QEMU upstream, there will be errors reported and crashes observed. This fixes the problem by adding an empty entry to the reserved map, just like create_device_tree() does already. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-03-20Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell
virtio,vhost,pci,pc: features, cleanups SRAT tables for DIMM devices new virtio net flags for speed/duplex post-copy migration support in vhost cleanups in pci Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Tue 20 Mar 2018 14:40:43 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (51 commits) postcopy shared docs libvhost-user: Claim support for postcopy postcopy: Allow shared memory vhost: Huge page align and merge vhost+postcopy: Wire up POSTCOPY_END notify vhost-user: Add VHOST_USER_POSTCOPY_END message libvhost-user: mprotect & madvises for postcopy vhost+postcopy: Call wakeups vhost+postcopy: Add vhost waker postcopy: postcopy_notify_shared_wake postcopy: helper for waking shared vhost+postcopy: Resolve client address postcopy-ram: add a stub for postcopy_request_shared_page vhost+postcopy: Helper to send requests to source for shared pages vhost+postcopy: Stash RAMBlock and offset vhost+postcopy: Send address back to qemu libvhost-user+postcopy: Register new regions with the ufd migration/ram: ramblock_recv_bitmap_test_byte_offset postcopy+vhost-user: Split set_mem_table for postcopy vhost+postcopy: Transmit 'listen' to slave ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # scripts/update-linux-headers.sh
2018-03-20pc-dimm: make qmp_pc_dimm_device_list() sort devices by addressHaozhong Zhang
Make qmp_pc_dimm_device_list() return sorted by start address list of devices so that it could be reused in places that would need sorted list*. Reuse existing pc_dimm_built_list() to get sorted list. While at it hide recursive callbacks from callers, so that: qmp_pc_dimm_device_list(qdev_get_machine(), &list); could be replaced with simpler: list = qmp_pc_dimm_device_list(); * follow up patch will use it in build_srat() Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> for ppc part Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-03-18hw/ppc/spapr: Allow "spapr-vlan" as NIC model name beside "ibmveth"Thomas Huth
With the new "--nic" command line parameter option, the "old" way of specifying a NIC model via the nd_table[] is becoming more prominent again. But for the pseries "spapr-vlan" device, there is a confusing discrepancy between the model name that is used for "--device" (i.e. "spapr-vlan") and the model name that has to be used for "--net nic" or the new "--nic" parameter (i.e. "ibmveth"). Since "spapr-vlan" is the "real" name of the device, let's allow "spapr-vlan" to be used as model name for the nd_table[] entries, too. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-03-13ppc/spapr, vfio: Turn off MSIX emulation for VFIO devicesAlexey Kardashevskiy
This adds a possibility for the platform to tell VFIO not to emulate MSIX so MMIO memory regions do not get split into chunks in flatview and the entire page can be registered as a KVM memory slot and make direct MMIO access possible for the guest. This enables the entire MSIX BAR mapping to the guest for the pseries platform in order to achieve the maximum MMIO preformance for certain devices. Tested on: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-03-06hw/ppc/spapr,e500: Use new property "stdout-path" for boot consoleNikunj A Dadhania
Linux kernel commit 2a9d832cc9aae21ea827520fef635b6c49a06c6d (of: Add bindings for chosen node, stdout-path) deprecated chosen property "linux,stdout-path" and "stdout". Introduce the new property "stdout-path" and continue supporting the older property to remain compatible with existing/older firmware. This older property can be deprecated after 5 years. Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-03-06ppc/spapr-caps: Define the pseries-2.12-sxxm machine typeSuraj Jitindar Singh
The sxxm (speculative execution exploit mitigation) machine type is a variant of the 2.12 machine type with workarounds for speculative execution vulnerabilities enabled by default. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-03-06spapr: harden code that depends on VSMTGreg Kurz
VSMT must be set in order to compute VCPU ids. This means that the following functions must not be called before spapr_set_vsmt_mode() was called: - spapr_vcpu_id() - spapr_is_thread0_in_vcore() - xics_max_server_number() We had a recent regression where the latter would be called before VSMT was set, and broke migration of some old machine types. This patch adds assert() in the above functions to avoid problems in the future. Also, since VSMT is really a CPU related thing, spapr_set_vsmt_mode() is now called from spapr_init_cpus(), just before the first VSMT user. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-03-06spapr: register dummy ICPs laterGreg Kurz
Some older machine types create more ICPs than needed. We hence need to register up to xics_max_server_number() dummy ICPs to accomodate the migration of these machine types. Recent VSMT rework changed xics_max_server_number() to return DIV_ROUND_UP(max_cpus * spapr->vsmt, smp_threads) instead of DIV_ROUND_UP(max_cpus * kvmppc_smt_threads(), smp_threads); The change is okay but it requires spapr->vsmt to be set, which isn't the case with the current code. This causes the formula to return zero and we don't create dummy ICPs. This breaks migration of older guests as reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1549087 The dummy ICP workaround doesn't really have a dependency on XICS itself. But it does depend on proper VCPU id numbering and it must be applied before creating vCPUs (ie, creating real ICPs). So this patch moves the workaround to spapr_init_cpus(), which already assumes VSMT to be set. Fixes: 72194664c8a1 ("spapr: use spapr->vsmt to compute VCPU ids") Reported-by: Lukas Doktor <ldoktor@redhat.com> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-03-06spapr: fix missing CPU core nodes in DT when running with TCGGreg Kurz
Commit 5d0fb1508e2d "spapr: consolidate the VCPU id numbering logic in a single place" introduced a helper to detect thread0 of a virtual core based on its VCPU id. This is used to create CPU core nodes in the DT, but it is broken in TCG. $ qemu-system-ppc64 -nographic -accel tcg -machine dumpdtb=dtb.bin \ -smp cores=16,maxcpus=16,threads=1 $ dtc -f -O dts dtb.bin | grep POWER8 PowerPC,POWER8@0 { PowerPC,POWER8@8 { instead of the expected 16 cores that we get with KVM: $ dtc -f -O dts dtb.bin | grep POWER8 PowerPC,POWER8@0 { PowerPC,POWER8@8 { PowerPC,POWER8@10 { PowerPC,POWER8@18 { PowerPC,POWER8@20 { PowerPC,POWER8@28 { PowerPC,POWER8@30 { PowerPC,POWER8@38 { PowerPC,POWER8@40 { PowerPC,POWER8@48 { PowerPC,POWER8@50 { PowerPC,POWER8@58 { PowerPC,POWER8@60 { PowerPC,POWER8@68 { PowerPC,POWER8@70 { PowerPC,POWER8@78 { This happens because spapr_get_vcpu_id() maps VCPU ids to cs->cpu_index in TCG mode. This confuses the code in spapr_is_thread0_in_vcore(), since it assumes thread0 VCPU ids to have a spapr->vsmt spacing. spapr_get_vcpu_id(cpu) % spapr->vsmt == 0 Actually, there's no real reason to expose cs->cpu_index instead of the VCPU id, since we also generate it with TCG. Also we already set it explicitly in spapr_set_vcpu_id(), so there's no real reason either to call kvm_arch_vcpu_id() with KVM. This patch unifies spapr_get_vcpu_id() to always return the computed VCPU id both in TCG and KVM. This is one step forward towards KVM<->TCG migration. Fixes: 5d0fb1508e2d Reported-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-02-16spapr: consolidate the VCPU id numbering logic in a single placeGreg Kurz
Several places in the code need to calculate a VCPU id: (cpu_index / smp_threads) * spapr->vsmt + cpu_index % smp_threads (core_id / smp_threads) * spapr->vsmt (1 user) index * spapr->vsmt (2 users) or guess that the VCPU id of a given VCPU is the first thread of a virtual core: index % spapr->vsmt != 0 Even if the numbering logic isn't that complex, it is rather fragile to have these assumptions open-coded in several places. FWIW this was proved with recent issues related to VSMT. This patch moves the VCPU id formula to a single function to be called everywhere the code needs to compute one. It also adds an helper to guess if a VCPU is the first thread of a VCORE. Signed-off-by: Greg Kurz <groug@kaod.org> [dwg: Rename spapr_is_vcore() to spapr_is_thread0_in_vcore() for clarity] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-02-16spapr: rename spapr_vcpu_id() to spapr_get_vcpu_id()Greg Kurz
The spapr_vcpu_id() function is an accessor actually. Let's rename it for symmetry with the recently added spapr_set_vcpu_id() helper. The motivation behind this is that a later patch will consolidate the VCPU id formula in a function and spapr_vcpu_id looks like an appropriate name. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-02-16spapr: move VCPU calculation to core machine codeGreg Kurz
The VCPU ids are currently computed and assigned to each individual CPU threads in spapr_cpu_core_realize(). But the numbering logic of VCPU ids is actually a machine-level concept, and many places in hw/ppc/spapr.c also have to compute VCPU ids out of CPU indexes. The current formula used in spapr_cpu_core_realize() is: vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i where: cc->core_id is a multiple of smp_threads cpu_index = cc->core_id + i 0 <= i < smp_threads So we have: cpu_index % smp_threads == i cc->core_id / smp_threads == cpu_index / smp_threads hence: vcpu_id = (cpu_index / smp_threads) * spapr->vsmt + cpu_index % smp_threads; This formula was used before VSMT at the time VCPU ids where computed at the target emulation level. It has the advantage of being useable to derive a VPCU id out of a CPU index only. It is fitted for all the places where the machine code has to compute a VCPU id. This patch introduces an accessor to set the VCPU id in a PowerPCCPU object using the above formula. It is a first step to consolidate all the VCPU id logic in a single place. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-02-16spapr: use spapr->vsmt to compute VCPU idsGreg Kurz
Since the introduction of VSMT in 2.11, the spacing of VCPU ids between cores is controllable through a machine property instead of being only dictated by the SMT mode of the host: cpu->vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i Until recently, the machine code would try to change the SMT mode of the host to be equal to VSMT or exit. This allowed the rest of the code to assume that kvmppc_smt_threads() == spapr->vsmt is always true. Recent commit "8904e5a75005 spapr: Adjust default VSMT value for better migration compatibility" relaxed the rule. If the VSMT mode cannot be set in KVM for some reasons, but the requested CPU topology is compatible with the current SMT mode, then we let the guest run with kvmppc_smt_threads() != spapr->vsmt. This breaks quite a few places in the code, in particular when calculating DRC indexes. This is what happens on a POWER host with subcores-per-core=2 (ie, supports up to SMT4) when passing the following topology: -smp threads=4,maxcpus=16 \ -device host-spapr-cpu-core,core-id=4,id=core1 \ -device host-spapr-cpu-core,core-id=8,id=core2 qemu-system-ppc64: warning: Failed to set KVM's VSMT mode to 8 (errno -22) This is expected since KVM is limited to SMT4, but the guest is started anyway because this topology can run on SMT4 even with a VSMT8 spacing. But when we look at the DT, things get nastier: cpus { ... ibm,drc-indexes = <0x4 0x10000000 0x10000004 0x10000008 0x1000000c>; This means that we have the following association: CPU core device | DRC | VCPU id -----------------+------------+--------- boot core | 0x10000000 | 0 core1 | 0x10000004 | 4 core2 | 0x10000008 | 8 core3 | 0x1000000c | 12 But since the spacing of VCPU ids is 8, the DRC for core1 points to a VCPU that doesn't exist, the DRC for core2 points to the first VCPU of core1 and and so on... ... PowerPC,POWER8@0 { ... ibm,my-drc-index = <0x10000000>; ... }; PowerPC,POWER8@8 { ... ibm,my-drc-index = <0x10000008>; ... }; PowerPC,POWER8@10 { ... No ibm,my-drc-index property for this core since 0x10000010 doesn't exist in ibm,drc-indexes above. ... }; }; ... interrupt-controller { ... ibm,interrupt-server-ranges = <0x0 0x10>; With a spacing of 8, the highest VCPU id for the given topology should be: 16 * 8 / 4 = 32 and not 16 ... linux,phandle = <0x7e7323b8>; interrupt-controller; }; And CPU hot-plug/unplug is broken: (qemu) device_del core1 pseries-hotplug-cpu: Cannot find CPU (drc index 10000004) to remove (qemu) device_del core2 cpu 4 (hwid 8) Ready to die... cpu 5 (hwid 9) Ready to die... cpu 6 (hwid 10) Ready to die... cpu 7 (hwid 11) Ready to die... These are the VCPU ids of core1 actually (qemu) device_add host-spapr-cpu-core,core-id=12,id=core3 (qemu) device_del core3 pseries-hotplug-cpu: Cannot find CPU (drc index 1000000c) to remove This patches all the code in hw/ppc/spapr.c to assume the VSMT spacing when manipulating VCPU ids. Fixes: 8904e5a75005 Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-02-10spapr: set vsmt to MAX(8, smp_threads)Laurent Vivier
We ignore silently the value of smp_threads when we set the default VSMT value, and if smp_threads is greater than VSMT kernel is going into trouble later. Fixes: 8904e5a750 ("spapr: Adjust default VSMT value for better migration compatibility") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-02-10hw/ppc: rename functions in commentsDaniel Henrique Barboza
Commit bcb5ce08cf ("spapr: Rename machine init functions for clarity") renamed ppc_spapr_reset to spapr_machine_reset and ppc_spapr_init to spapr_machine_init. Let's also rename the references in comments. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-02-09Include qmp-commands.h exactly where neededMarkus Armbruster
Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-7-armbru@redhat.com> [OSX breakage fixed]
2018-01-29target/ppc/spapr_caps: Add new tristate cap safe_indirect_branchSuraj Jitindar Singh
Add new tristate cap cap-ibs to represent the indirect branch serialisation capability. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29target/ppc/spapr_caps: Add new tristate cap safe_bounds_checkSuraj Jitindar Singh
Add new tristate cap cap-sbbc to represent the speculation barrier bounds checking capability. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29target/ppc/spapr_caps: Add new tristate cap safe_cacheSuraj Jitindar Singh
Add new tristate cap cap-cfpc to represent the cache flush on privilege change capability. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-20spapr: fix device tree properties when using compatibility modeGreg Kurz
Commit 51f84465dd98 changed the compatility mode setting logic: - machine reset only sets compatibility mode for the boot CPU - compatibility mode is set for other CPUs when they are put online by the guest with the "start-cpu" RTAS call This causes a regression for machines started with max-compat-cpu: the device tree nodes related to secondary CPU cores contain wrong "cpu-version" and "ibm,pa-features" values, as shown below. Guest started on a POWER8 host with: -smp cores=2 -machine pseries,max-cpu-compat=compat7 ibm,pa-features = [18 00 f6 3f c7 c0 80 f0 80 00 00 00 00 00 00 00 00 00 80 00 80 00 80 00 00 00]; cpu-version = <0x4d0200>; ^^^ second CPU core ibm,pa-features = <0x600f63f 0xc70080c0>; cpu-version = <0xf000003>; ^^^ boot CPU core The second core is advertised in raw POWER8 mode. This happens because CAS assumes all CPUs to have the same compatibility mode. Since the boot CPU already has the requested compatibility mode, the CAS code does not set it for the secondary one, and exposes the bogus device tree properties in in the CAS response to the guest. A similar situation is observed when hot-plugging a CPU core. The related device tree properties are generated and exposed to guest with the "ibm,configure-connector" RTAS before "start-cpu" is called. The CPU core is advertised to the guest in raw mode as well. It both cases, it boils down to the fact that "start-cpu" happens too late. This can be fixed globally by propagating the compatibility mode of the boot CPU to the other CPUs during reset. For this to work, the compatibility mode of the boot CPU must be set before the machine code actually resets all CPUs. It is not needed to set the compatibility mode in "start-cpu" anymore, so the code is dropped. Fixes: 51f84465dd98 Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-20spapr: drop duplicate variable in spapr_core_plug()Greg Kurz
A variable is already defined at the begining of the function to hold a pointer to the CPU core object: sPAPRCPUCore *core = SPAPR_CPU_CORE(OBJECT(dev)); No need to define it again in the pre-2.10 compatibility code snipplet. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-19possible_cpus: add CPUArchId::type fieldIgor Mammedov
Remove dependency of possible_cpus on 1st CPU instance, which decouples configuration data from CPU instances that are created using that data. Also later it would be used for enabling early cpu to numa node configuration at runtime qmp_query_hotpluggable_cpus() should provide a list of available cpu slots at early stage, before machine_init() is called and the 1st cpu is created, so that mgmt might be able to call it and use output to set numa mapping. Use MachineClass::possible_cpu_arch_ids() callback to set cpu type info, along with the rest of possible cpu properties, to let machine define which cpu type* will be used. * for SPAPR it will be a spapr core type and for ARM/s390x/x86 a respective descendant of CPUClass. Move parse_numa_opts() in vl.c after cpu_model is parsed into cpu_type so that possible_cpu_arch_ids() would know which cpu_type to use during layout initialization. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1515597770-268979-1-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-01-19spapr: Allow only supported dynamic sysbus devicesEduardo Habkost
TYPE_SPAPR_PCI_HOST_BRIDGE is the only dynamic sysbus device not rejected by ppc_spapr_reset(), so it can be the only entry on the allowed list. Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Alexander Graf <agraf@suse.de> Cc: qemu-ppc@nongnu.org Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20171125151610.20547-5-ehabkost@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-01-19machine: Replace has_dynamic_sysbus with list of allowed devicesEduardo Habkost
The existing has_dynamic_sysbus flag makes the machine accept every user-creatable sysbus device type on the command-line. Replace it with a list of allowed device types, so machines can easily accept some sysbus devices while rejecting others. To keep exactly the same behavior as before, the existing has_dynamic_sysbus=true assignments are replaced with a TYPE_SYS_BUS_DEVICE entry on the allowed list. Other patches will replace the TYPE_SYS_BUS_DEVICE entries with more specific lists of devices. Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Marcel Apfelbaum <marcel@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Alexander Graf <agraf@suse.de> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Anthony Perard <anthony.perard@citrix.com> Cc: qemu-arm@nongnu.org Cc: qemu-ppc@nongnu.org Cc: xen-devel@lists.xenproject.org Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20171125151610.20547-2-ehabkost@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-01-17spapr: Adjust default VSMT value for better migration compatibilityDavid Gibson
fa98fbfc "PC: KVM: Support machine option to set VSMT mode" introduced the "vsmt" parameter for the pseries machine type, which controls the spacing of the vcpu ids of thread 0 for each virtual core. This was done to bring some consistency and stability to how that was done, while still allowing backwards compatibility for migration and otherwise. The default value we used for vsmt was set to the max of the host's advertised default number of threads and the number of vthreads per vcore in the guest. This was done to continue running without extra parameters on older KVM versions which don't allow the VSMT value to be changed. Unfortunately, even that smaller than before leakage of host configuration into guest visible configuration still breaks things. Specifically a guest with 4 (or less) vthread/vcore will get a different vsmt value when running on a POWER8 (vsmt==8) and POWER9 (vsmt==4) host. That means the vcpu ids don't line up so you can't migrate between them, though you should be able to. Long term we really want to make vsmt == smp_threads for sufficiently new machine types. However, that means that qemu will then require a sufficiently recent KVM (one which supports changing VSMT) - that's still not widely enough deployed to be really comfortable to do. In the meantime we need some default that will work as often as possible. This patch changes that default to 8 in all circumstances. This does change guest visible behaviour (including for existing machine versions) for many cases - just not the most common/important case. Following is case by case justification for why this is still the least worst option. Note that any of the old behaviours can still be duplicated after this patch, it's just that it requires manual intervention by setting the vsmt property on the command line. KVM HV on POWER8 host: This is the overwhelmingly common case in production setups, and is unchanged by design. POWER8 hosts will advertise a default VSMT mode of 8, and > 8 vthreads/vcore isn't permitted KVM HV on POWER7 host: Will break, but POWER7s allowing KVM were never released to the public. KVM HV on POWER9 host: Not yet released to the public, breaking this now will reduce other breakage later. KVM HV on PowerPC 970: Will theoretically break it, but it was barely supported to begin with and already required various user visible hacks to work. Also so old that I just don't care. TCG: This is the nastiest one; it means migration of TCG guests (without manual vsmt setting) will break. Since TCG is rarely used in production I think this is worth it for the other benefits. It does also remove one more barrier to TCG<->KVM migration which could be interesting for debugging applications. KVM PR: As with TCG, this will break migration of existing configurations, without adding extra manual vsmt options. As with TCG, it is rare in production so I think the benefits outweigh breakages. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org>