aboutsummaryrefslogtreecommitdiff
path: root/hw/ppc/spapr.c
AgeCommit message (Collapse)Author
2017-06-30spapr: fix migration of ICPState objects from/to older QEMUGreg Kurz
Commit 5bc8d26de20c ("spapr: allocate the ICPState object from under sPAPRCPUCore") moved ICPState objects from the machine to CPU cores. This is an improvement since we no longer allocate ICPState objects that will never be used. But it has the side-effect of breaking migration of older machine types from older QEMU versions. This patch allows spapr to register dummy "icp/server" entries to vmstate. These entries use a dedicated VMStateDescription that can swallow and discard state of an incoming migration stream, and that don't send anything on outgoing migration. As for real ICPState objects, the instance_id is the cpu_index of the corresponding vCPU, which happens to be equal to the generated instance_id of older machine types. The machine can unregister/register these entries when CPUs are dynamically plugged/unplugged. This is only available for pseries-2.9 and older machines, thanks to a compat property. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-30spapr: Fix migration of Radix guestsBharata B Rao
Fix migration of radix guests by ensuring that we issue KVM_PPC_CONFIGURE_V3_MMU for radix case post migration. Reported-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-30spapr: Add a "no HPT" encoding to HTAB migration streamBharata B Rao
Add a "no HPT" encoding (using value -1) to the HTAB migration stream (in the place of HPT size) when the guest doesn't allocate HPT. This will help the target side to match target HPT with the source HPT and thus enable successful migration. Suggested-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-30ppc: Rework CPU compatibility testing across migrationDavid Gibson
Migrating between different CPU versions is a bit complicated for ppc. A long time ago, we ensured identical CPU versions at either end by checking the PVR had the same value. However, this breaks under KVM HV, because we always have to use the host's PVR - it's not virtualized. That would mean we couldn't migrate between hosts with different PVRs, even if the CPUs are close enough to compatible in practice (sometimes identical cores with different surrounding logic have different PVRs, so this happens in practice quite often). So, we removed the PVR check, but instead checked that several flags indicating supported instructions matched. This turns out to be a bad idea, because those instruction masks are not architected information, but essentially a TCG implementation detail. So changes to qemu internal CPU modelling can break migration - this happened between qemu-2.6 and qemu-2.7. That was addressed by 146c11f1 "target-ppc: Allow eventual removal of old migration mistakes". Now, verification of CPU compatibility across a migration basically doesn't happen. We simply ignore the PVR of the incoming migration, and hope the cpu on the destination is close enough to work. Now that we've cleaned up handling of processor compatibility modes for pseries machine type, we can do better. For new machine types (pseries-2.10+) We allow migration if: * The source and destination PVRs are for the same type of CPU, as determined by CPU class's pvr_match function OR * When the source was in a compatibility mode, and the destination CPU supports the same compatibility mode For older machine types we retain the existing behaviour - current CAS code will usually set a compat mode which would break backwards migration if we made them use the new behaviour. [Fixed from an earlier version by Greg Kurz]. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
2017-06-30pseries: Reset CPU compatibility modeDavid Gibson
Currently, the CPU compatibility mode is set when the cpu is initialized, then again when the guest negotiates features. This means if a guest negotiates a compatibility mode, then reboots, that compatibility mode will be retained across the reset. Usually that will get overridden when features are negotiated on the next boot, but it's still not really correct. This patch moves the initial set up of the compatibility mode from cpu init to reset time. The mode *is* retained if the reboot was caused by the feature negotiation (it might be important in that case, though it's unlikely). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Tested-by: Andrea Bolognani <abologna@redhat.com>
2017-06-30pseries: Move CPU compatibility property to machineDavid Gibson
Server class POWER CPUs have a "compat" property, which is used to set the backwards compatibility mode for the processor. However, this only makes sense for machine types which don't give the guest access to hypervisor privilege - otherwise the compatibility level is under the guest's control. To reflect this, this removes the CPU 'compat' property and instead creates a 'max-cpu-compat' property on the pseries machine. Strictly speaking this breaks compatibility, but AFAIK the 'compat' option was never (directly) used with -device or device_add. The option was used with -cpu. So, to maintain compatibility, this patch adds a hack to the cpu option parsing to strip out any compat options supplied with -cpu and set them on the machine property instead of the now deprecated cpu property. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Tested-by: Andrea Bolognani <abologna@redhat.com>
2017-06-28migration: move skip_section_footersPeter Xu
Move it into MigrationState, revert its meaning and renaming it to send_section_footer, with a property bound to it. Same trick is played like previous patches. Removing savevm_skip_section_footers(). Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498536619-14548-9-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-06-28migration: move skip_configuration outPeter Xu
It was in SaveState but now moved to MigrationState altogether, reverted its meaning, then renamed to "send_configuration". Again, using HW_COMPAT_2_3 for old PC/SPAPR machines, and accel_register_prop() for xen_init(). Removing savevm_skip_configuration(). Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498536619-14548-8-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-06-28migration: move global_state.optional outPeter Xu
Put it into MigrationState then we can use the properties to specify whether to enable storing global state. Removing global_state_set_optional() since now we can use HW_COMPAT_2_3 for x86/power, and AccelClass.global_props for Xen. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498536619-14548-6-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-06-20pc-dimm: use get_uint() for dimm propertiesMarc-André Lureau
TYPE_PC_DIMM's property PC_DIMM_ADDR_PROP is defined with DEFINE_PROP_UINT64(). TYPE_PC_DIMM's property PC_DIMM_NODE_PROP is defined with DEFINE_PROP_UINT32(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20170607163635.17635-22-marcandre.lureau@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2017-06-13Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20170613' ↵Peter Maydell
into staging migration/next for 20170613 # gpg: Signature made Tue 13 Jun 2017 10:01:45 BST # gpg: using RSA key 0xF487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20170613: migration: Move migration.h to migration/ migration: Move remaining exported functions to migration/misc.h migration: create global_state.c migration: ram_control_* are implemented in qemu_file migration: Commands are only used inside migration.c migration: Move constants to savevm.h migration: Move dump_vmsate_json_to_file() to misc.h migration: Split registration functions from vmstate.h migration: Move self_announce_delay() to misc.h migration: Remove MigrationState from migration_channel_incomming() ram: Now POSTCOPY_ACTIVE is the same that STATUS_ACTIVE ram: Print block stats also in the complete case migration: Don't try to set *errp directly migration: isolate return path on src Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-06-13migration: Move remaining exported functions to migration/misc.hJuan Quintela
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com>
2017-06-13migration: create global_state.cJuan Quintela
It don't belong anywhere else, just the global state where everybody can stick other things. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2017-06-13migration: Split registration functions from vmstate.hJuan Quintela
They are indpendent, and nowadays almost every device register things with qdev->vmsd. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com>
2017-06-09xics: introduce macros for ICP/ICS link propertiesGreg Kurz
These properties are part of the XICS API. They deserve to appear explicitely in the XICS header file. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-08hw/ppc/spapr: Adjust firmware name for PCI bridgesThomas Huth
SLOF uses "pci" as name for PCI bridges nodes in the device tree instead of "pci-bridges", so booting via bootindex from a device behind a PCI bridge currently does not work since QEMU passes the wrong name in the "qemu,boot-list" property. Fix it by changing the name of the PCI bridge nodes to "pci" instead. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1459170 Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-08spapr: Change DRC attach & detach methods to functionsDavid Gibson
DRC objects have attach & detach methods, but there's only one implementation. Although there are some differences in its behaviour for different DRC types, the overall structure is the same, so while we might want different method implementations for some parts, we're unlikely to want them for the top-level functions. So, replace them with direct function calls. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-08spapr: Don't misuse DR-indicator in spapr_recover_pending_dimm_state()David Gibson
With some combinations of migration and hotplug we can lost temporary state indicating how many DRCs (guest side hotplug handles) are still connected to a DIMM object in the process of removal. When we hit that situation spapr_recover_pending_dimm_state() is used to scan more extensively and work out the right number. It does this using drc->indicator state to determine what state of disconnection the DRC is in. However, this is not safe, because the indicator state is guest settable - in fact it's more-or-less a purely guest->host notification mechanism which should have no bearing on the internals of hotplug state management. So, replace the test for this with a test on drc->dev, which is a purely qemu side managed variable, and updated the same BQL critical section as the indicator state. This does introduce an off-by-one change, because the indicator state was updated before the call to spapr_lmb_release() on the current DRC, whereas drc->dev is updated afterwards. That's corrected by always decrementing the nr_lmbs value instead of only doing so in the case where we didn't have to recover information. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-08spapr: fix memory leak in spapr_memory_pre_plug()Greg Kurz
The string returned by object_property_get_str() is dynamically allocated. (Spotted by Coverity, CID 1375942) Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-06Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170606' ↵Peter Maydell
into staging ppc patch queue 2017-06-06 Accumulated patches for ppc targets and the pseries machine type. The big thing in this batch is a start on a substantial cleanup of the pseries hotplug mechanisms, which were pretty confusing. For now these shouldn't cause substantial behavioural changes, but I am hoping these lead to clearer code and eventually to fixes for the bugs we have in hotplug handling, particularly when hotplug and migration are combined. The remaining patches are mostly bugfixes. # gpg: Signature made Tue 06 Jun 2017 03:48:50 BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.10-20170606: spapr: Remove some non-useful properties on DRC objects spapr: Eliminate spapr_drc_get_type_str() spapr: Move configure-connector state into DRC spapr: Clean up spapr_dr_connector_by_*() spapr: Introduce DRC subclasses spapr/drc: don't migrate DRC of cold-plugged CPUs and LMBs spapr: Allow boot from vhost-*-scsi backends ppc/pnv: check the return value of fdt_setprop() spapr_nvram: Check return value from blk_getlength() target/ppc: Fixup set_spr error in h_register_process_table target-ppc: Fix openpic timer read register offset spapr: Make DRC get_index and get_type methods into plain functions spapr: Abolish DRC set_configured method spapr: Abolish DRC get_fdt method spapr: Move DRC RTAS calls into spapr_drc.c migration: Mark CPU states dirty before incoming migration/loadvm migration: remove register_savevm() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-06-06spapr: Move configure-connector state into DRCDavid Gibson
Currently the sPAPRMachineState contains a list of sPAPRConfigureConnector structures which store intermediate state for the ibm,configure-connector RTAS call. This was an attempt to separate this state from the core of the DRC state. However the configure connector process is intimately tied to the DRC model, so there's really no point trying to have two levels of interface here. Moving the configure-connector state into its corresponding DRC allows removal of a number of helpers for maintaining the anciliary list. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06spapr: Clean up spapr_dr_connector_by_*()David Gibson
* Change names to something less ludicrously verbose * Now that we have QOM subclasses for the different DRC types, use a QOM typename instead of a PAPR type value parameter The latter allows removal of the get_type_shift() helper. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06spapr: Introduce DRC subclassesDavid Gibson
Currently we only have a single QOM type for all DRCs, but lots of places where we switch behaviour based on the DRC's PAPR defined type. This is a poor use of our existing type system. So, instead create QOM subclasses for each PAPR defined DRC type. We also introduce intermediate subclasses for physical and logical DRCs, a division which will be useful later on. Instead of being stored in the DRC object itself, the PAPR type is now stored in the class structure. There are still many places where we switch directly on the PAPR type value, but this at least provides the basis to start to remove those. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06spapr: Allow boot from vhost-*-scsi backendsFelipe Franciosi
The current implementation of spapr_get_fw_dev_path() doesn't take into consideration vhost-*-scsi devices. This makes said devices unbootable on PPC as SLOF is unable to work out the path to scan boot disks. This makes VMs bootable on spapr when using vhost-*-scsi by implementing a disk path for VHostSCSICommon (which currently includes both vhost-user-scsi and vhost-scsi). Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Signed-off-by: Mike Cui <cui@nutanix.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-06spapr: Make DRC get_index and get_type methods into plain functionsDavid Gibson
These two methods only have one implementation, and the spec they're implementing means any other implementation is unlikely, verging on impossible. So replace them with simple functions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-06-05spapr: cleanup spapr_fixup_cpu_numa_dt() usageIgor Mammedov
even though spapr_fixup_cpu_numa_dt() has no effect on FDT if numa is disabled, don't call it uselessly. It makes it obvious at call sites that function is needed only when numa is enabled. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496161442-96665-7-git-send-email-imammedo@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05numa: move numa_node from CPUState into target specific classesIgor Mammedov
Move vcpu's associated numa_node field out of generic CPUState into inherited classes that actually care about cpu<->numa mapping, i.e: ARMCPU, PowerPCCPU, X86CPU. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496161442-96665-6-git-send-email-imammedo@redhat.com> [ehabkost: s/CPU is belonging to/CPU belongs to/ on comments] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05numa: consolidate cpu_preplug fixups/checks for pc/arm/spaprIgor Mammedov
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1496161442-96665-2-git-send-email-imammedo@redhat.com> [ehabkost: Fix indentation] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-25hw/ppc/spapr.c: recover pending LMB unplug info in spapr_lmb_releaseDaniel Henrique Barboza
When a LMB hot unplug starts, the current DRC LMB status is stored at spapr->pending_dimm_unplugs QTAILQ. This queue isn't migrated, thus if a migration occurs in the middle of a LMB unplug the spapr_lmb_release callback will lost track of the LMB unplug progress. This patch implements a new recover function spapr_recover_pending_dimm_state that is used inside spapr_lmb_release to recover this DRC LMB release status that is lost during the migration. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> [dwg: Minor stylistic changes, simplify error handling] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-25hw/ppc: removing drc->detach_cb and drc->detach_cb_opaqueDaniel Henrique Barboza
The pointer drc->detach_cb is being used as a way of informing the detach() function inside spapr_drc.c which cb to execute. This information can also be retrieved simply by checking drc->type and choosing the right callback based on it. In this context, detach_cb is redundant information that must be managed. After the previous spapr_lmb_release change, no detach_cb_opaques are being used by any of the three callbacks functions. This is yet another information that is now unused and, on top of that, can't be migrated either. This patch makes the following changes: - removal of detach_cb_opaque. the 'opaque' argument was removed from the callbacks and from the detach() function of sPAPRConnectorClass. The attribute detach_cb_opaque of sPAPRConnector was removed. - removal of detach_cb from the detach() call. The function pointer detach_cb of sPAPRConnector was removed. detach() now uses a switch(drc->type) to execute the apropriate callback. To achieve this, spapr_core_release, spapr_lmb_release and spapr_phb_remove_pci_device_cb callbacks were made public to be visible inside detach(). Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-25hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineStateDavid Gibson
The LMB DRC release callback, spapr_lmb_release(), uses an opaque parameter, a sPAPRDIMMState struct that stores the current LMBs that are allocated to a DIMM (nr_lmbs). After each call to this callback, the nr_lmbs is decremented by one and, when it reaches zero, the callback proceeds with the qdev calls to hot unplug the LMB. Using drc->detach_cb_opaque is problematic because it can't be migrated in the future DRC migration work. This patch makes the following changes to eliminate the usage of this opaque callback inside spapr_lmb_release: - sPAPRDIMMState was moved from spapr.c and added to spapr.h. A new attribute called 'addr' was added to it. This is used as an unique identifier to associate a sPAPRDIMMState to a PCDIMM element. - sPAPRMachineState now hosts a new QTAILQ called 'pending_dimm_unplugs'. This queue of sPAPRDIMMState elements will store the DIMM state of DIMMs that are currently going under an unplug process. - spapr_lmb_release() will now retrieve the nr_lmbs value by getting the correspondent sPAPRDIMMState. A helper function called spapr_dimm_get_address was created to fetch the address of a PCDIMM device inside spapr_lmb_release. When nr_lmbs reaches zero and the callback proceeds with the qdev hot unplug calls, the sPAPRDIMMState struct is removed from spapr->pending_dimm_unplugs. After these changes, the opaque argument for spapr_lmb_release is now unused and is passed as NULL inside spapr_del_lmbs. This and the other opaque arguments can now be safely removed from the code. As an additional cleanup made by this patch, the spapr_del_lmbs function was merged with spapr_memory_unplug_request. The former was being called only by the latter and both were small enough to fit one single function. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> [dwg: Minor stylistic cleanups] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24spapr: add pre_plug function for memoryLaurent Vivier
This allows to manage errors before the memory has started to be hotplugged. We already have the function for the CPU cores. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> [dwg: Fixed a couple of style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24pseries: Restore support for total vcpus not a multiple of threads-per-core ↵David Gibson
for old machine types As of pseries-2.7 and later, we require the total number of guest vcpus to be a multiple of the threads-per-core. pseries-2.6 and earlier machine types, however, are supposed to allow this for the sake of migration from old qemu versions which allowed this. Unfortunately, 8149e29 "pseries: Enforce homogeneous threads-per-core" broke this by not considering the old machine type case. This fixes it by only applying the check when the machine type supports hotpluggable cpus. By not-entirely-coincidence, that corresponds to the same time when we started enforcing total threads being a multiple of threads-per-core. Fixes: 8149e2992f7811355cc34721b79d69d1a3a667dd Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org>
2017-05-24spapr: fix error reporting in xics_system_init()Greg Kurz
If the user explicitely asked for kernel-irqchip support and "xics-kvm" initialization fails, we shouldn't fallback to emulated "xics" as we do now. It is also awkward to print an error message when we have an errp pointer argument. Let's use the errp argument to report the error and let the caller decide. This simplifies the code as we don't need a local Error * here. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24spapr: ensure core_slot isn't NULL in spapr_core_unplug()Greg Kurz
If we go that far on the path of hot-removing a core and we find out that the core-id is invalid, then we have a serious bug. Let's make it explicit with an assert() instead of dereferencing a NULL pointer. This fixes Coverity issue CID 1375404. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24spapr: Consolidate HPT freeing code into a routineBharata B Rao
Consolidate the code that frees HPT into a separate routine spapr_free_hpt() as the same chunk of code is called from two places. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24spapr: sanitize error handling in spapr_ics_create()Greg Kurz
The spapr_ics_create() function handles errors in a rather convoluted way, with two local Error * variables. Moreover, failing to parent the ICS object to the machine should be considered as a bug but it is currently ignored. This patch addresses both issues. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-24ppc/xics: simplify prototype of xics_spapr_init()Greg Kurz
This function only does hypercall and RTAS-call registration, and thus never returns an error. This patch adapt the prototype to reflect that. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-05-15Merge remote-tracking branch 'ehabkost/tags/x86-and-machine-pull-request' ↵Stefan Hajnoczi
into staging x86 and machine queue, 2017-05-11 Highlights: * New "-numa cpu" option * NUMA distance configuration * migration/i386 vmstatification # gpg: Signature made Thu 11 May 2017 08:16:07 PM BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # gpg: Note: This key has expired! # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * ehabkost/tags/x86-and-machine-pull-request: (29 commits) migration/i386: Remove support for pre-0.12 formats vmstatification: i386 FPReg migration/i386: Remove old non-softfloat 64bit FP support tests: check -numa node,cpu=props_list usecase numa: add '-numa cpu,...' option for property based node mapping numa: remove node_cpu bitmaps as they are no longer used numa: use possible_cpus for not mapped CPUs check machine: call machine init from wrapper numa: remove no longer need numa_post_machine_init() tests: numa: add case for QMP command query-cpus QMP: include CpuInstanceProperties into query_cpus output output virt-arm: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() spapr: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() pc: get numa node mapping from possible_cpus instead of numa_get_node_for_cpu() numa: do default mapping based on possible_cpus instead of node_cpu bitmaps numa: mirror cpu to node mapping in MachineState::possible_cpus numa: add check that board supports cpu_index to node mapping virt-arm: add node-id property to CPU pc: add node-id property to CPU spapr: add node-id property to sPAPR core ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-05-11spapr: get numa node mapping from possible_cpus instead of ↵Igor Mammedov
numa_get_node_for_cpu() it's safe to remove thread node_id != core node_id error branch as machine_set_cpu_numa_node() also does mismatch check and is called even before any CPU is created. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1494415802-227633-10-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11spapr: add node-id property to sPAPR coreIgor Mammedov
it will allow switching from cpu_index to core based numa mapping in follow up patches. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1494415802-227633-3-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11numa: move source of default CPUs to NUMA node mapping into boardsIgor Mammedov
Originally CPU threads were by default assigned in round-robin fashion. However it was causing issues in guest since CPU threads from the same socket/core could be placed on different NUMA nodes. Commit fb43b73b (pc: fix default VCPU to NUMA node mapping) fixed it by grouping threads within a socket on the same node introducing cpu_index_to_socket_id() callback and commit 20bb648d (spapr: Fix default NUMA node allocation for threads) reused callback to fix similar issues for SPAPR machine even though socket doesn't make much sense there. As result QEMU ended up having 3 default distribution rules used by 3 targets /virt-arm, spapr, pc/. In effort of moving NUMA mapping for CPUs into possible_cpus, generalize default mapping in numa.c by making boards decide on default mapping and let them explicitly tell generic numa code to which node a CPU thread belongs to by replacing cpu_index_to_socket_id() with @cpu_index_to_instance_props() which provides default node_id assigned by board to specified cpu_index. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1494415802-227633-2-git-send-email-imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11numa: equally distribute memory on nodesLaurent Vivier
When there are more nodes than available memory to put the minimum allowed memory by node, all the memory is put on the last node. This is because we put (ram_size / nb_numa_nodes) & ~((1 << mc->numa_mem_align_shift) - 1); on each node, and in this case the value is 0. This is particularly true with pseries, as the memory must be aligned to 256MB. To avoid this problem, this patch uses an error diffusion algorithm [1] to distribute equally the memory on nodes. We introduce numa_auto_assign_ram() function in MachineClass to keep compatibility between machine type versions. The legacy function is used with pseries-2.9, pc-q35-2.9 and pc-i440fx-2.9 (and previous), the new one with all others. Example: qemu-system-ppc64 -S -nographic -nodefaults -monitor stdio -m 1G -smp 8 \ -numa node -numa node -numa node \ -numa node -numa node -numa node Before: (qemu) info numa 6 nodes node 0 cpus: 0 6 node 0 size: 0 MB node 1 cpus: 1 7 node 1 size: 0 MB node 2 cpus: 2 node 2 size: 0 MB node 3 cpus: 3 node 3 size: 0 MB node 4 cpus: 4 node 4 size: 0 MB node 5 cpus: 5 node 5 size: 1024 MB After: (qemu) info numa 6 nodes node 0 cpus: 0 6 node 0 size: 0 MB node 1 cpus: 1 7 node 1 size: 256 MB node 2 cpus: 2 node 2 size: 0 MB node 3 cpus: 3 node 3 size: 256 MB node 4 cpus: 4 node 4 size: 256 MB node 5 cpus: 5 node 5 size: 256 MB [1] https://en.wikipedia.org/wiki/Error_diffusion Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170502162955.1610-2-lvivier@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> [ehabkost: s/ram_size/size/ at numa_default_auto_assign_ram()] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-05-11spapr: Don't accidentally advertise HTM support on POWER9David Gibson
Logic in spapr_populate_pa_features() enables the bit advertising Hardware Transactional Memory (HTM) in the guest's device tree only when KVM advertises its availability with the KVM_CAP_PPC_HTM feature. However, this assumes that the HTM bit is off in the base template used for the device tree value. That is true for POWER8, but not for POWER9. It looks like that was accidentally changed in 9fb4541 "spapr: Enable ISA 3.0 MMU mode selection via CAS". Fixes: 9fb4541f5803f8d2ba116b12113386e26482ba30 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com>
2017-05-11target/ppc: Enable RADIX mmu mode for pseries TCG guestSuraj Jitindar Singh
Now that we have added all the infrastructure we can enable a pseries TCG guest to use radix. In order to do this we have to add the appropriate bits to the ibm,arch-vec-5-platform-support vector to represent that we support both hash and radix mmu models. A radix guest can now be booted in pseries tcg mode by specifying: -cpu POWER9 Note that we assume hash, that is we allocate a hpt, until a guest tells us otherwise via a H_REGISTER_PROCESS_TABLE call with radix specified - in which case we free the hpt. If we were right and the guest is hash then there's nothing for us to do. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26spapr: remove the 'nr_servers' field from the machineCédric Le Goater
xics_system_init() does not need 'nr_servers' anymore as it is only used to define the 'interrupt-controller' node in the device tree. So let's just compute the value when calling spapr_dt_xics(). This also gives us an opportunity to simplify the xics_system_init() routine and introduce a specific spapr_ics_create() helper to create the sPAPR ICS object. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26spapr: allocate the ICPState object from under sPAPRCPUCoreCédric Le Goater
Today, all the ICPs are created before the CPUs, stored in an array under the sPAPR machine and linked to the CPU when the core threads are realized. This modeling brings some complexity when a lookup in the array is required and it can be simplified by allocating the ICPs when the CPUs are. This is the purpose of this proposal which introduces a new 'icp_type' field under the machine and creates the ICP objects of the right type (KVM or not) before the PowerPCCPU object are. This change allows more cleanups : the removal of the icps array under the sPAPR machine and the removal of the xics_get_cpu_index_by_dt_id() helper. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26spapr: move the IRQ server number mapping under the machineCédric Le Goater
This is the second step to abstract the IRQ 'server' number of the XICS layer. Now that the prereq cleanups have been done in the previous patch, we can move down the 'cpu_dt_id' to 'cpu_index' mapping in the sPAPR machine handler. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26target-ppc/kvm: Enable in-kernel TCE acceleration for multi-tceAlexey Kardashevskiy
This enables in-kernel handling of H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls. The host kernel support is there since v4.6, in particular d3695aa4f452 ("KVM: PPC: Add support for multiple-TCE hcalls"). H_PUT_TCE is already accelerated and does not need any special enablement. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26spapr: Workaround for broken radix guestsSam Bobroff
For a little while around 4.9, Linux kernels that saw the radix bit in ibm,pa-features would attempt to set up the MMU as if they were a hypervisor, even if they were a guest, which would cause them to crash. Work around this by detecting pre-ISA 3.0 guests by their lack of that bit in option vector 1, and then removing the radix bit from ibm,pa-features. Note: This now requires regeneration of that node after CAS negotiation. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>