aboutsummaryrefslogtreecommitdiff
path: root/include/hw/ppc/spapr.h
AgeCommit message (Collapse)Author
2019-01-09spapr: move the qemu_irq array under the machineCédric Le Goater
The qemu_irq array is now allocated at the machine level using a sPAPR IRQ set_irq handler depending on the chosen interrupt mode. The use of this handler is slightly inefficient today but it will become necessary when the 'dual' interrupt mode is introduced. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09ppc/spapr: Receive and store device tree blob from SLOFAlexey Kardashevskiy
SLOF receives a device tree and updates it with various properties before switching to the guest kernel and QEMU is not aware of any changes made by SLOF. Since there is no real RTAS (QEMU implements it), it makes sense to pass the SLOF final device tree to QEMU to let it implement RTAS related tasks better, such as PCI host bus adapter hotplug. Specifially, now QEMU can find out the actual XICS phandle (for PHB hotplug) and the RTAS linux,rtas-entry/base properties (for firmware assisted NMI - FWNMI). This stores the initial DT blob in the sPAPR machine and replaces it in the KVMPPC_H_UPDATE_DT (new private hypercall) handler. This adds an @update_dt_enabled machine property to allow backward migration. SLOF already has a hypercall since https://github.com/aik/SLOF/commit/e6fc84652c9c0073f9183 This makes use of the new fdt_check_full() helper. In order to allow the configure script to pick the correct DTC version, this adjusts the DTC presense test. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-09spapr: Add H-Call H_HOME_NODE_ASSOCIATIVITYLaurent Vivier
H_HOME_NODE_ASSOCIATIVITY H-Call returns the associativity domain designation associated with the identifier input parameter This fixes a crash when we try to hotplug a CPU in memory-less and CPU-less numa node. In this case, the kernel tries to online the node, but without the information provided by this h-call, the node id, it cannot and the CPU is started while the node is not onlined. It also removes the warning message from the kernel: VPHN is not supported. Disabling polling.. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr: introduce an 'ic-mode' machine optionCédric Le Goater
This option is used to select the interrupt controller mode (XICS or XIVE) with which the machine will operate. XICS being the default mode for now. When running a machine with the XIVE interrupt mode backend, the guest OS is required to have support for the XIVE exploitation mode. In the case of legacy OS, the mode selected by CAS should be XICS and the OS should fail to boot. However, QEMU could possibly detect it, terminate the boot process and reset to stop in the SLOF firmware. This is not yet handled. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr: add an extra OV5 field to the sPAPR IRQ backendCédric Le Goater
The interrupt modes supported by the hypervisor are advertised to the guest with new bits definitions of the option vector 5 of property "ibm,arch-vec-5-platform-support. The byte 23 bits 0-1 of the OV5 are defined as follow : 0b00 PAPR 2.7 and earlier (Legacy systems) 0b01 XIVE Exploitation mode only 0b10 Either available If the client/guest selects the XIVE interrupt mode, it informs the hypervisor by returning the value 0b01 in byte 23 bits 0-1. A 0b00 value indicates the use of the XICS interrupt mode (Legacy systems). The sPAPR IRQ backend is extended with these definitions and the values are directly used to populate the "ibm,arch-vec-5-platform-support" property. The interrupt mode is advertised under TCG and under KVM. Although a KVM XIVE device is not yet available, the machine can still operate with kernel_irqchip=off. However, we apply a restriction on the CPU which is required to be a POWER9 when a XIVE interrupt controller is in use. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr: add hcalls support for the XIVE exploitation interrupt modeCédric Le Goater
The different XIVE virtualization structures (sources and event queues) are configured with a set of Hypervisor calls : - H_INT_GET_SOURCE_INFO used to obtain the address of the MMIO page of the Event State Buffer (ESB) entry associated with the source. - H_INT_SET_SOURCE_CONFIG assigns a source to a "target". - H_INT_GET_SOURCE_CONFIG determines which "target" and "priority" is assigned to a source - H_INT_GET_QUEUE_INFO returns the address of the notification management page associated with the specified "target" and "priority". - H_INT_SET_QUEUE_CONFIG sets or resets the event queue for a given "target" and "priority". It is also used to set the notification configuration associated with the queue, only unconditional notification is supported for the moment. Reset is performed with a queue size of 0 and queueing is disabled in that case. - H_INT_GET_QUEUE_CONFIG returns the queue settings for a given "target" and "priority". - H_INT_RESET resets all of the guest's internal interrupt structures to their initial state, losing all configuration set via the hcalls H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG. - H_INT_SYNC issue a synchronisation on a source to make sure all notifications have reached their queue. Calls that still need to be addressed : H_INT_SET_OS_REPORTING_LINE H_INT_GET_OS_REPORTING_LINE See the code for more documentation on each hcall. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Folded in fix for field accessors] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr: introduce a new machine IRQ backend for XIVECédric Le Goater
The XIVE IRQ backend uses the same layout as the new XICS backend but covers the full range of the IRQ number space. The IRQ numbers for the CPU IPIs are allocated at the bottom of this space, below 4K, to preserve compatibility with XICS which does not use that range. This should be enough given that the maximum number of CPUs is 1024 for the sPAPR machine under QEMU. For the record, the biggest POWER8 or POWER9 system has a maximum of 1536 HW threads (16 sockets, 192 cores, SMT8). Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-12-21spapr: export and rename the xics_max_server_number() routineCédric Le Goater
The XIVE sPAPR IRQ backend will use it to define the number of ENDs of the IC controller. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-11-08ppc/spapr_caps: Add SPAPR_CAP_NESTED_KVM_HVSuraj Jitindar Singh
Add the spapr cap SPAPR_CAP_NESTED_KVM_HV to be used to control the availability of nested kvm-hv to the level 1 (L1) guest. Assuming a hypervisor with support enabled an L1 guest can be allowed to use the kvm-hv module (and thus run it's own kvm-hv guests) by setting: -machine pseries,cap-nested-hv=true or disabled with: -machine pseries,cap-nested-hv=false Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-11-08hw/ppc/spapr_rng: Introduce CONFIG_SPAPR_RNG switch for spapr_rng.cThomas Huth
The spapr-rng device is suboptimal when compared to virtio-rng, so users might want to disable it in their builds. Thus let's introduce a proper CONFIG switch to allow us to compile QEMU without this device. The function spapr_rng_populate_dt is required for linking, so move it to a different location. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-21spapr: introduce a IRQ controller backend to the machineCédric Le Goater
This proposal moves all the related IRQ routines of the sPAPR machine behind a sPAPR IRQ backend interface 'spapr_irq' to prepare for future changes. First of which will be to increase the size of the IRQ number space, then, will follow a new backend for the POWER9 XIVE IRQ controller. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-08-21spapr: introduce a fixed IRQ number spaceCédric Le Goater
This proposal introduces a new IRQ number space layout using static numbers for all devices, depending on a device index, and a bitmap allocator for the MSI IRQ numbers which are negotiated by the guest at runtime. As the VIO device model does not have a device index but a "reg" property, we introduce a formula to compute an IRQ number from a "reg" value. It should minimize most of the collisions. The previous layout is kept in pre-3.1 machines raising the 'legacy_irq_allocation' machine class flag. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-07-02hw/ppc: Use the IEC binary prefix definitionsPhilippe Mathieu-Daudé
It eases code review, unit is explicit. Patch generated using: $ git grep -E '(1024|2048|4096|8192|(<<|>>).?(10|20|30))' hw/ include/hw/ and modified manually. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20180625124238.25339-33-f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-22spapr: Use maximum page size capability to simplify memory backend checkingDavid Gibson
The way we used to handle KVM allowable guest pagesizes for PAPR guests required some convoluted checking of memory attached to the guest. The allowable pagesizes advertised to the guest cpus depended on the memory which was attached at boot, but then we needed to ensure that any memory later hotplugged didn't change which pagesizes were allowed. Now that we have an explicit machine option to control the allowable maximum pagesize we can simplify this. We just check all memory backends against that declared pagesize. We check base and cold-plugged memory at reset time, and hotplugged memory at pre_plug() time. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-22spapr: Maximum (HPT) pagesize propertyDavid Gibson
The way the POWER Hash Page Table (HPT) MMU is virtualized by KVM HV means that every page that the guest puts in the pagetables must be truly physically contiguous, not just GPA-contiguous. In effect this means that an HPT guest can't use any pagesizes greater than the host page size used to back its memory. At present we handle this by changing what we advertise to the guest based on the backing pagesizes. This is pretty bad, because it means the guest sees a different environment depending on what should be host configuration details. As a start on fixing this, we add a new capability parameter to the pseries machine type which gives the maximum allowed pagesizes for an HPT guest. For now we just create and validate the parameter without making it do anything. For backwards compatibility, on older machine types we set it to the max available page size for the host. For the 3.0 machine type, we fix it to 16, the intention being to only allow HPT pagesizes up to 64kiB by default in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-21spapr: remove unused spapr_irq routinesCédric Le Goater
spapr_irq_alloc_block and spapr_irq_alloc() are now deprecated. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21spapr: split the IRQ allocation sequenceCédric Le Goater
Today, when a device requests for IRQ number in a sPAPR machine, the spapr_irq_alloc() routine first scans the ICSState status array to find an empty slot and then performs the assignement of the selected numbers. Split this sequence in two distinct routines : spapr_irq_find() for lookups and spapr_irq_claim() for claiming the IRQ numbers. This will ease the introduction of a static layout of IRQ numbers. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-21spapr: Add cpu_apply hook to capabilitiesDavid Gibson
spapr capabilities have an apply hook to actually activate (or deactivate) the feature in the system at reset time. However, a number of capabilities affect the setup of cpus, and need to be applied to each of them - including hotplugged cpus for extra complication. To make this simpler, add an optional cpu_apply hook that is called from spapr_cpu_reset(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>
2018-06-21spapr: Compute effective capability values earlierDavid Gibson
Previously, the effective values of the various spapr capability flags were only determined at machine reset time. That was a lazy way of making sure it was after cpu initialization so it could use the cpu object to inform the defaults. But we've now improved the compat checking code so that we don't need to instantiate the cpus to use it. That lets us move the resolution of the capability defaults much earlier. This is going to be necessary for some future capabilities. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>
2018-05-07spapr: rename "hotplug memory" terminology to "device memory"David Hildenbrand
Let's make it clear at relevant places that we are dealing with device memory. That it can be used for memory hotplug is just a special case. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-11-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> [ehabkost: rebased series, solved conflicts at spapr.c] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-05-07machine: make MemoryHotplugState accessible via the machineDavid Hildenbrand
Let's allow to query the MemoryHotplugState directly from the machine. If the pointer is NULL, the machine does not support memory devices. If the pointer is !NULL, the machine supports memory devices and the data structure contains information about the applicable physical guest address space region. This allows us to generically detect if a certain machine has support for memory devices, and to generically manage it (find free address range, plug/unplug a memory region). We will rename "MemoryHotplugState" to something more meaningful ("DeviceMemory") after we completed factoring out the pc-dimm code into MemoryDevice code. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180423165126.15441-3-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> [ehabkost: rebased series, solved conflicts at spapr.c] [ehabkost: squashed fix to use g_malloc0()] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2018-03-06ppc/spapr-caps: Convert cap-ibs to custom spapr-capSuraj Jitindar Singh
Convert cap-ibs (indirect branch speculation) to a custom spapr-cap type. All tristate caps have now been converted to custom spapr-caps, so remove the remaining support for them. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> [dwg: Don't explicitly list "?"/help option, trust convention] [dwg: Fold tristate removal into here, to not break bisect] [dwg: Fix minor style problems] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-02-16spapr: rename spapr_vcpu_id() to spapr_get_vcpu_id()Greg Kurz
The spapr_vcpu_id() function is an accessor actually. Let's rename it for symmetry with the recently added spapr_set_vcpu_id() helper. The motivation behind this is that a later patch will consolidate the VCPU id formula in a function and spapr_vcpu_id looks like an appropriate name. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-02-16spapr: move VCPU calculation to core machine codeGreg Kurz
The VCPU ids are currently computed and assigned to each individual CPU threads in spapr_cpu_core_realize(). But the numbering logic of VCPU ids is actually a machine-level concept, and many places in hw/ppc/spapr.c also have to compute VCPU ids out of CPU indexes. The current formula used in spapr_cpu_core_realize() is: vcpu_id = (cc->core_id * spapr->vsmt / smp_threads) + i where: cc->core_id is a multiple of smp_threads cpu_index = cc->core_id + i 0 <= i < smp_threads So we have: cpu_index % smp_threads == i cc->core_id / smp_threads == cpu_index / smp_threads hence: vcpu_id = (cpu_index / smp_threads) * spapr->vsmt + cpu_index % smp_threads; This formula was used before VSMT at the time VCPU ids where computed at the target emulation level. It has the advantage of being useable to derive a VPCU id out of a CPU index only. It is fitted for all the places where the machine code has to compute a VCPU id. This patch introduces an accessor to set the VCPU id in a PowerPCCPU object using the above formula. It is a first step to consolidate all the VCPU id logic in a single place. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29target/ppc/spapr: Add H-Call H_GET_CPU_CHARACTERISTICSSuraj Jitindar Singh
The new H-Call H_GET_CPU_CHARACTERISTICS is used by the guest to query behaviours and available characteristics of the cpu. Implement the handler for this new H-Call which formulates its response based on the setting of the spapr_caps cap-cfpc, cap-sbbc and cap-ibs. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29target/ppc/spapr_caps: Add new tristate cap safe_indirect_branchSuraj Jitindar Singh
Add new tristate cap cap-ibs to represent the indirect branch serialisation capability. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29target/ppc/spapr_caps: Add new tristate cap safe_bounds_checkSuraj Jitindar Singh
Add new tristate cap cap-sbbc to represent the speculation barrier bounds checking capability. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29target/ppc/spapr_caps: Add new tristate cap safe_cacheSuraj Jitindar Singh
Add new tristate cap cap-cfpc to represent the cache flush on privilege change capability. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29target/ppc/spapr_caps: Add support for tristate spapr_capabilitiesSuraj Jitindar Singh
spapr_caps are used to represent the level of support for various capabilities related to the spapr machine type. Currently there is only support for boolean capabilities. Add support for tristate capabilities by implementing their get/set functions. These capabilities can have the values 0, 1 or 2 corresponding to broken, workaround and fixed. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-29target/ppc/kvm: Add cap_ppc_safe_[cache/bounds_check/indirect_branch]Suraj Jitindar Singh
Add three new kvm capabilities used to represent the level of host support for three corresponding workarounds. Host support for each of the capabilities is queried through the new ioctl KVM_PPC_GET_CPU_CHAR which returns four uint64 quantities. The first two, character and behaviour, represent the available characteristics of the cpu and the behaviour of the cpu respectively. The second two, c_mask and b_mask, represent the mask of known bits for the character and beheviour dwords respectively. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Correct some compile errors due to name change in final kernel patch version] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-17hw/ppc/spapr_caps: Rework spapr_caps to use uint8 internal representationSuraj Jitindar Singh
Currently spapr_caps are tied to boolean values (on or off). This patch reworks the caps so that they can have any uint8 value. This allows more capabilities with various values to be represented in the same way internally. Capabilities are numbered in ascending order. The internal representation of capability values is an array of uint8s in the sPAPRMachineState, indexed by capability number. Capabilities can have their own name, description, options, getter and setter functions, type and allow functions. They also each have their own section in the migration stream. Capabilities are only migrated if they were explictly set on the command line, with the assumption that otherwise the default will match. On migration we ensure that the capability value on the destination is greater than or equal to the capability value from the source. So long at this remains the case then the migration is considered compatible and allowed to continue. This patch implements generic getter and setter functions for boolean capabilities. It also converts the existings cap-htm, cap-vsx and cap-dfp capabilities to this new format. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-01-17spapr: Handle Decimal Floating Point (DFP) as an optional capabilityDavid Gibson
Decimal Floating Point has been available on POWER7 and later (server) cpus. However, it can be disabled on the hypervisor, meaning that it's not available to guests. We currently handle this by conditionally advertising DFP support in the device tree depending on whether the guest CPU model supports it - which can also depend on what's allowed in the host for -cpu host. That can lead to confusion on migration, since host properties are silently affecting guest visible properties. This patch handles it by treating it as an optional capability for the pseries machine type. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17spapr: Handle VMX/VSX presence as an spapr capability flagDavid Gibson
We currently have some conditionals in the spapr device tree code to decide whether or not to advertise the availability of the VMX (aka Altivec) and VSX vector extensions to the guest, based on whether the guest cpu has those features. This can lead to confusion and subtle failures on migration, since it makes a guest visible change based only on host capabilities. We now have a better mechanism for this, in spapr capabilities flags, which explicitly depend on user options rather than host capabilities. Rework the advertisement of VSX and VMX based on a new VSX capability. We no longer bother with a conditional for VMX support, because every CPU that's ever been supported by the pseries machine type supports VMX. NOTE: Some userspace distributions (e.g. RHEL7.4) already rely on availability of VSX in libc, so using cap-vsx=off may lead to a fatal SIGILL in init. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17spapr: Validate capabilities on migrationDavid Gibson
Now that the "pseries" machine type implements optional capabilities (well, one so far) there's the possibility of having different capabilities available at either end of a migration. Although arguably a user error, it would be nice to catch this situation and fail as gracefully as we can. This adds code to migrate the capabilities flags. These aren't pulled directly into the destination's configuration since what the user has specified on the destination command line should take precedence. However, they are checked against the destination capabilities. If the source was using a capability which is absent on the destination, we fail the migration, since that could easily cause a guest crash or other bad behaviour. If the source lacked a capability which is present on the destination we warn, but allow the migration to proceed. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17spapr: Treat Hardware Transactional Memory (HTM) as an optional capabilityDavid Gibson
This adds an spapr capability bit for Hardware Transactional Memory. It is enabled by default for pseries-2.11 and earlier machine types. with POWER8 or later CPUs (as it must be, since earlier qemu versions would implicitly allow it). However it is disabled by default for the latest pseries-2.12 machine type. This means that with the latest machine type, HTM will not be available, regardless of CPU, unless it is explicitly enabled on the command line. That change is made on the basis that: * This way running with -M pseries,accel=tcg will start with whatever cpu and will provide the same guest visible model as with accel=kvm. - More specifically, this means existing make check tests don't have to be modified to use cap-htm=off in order to run with TCG * We hope to add a new "HTM without suspend" feature in the not too distant future which could work on both POWER8 and POWER9 cpus, and could be enabled by default. * Best guesses suggest that future POWER cpus may well only support the HTM-without-suspend model, not the (frankly, horribly overcomplicated) POWER8 style HTM with suspend. * Anecdotal evidence suggests problems with HTM being enabled when it wasn't wanted are more common than being missing when it was. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>
2018-01-17spapr: Capabilities infrastructureDavid Gibson
Because PAPR is a paravirtual environment access to certain CPU (or other) facilities can be blocked by the hypervisor. PAPR provides ways to advertise in the device tree whether or not those features are available to the guest. In some places we automatically determine whether to make a feature available based on whether our host can support it, in most cases this is based on limitations in the available KVM implementation. Although we correctly advertise this to the guest, it means that host factors might make changes to the guest visible environment which is bad: as well as generaly reducing reproducibility, it means that a migration between different host environments can easily go bad. We've mostly gotten away with it because the environments considered mature enough to be well supported (basically, KVM on POWER8) have had consistent feature availability. But, it's still not right and some limitations on POWER9 is going to make it more of an issue in future. This introduces an infrastructure for defining "sPAPR capabilities". These are set by default based on the machine version, masked by the capabilities of the chosen cpu, but can be overriden with machine properties. The intention is at reset time we verify that the requested capabilities can be supported on the host (considering TCG, KVM and/or host cpu limitations). If not we simply fail, rather than silently modifying the advertised featureset to the guest. This does mean that certain configurations that "worked" may now fail, but such configurations were already more subtly broken. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>
2017-12-15spapr: fix LSI interrupt specifiers in the device treeGreg Kurz
LoPAPR 1.1 B.6.9.1.2 describes the "#interrupt-cells" property of the PowerPC External Interrupt Source Controller node as follows: “#interrupt-cells” Standard property name to define the number of cells in an interrupt- specifier within an interrupt domain. prop-encoded-array: An integer, encoded as with encode-int, that denotes the number of cells required to represent an interrupt specifier in its child nodes. The value of this property for the PowerPC External Interrupt option shall be 2. Thus all interrupt specifiers (as used in the standard “interrupts” property) shall consist of two cells, each containing an integer encoded as with encode-int. The first integer represents the interrupt number the second integer is the trigger code: 0 for edge triggered, 1 for level triggered. This patch fixes the interrupt specifiers in the "interrupt-map" property of the PHB node, that were setting the second cell to 8 (confusion with IRQ_TYPE_LEVEL_LOW ?) instead of 1. VIO devices and RTAS event sources use the same format for interrupt specifiers: while here, we introduce a common helper to handle the encoding details. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> -- v3: - reference public LoPAPR instead of internal PAPR+ in changelog - change helper name to spapr_dt_xics_irq() v2: - drop the erroneous changes to the "interrupts" prop in PCI device nodes - introduce a common helper to encode interrupt specifiers Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15spapr: introduce a spapr_qirq() helperCédric Le Goater
xics_get_qirq() is only used by the sPAPR machine. Let's move it there and change its name to reflect its scope. It will be useful for XIVE support which will use its own set of qirqs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-12-15spapr: move the IRQ allocation routines under the machineCédric Le Goater
Also change the prototype to use a sPAPRMachineState and prefix them with spapr_irq_. It will let us synchronise the IRQ allocation with the XIVE interrupt mode when available. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17ppc: spapr: use generic cpu_model parsingIgor Mammedov
use generic cpu_model parsing introduced by (6063d4c0f vl.c: convert cpu_model to cpu type and set of global properties before machine_init()) it allows to: * replace sPAPRMachineClass::tcg_default_cpu with MachineClass::default_cpu_type * drop cpu_parse_cpu_model() from hw/ppc/spapr.c and reuse one in vl.c * simplify spapr_get_cpu_core_type() by removing not needed anymore recurrsion since alias look up happens earlier at vl.c and spapr_get_cpu_core_type() works only with resulted from that cpu type. * spapr no more needs to parse/depend on being phased out MachineState::cpu_model, all tha parsing done by generic code and target specific callback. Signed-off-by: Igor Mammedov <imammedo@redhat.com> [dwg: Correct minor compile error] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-10-17ppc: move '-cpu foo,compat=xxx' parsing into ppc_cpu_parse_featurestr()Igor Mammedov
there is a dedicated callback CPUClass::parse_features which purpose is to convert -cpu features into a set of global properties AND deal with compat/legacy features that couldn't be directly translated into CPU's properties. Create ppc variant of it (ppc_cpu_parse_featurestr) and move 'compat=val' handling from spapr_cpu_core.c into it. That removes a dependency of board/core code on cpu_model parsing and would let to reuse common -cpu parsing introduced by 6063d4c0 Set "max-cpu-compat" property only if it exists, in practice it should limit 'compat' hack to spapr machine and allow to avoid including machine/spapr headers in target/ppc/cpu.c Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08PPC: KVM: Support machine option to set VSMT modeSam Bobroff
KVM now allows writing to KVM_CAP_PPC_SMT which has previously been read only. Doing so causes KVM to act, for that VM, as if the host's SMT mode was the given value. This is particularly important on Power 9 systems because their default value is 1, but they are able to support values up to 8. This patch introduces a way to control this capability via a new machine property called VSMT ("Virtual SMT"). If the value is not set on the command line a default is chosen that is, when possible, compatible with legacy systems. Note that the intialization of KVM_CAP_PPC_SMT has changed slightly because it has changed (in KVM) from a global capability to a VM-specific one. This won't cause a problem on older KVMs because VM capabilities fall back to global ones. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08ppc: spapr: Make VCPU ID handling private to SPAPRSam Bobroff
The concept of a VCPU ID that differs from the CPU's index (cpu->cpu_index) exists only within SPAPR machines so, move the functions ppc_get_vcpu_id() and ppc_get_cpu_by_vcpu_id() into spapr.c and rename them appropriately. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-09-08hw/ppc: clear pending_events on machine resetDaniel Henrique Barboza
The sPAPR machine isn't clearing up the pending events QTAILQ on machine reboot. This allows for unprocessed hotplug/epow events to persist in the queue after reset and, when reasserting the IRQs in check_exception later on, these will be being processed by the OS. This patch implements a new function called 'spapr_clear_pending_events' that clears up the pending_events QTAILQ. This helper is then called inside ppc_spapr_reset to clear up the events queue, preventing old/deprecated events from persisting after a reset. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-17pseries: Use smaller default hash page tables when guest can resizeDavid Gibson
We've now implemented a PAPR extension allowing PAPR guest to resize their hash page table (HPT) during runtime. This patch makes use of that facility to allocate smaller HPTs by default. Specifically when a guest is aware of the HPT resize facility, qemu sizes the HPT to the initial memory size, rather than the maximum memory size on the assumption that the guest will resize its HPT if necessary for hot plugged memory. When the initial memory size is much smaller than the maximum memory size (a common configuration with e.g. oVirt / RHEV) then this can save significant memory on the HPT. If the guest does *not* advertise HPT resize awareness when it makes the ibm,client-architecture-support call, qemu resizes the HPT for maxmimum memory size (unless it's been configured not to allow such guests at all). For now we make that reallocation assuming the guest has not yet used the HPT at all. That's true in practice, but not, strictly, an architectural or PAPR requirement. If we need to in future we can fix this by having the client-architecture-support call reboot the guest with the revised HPT size (the client-architecture-support call is explicitly permitted to trigger a reboot in this way). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
2017-07-17pseries: Implement HPT resizingDavid Gibson
This patch implements hypercalls allowing a PAPR guest to resize its own hash page table. This will eventually allow for more flexible memory hotplug. The implementation is partially asynchronous, handled in a special thread running the hpt_prepare_thread() function. The state of a pending resize is stored in SPAPR_MACHINE->pending_hpt. The H_RESIZE_HPT_PREPARE hypercall will kick off creation of a new HPT, or, if one is already in progress, monitor it for completion. If there is an existing HPT resize in progress that doesn't match the size specified in the call, it will cancel it, replacing it with a new one matching the given size. The H_RESIZE_HPT_COMMIT completes transition to a resized HPT, and can only be called successfully once H_RESIZE_HPT_PREPARE has successfully completed initialization of a new HPT. The guest must ensure that there are no concurrent accesses to the existing HPT while this is called (this effectively means stop_machine() for Linux guests). For now H_RESIZE_HPT_COMMIT goes through the whole old HPT, rehashing each HPTE into the new HPT. This can have quite high latency, but it seems to be of the order of typical migration downtime latencies for HPTs of size up to ~2GiB (which would be used in a 256GiB guest). In future we probably want to move more of the rehashing to the "prepare" phase, by having H_ENTER and other hcalls update both current and pending HPTs. That's a project for another day, but should be possible without any changes to the guest interface. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-17pseries: Stubs for HPT resizingDavid Gibson
This introduces stub implementations of the H_RESIZE_HPT_PREPARE and H_RESIZE_HPT_COMMIT hypercalls which we hope to add in a PAPR extension to allow run time resizing of a guest's hash page table. It also adds a new machine property for controlling whether this new facility is available. For now we only allow resizing with TCG, allowing it with KVM will require kernel changes as well. Finally, it adds a new string to the hypertas property in the device tree, advertising to the guest the availability of the HPT resizing hypercalls. This is a tentative suggested value, and would need to be standardized by PAPR before being merged. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2017-07-17spapr: Minor cleanups to events handlingDavid Gibson
The rtas_error_log structure is marked packed, which strongly suggests its precise layout is important to match an external interface. Along with that one could expect it to have a fixed endianness to match the same interface. That used to be the case - matching the layout of PAPR RTAS event format and requiring BE fields. Now, however, it's only used embedded within sPAPREventLogEntry with the fields in native order, since they're processed internally. Clear that up by removing the nested structure in sPAPREventLogEntry. struct rtas_error_log is moved back to spapr_events.c where it is used as a temporary to help convert the fields in sPAPREventLogEntry to the correct in memory format when delivering an event to the guest. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-17spapr: migrate pending_events of spapr stateDaniel Henrique Barboza
In racing situations between hotplug events and migration operation, a rtas hotplug event could have not yet be delivered to the source guest when migration is started. In this case the pending_events of spapr state need be transmitted to the target so that the hotplug event can be finished on the target. To achieve the minimal VMSD possible to migrate the pending_events list, this patch makes the changes in spapr_events.c: - 'log_type' of sPAPREventLogEntry struct deleted. This information can be derived by inspecting the rtas_error_log summary field. A new function called 'spapr_event_log_entry_type' was added to retrieve the type of a given sPAPREventLogEntry. - sPAPREventLogEntry, epow_log_full and hp_log_full were redesigned. The only data we're going to migrate in the VMSD is the event log data itself, which can be divided in two parts: a rtas_error_log header and an extended event log field. The rtas_error_log header contains information about the size of the extended log field, which can be used inside VMSD as the size parameter of the VBUFFER_ALOC field that will store it. To allow this use, the header.extended_length field must be exposed inline to the VMSD instead of embedded into a 'data' field that holds everything. With this in mind, the following changes were done: * a new 'header' field was added to sPAPREventLogEntry. This field holds a a struct rtas_error_log inline. * the declaration of the 'rtas_error_log' struct was moved to spapr.h to be visible to the VMSD macros. * 'data' field of sPAPREventLogEntry was renamed to 'extended_log' and now holds only the contents of the extended event log. * 'struct rtas_error_log hdr' were taken away from both epow_log_full and hp_log_full. This information is now available at the header field of sPAPREventLogEntry. * epow_log_full and hp_log_full were renamed to epow_extended_log and hp_extended_log respectively. This rename makes it clearer to understand the new purpose of both structures: hold the information of an extended event log field. * spapr_powerdown_req and spapr_hotplug_req_event now creates a sPAPREventLogEntry structure that contains the full rtas log entry. * rtas_event_log_queue and rtas_event_log_dequeue now receives a sPAPREventLogEntry pointer as a parameter instead of a void pointer. - the endianess of the sPAPREventLogEntry header is now native instead of be32. We can use the fields in native endianess internally and write them in be32 in the guest physical memory inside 'check_exception'. This allows the VMSD inside spapr.c to read the correct size of the entended_log field. - inside spapr.c, pending_events is put in a subsection in the spapr state VMSD to make sure migration across different versions is not broken. A small change in rtas_event_log_queue and rtas_event_log_dequeue were also made: instead of calling qdev_get_machine(), both functions now receive a pointer to the sPAPRMachineState. This pointer is already available in the callers of these functions and we don't need to waste resources calling qdev() again. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-07-14memory/iommu: introduce IOMMUMemoryRegionClassAlexey Kardashevskiy
This finishes QOM'fication of IOMMUMemoryRegion by introducing a IOMMUMemoryRegionClass. This also provides a fastpath analog for IOMMU_MEMORY_REGION_GET_CLASS(). This makes IOMMUMemoryRegion an abstract class. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20170711035620.4232-3-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>