aboutsummaryrefslogtreecommitdiff
path: root/target-ppc/kvm.c
AgeCommit message (Collapse)Author
2016-03-24spapr/target-ppc/kvm: Only add hcall-instructions if KVM supports itAlexey Kardashevskiy
ePAPR defines "hcall-instructions" device-tree property which contains code to call hypercalls in ePAPR paravirtualized guests. In general pseries guests won't use this property, instead using the PAPR defined hypercall interface. However, this property has been re-used to implement a hack to allow PR KVM to run (slightly modified) guests in some situations where it otherwise wouldn't be able to (because the system's L0 hypervisor doesn't forward the PAPR hypercalls to the PR KVM kernel). Hence, this property is always present in the device tree for pseries guests. All KVM guests use it at least to read features via the KVM_HC_FEATURES hypercall. The property is populated by the code returned from the KVM's KVM_PPC_GET_PVINFO ioctl; if not implemented in the KVM, QEMU supplies code which will fail all hypercall attempts. If QEMU does not create the property, and the guest kernel is compiled with CONFIG_EPAPR_PARAVIRT (which is normally the case), there is exactly the same stub at @epapr_hypercall_start already. Rather than maintaining this fairly useless stub implementation, it makes more sense not to create the property in the device tree in the first place if the host kernel does not implement it. This changes kvmppc_get_hypercall() to return 1 if the host kernel does not implement KVM_CAP_PPC_GET_PVINFO. The caller can use it to decide on whether to create the property or not. This changes the pseries machine to not create the property if KVM does not implement KVM_PPC_GET_PVINFO. In practice this means that from now on the property will not be created if either HV KVM or TCG is used. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [reworded commit message for clarity --dwg] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-18target-ppc: Document TOCTTOU in hugepage supportMarkus Armbruster
The code to find the minimum page size is is vulnerable to TOCTTOU. Added in commit 2d103aa "target-ppc: fix hugepage support when using memory-backend-file" (v2.4.0). Since I can't fix it myself right now, add a FIXME comment. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1458066895-20632-2-git-send-email-armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2016-03-16target-ppc: Add helpers for updating a CPU's SDR1 and external HPTDavid Gibson
When a Power cpu with 64-bit hash MMU has it's hash page table (HPT) pointer updated by a write to the SDR1 register we need to update some derived variables. Likewise, when the cpu is configured for an external HPT (one not in the guest memory space) some derived variables need to be updated. Currently the logic for this is (partially) duplicated in ppc_store_sdr1() and in spapr_cpu_reset(). In future we're going to need it in some other places, so make some common helpers for this update. In addition the new ppc_hash64_set_external_hpt() helper also updates SDR1 in KVM - it's not updated by the normal runtime KVM <-> qemu CPU synchronization. In a sense this belongs logically in the ppc_hash64_set_sdr1() helper, but that is called from kvm_arch_get_registers() so can't itself call cpu_synchronize_state() without infinite recursion. In practice this doesn't matter because the only other caller is TCG specific. Currently there aren't situations where updating SDR1 at runtime in KVM matters, but there are going to be in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com>
2016-03-16target-ppc: Split out SREGS get/put functionsDavid Gibson
Currently the getting and setting of Power MMU registers (sregs) take up large inline chunks of the kvm_arch_get_registers() and kvm_arch_put_registers() functions. Especially since there are two variants (for Book-E and Book-S CPUs), only one of which will be used in practice, this is pretty hard to read. This patch splits these out into helper functions for clarity. No functional change is expected. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
2016-02-25ppc/kvm: Tell the user what might be wrong when using bad CPU types with kvm-hvThomas Huth
Using a CPU type that does not match the host is not possible when using the kvm-hv kernel module - the PVR is checked in the kernel function kvm_arch_vcpu_ioctl_set_sregs_hv() and rejected with -EINVAL if it does not match the host. However, when the user tries to specify a non-matching CPU type, QEMU currently only reports "kvm_init_vcpu failed: Invalid argument", and this is of course not very helpful for the user to solve the problem. So this patch adds a more descriptive error message that tells the user to specify "-cpu host" instead. Signed-off-by: Thomas Huth <thuth@redhat.com> [Removed melodramatic '!' :)] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-02-25ppc/kvm: Use error_report() instead of cpu_abort() for user-triggerable errorsThomas Huth
Setting the KVM_CAP_PPC_PAPR capability can fail if either the KVM kernel module does not support it, or if the specified vCPU type is not a 64-bit Book3-S CPU type. For example, the user can trigger it easily with "-M pseries -cpu G2leLS" when using the kvm-pr kernel module. So the error should not be reported with cpu_abort() since this function is rather meant for reporting programming errors than reporting user-triggerable errors (it prints out all CPU registers and then calls abort() to kills the program - two things that the normal user does not expect here) . So let's use error_report() with exit(1) here instead. A similar problem exists in the code that sets the KVM_CAP_PPC_EPR capability, so while we're at it, fix that, too. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-01-30target-ppc: Rework ppc_store_slbDavid Gibson
ppc_store_slb updates the SLB for PPC cpus with 64-bit hash MMUs. Currently it takes two parameters, which contain values encoded as the register arguments to the slbmte instruction, one register contains the ESID portion of the SLBE and also the slot number, the other contains the VSID portion of the SLBE. We're shortly going to want to do some SLB updates from other code where it is more convenient to supply the slot number and ESID separately, so rework this function and its callers to work this way. As a bonus, this slightly simplifies the emulation of segment registers for when running a 32-bit OS on a 64-bit CPU. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Alexander Graf <agraf@suse.de>
2016-01-30target-ppc: Convert mmu-hash{32,64}.[ch] from CPUPPCState to PowerPCCPUDavid Gibson
Like a lot of places these files include a mixture of functions taking both the older CPUPPCState *env and newer PowerPCCPU *cpu. Move a step closer to cleaning this up by standardizing on PowerPCCPU, except for the helper_* functions which are called with the CPUPPCState * from tcg. Callers and some related functions are updated as well, the boundaries of what's changed here are a bit arbitrary. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Alexander Graf <agraf@suse.de>
2016-01-30target-ppc: kvm: fix floating point registers sync on little-endian hostsGreg Kurz
On VSX capable CPUs, the 32 FP registers are mapped to the high-bits of the 32 first VSX registers. So if you have: VSR31 = (uint128) 0x0102030405060708090a0b0c0d0e0f00 then FPR31 = (uint64) 0x0102030405060708 The kernel stores the VSX registers in the fp_state struct following the host endian element ordering. On big-endian: fp_state.fpr[31][0] = 0x0102030405060708 fp_state.fpr[31][1] = 0x090a0b0c0d0e0f00 On little-endian: fp_state.fpr[31][0] = 0x090a0b0c0d0e0f00 fp_state.fpr[31][1] = 0x0102030405060708 The KVM_GET_ONE_REG and KVM_SET_ONE_REG ioctls preserve this ordering, but QEMU considers it as big-endian and always copies element [0] to the fpr[] array and element [1] to the vsr[] array. This does not work with little-endian hosts, and you will get: (qemu) p $f31 0x90a0b0c0d0e0f00 instead of: (qemu) p $f31 0x102030405060708 This patch fixes the element ordering for little-endian hosts. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-01-29ppc: Clean up includesPeter Maydell
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1453832250-766-6-git-send-email-peter.maydell@linaro.org
2016-01-11target-ppc: Define kvmppc_read_int_dt()Sukadev Bhattiprolu
Extract code from the function kvmppc_read_int_cpu_dt() that actually reads the file into a separate function, so it can be called from other places. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23ppc/spapr: Add "ibm,pa-features" property to the device-treeBenjamin Herrenschmidt
LoPAPR defines a "ibm,pa-features" per-CPU device tree property which describes extended features of the Processor Architecture. This adds the property to the device tree. At the moment this is the copy of what pHyp advertises except "I=1 (cache inhibited) Large Pages" which is enabled for TCG and disabled when running under HV KVM host with 4K system page size. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [aik: rebased, changed commit log, moved ci_large_pages initialization, renamed pa_features arrays] Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23ppc: Add mmu_model defines for arch 2.03 and 2.07Benjamin Herrenschmidt
This removes unused POWERPC_MMU_2_06a/POWERPC_MMU_2_06d. This replaces POWERPC_MMU_64B with POWERPC_MMU_2_03 for POWER5+ to be more explicit about the version of the PowerISA supported. This defines POWERPC_MMU_2_07 and uses it for the POWER8 CPU family. This will not have an immediate effect now but it will in the following patch. This should cause no behavioural change. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [aik: rebased, changed commit log] Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-10-23spapr_iommu: Rename vfio_accel parameterDavid Gibson
The vfio_accel parameter used when creating a new TCE table (guest IOMMU context) has a confusing name. What it really means is whether we need the TCE table created to be able to support VFIO devices. VFIO is relevant, because when available we use in-kernel acceleration of the TCE table, but that may not work with VFIO devices because updates to the table are handled in kernel, bypass qemu and so don't hit qemu's infrastructure for keeping the VFIO host IOMMU state in sync with the guest IOMMU state. Rename the parameter to "need_vfio" throughout. This is a cosmetic change, with no impact on the logic. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2015-10-19kvm: Pass PCI device pointer to MSI routing functionsPavel Fedin
In-kernel ITS emulation on ARM64 will require to supply requester IDs. These IDs can now be retrieved from the device pointer using new pci_requester_id() function. This patch adds pci_dev pointer to KVM GSI routing functions and makes callers passing it. x86 architecture does not use requester IDs, but hw/i386/kvm/pci-assign.c also made passing PCI device pointer instead of NULL for consistency with the rest of the code. Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Message-Id: <ce081423ba2394a4efc30f30708fca07656bc500.1444916432.git.p.fedin@samsung.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-09qdev: Protect device-list-properties against broken devicesMarkus Armbruster
Several devices don't survive object_unref(object_new(T)): they crash or hang during cleanup, or they leave dangling pointers behind. This breaks at least device-list-properties, because qmp_device_list_properties() needs to create a device to find its properties. Broken in commit f4eb32b "qmp: show QOM properties in device-list-properties", v2.1. Example reproducer: $ qemu-system-aarch64 -nodefaults -display none -machine none -S -qmp stdio {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4, "major": 2}, "package": ""}, "capabilities": []}} { "execute": "qmp_capabilities" } {"return": {}} { "execute": "device-list-properties", "arguments": { "typename": "pxa2xx-pcmcia" } } qemu-system-aarch64: /home/armbru/work/qemu/memory.c:1307: memory_region_finalize: Assertion `((&mr->subregions)->tqh_first == ((void *)0))' failed. Aborted (core dumped) [Exit 134 (SIGABRT)] Unfortunately, I can't fix the problems in these devices right now. Instead, add DeviceClass member cannot_destroy_with_object_finalize_yet to mark them: * Hang during cleanup (didn't debug, so I can't say why): "realview_pci", "versatile_pci". * Dangling pointer in cpus: most CPUs, plus "allwinner-a10", "digic", "fsl,imx25", "fsl,imx31", "xlnx,zynqmp", because they create such CPUs * Assert kvm_enabled(): "host-x86_64-cpu", host-i386-cpu", "host-powerpc64-cpu", "host-embedded-powerpc-cpu", "host-powerpc-cpu" (the powerpc ones can't currently reach the assertion, because the CPUs are only registered when KVM is enabled, but the assertion is arguably in the wrong place all the same) Make qmp_device_list_properties() fail cleanly when the device is so marked. This improves device-list-properties from "crashes, hangs or leaves dangling pointers behind" to "fails". Not a complete fix, just a better-than-nothing work-around. In the above reproducer, device-list-properties now fails with "Can't list properties of device 'pxa2xx-pcmcia'". This also protects -device FOO,help, which uses the same machinery since commit ef52358 "qdev-monitor: include QOM properties in -device FOO, help output", v2.2. Example reproducer: $ qemu-system-aarch64 -machine none -device pxa2xx-pcmcia,help Before: qemu-system-aarch64: .../memory.c:1307: memory_region_finalize: Assertion `((&mr->subregions)->tqh_first == ((void *)0))' failed. After: Can't list properties of device 'pxa2xx-pcmcia' Cc: "Andreas Färber" <afaerber@suse.de> Cc: "Edgar E. Iglesias" <edgar.iglesias@gmail.com> Cc: Alexander Graf <agraf@suse.de> Cc: Anthony Green <green@moxielogic.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Cc: Blue Swirl <blauwirbel@gmail.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Jia Liu <proljc@gmail.com> Cc: Leon Alrae <leon.alrae@imgtec.com> Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Walle <michael@walle.cc> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Richard Henderson <rth@twiddle.net> Cc: qemu-ppc@nongnu.org Cc: qemu-stable@nongnu.org Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <1443689999-12182-10-git-send-email-armbru@redhat.com>
2015-10-08target-ppc: Remove unnecessary variableShraddha Barke
Compress lines and remove the variable. Signed-off-by: Shraddha Barke <shraddha.6596@gmail.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2015-09-23ppc/spapr: Implement H_RANDOM hypercall in QEMUThomas Huth
The PAPR interface defines a hypercall to pass high-quality hardware generated random numbers to guests. Recent kernels can already provide this hypercall to the guest if the right hardware random number generator is available. But in case the user wants to use another source like EGD, or QEMU is running with an older kernel, we should also have this call in QEMU, so that guests that do not support virtio-rng yet can get good random numbers, too. This patch now adds a new pseudo-device to QEMU that either directly provides this hypercall to the guest or is able to enable the in-kernel hypercall if available. The in-kernel hypercall can be enabled with the use-kvm property, e.g.: qemu-system-ppc64 -device spapr-rng,use-kvm=true For handling the hypercall in QEMU instead, a "RngBackend" is required since the hypercall should provide "good" random data instead of pseudo-random (like from a "simple" library function like rand() or g_random_int()). Since there are multiple RngBackends available, the user must select an appropriate back-end via the "rng" property of the device, e.g.: qemu-system-ppc64 -object rng-random,filename=/dev/hwrng,id=gid0 \ -device spapr-rng,rng=gid0 ... See http://wiki.qemu-project.org/Features-Done/VirtIORNG for other example of specifying RngBackends. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-09-23spapr: Enable in-kernel H_SET_MODE handlingAlexey Kardashevskiy
For setting debug watchpoints, sPAPR guests use H_SET_MODE hypercall. The existing QEMU H_SET_MODE handler does not support this but the KVM handler in HV KVM does. However it is not enabled. This enables the in-kernel H_SET_MODE handler which handles: - Completed Instruction Address Breakpoint Register - Watch point 0 registers. The rest is still handled in QEMU. Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-07-07target-ppc: fix hugepage support when using memory-backend-fileMichael Roth
Current PPC code relies on -mem-path being used in order for hugepage support to be detected. With the introduction of MemoryBackendFile we can now handle this via: -object memory-file-backend,mem-path=...,id=hugemem0 \ -numa node,id=mem0,memdev=hugemem0 Management tools like libvirt treat the 2 approaches as interchangeable in some cases, which can lead to user-visible regressions even for previously supported guest configurations. Fix these by also iterating through any configured memory backends that may be backed by hugepages. Since the old code assumed hugepages always backed the entirety of guest memory, play it safe an pick the minimum across the max pages sizes for all backends, even ones that aren't backed by hugepages. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-01kvm: First step to push iothread lock out of inner run loopJan Kiszka
This opens the path to get rid of the iothread lock on vmexits in KVM mode. On x86, the in-kernel irqchips has to be used because we otherwise need to synchronize APIC and other per-cpu state accesses that could be changed concurrently. Regarding pre/post-run callbacks, s390x and ARM should be fine without specific locking as the callbacks are empty. MIPS and POWER require locking for the pre-run callback. For the handle_exit callback, it is non-empty in x86, POWER and s390. Some POWER cases could do without the locking, but it is left in place for now. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1434646046-27150-7-git-send-email-pbonzini@redhat.com>
2015-06-04Merge remote-tracking branch 'remotes/agraf/tags/signed-ppc-for-upstream' ↵Peter Maydell
into staging Patch queue for ppc - 2015-06-03 Highlights this time around: - sPAPR: endian fixes, speedups, bug fixes, hotplug basics - add default ram size capability for machines (sPAPR defaults to 512MB now) # gpg: Signature made Wed Jun 3 22:59:09 2015 BST using RSA key ID 03FEDC60 # gpg: Good signature from "Alexander Graf <agraf@suse.de>" # gpg: aka "Alexander Graf <alex@csgraf.de>" * remotes/agraf/tags/signed-ppc-for-upstream: (40 commits) softmmu: support up to 12 MMU modes tcg: add TCG_TARGET_TLB_DISPLACEMENT_BITS tci: do not use CPUArchState in tcg-target.h Add David Gibson for sPAPR in MAINTAINERS file pseries: Enable in-kernel H_LOGICAL_CI_{LOAD, STORE} implementations spapr: override default ram size to 512MB machine: add default_ram_size to machine class spapr_pci: emit hotplug add/remove events during hotplug spapr_pci: enable basic hotplug operations pci: make pci_bar useable outside pci.c spapr_pci: create DRConnectors for each PCI slot during PHB realize spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge spapr_drc: add spapr_drc_populate_dt() spapr_events: event-scan RTAS interface spapr_events: re-use EPOW event infrastructure for hotplug events spapr_rtas: add ibm, configure-connector RTAS interface spapr: add rtas_st_buffer_direct() helper spapr_rtas: add get-sensor-state RTAS interface spapr_rtas: add set-indicator RTAS interface spapr_rtas: add get/set-power-level RTAS interfaces ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-06-03pseries: Enable in-kernel H_LOGICAL_CI_{LOAD, STORE} implementationsDavid Gibson
qemu currently implements the hypercalls H_LOGICAL_CI_LOAD and H_LOGICAL_CI_STORE as PAPR extensions. These are used by the SLOF firmware for IO, because performing cache inhibited MMIO accesses with the MMU off (real mode) is very awkward on POWER. This approach breaks when SLOF needs to access IO devices implemented within KVM instead of in qemu. The simplest example would be virtio-blk using an iothread, because the iothread / dataplane mechanism relies on an in-kernel implementation of the virtio queue notification MMIO. To fix this, an in-kernel implementation of these hypercalls has been made, (kernel commit 99342cf "kvmppc: Implement H_LOGICAL_CI_{LOAD,STORE} in KVM" however, the hypercalls still need to be enabled from qemu. This performs the necessary calls to do so. It would be nice to provide some warning if we encounter a problematic device with a kernel which doesn't support the new calls. Unfortunately, I can't see a way to detect this case which won't either warn in far too many cases that will probably work, or which is horribly invasive. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-06-02kvm: introduce kvm_arch_msi_data_to_gsiEric Auger
On ARM the MSI data corresponds to the shared peripheral interrupt (SPI) ID. This latter equals to the SPI index + 32. to retrieve the SPI index, matching the gsi, an architecture specific function is introduced. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-04-30kvm: add support for memory transaction attributesPaolo Bonzini
Let kvm_arch_post_run convert fields in the kvm_run struct to MemTxAttrs. These are then passed to address_space_rw. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-11kvm: add machine state to kvm_arch_initMarcel Apfelbaum
Needed to query machine's properties. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-12kvm: extend kvm_irqchip_add_msi_route to work on s390Frank Blaschka
on s390 MSI-X irqs are presented as thin or adapter interrupts for this we have to reorganize the routing entry to contain valid information for the adapter interrupt code on s390. To minimize impact on existing code we introduce an architecture function to fixup the routing entry. Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2015-01-07target-ppc: explicitly save page table headers in big endianCédric Le Goater
Currently, when the page tables are saved, the kvm_get_htab_header structs and the ptes are assumed being big endian and dumped as a indistinct blob in the statefile. This is no longer true when the host is little endian and this breaks restoration. This patch unfolds the kvmppc_save_htab routine to write explicitly the kvm_get_htab_header structs in big endian. The ptes are left untouched. Signed-off-by: Cédric Le Goater <clg@fr.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-11-04target-ppc: kvm: Fix memory overflow issue about strncat()Chen Gang
strncat() will append additional '\0' to destination buffer, so need additional 1 byte for it, or may cause memory overflow, just like other area within QEMU have done. And can use g_strdup_printf() instead of strncat(), which may be more easier understanding. Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-08PPC: KVM: Use vm check_extension for pv hcallAlexander Graf
To find out whether we support the KVM hypercall interface we need to ask KVM on the VM level rather than the global KVM level, because Book3S HV KVM does not support it and we play conservative when both HV and PR are loaded. So instead, use the VM helper that falls back to global KVM enumeration. That should cover all cases. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-08ppc: Add hw breakpoint watchpoint supportBharat Bhushan
This patch adds hardware breakpoint and hardware watchpoint support for ppc. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then qemu can overwrite them. When QEMU is not able to handle debug exception then we inject program exception to guest. Yes program exception NOT debug exception and the reason is: 1) QEMU and guest not sharing debug resources 2) For software breakpoint QEMU uses a ehpriv-1 instruction; So there cannot be any reason that we are in qemu with exit reason KVM_EXIT_DEBUG for guest set debug exception, only possibility is guest executed ehpriv-1 privilege instruction and that's why we are injecting program exception. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-08ppc: Add software breakpoint supportBharat Bhushan
This patch allow insert/remove software breakpoint. When QEMU is not able to handle debug exception then we inject program exception to guest because for software breakpoint QEMU uses a ehpriv-1 instruction; So there cannot be any reason that we are in qemu with exit reason KVM_EXIT_DEBUG for guest set debug exception, only possibility is guest executed ehpriv-1 privilege instruction and that's why we are injecting program exception. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com> [agraf: make deflect comment booke/book3s agnostic] Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-08ppc: synchronize excp_vectors for injecting exceptionBharat Bhushan
This patch synchronizes env->excp_vectors[] with env->iovr[]. This is required for using the existing interrupt injection mechanism for kvm. Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-08ppc: debug stub: Get trap instruction opcode from KVMBharat Bhushan
Get trap instruction opcode from KVM and this opcode will be used for setting software breakpoint in following patch Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-09-08spapr: add uuid/host details to device treeNikunj A Dadhania
Useful for identifying the guest/host uniquely within the guest. Adding following properties to the guest root node. vm,uuid - uuid of the guest host-model - Host model number host-serial - Host machine serial number hypervisor type - Tells its "kvm" Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-07-15spapr: Move RMA memory region registration codeAlexey Kardashevskiy
PPC970 does not support VRMA (virtual RMA) so real memory required for SLOF to execute must be allocated by the KVM_ALLOCATE_RMA ioctl. Later this memory is used as a part of the guest RAM area. The RMA allocating code also registers a memory region for this piece of RAM. We are going to simplify memory regions layout: RMA memory region will be a subregion in the RAM memory region, both starting from zero. This way we will not have to take care of start address alignment for the piece of RAM next to the RMA. This moves memory region business closer to the RAM memory region creation/allocation code. As this is a mechanical patch, no change in behaviour is expected. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [agraf: fix compilation on non-kvm systems] Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-27spapr_iommu: Make in-kernel TCE table optionalAlexey Kardashevskiy
POWER KVM supports an KVM_CAP_SPAPR_TCE capability which allows allocating TCE tables in the host kernel memory and handle H_PUT_TCE requests targeted to specific LIOBN (logical bus number) right in the host without switching to QEMU. At the moment this is used for emulated devices only and the handler only puts TCE to the table. If the in-kernel H_PUT_TCE handler finds a LIOBN and corresponding table, it will put a TCE to the table and complete hypercall execution. The user space will not be notified. Upcoming VFIO support is going to use the same sPAPRTCETable device class so KVM_CAP_SPAPR_TCE is going to be used as well. That means that TCE tables for VFIO are going to be allocated in the host as well. However VFIO operates with real IOMMU tables and simple copying of a TCE to the real hardware TCE table will not work as guest physical to host physical address translation is requited. So until the host kernel gets VFIO support for H_PUT_TCE, we better not to register VFIO's TCE in the host. This adds a place holder for KVM_CAP_SPAPR_TCE_VFIO capability. It is not in upstream yet and being discussed so now it is always false which means that in-kernel VFIO acceleration is not supported. This adds a bool @vfio_accel flag to the sPAPRTCETable device telling that sPAPRTCETable should not try allocating TCE table in the host kernel for VFIO. The flag is false now as at the moment there is no VFIO. This adds an vfio_accel parameter to spapr_tce_new_table(), the semantic is the same. Since there is only emulated PCI and VIO now, the flag is set to false. Upcoming VFIO support will set it to true. This is a preparation patch so no change in behaviour is expected Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16PPC: KVM: Make pv hcall endian agnosticAlexander Graf
There were a few revisions of the Linux kernel that incorrectly swapped the hcall instructions when they saw ePAPR compliant hypercalls. We already have fixups for those in place when running with PR KVM, but HV KVM and systems that don't implement hypercalls at all are still broken because they fall back to the QEMU implementation of fallback hypercalls. So let's make the fallback hypercall instruction path endian agnostic. This only really works well for 64bit guests, but I don't think there are any 32bit systems left that don't implement real pv hcall support, so we'll never get into this code path. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16KVM: target-ppc: Enable TM state migrationAlexey Kardashevskiy
This adds migration support for registers saved before Transactional Memory (TM) transaction started. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Tom Musta <tommusta@gmail.com> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16KVM: PPC: Expose fixup hcall capabilityAlexander Graf
New kvm versions expose a PPC_FIXUP_HCALL capability. Make it visible to machine code so we can take decisions based on it. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16spapr_iommu: Get rid of window_size in sPAPRTCETableAlexey Kardashevskiy
This removes window_size as it is basically a copy of nb_table shifted by SPAPR_TCE_PAGE_SHIFT. As new dynamic DMA windows are going to support windows as big as the entire RAM and this number will be bigger that 32 capacity, we will have to do something about @window_size anyway and removal seems to be the right way to go. This removes dma_window_start/dma_window_size from sPAPRPHBState as they are no longer used. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16spapr_iommu: Enable multiple TCE requestsAlexey Kardashevskiy
Currently only single TCE entry per request is supported (H_PUT_TCE). However PAPR+ specification allows multiple entry requests such as H_PUT_TCE_INDIRECT and H_STUFF_TCE. Having less transitions to the host kernel via ioctls, support of these calls can accelerate IOMMU operations. This implements H_STUFF_TCE and H_PUT_TCE_INDIRECT. This advertises "multi-tce" capability to the guest if the host kernel supports it (KVM_CAP_SPAPR_MULTITCE) or guest is running in TCG mode. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16KVM: PPC: Enable compatibility modeAlexey Kardashevskiy
The host kernel implements a KVM_REG_PPC_ARCH_COMPAT register which this uses to enable a compatibility mode if any chosen. This sets the KVM_REG_PPC_ARCH_COMPAT register in KVM. ppc_set_compat() signals the caller if the mode cannot be enabled by the host kernel. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [agraf: fix TCG compat setting] Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16spapr: Add support for time base offset migrationAlexey Kardashevskiy
This allows guests to have a different timebase origin from the host. This is needed for migration, where a guest can migrate from one host to another and the two hosts might have a different timebase origin. However, the timebase seen by the guest must not go backwards, and should go forwards only by a small amount corresponding to the time taken for the migration. This is only supported for recent POWER hardware which has the TBU40 (timebase upper 40 bits) register. That includes POWER6, 7, 8 but not 970. This adds kvm_access_one_reg() to access a special register which is not in env->spr. This requires kvm_set_one_reg/kvm_get_one_reg patch. The feature must be present in the host kernel. This bumps vmstate_spapr::version_id and enables new vmstate_ppc_timebase only for it. Since the vmstate_spapr::minimum_version_id remains unchanged, migration from older QEMU is supported but without vmstate_ppc_timebase. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16KVM: PPC: Don't secretly add 1T segment feature to CPUAlexander Graf
When we select a CPU type that does not support 1TB segments, we should not expose 1TB just because KVM supports 1TB segments. User configuration always wins over feature availability. Signed-off-by: Alexander Graf <agraf@suse.de>
2014-06-16target-ppc: Create versionless CPU class per family if KVMAlexey Kardashevskiy
At the moment generic version-less CPUs are supported via hardcoded aliases. For example, POWER7 is an alias for POWER7_v2.1. So when QEMU is started with -cpu POWER7, the POWER7_v2.1 class instance is created. This approach works for TCG and KVMs other than HV KVM. HV KVM cannot emulate PVR value so the guest always sees the real PVR. HV KVM will not allow setting PVR other that the host PVR because of that (the kernel patch for it is on its way). So in most cases it is impossible to run QEMU with -cpu POWER7 unless the host PVR is exactly the same as the one from the alias (which is now POWER7_v2.3). It was decided that under HV KVM QEMU should use -cpu host. Using "host" CPU type creates a problem for management tools such as libvirt because they want to know in advance if the destination guest can possibly run on the destination. Since the "host" type is really not a type and will always work with any KVM, there is no way for libvirt to know if the migration will success. This registers additional CPU class derived from the host CPU family. The name for it is taken from @desc field of the CPU family class. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-13kvm: reset state from the CPU's reset methodPaolo Bonzini
Now that we have a CPU object with a reset method, it is better to keep the KVM reset close to the CPU reset. Using qemu_register_reset as we do now keeps them far apart. With this patch, PPC no longer calls the kvm_arch_ function, so it can get removed there. Other arches call it from their CPU reset handler, and the function gets an ARMCPU/X86CPU/S390CPU. Note that ARM- and s390-specific functions are called kvm_arm_* and kvm_s390_*, while x86-specific functions are called kvm_arch_*. That follows the convention used by the different architectures. Changing that is the topic of a separate patch. Reviewed-by: Gleb Natapov <gnatapov@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-30ppc: use kvm_vcpu_enable_cap()Cornelia Huck
Convert existing users of KVM_ENABLE_CAP to new helper. Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-03-13exec: Change cpu_abort() argument to CPUStateAndreas Färber
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13cpu: Move exception_index field from CPU_COMMON to CPUStateAndreas Färber
Signed-off-by: Andreas Färber <afaerber@suse.de>