aboutsummaryrefslogtreecommitdiff
path: root/hw/ppc/spapr_hcall.c
AgeCommit message (Collapse)Author
2017-05-11target/ppc: Set UPRT and GTSE on all cpus in H_REGISTER_PROCESS_TABLESuraj Jitindar Singh
The UPRT and GTSE bits are set when a guest calls H_REGISTER_PROCESS_TABLE to choose determine how address translation is performed. Currently these bits in the LPCR are only set for the cpu which handles the H_CALL, however they need to be set for all cpus for that guest as address translation cannot be performed differently on a per cpu basis. Update the H_CALL handler to set these bits in the LPCR correctly for all cpus of the guest. Note it is the reponsibility of the guest to ensure that any secondary cpus are suspended when the H_CALL is made and thus we can safely update these values here. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26spapr: Workaround for broken radix guestsSam Bobroff
For a little while around 4.9, Linux kernels that saw the radix bit in ibm,pa-features would attempt to set up the MMU as if they were a hypervisor, even if they were a guest, which would cause them to crash. Work around this by detecting pre-ISA 3.0 guests by their lack of that bit in option vector 1, and then removing the radix bit from ibm,pa-features. Note: This now requires regeneration of that node after CAS negotiation. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26spapr: Enable ISA 3.0 MMU mode selection via CASSam Bobroff
Add the new node, /chosen/ibm,arch-vec-5-platform-support to the device tree. This allows the guest to determine which modes are supported by the hypervisor. Update the option vector processing in h_client_architecture_support() to handle the new MMU bits. This allows guests to request hash or radix mode and QEMU to create the guest's HPT at this time if it is necessary but hasn't yet been done. QEMU will terminate the guest if it requests an unavailable mode, as required by the architecture. Extend the ibm,pa-features node with the new ISA 3.0 values and set the radix bit if KVM supports radix mode. This probably won't be used directly by guests to determine the availability of radix mode (that is indicated by the new node added above) but the architecture requires that it be set when the hardware supports it. If QEMU is using KVM, and KVM is capable of running in radix mode, guests can be run in real-mode without allocating a HPT (because KVM will use a minimal RPT). So in this case, we avoid creating the HPT at reset time and later (during CAS) create it if it is necessary. ISA 3.0 guests will now begin to call h_register_process_table(), which has been added previously. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Strip some unneeded prefix from error messages] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26target/ppc: Implement H_REGISTER_PROCESS_TABLE H_CALLSuraj Jitindar Singh
The H_REGISTER_PROCESS_TABLE H_CALL is used by a guest to indicate to the hypervisor where in memory its process table is and how translation should be performed using this process table. Provide the implementation of this H_CALL for a guest. We first check for invalid flags, then parse the flags to determine the operation, and then check the other parameters for valid values based on the operation (register new table/deregister table/maintain registration). The process table is then stored in the appropriate location and registered with the hypervisor (if running under KVM), and the LPCR_[UPRT/GTSE] bits are updated as required. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [dwg: Correct missing prototype and uninitialized variable] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-04-26target/ppc: Add new H-CALL shells for in memory table translationSuraj Jitindar Singh
The use of the new in memory tables introduced in ISAv3.00 for translation, also referred to as process tables, requires the introduction of 3 new H-CALLs; H_REGISTER_PROCESS_TABLE, H_CLEAN_SLB, and H_INVALIDATE_PID. Add shells for each of these and register them as the hypercall handlers. Currently they all log an unimplemented hypercall and return H_FUNCTION. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01target/ppc: Manage external HPT via virtual hypervisorDavid Gibson
The pseries machine type implements the behaviour of a PAPR compliant hypervisor, without actually executing such a hypervisor on the virtual CPU. To do this we need some hooks in the CPU code to make hypervisor facilities get redirected to the machine instead of emulated internally. For hypercalls this is managed through the cpu->vhyp field, which points to a QOM interface with a method implementing the hypercall. For the hashed page table (HPT) - also a hypervisor resource - we use an older hack. CPUPPCState has an 'external_htab' field which when non-NULL indicates that the HPT is stored in qemu memory, rather than within the guest's address space. For consistency - and to make some future extensions easier - this merges the external HPT mechanism into the vhyp mechanism. Methods are added to vhyp for the basic operations the core hash MMU code needs: map_hptes() and unmap_hptes() for reading the HPT, store_hpte() for updating it and hpt_mask() to retrieve its size. To match this, the pseries machine now sets these vhyp fields in its existing vhyp class, rather than reaching into the cpu object to set the external_htab field. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
2017-03-01target/ppc: Eliminate htab_base and htab_mask variablesDavid Gibson
CPUPPCState includes fields htab_base and htab_mask which store the base address (GPA) and size (as a mask) of the guest's hashed page table (HPT). These are set when the SDR1 register is updated. Keeping these in sync with the SDR1 is actually a little bit fiddly, and probably not useful for performance, since keeping them expands the size of CPUPPCState. It also makes some upcoming changes harder to implement. This patch removes these fields, in favour of calculating them directly from the SDR1 contents when necessary. This does make a change to the behaviour of attempting to write a bad value (invalid HPT size) to the SDR1 with an mtspr instruction. Previously, the bad value would be stored in SDR1 and could be retrieved with a later mfspr, but the HPT size as used by the softmmu would be, clamped to the allowed values. Now, writing a bad value is treated as a no-op. An error message is printed in both new and old versions. I'm not sure which behaviour, if either, matches real hardware. I don't think it matters that much, since it's pretty clear that if an OS writes a bad value to SDR1, it's not going to boot. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-03-01target/ppc: Cleanup HPTE accessors for 64-bit hash MMUDavid Gibson
Accesses to the hashed page table (HPT) are complicated by the fact that the HPT could be in one of three places: 1) Within guest memory - when we're emulating a full guest CPU at the hardware level (e.g. powernv, mac99, g3beige) 2) Within qemu, but outside guest memory - when we're emulating user and supervisor instructions within TCG, but instead of emulating the CPU's hypervisor mode, we just emulate a hypervisor's behaviour (pseries in TCG or KVM-PR) 3) Within the host kernel - a pseries machine using KVM-HV acceleration. Mostly accesses to the HPT are handled by KVM, but there are a few cases where qemu needs to access it via a special fd for the purpose. In order to batch accesses to the fd in case (3), we use a somewhat awkward ppc_hash64_start_access() / ppc_hash64_stop_access() pair, which for case (3) reads / releases several HPTEs from the kernel as a batch (usually a whole PTEG). For cases (1) & (2) it just returns an address value. The actual HPTE load helpers then need to interpret the returned token differently in the 3 cases. This patch keeps the same basic structure, but simplfiies the details. First start_access() / stop_access() are renamed to map_hptes() and unmap_hptes() to make their operation more obvious. Second, map_hptes() now always returns a qemu pointer, which can always be used in the same way by the load_hpte() helpers. In case (1) it comes from address_space_map() in case (2) directly from qemu's HPT buffer and in case (3) from a temporary buffer read from the KVM fd. While we're at it, make things a bit more consistent in terms of types and variable names: avoid variables named 'index' (it shadows index(3) which can lead to confusing results), use 'hwaddr ptex' for HPTE indices and uint64_t for each of the HPTE words, use ptex throughout the call stack instead of pte_offset in some places (we still need that at the bottom layer, but nowhere else). Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-03-01pseries: Minor cleanups to HPT management hypercallsDavid Gibson
* Standardize on 'ptex' instead of 'pte_index' for HPTE index variables for consistency and brevity * Avoid variables named 'index'; shadowing index(3) from libc can lead to surprising bugs if the variable is removed, because compiler errors might not appear for remaining references * Clarify index calculations in h_enter() - we have two cases, H_EXACT where the exact HPTE slot is given, and !H_EXACT where we search for an empty slot within the hash bucket. Make the calculation more consistent between the cases. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
2017-01-31ppc: Add ppc_set_compat_all()David Gibson
Once a compatiblity mode is negotiated with the guest, h_client_architecture_support() uses run_on_cpu() to update each CPU to the new mode. We're going to want this logic somewhere else shortly, so make a helper function to do this global update. We put it in target-ppc/compat.c - it makes as much sense at the CPU level as it does at the machine level. We also move the cpu_synchronize_state() into ppc_set_compat(), since it doesn't really make any sense to call that without synchronizing state. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-01-31pseries: Rewrite CAS PVR compatibility logicDavid Gibson
During boot, PAPR guests negotiate CPU model support with the ibm,client-architecture-support mechanism. The logic to implement this in qemu is very convoluted. This cleans it up to be cleaner, using the new ppc_check_compat() call. The new logic for choosing a compatibility mode is: 1. Usually, use the most recent compatibility mode that is a) supported by the guest b) supported by the CPU and c) no later than the maximum allowed (if specified) 2. If no suitable compatibility mode was found, the guest *does* support this CPU explicitly, and no maximum compatibility mode is specified, then use "raw" mode for the current CPU 3. Otherwise, fail the boot. This differs from the results of the old code: the old code preferred using "raw" mode to a compatibility mode, whereas the new code prefers a compatibility mode if available. Using compatibility mode preferentially means that we're more likely to be able to migrate the guest to a similar but not identical host. Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-01-31ppc/spapr: implement H_SIGNAL_SYS_RESETNicholas Piggin
The H_SIGNAL_SYS_RESET hcall allows a guest CPU to raise a system reset exception on CPUs within the same guest -- all CPUs, all-but-self, or a specific CPU (including self). This has not made its way to a PAPR release yet, but we have an hcall number assigned. H_SIGNAL_SYS_RESET = 0x380 Syntax: hcall(uint64 H_SIGNAL_SYS_RESET, int64 target); Generate a system reset NMI on the threads indicated by target. Values for target: -1 = target all online threads including the caller -2 = target all online threads except for the caller All other negative values: reserved Positive values: The thread to be targeted, obtained from the value of the "ibm,ppc-interrupt-server#s" property of the CPU in the OF device tree. Semantics: - Invalid target: return H_Parameter. - Otherwise: Generate a system reset NMI on target thread(s), return H_Success. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-01-31ppc: Rename cpu_version to compat_pvrDavid Gibson
The 'cpu_version' field in PowerPCCPU is badly named. It's named after the 'cpu-version' device tree property where it is advertised, but that meaning may not be obvious in most places it appears. Worse, it doesn't even really correspond to that device tree property. The property contains either the processor's PVR, or, if the CPU is running in a compatibility mode, a special "logical PVR" representing which mode. Rename the cpu_version field, and a number of related variables to compat_pvr to make this clearer. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Thomas Huth <thuth@redhat.com>
2017-01-31pseries: Make cpu_update during CAS unconditionalDavid Gibson
spapr_h_cas_compose_response() includes a cpu_update parameter which controls whether it includes updated information on the CPUs in the device tree fragment returned from the ibm,client-architecture-support (CAS) call. Providing the updated information is essential when CAS has negotiated compatibility options which require different cpu information to be presented to the guest. However, it should be safe to provide in other cases (it will just override the existing data in the device tree with identical data). This simplifies the code by removing the parameter and always providing the cpu update information. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2017-01-19kvm: move cpu synchronization codeVincent Palatin
Move the generic cpu_synchronize_ functions to the common hw_accel.h header, in order to prepare for the addition of a second hardware accelerator. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Vincent Palatin <vpalatin@chromium.org> Message-Id: <f5c3cffe8d520011df1c2e5437bb814989b48332.1484045952.git.vpalatin@chromium.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-31Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream-mttcg' into ↵Peter Maydell
staging Base patches for MTTCG enablement. # gpg: Signature made Mon 31 Oct 2016 14:01:41 GMT # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream-mttcg: tcg: move locking for tb_invalidate_phys_page_range up *_run_on_cpu: introduce run_on_cpu_data type cpus: re-factor out handle_icount_deadline tcg: cpus rm tcg_exec_all() tcg: move tcg_exec_all and helpers above thread fn target-arm/arm-powerctl: wake up sleeping CPUs tcg: protect translation related stuff with tb_lock. translate-all: Add assert_(memory|tb)_lock annotations linux-user/elfload: ensure mmap_lock() held while setting up tcg: comment on which functions have to be called with tb_lock held cpu-exec: include cpu_index in CPU_LOG_EXEC messages translate-all: add DEBUG_LOCKING asserts translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH cpus: make all_vcpus_paused() return bool Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-31*_run_on_cpu: introduce run_on_cpu_data typePaolo Bonzini
This changes the *_run_on_cpu APIs (and helpers) to pass data in a run_on_cpu_data type instead of a plain void *. This is because we sometimes want to pass a target address (target_ulong) and this fails on 32 bit hosts emulating 64 bit guests. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20161027151030.20863-24-alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-28spapr: add option vector handling in CAS-generated resetsMichael Roth
In some cases, ibm,client-architecture-support calls can fail. This could happen in the current code for situations where the modified device tree segment exceeds the buffer size provided by the guest via the call parameters. In these cases, QEMU will reset, allowing an opportunity to regenerate the device tree from scratch via boot-time handling. There are potentially other scenarios as well, not currently reachable in the current code, but possible in theory, such as cases where device-tree properties or nodes need to be removed. We currently don't handle either of these properly for option vector capabilities however. Instead of carrying the negotiated capability beyond the reset and creating the boot-time device tree accordingly, we start from scratch, generating the same boot-time device tree as we did prior to the CAS-generated and the same device tree updates as we did before. This could (in theory) cause us to get stuck in a reset loop. This hasn't been observed, but depending on the extensiveness of CAS-induced device tree updates in the future, could eventually become an issue. Address this by pulling capability-related device tree updates resulting from CAS calls into a common routine, spapr_dt_cas_updates(), and adding an sPAPROptionVector* parameter that allows us to test for newly-negotiated capabilities. We invoke it as follows: 1) When ibm,client-architecture-support gets called, we call spapr_dt_cas_updates() with the set of capabilities added since the previous call to ibm,client-architecture-support. For the initial boot, or a system reset generated by something other than the CAS call itself, this set will consist of *all* options supported both the platform and the guest. For calls to ibm,client-architecture-support immediately after a CAS-induced reset, we call spapr_dt_cas_updates() with only the set of capabilities added since the previous call, since the other capabilities will have already been addressed by the boot-time device-tree this time around. In the unlikely event that capabilities are *removed* since the previous CAS, we will generate a CAS-induced reset. In the unlikely event that we cannot fit the device-tree updates into the buffer provided by the guest, well generate a CAS-induced reset. 2) When a CAS update results in the need to reset the machine and include the updates in the boot-time device tree, we call the spapr_dt_cas_updates() using the full set of negotiated capabilities as part of the reset path. At initial boot, or after a reset generated by something other than the CAS call itself, this set will be empty, resulting in what should be the same boot-time device-tree as we generated prior to this patch. For CAS-induced reset, this routine will be called with the full set of capabilities negotiated by the platform/guest in the previous CAS call, which should result in CAS updates from previous call being accounted for in the initial boot-time device tree. Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Changed an int -> bool conversion to be more explicit] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28spapr_hcall: use spapr_ovec_* interfaces for CAS optionsMichael Roth
Currently we access individual bytes of an option vector via ldub_phys() to test for the presence of a particular capability within that byte. Currently this is only done for the "dynamic reconfiguration memory" capability bit. If that bit is present, we pass a boolean value to spapr_h_cas_compose_response() to generate a modified device tree segment with the additional properties required to enable this functionality. As more capability bits are added, will would need to modify the code to add additional option vector accesses and extend the param list for spapr_h_cas_compose_response() to include similar boolean values for these parameters. Avoid this by switching to spapr_ovec_* helpers so we can do all the parsing in one shot and then test for these additional bits within spapr_h_cas_compose_response() directly. Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-27cpus: pass CPUState to run_on_cpu helpersAlex Bennée
CPUState is a fairly common pointer to pass to these helpers. This means if you need other arguments for the async_run_on_cpu case you end up having to do a g_malloc to stuff additional data into the routine. For the current users this isn't a massive deal but for MTTCG this gets cumbersome when the only other parameter is often an address. This adds the typedef run_on_cpu_func for helper functions which has an explicit CPUState * passed as the first parameter. All the users of run_on_cpu and async_run_on_cpu have had their helpers updated to use CPUState where available. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> [Sergey Fedorov: - eliminate more CPUState in user data; - remove unnecessary user data passing; - fix target-s390x/kvm.c and target-s390x/misc_helper.c] Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts) Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> (s390 parts) Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <1470158864-17651-3-git-send-email-alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-09-23target-ppc: tlbie/tlbivax should have global effectNikunj A Dadhania
tlbie (BookS) and tlbivax (BookE) plus the H_CALLs(pseries) should have a global effect. Introduces TLB_NEED_GLOBAL_FLUSH flag. During lazy tlb flush, after taking care of pending local flushes, check broadcast flush(at context synchronizing event ptesync/tlbsync, etc) is needed. Depending on the bitmask state of the tlb_need_flush, tlb is flushed from other cpus if needed and the flags are cleared. Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Use 'true' instead of '1' for call to check_tlb_flush()] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-09-23target-ppc: add flag in check_tlb_flush()Nikunj A Dadhania
We flush the qemu TLB lazily. check_tlb_flush is called whenever we hit a context synchronizing event or instruction that requires a pending flush to be performed. However, we fail to handle broadcast TLB flush operations. In order to fix that efficiently, we want to differentiate whether check_tlb_flush() needs to only apply pending local flushes (isync instructions, interrupts, ...) or also global pending flush operations. The latter is only needed when executing instructions that are defined architecturally as synchronizing global TLB flush operations. This in our case is ptesync on BookS and tlbsync on BookE along with the paravirtualized hypervisor calls. Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> [dwg: Changed gen_check_tlb_flush() to also take a bool, and fixed some spelling errors in commit message] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-05ppc: simplify ppc_hash64_hpte_page_shift_noslb()Cédric Le Goater
The segment page shift parameter is never used. Let's remove it. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-22powerpc/mm: Update the WIMG check during H_ENTERAneesh Kumar K.V
Support for 0 value for memeory coherence is optional and with ppc64 we can always enable memory coherence. Linux kernel did that during the development of 4.7 kernel. But that resulted in failure in Qemu in H_ENTER hcall due to below check. The mentioned change was reverted in the kernel and kernel right now enable memory coherence only if cache inhibited is not set. Nevertheless update qemu WIMG flag check to cover the case where we enable memory coherence along with cache inhibited flag. In order to handle older and newer kernel version consider both Cache inhibitted and (cache inhibitted | memory conference) as valid values for wimg flags. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-14ppc: Add PowerISA 2.07 compatibility modeThomas Huth
Make sure that guests can use the PowerISA 2.07 CPU sPAPR compatibility mode when they request it and the target CPU supports it. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-14ppc: Split pcr_mask settings into supported bits and the register maskThomas Huth
The current pcr_mask values are ambiguous: Should these be the mask that defines valid bits in the PCR register? Or should these rather indicate which compatibility levels are possible? Anyway, POWER6 and POWER7 should certainly not use the same values here. So let's introduce an additional variable "pcr_supported" here which is used to indicate the valid compatibility levels, and use pcr_mask to signal the valid bits in the PCR register. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-06-14ppc/spapr: Refactor h_client_architecture_support() CPU parsing codeThomas Huth
The h_client_architecture_support() function has become quite big and nested already. So factor out the code that takes care of the sPAPR compatibility PVRs (which will be modified by the following patches). Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-30ppc: Do some batching of TCG tlb flushesBenjamin Herrenschmidt
On ppc64 especially, we flush the tlb on any slbie or tlbie instruction. However, those instructions often come in bursts of 3 or more (context switch will favor a series of slbie's for example to an slbia if the SLB has less than a certain number of entries in it, and tlbie's can happen in a series, with PAPR, H_BULK_REMOVE can remove up to 4 entries at a time. Doing a tlb_flush() each time is a waste of time. We end up doing a memset of the whole TLB, reloading it for the next instruction, memset'ing again, etc... Those instructions don't have to take effect immediately. For slbie, they can wait for the next context synchronizing event. For tlbie, the next tlbsync. This implements batching by keeping a flag that indicates that we have a TLB in need of flushing. We check it on interrupts, rfi's, isync's and tlbsync and flush the TLB if needed. This reduces the number of tlb_flush() on a boot to a ubuntu installer first dialog screen from roughly 360K down to 36K. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [clg: added a 'CPUPPCState *' variable in h_remove() and h_bulk_remove() ] Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: removed spurious whitespace change, use 0/1 not true/false consistently, since tlb_need_flush has int type] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-05-19cpu: move exec-all.h inclusion out of cpu.hPaolo Bonzini
exec-all.h contains TCG-specific definitions. It is not needed outside TCG-specific files such as translate.c, exec.c or *helper.c. One generic function had snuck into include/exec/exec-all.h; move it to include/qom/cpu.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19hw: explicitly include qemu/log.hPaolo Bonzini
Move the inclusion out of hw/hw.h, most files do not need it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19dma: do not depend on kvm_enabled()Paolo Bonzini
Memory barriers are needed also by Xen and, when the ioeventfd bugs are fixed, by TCG as well. sysemu/kvm.h is not anymore needed in sysemu/dma.h, move it to the actual users. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-05ppc: Rework POWER7 & POWER8 exception modelCédric Le Goater
From: Benjamin Herrenschmidt <benh@kernel.crashing.org> This patch fixes the current AIL implementation for POWER8. The interrupt vector address can be calculated directly from LPCR when the exception is handled. The excp_prefix update becomes useless and we can cleanup the H_SET_MODE hcall. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [clg: Removed LPES0/1 handling for HV vs. !HV Fixed LPCR_ILE case for POWERPC_EXCP_POWER8 ] Signed-off-by: Cédric Le Goater <clg@fr.ibm.com> [dwg: This was written as a cleanup, but it also fixes a real bug where setting an alternative interrupt location would not be correctly migrated] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-03-22include/qemu/osdep.h: Don't include qapi/error.hMarkus Armbruster
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the Error typedef. Since then, we've moved to include qemu/osdep.h everywhere. Its file comment explains: "To avoid getting into possible circular include dependencies, this file should not include any other QEMU headers, with the exceptions of config-host.h, compiler.h, os-posix.h and os-win32.h, all of which are doing a similar job to this file and are under similar constraints." qapi/error.h doesn't do a similar job, and it doesn't adhere to similar constraints: it includes qapi-types.h. That's in excess of 100KiB of crap most .c files don't actually need. Add the typedef to qemu/typedefs.h, and include that instead of qapi/error.h. Include qapi/error.h in .c files that need it and don't get it now. Include qapi-types.h in qom/object.h for uint16List. Update scripts/clean-includes accordingly. Update it further to match reality: replace config.h by config-target.h, add sysemu/os-posix.h, sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h comment quoted above similarly. This reduces the number of objects depending on qapi/error.h from "all of them" to less than a third. Unfortunately, the number depending on qapi-types.h shrinks only a little. More work is needed for that one. Signed-off-by: Markus Armbruster <armbru@redhat.com> [Fix compilation without the spice devel packages. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-16target-ppc: Eliminate kvmppc_kern_htab globalDavid Gibson
fa48b43 "target-ppc: Remove hack for ppc_hash64_load_hpte*() with HV KVM" purports to remove a hack in the handling of hash page tables (HPTs) managed by KVM instead of qemu. However, it actually went in the wrong direction. That patch requires anything looking for an external HPT (that is one not managed by the guest itself) to check both env->external_htab (for a qemu managed HPT) and kvmppc_kern_htab (for a KVM managed HPT). That's a problem because kvmppc_kern_htab is local to mmu-hash64.c, but some places which need to check for an external HPT are outside that, such as kvm_arch_get_registers(). The latter was subtly broken by the earlier patch such that gdbstub can no longer access memory. Basically a KVM managed HPT is much more like a qemu managed HPT than it is like a guest managed HPT, so the original "hack" was actually on the right track. This partially reverts fa48b43, so we again mark a KVM managed external HPT by putting a special but non-NULL value in env->external_htab. It then goes further, using that marker to eliminate the kvmppc_kern_htab global entirely. The ppc_hash64_set_external_hpt() helper function is extended to set that marker if passed a NULL value (if you're setting an external HPT, but don't have an actual HPT to set, the assumption is that it must be a KVM managed HPT). This also has some flow-on changes to the HPT access helpers, required by the above changes. Reported-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
2016-02-25hw/ppc/spapr: Implement the h_page_init hypercallThomas Huth
This hypercall either initializes a page with zeros, or copies another page. According to LoPAPR, the i-cache of the page should also be flushed if using H_ICACHE_INVALIDATE or H_ICACHE_SYNCHRONIZE, and the d-cache should be synchronized to the RAM if the H_ICACHE_SYNCHRONIZE flag is used. For this, two new functions are introduced, kvmppc_dcbst_range() and kvmppc_icbi()_range, which use the corresponding assembler instructions to flush the caches if running with KVM on Power. If the code runs with TCG instead, the code only uses tb_flush(), assuming that this will be enough for synchronization. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-02-17hw/ppc/spapr: Implement the h_set_xdabr hypercallThomas Huth
The H_SET_XDABR hypercall is similar to H_SET_DABR, but also sets the extended DABR (DABRX) register. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-02-17hw/ppc/spapr: Implement h_set_dabrThomas Huth
According to LoPAPR, h_set_dabr should simply set DABRX to 3 (if the register is available), and load the parameter into DABR. If DABRX is not available, the hypervisor has to check the "Breakpoint Translation" bit of the DABR register first. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-02-17hw/ppc/spapr: Add h_set_sprg0 hypercallThomas Huth
This is a very simple hypercall that only sets up the SPRG0 register for the guest (since writing to SPRG0 was only permitted to the hypervisor in older versions of the PowerISA). Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-01-30target-ppc: Helper to determine page size information from hpte aloneDavid Gibson
h_enter() in the spapr code needs to know the page size of the HPTE it's about to insert. Unlike other paths that do this, it doesn't have access to the SLB, so at the moment it determines this with some open-coded tests which assume POWER7 or POWER8 page size encodings. To make this more flexible add ppc_hash64_hpte_page_shift_noslb() to determine both the "base" page size per segment, and the individual effective page size from an HPTE alone. This means that the spapr code should now be able to handle any page size listed in the env->sps table. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Alexander Graf <agraf@suse.de>
2016-01-30target-ppc: Add new TLB invalidate by HPTE call for hash64 MMUsDavid Gibson
When HPTEs are removed or modified by hypercalls on spapr, we need to invalidate the relevant pages in the qemu TLB. Currently we do that by doing some complicated calculations to work out the right encoding for the tlbie instruction, then passing that to ppc_tlb_invalidate_one()... which totally ignores the argument and flushes the whole tlb. Avoid that by adding a new flush-by-hpte helper in mmu-hash64.c. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Alexander Graf <agraf@suse.de>
2016-01-30target-ppc: Convert mmu-hash{32,64}.[ch] from CPUPPCState to PowerPCCPUDavid Gibson
Like a lot of places these files include a mixture of functions taking both the older CPUPPCState *env and newer PowerPCCPU *cpu. Move a step closer to cleaning this up by standardizing on PowerPCCPU, except for the helper_* functions which are called with the CPUPPCState * from tcg. Callers and some related functions are updated as well, the boundaries of what's changed here are a bit arbitrary. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Alexander Graf <agraf@suse.de>
2016-01-30pseries: Allow TCG h_enter to work with hotplugged memoryDavid Gibson
The implementation of the H_ENTER hypercall for PAPR guests needs to enforce correct access attributes on the inserted HPTE. This means determining if the HPTE's real address is a regular RAM address (which requires attributes for coherent access) or an IO address (which requires attributes for cache-inhibited access). At the moment this check is implemented with (raddr < machine->ram_size), but that only handles addresses in the base RAM area, not any hotplugged RAM. This patch corrects the problem with a new helper. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-01-30ppc: Clean up error handling in ppc_set_compat()David Gibson
Current ppc_set_compat() returns -1 for errors, and also (unconditionally) reports an error message. The caller in h_client_architecture_support() may then report it again using an outdated fprintf(). Clean this up by using the modern error reporting mechanisms. Also add strerror(errno) to the error message. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Markus Armbruster <armbru@redhat.com>
2016-01-30spapr: Remove abuse of rtas_ld() in h_client_architecture_supportDavid Gibson
h_client_architecture_support() uses rtas_ld() for general purpose memory access, despite the fact that it's not an RTAS routine at all and rtas_ld makes things more awkward. Clean this up by replacing rtas_ld() calls with appropriate ldXX_phys() calls. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-01-29ppc: Clean up includesPeter Maydell
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1453832250-766-6-git-send-email-peter.maydell@linaro.org
2015-09-23spapr: Support ibm,dynamic-reconfiguration-memoryBharata B Rao
Parse ibm,architecture.vec table obtained from the guest and enable memory node configuration via ibm,dynamic-reconfiguration-memory if guest supports it. This is in preparation to support memory hotplug for sPAPR guests. This changes the way memory node configuration is done. Currently all memory nodes are built upfront. But after this patch, only memory@0 node for RMA is built upfront. Guest kernel boots with just that and rest of the memory nodes (via memory@XXX or ibm,dynamic-reconfiguration-memory) are built when guest does ibm,client-architecture-support call. Note: This patch needs a SLOF enhancement which is already part of SLOF binary in QEMU. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-09-23ppc/spapr: Use qemu_log_mask() for hcall_dprintf()Thomas Huth
To see the output of the hcall_dprintf statements, you currently have to enable the DEBUG_SPAPR_HCALLS macro in include/hw/ppc/spapr.h. This is ugly because a) not every user who wants to debug guest problems can or wants to recompile QEMU to be able to see such issues, and b) since this macro is disabled by default, the code in the hcall_dprintf() brackets tends to bitrot until somebody temporarily enables that macro again. Since the hcall_dprintf statements except one indicate guest problems, let's always use qemu_log_mask(LOG_GUEST_ERROR, ...) for this macro instead. One spot indicated an unimplemented host feature, so this is changed into qemu_log_mask(LOG_UNIMP, ...) instead. Now it's possible to see all those messages by simply adding the CLI parameter "-d guest_errors,unimp", without the need to re-compile the binary. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2015-07-07spapr: Remove obsolete ram_limit field from sPAPRMachineStateDavid Gibson
The ram_limit field was imported from sPAPREnvironment where it predates the machine's ram size being available generically from machine->ram_size. Worse, the existing code was inconsistent about where it got the ram size from. Sometimes it used spapr->ram_limit, sometimes the global 'ram_size' and sometimes a local 'ram_size' masking the global. This cleans up the code to consistently use machine->ram_size, eliminating spapr->ram_limit in the process. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-07-07spapr: Merge sPAPREnvironment into sPAPRMachineStateDavid Gibson
The code for -machine pseries maintains a global sPAPREnvironment structure which keeps track of general state information about the guest platform. This predates the existence of the MachineState structure, but performs basically the same function. Now that we have the generic MachineState, fold sPAPREnvironment into sPAPRMachineState, the pseries specific subclass of MachineState. This is mostly a matter of search and replace, although a few places which relied on the global spapr variable are changed to find the structure via qdev_get_machine(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-03-09pseries: Switch VGA endian on H_SET_MODEDavid Gibson
When the guest switches the interrupt endian mode, which essentially means a global machine endian switch, we want to change the VGA framebuffer endian mode as well in order to be backward compatible with existing guests who don't know about the new endian control register. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>