aboutsummaryrefslogtreecommitdiff
path: root/target/i386/kvm.c
AgeCommit message (Collapse)Author
2020-09-11target/i386/kvm: Add missing fallthrough commentThomas Huth
Let's make this file compilable with -Werror=implicit-fallthrough : Looking at the code, it seems like the fallthrough is intended here, so we should add the corresponding "/* fallthrough */" comment here. Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200911125301.413081-1-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-09-10target/i386/kvm: Rename host_tsx_blacklisted() as host_tsx_broken()Philippe Mathieu-Daudé
In order to use inclusive terminology, rename host_tsx_blacklisted() as host_tsx_broken(). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200910070131.435543-7-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-09-02x86-iommu: Rename QOM type macrosEduardo Habkost
Some QOM macros were using a X86_IOMMU_DEVICE prefix, and others were using a X86_IOMMU prefix. Rename all of them to use the same X86_IOMMU_DEVICE prefix. This will make future conversion to OBJECT_DECLARE* easier. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20200825192110.3528606-47-ehabkost@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-07-23KVM: fix CPU reset wrt HF2_GIF_MASKVitaly Kuznetsov
HF2_GIF_MASK is set in env->hflags2 unconditionally on CPU reset (see x86_cpu_reset()) but when calling KVM_SET_NESTED_STATE, KVM_STATE_NESTED_GIF_SET is only valid for nSVM as e.g. nVMX code looks like if (kvm_state->hdr.vmx.vmxon_pa == -1ull) { if (kvm_state->flags & ~KVM_STATE_NESTED_EVMCS) return -EINVAL; } Also, when adjusting the environment after KVM_GET_NESTED_STATE we need not reset HF2_GIF_MASK on VMX as e.g. x86_cpu_pending_interrupt() expects it to be set. Alternatively, we could've made env->hflags2 SVM-only. Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Fixes: b16c0e20c742 ("KVM: add support for AMD nested live migration") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200723142701.2521161-1-vkuznets@redhat.com> Tested-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-07-10KVM: x86: believe what KVM says about WAITPKGPaolo Bonzini
Currently, QEMU is overriding KVM_GET_SUPPORTED_CPUID's answer for the WAITPKG bit depending on the "-overcommit cpu-pm" setting. This is a bad idea because it does not even check if the host supports it, but it can be done in x86_cpu_realizefn just like we do for the MONITOR bit. This patch moves it there, while making it conditional on host support for the related UMWAIT MSR. Cc: qemu-stable@nongnu.org Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10KVM: add support for AMD nested live migrationPaolo Bonzini
Support for nested guest live migration is part of Linux 5.8, add the corresponding code to QEMU. The migration format consists of a few flags, is an opaque 4k blob. The blob is in VMCB format (the control area represents the L1 VMCB control fields, the save area represents the pre-vmentry state; KVM does not use the host save area since the AMD manual allows that) but QEMU does not really care about that. However, the flags need to be copied to hflags/hflags2 and back. In addition, support for retrieving and setting the AMD nested virtualization states allows the L1 guest to be reset while running a nested guest, but a small bug in CPU reset needs to be fixed for that to work. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-26kvm: i386: allow TSC to differ by NTP correction bounds without TSC scalingMarcelo Tosatti
The Linux TSC calibration procedure is subject to small variations (its common to see +-1 kHz difference between reboots on a given CPU, for example). So migrating a guest between two hosts with identical processor can fail, in case of a small variation in calibrated TSC between them. Allow a conservative 250ppm error between host TSC and VM TSC frequencies, rather than requiring an exact match. NTP daemon in the guest can correct this difference. Also change migration to accept this bound. KVM_SET_TSC_KHZ depends on a kernel interface change. Without this change, the behaviour remains the same: in case of a different frequency between host and VM, KVM_SET_TSC_KHZ will fail and QEMU will exit. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Message-Id: <20200616165805.GA324612@fuller.cnet> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10target/i386: define a new MSR based feature word - FEAT_PERF_CAPABILITIESLike Xu
The Perfmon and Debug Capability MSR named IA32_PERF_CAPABILITIES is a feature-enumerating MSR, which only enumerates the feature full-width write (via bit 13) by now which indicates the processor supports IA32_A_PMCx interface for updating bits 32 and above of IA32_PMCx. The existence of MSR IA32_PERF_CAPABILITIES is enumerated by CPUID.1:ECX[15]. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20200529074347.124619-5-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10i386/kvm: fix a use-after-free when vcpu plug/unplugPan Nengyuan
When we hotplug vcpus, cpu_update_state is added to vm_change_state_head in kvm_arch_init_vcpu(). But it forgot to delete in kvm_arch_destroy_vcpu() after unplug. Then it will cause a use-after-free access. This patch delete it in kvm_arch_destroy_vcpu() to fix that. Reproducer: virsh setvcpus vm1 4 --live virsh setvcpus vm1 2 --live virsh suspend vm1 virsh resume vm1 The UAF stack: ==qemu-system-x86_64==28233==ERROR: AddressSanitizer: heap-use-after-free on address 0x62e00002e798 at pc 0x5573c6917d9e bp 0x7fff07139e50 sp 0x7fff07139e40 WRITE of size 1 at 0x62e00002e798 thread T0 #0 0x5573c6917d9d in cpu_update_state /mnt/sdb/qemu/target/i386/kvm.c:742 #1 0x5573c699121a in vm_state_notify /mnt/sdb/qemu/vl.c:1290 #2 0x5573c636287e in vm_prepare_start /mnt/sdb/qemu/cpus.c:2144 #3 0x5573c6362927 in vm_start /mnt/sdb/qemu/cpus.c:2150 #4 0x5573c71e8304 in qmp_cont /mnt/sdb/qemu/monitor/qmp-cmds.c:173 #5 0x5573c727cb1e in qmp_marshal_cont qapi/qapi-commands-misc.c:835 #6 0x5573c7694c7a in do_qmp_dispatch /mnt/sdb/qemu/qapi/qmp-dispatch.c:132 #7 0x5573c7694c7a in qmp_dispatch /mnt/sdb/qemu/qapi/qmp-dispatch.c:175 #8 0x5573c71d9110 in monitor_qmp_dispatch /mnt/sdb/qemu/monitor/qmp.c:145 #9 0x5573c71dad4f in monitor_qmp_bh_dispatcher /mnt/sdb/qemu/monitor/qmp.c:234 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200513132630.13412-1-pannengyuan@huawei.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10i386/cpu: Store LAPIC bus frequency in CPU structureLiran Alon
No functional change. This information will be used by following patches. Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Message-Id: <20200312165431.82118-15-liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-14KVM: Move hwpoison page related functions into kvm-all.cDongjiu Geng
kvm_hwpoison_page_add() and kvm_unpoison_all() will both be used by X86 and ARM platforms, so moving them into "accel/kvm/kvm-all.c" to avoid duplicate code. For architectures that don't use the poison-list functionality the reset handler will harmlessly do nothing, so let's register the kvm_unpoison_all() function in the generic kvm_init() function. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com> Acked-by: Xiang Zheng <zhengxiang9@huawei.com> Message-id: 20200512030609.19593-8-gengdongjiu@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-04-02target/i386: do not set unsupported VMX secondary execution controlsVitaly Kuznetsov
Commit 048c95163b4 ("target/i386: work around KVM_GET_MSRS bug for secondary execution controls") added a workaround for KVM pre-dating commit 6defc591846d ("KVM: nVMX: include conditional controls in /dev/kvm KVM_GET_MSRS") which wasn't setting certain available controls. The workaround uses generic CPUID feature bits to set missing VMX controls. It was found that in some cases it is possible to observe hosts which have certain CPUID features but lack the corresponding VMX control. In particular, it was reported that Azure VMs have RDSEED but lack VMX_SECONDARY_EXEC_RDSEED_EXITING; attempts to enable this feature bit result in QEMU abort. Resolve the issue but not applying the workaround when we don't have to. As there is no good way to find out if KVM has the fix itself, use 95c5c7c77c ("KVM: nVMX: list VMX MSRs in KVM_GET_MSR_INDEX_LIST") instead as these [are supposed to] come together. Fixes: 048c95163b4 ("target/i386: work around KVM_GET_MSRS bug for secondary execution controls") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200331162752.1209928-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12target/i386: check for availability of MSR_IA32_UCODE_REV as an emulated MSRPaolo Bonzini
Even though MSR_IA32_UCODE_REV has been available long before Linux 5.6, which added it to the emulated MSR list, a bug caused the microcode version to revert to 0x100000000 on INIT. As a result, processors other than the bootstrap processor would not see the host microcode revision; some Windows version complain loudly about this and crash with a fairly explicit MICROCODE REVISION MISMATCH error. [If running 5.6 prereleases, the kernel fix "KVM: x86: do not reset microcode version on INIT or RESET" should also be applied.] Reported-by: Alex Williamson <alex.williamson@redhat.com> Message-id: <20200211175516.10716-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24accel: Replace current_machine->accelerator by current_accel() wrapperPhilippe Mathieu-Daudé
We actually want to access the accelerator, not the machine, so use the current_accel() wrapper instead. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200121110349.25842-10-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24target/i386: kvm: initialize microcode revision from KVMPaolo Bonzini
KVM can return the host microcode revision as a feature MSR. Use it as the default value for -cpu host. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1579544504-3616-4-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24target/i386: kvm: initialize feature MSRs very earlyPaolo Bonzini
Some read-only MSRs affect the behavior of ioctls such as KVM_SET_NESTED_STATE. We can initialize them once and for all right after the CPU is realized, since they will never be modified by the guest. Reported-by: Qingua Cheng <qcheng@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1579544504-3616-2-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-07x86: Check for machine state object class before typecasting itMichal Privoznik
In ed9e923c3c ("x86: move SMM property to X86MachineState", 2019-12-17) In v4.2.0-246-ged9e923c3c the SMM property was moved from PC machine class to x86 machine class. Makes sense, but the change was too aggressive: in target/i386/kvm.c:kvm_arch_init() it altered check which sets SMRAM if given machine has SMM enabled. The line that detects whether given machine object is class of PC_MACHINE was removed from the check. This makes qemu try to enable SMRAM for all machine types, which is not what we want. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Fixes: ed9e923c3c ("x86: move SMM property to X86MachineState", 2019-12-17) Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <7cc91bab3191bfd7e071bdd3fdf7fe2a2991deb0.1577692822.git.mprivozn@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-18target/i386: remove unused pci-assign codesEiichi Tsukata
Legacy PCI device assignment has been already removed in commit ab37bfc7d641 ("pci-assign: Remove"), but some codes remain unused. CC: qemu-trivial@nongnu.org Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Message-Id: <20191209072932.313056-1-devel@etsukata.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-17x86: move more x86-generic functions out of PC filesPaolo Bonzini
These are needed by microvm too, so move them outside of PC-specific files. With this patch, microvm.c need not include pc.h anymore. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-17x86: move SMM property to X86MachineStatePaolo Bonzini
Add it to microvm as well, it is a generic property of the x86 architecture. Suggested-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-17kvm: introduce kvm_kernel_irqchip_* functionsPaolo Bonzini
The KVMState struct is opaque, so provide accessors for the fields that will be moved from current_machine to the accelerator. For now they just forward to the machine object, but this will change. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-17kvm: convert "-machine kvm_shadow_mem" to an accelerator propertyPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-06target/i386: disable VMX features if nested=0Yang Zhong
If kvm does not support VMX feature by nested=0, the kvm_vmx_basic can't get the right value from MSR_IA32_VMX_BASIC register, which make qemu coredump when qemu do KVM_SET_MSRS. The coredump info: error: failed to set MSR 0x480 to 0x0 kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20191206071111.12128-1-yang.zhong@intel.com> Reported-by: Catherine Ho <catherine.hecx@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-21target/i386: add support for MSR_IA32_TSX_CTRLPaolo Bonzini
The MSR_IA32_TSX_CTRL MSR can be used to hide TSX (also known as the Trusty Side-channel Extension). By virtualizing the MSR, KVM guests can disable TSX and avoid paying the price of mitigating TSX-based attacks on microarchitectural side channels. Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-23target/i386: Add support for save/load IA32_UMWAIT_CONTROL MSRTao Xu
UMWAIT and TPAUSE instructions use 32bits IA32_UMWAIT_CONTROL at MSR index E1H to determines the maximum time in TSC-quanta that the processor can reside in either C0.1 or C0.2. This patch is to Add support for save/load IA32_UMWAIT_CONTROL MSR in guest. Co-developed-by: Jingqi Liu <jingqi.liu@intel.com> Signed-off-by: Jingqi Liu <jingqi.liu@intel.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20191011074103.30393-3-tao3.xu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-23x86/cpu: Add support for UMONITOR/UMWAIT/TPAUSETao Xu
UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions. This patch adds support for user wait instructions in KVM. Availability of the user wait instructions is indicated by the presence of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5]. User wait instructions may be executed at any privilege level, and use IA32_UMWAIT_CONTROL MSR to set the maximum time. The patch enable the umonitor, umwait and tpause features in KVM. Because umwait and tpause can put a (psysical) CPU into a power saving state, by default we dont't expose it to kvm and enable it only when guest CPUID has it. And use QEMU command-line "-overcommit cpu-pm=on" (enable_cpu_pm is enabled), a VM can use UMONITOR, UMWAIT and TPAUSE instructions. If the instruction causes a delay, the amount of time delayed is called here the physical delay. The physical delay is first computed by determining the virtual delay (the time to delay relative to the VM’s timestamp counter). Otherwise, UMONITOR, UMWAIT and TPAUSE cause an invalid-opcode exception(#UD). The release document ref below link: https://software.intel.com/sites/default/files/\ managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf Co-developed-by: Jingqi Liu <jingqi.liu@intel.com> Signed-off-by: Jingqi Liu <jingqi.liu@intel.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20191011074103.30393-2-tao3.xu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22i386/kvm: add NoNonArchitecturalCoreSharing Hyper-V enlightenmentVitaly Kuznetsov
Hyper-V TLFS specifies this enlightenment as: "NoNonArchitecturalCoreSharing - Indicates that a virtual processor will never share a physical core with another virtual processor, except for virtual processors that are reported as sibling SMT threads. This can be used as an optimization to avoid the performance overhead of STIBP". However, STIBP is not the only implication. It was found that Hyper-V on KVM doesn't pass MD_CLEAR bit to its guests if it doesn't see NoNonArchitecturalCoreSharing bit. KVM reports NoNonArchitecturalCoreSharing in KVM_GET_SUPPORTED_HV_CPUID to indicate that SMT on the host is impossible (not supported of forcefully disabled). Implement NoNonArchitecturalCoreSharing support in QEMU as tristate: 'off' - the feature is disabled (default) 'on' - the feature is enabled. This is only safe if vCPUS are properly pinned and correct topology is exposed. As CPU pinning is done outside of QEMU the enablement decision will be made on a higher level. 'auto' - copy KVM setting. As during live migration SMT settings on the source and destination host may differ this requires us to add a migration blocker. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20191018163908.10246-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22target/i386: log MCE guest and host addressesMario Smarduch
Patch logs MCE AO, AR messages injected to guest or taken by QEMU itself. We print the QEMU address for guest MCEs, helps on hypervisors that have another source of MCE logging like mce log, and when they go missing. For example we found these QEMU logs: September 26th 2019, 17:36:02.309 Droplet-153258224: Guest MCE Memory Error at qemu addr 0x7f8ce14f5000 and guest 3d6f5000 addr of type BUS_MCEERR_AR injected qemu-system-x86_64 amsN ams3nodeNNNN September 27th 2019, 06:25:03.234 Droplet-153258224: Guest MCE Memory Error at qemu addr 0x7f8ce14f5000 and guest 3d6f5000 addr of type BUS_MCEERR_AR injected qemu-system-x86_64 amsN ams3nodeNNNN The first log had a corresponding mce log entry, the second didnt (logging thresholds) we can infer from second entry same PA and mce type. Signed-off-by: Mario Smarduch <msmarduch@digitalocean.com> Message-Id: <20191009164459.8209-3-msmarduch@digitalocean.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-15i386: Omit all-zeroes entries from KVM CPUID tableEduardo Habkost
KVM has a 80-entry limit at KVM_SET_CPUID2. With the introduction of CPUID[0x1F], it is now possible to hit this limit with unusual CPU configurations, e.g.: $ ./x86_64-softmmu/qemu-system-x86_64 \ -smp 1,dies=2,maxcpus=2 \ -cpu EPYC,check=off,enforce=off \ -machine accel=kvm qemu-system-x86_64: kvm_init_vcpu failed: Argument list too long This happens because QEMU adds a lot of all-zeroes CPUID entries for unused CPUID leaves. In the example above, we end up creating 48 all-zeroes CPUID entries. KVM already returns all-zeroes when emulating the CPUID instruction if an entry is missing, so the all-zeroes entries are redundant. Skip those entries. This reduces the CPUID table size by half while keeping CPUID output unchanged. Reported-by: Yumei Huang <yuhuang@redhat.com> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1741508 Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190822225210.32541-1-ehabkost@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-10-04target/i386/kvm: Silence warning from Valgrind about uninitialized bytesThomas Huth
When I run QEMU with KVM under Valgrind, I currently get this warning: Syscall param ioctl(generic) points to uninitialised byte(s) at 0x95BA45B: ioctl (in /usr/lib64/libc-2.28.so) by 0x429DC3: kvm_ioctl (kvm-all.c:2365) by 0x51B249: kvm_arch_get_supported_msr_feature (kvm.c:469) by 0x4C2A49: x86_cpu_get_supported_feature_word (cpu.c:3765) by 0x4C4116: x86_cpu_expand_features (cpu.c:5065) by 0x4C7F8D: x86_cpu_realizefn (cpu.c:5242) by 0x5961F3: device_set_realized (qdev.c:835) by 0x7038F6: property_set_bool (object.c:2080) by 0x707EFE: object_property_set_qobject (qom-qobject.c:26) by 0x705814: object_property_set_bool (object.c:1338) by 0x498435: pc_new_cpu (pc.c:1549) by 0x49C67D: pc_cpus_init (pc.c:1681) Address 0x1ffeffee74 is on thread 1's stack in frame #2, created by kvm_arch_get_supported_msr_feature (kvm.c:445) It's harmless, but a little bit annoying, so silence it by properly initializing the whole structure with zeroes. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-04target/i386: work around KVM_GET_MSRS bug for secondary execution controlsPaolo Bonzini
Some secondary controls are automatically enabled/disabled based on the CPUID values that are set for the guest. However, they are still available at a global level and therefore should be present when KVM_GET_MSRS is sent to /dev/kvm. Unfortunately KVM forgot to include those, so fix that. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-04target/i386: add VMX featuresPaolo Bonzini
Add code to convert the VMX feature words back into MSR values, allowing the user to enable/disable VMX features as they wish. The same infrastructure enables support for limiting VMX features in named CPU models. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-04target/i386: expand feature words to 64 bitsPaolo Bonzini
VMX requires 64-bit feature words for the IA32_VMX_EPT_VPID_CAP and IA32_VMX_BASIC MSRs. (The VMX control MSRs are 64-bit wide but actually have only 32 bits of information). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-16hw/i386/pc: Extract e820 memory layout codePhilippe Mathieu-Daudé
Suggested-by: Samuel Ortiz <sameo@linux.intel.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190818225414.22590-3-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-16i386/kvm: support guest access CORE cstateWanpeng Li
Allow guest reads CORE cstate when exposing host CPU power management capabilities to the guest. PKG cstate is restricted to avoid a guest to get the whole package information in multi-tenant scenario. Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1563154124-18579-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-20x86: Intel AVX512_BF16 feature enablingJing Liu
Intel CooperLake cpu adds AVX512_BF16 instruction, defining as CPUID.(EAX=7,ECX=1):EAX[bit 05]. The patch adds a property for setting the subleaf of CPUID leaf 7 in case that people would like to specify it. The release spec link as follows, https://software.intel.com/sites/default/files/managed/c5/15/\ architecture-instruction-set-extensions-programming-reference.pdf Signed-off-by: Jing Liu <jing2.liu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-20i386/kvm: initialize struct at full before ioctl callAndrey Shinkevich
Not the whole structure is initialized before passing it to the KVM. Reduce the number of Valgrind reports. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Message-Id: <1564502498-805893-4-git-send-email-andrey.shinkevich@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-20target-i386: kvm: 'kvm_get_supported_msrs' cleanupLi Qiang
Function 'kvm_get_supported_msrs' is only called once now, get rid of the static variable 'kvm_supported_msrs'. Signed-off-by: Li Qiang <liq3ea@163.com> Message-Id: <20190725151639.21693-1-liq3ea@163.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-20kvm: i386: halt poll control MSR supportMarcelo Tosatti
Add support for halt poll control MSR: save/restore, migration and new feature name. The purpose of this MSR is to allow the guest to disable host halt poll. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Message-Id: <20190603230408.GA7938@amt.cnet> [Do not enable by default, as pointed out by Mark Kanda. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-16sysemu: Split sysemu/runstate.h off sysemu/sysemu.hMarkus Armbruster
sysemu/sysemu.h is a rather unfocused dumping ground for stuff related to the system-emulator. Evidence: * It's included widely: in my "build everything" tree, changing sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h, down from 5400 due to the previous two commits). * It pulls in more than a dozen additional headers. Split stuff related to run state management into its own header sysemu/runstate.h. Touching sysemu/sysemu.h now recompiles some 850 objects. qemu/uuid.h also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400 to 4200. Touching new sysemu/runstate.h recompiles some 500 objects. Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also add qemu/main-loop.h. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-30-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [Unbreak OS-X build]
2019-08-16Include qemu/main-loop.h lessMarkus Armbruster
In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16Include sysemu/reset.h a lot lessMarkus Armbruster
In my "build everything" tree, changing sysemu/reset.h triggers a recompile of some 2600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). The main culprit is hw/hw.h, which supposedly includes it for convenience. Include sysemu/reset.h only where it's needed. Touching it now recompiles less than 200 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190812052359.30071-9-armbru@redhat.com>
2019-07-24i386/kvm: Do not sync nested state during runtimeJan Kiszka
Writing the nested state e.g. after a vmport access can invalidate important parts of the kernel-internal state, and it is not needed as well. So leave this out from KVM_PUT_RUNTIME_STATE. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Message-Id: <bdd53f40-4e60-f3ae-7ec6-162198214953@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-19target/i386: skip KVM_GET/SET_NESTED_STATE if VMX disabled, or for SVMPaolo Bonzini
Do not allocate env->nested_state unless we later need to migrate the nested virtualization state. With this change, nested_state_needed() will return false if the VMX flag is not included in the virtual machine. KVM_GET/SET_NESTED_STATE is also disabled for SVM which is safer (we know that at least the NPT root and paging mode have to be saved/loaded), and thus the corresponding subsection can go away as well. Inspired by a patch from Liran Alon. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-19target/i386: kvm: Demand nested migration kernel capabilities only when vCPU ↵Liran Alon
may have enabled VMX Previous to this change, a vCPU exposed with VMX running on a kernel without KVM_CAP_NESTED_STATE or KVM_CAP_EXCEPTION_PAYLOAD resulted in adding a migration blocker. This was because when the code was written it was thought there is no way to reliably know if a vCPU is utilising VMX or not at runtime. However, it turns out that this can be known to some extent: In order for a vCPU to enter VMX operation it must have CR4.VMXE set. Since it was set, CR4.VMXE must remain set as long as the vCPU is in VMX operation. This is because CR4.VMXE is one of the bits set in MSR_IA32_VMX_CR4_FIXED1. There is one exception to the above statement when vCPU enters SMM mode. When a vCPU enters SMM mode, it temporarily exits VMX operation and may also reset CR4.VMXE during execution in SMM mode. When the vCPU exits SMM mode, vCPU state is restored to be in VMX operation and CR4.VMXE is restored to its original state of being set. Therefore, when the vCPU is not in SMM mode, we can infer whether VMX is being used by examining CR4.VMXE. Otherwise, we cannot know for certain but assume the worse that vCPU may utilise VMX. Summaring all the above, a vCPU may have enabled VMX in case CR4.VMXE is set or vCPU is in SMM mode. Therefore, remove migration blocker and check before migration (cpu_pre_save()) if the vCPU may have enabled VMX. If true, only then require relevant kernel capabilities. While at it, demand KVM_CAP_EXCEPTION_PAYLOAD only when the vCPU is in guest-mode and there is a pending/injected exception. Otherwise, this kernel capability is not required for proper migration. Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Maran Wilson <maran.wilson@oracle.com> Tested-by: Maran Wilson <maran.wilson@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-08Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into stagingPeter Maydell
Bugfixes. # gpg: Signature made Fri 05 Jul 2019 21:21:52 BST # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: ioapic: use irq number instead of vector in ioapic_eoi_broadcast hw/i386: Fix linker error when ISAPC is disabled Makefile: generate header file with the list of devices enabled target/i386: kvm: Fix when nested state is needed for migration minikconf: do not include variables from MINIKCONF_ARGS in config-all-devices.mak target/i386: fix feature check in hyperv-stub.c ioapic: clear irq_eoi when updating the ioapic redirect table entry intel_iommu: Fix unexpected unmaps during global unmap intel_iommu: Fix incorrect "end" for vtd_address_space_unmap i386/kvm: Fix build with -m32 checkpatch: do not warn for multiline parenthesized returned value pc: fix possible NULL pointer dereference in pc_machine_get_device_memory_region_size() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-07-05i386/kvm: Fix build with -m32Max Reitz
find_next_bit() takes a pointer of type "const unsigned long *", but the first argument passed here is a "uint64_t *". These types are incompatible when compiling qemu with -m32. Just use ctz64() instead. Fixes: c686193072a47032d83cb4e131dc49ae30f9e5d Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190624193913.28343-1-mreitz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-05target/i386: Add CPUID.1F generation support for multi-dies PCMachineLike Xu
The CPUID.1F as Intel V2 Extended Topology Enumeration Leaf would be exposed if guests want to emulate multiple software-visible die within each package. Per Intel's SDM, the 0x1f is a superset of 0xb, thus they can be generated by almost same code as 0xb except die_offset setting. If the number of dies per package is greater than 1, the cpuid_min_level would be adjusted to 0x1f regardless of whether the host supports CPUID.1F. Likewise, the CPUID.1F wouldn't be exposed if env->nr_dies < 2. Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20190620054525.37188-2-like.xu@linux.intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-06-21target/i386: kvm: Add nested migration blocker only when kernel lacks ↵Liran Alon
required capabilities Previous commits have added support for migration of nested virtualization workloads. This was done by utilising two new KVM capabilities: KVM_CAP_NESTED_STATE and KVM_CAP_EXCEPTION_PAYLOAD. Both which are required in order to correctly migrate such workloads. Therefore, change code to add a migration blocker for vCPUs exposed with Intel VMX or AMD SVM in case one of these kernel capabilities is missing. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Maran Wilson <maran.wilson@oracle.com> Message-Id: <20190619162140.133674-11-liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-06-21target/i386: kvm: Add support for KVM_CAP_EXCEPTION_PAYLOADLiran Alon
Kernel commit c4f55198c7c2 ("kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD") introduced a new KVM capability which allows userspace to correctly distinguish between pending and injected exceptions. This distinguish is important in case of nested virtualization scenarios because a L2 pending exception can still be intercepted by the L1 hypervisor while a L2 injected exception cannot. Furthermore, when an exception is attempted to be injected by QEMU, QEMU should specify the exception payload (CR2 in case of #PF or DR6 in case of #DB) instead of having the payload already delivered in the respective vCPU register. Because in case exception is injected to L2 guest and is intercepted by L1 hypervisor, then payload needs to be reported to L1 intercept (VMExit handler) while still preserving respective vCPU register unchanged. This commit adds support for QEMU to properly utilise this new KVM capability (KVM_CAP_EXCEPTION_PAYLOAD). Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Message-Id: <20190619162140.133674-10-liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>