aboutsummaryrefslogtreecommitdiff
path: root/target/i386/kvm.c
AgeCommit message (Collapse)Author
2020-12-16i386: move kvm accel files into kvm/Claudio Fontana
Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20201212155530.23098-2-cfontana@suse.de> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-10target/i386: Support up to 32768 CPUs without IRQ remappingDavid Woodhouse
The IOAPIC has an 'Extended Destination ID' field in its RTE, which maps to bits 11-4 of the MSI address. Since those address bits fall within a given 4KiB page they were historically non-trivial to use on real hardware. The Intel IOMMU uses the lowest bit to indicate a remappable format MSI, and then the remaining 7 bits are part of the index. Where the remappable format bit isn't set, we can actually use the other seven to allow external (IOAPIC and MSI) interrupts to reach up to 32768 CPUs instead of just the 255 permitted on bare metal. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <78097f9218300e63e751e077a0a5ca029b56ba46.camel@infradead.org> [Fix UBSAN warning. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
2020-11-16kvm/i386: Set proper nested state format for SVMTom Lendacky
Currently, the nested state format is hardcoded to VMX. This will result in kvm_put_nested_state() returning an error because the KVM SVM support checks for the nested state to be KVM_STATE_NESTED_FORMAT_SVM. As a result, kvm_arch_put_registers() errors out early. Update the setting of the format based on the virtualization feature: VMX - KVM_STATE_NESTED_FORMAT_VMX SVM - KVM_STATE_NESTED_FORMAT_SVM Also, fix the code formatting while at it. Fixes: b16c0e20c7 ("KVM: add support for AMD nested live migration") Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Message-Id: <fe53d00fe0d884e812960781284cd48ae9206acc.1605546140.git.thomas.lendacky@amd.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-14i386/kvm: Delete kvm_allows_irq0_override()Eduardo Habkost
As IRQ routing is always available on x86, kvm_allows_irq0_override() will always return true, so we don't need the function anymore. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200922201922.2153598-4-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-10-14i386/kvm: Remove IRQ routing support checksEduardo Habkost
KVM_CAP_IRQ_ROUTING is always available on x86, so replace checks for kvm_has_gsi_routing() and KVM_CAP_IRQ_ROUTING with asserts. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200922201922.2153598-3-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-10-14i386/kvm: Require KVM_CAP_IRQ_ROUTINGEduardo Habkost
KVM_CAP_IRQ_ROUTING is available since 2009 (Linux v2.6.30), so it's safe to just make it a requirement on x86. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200922201922.2153598-2-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-10-14i386/kvm: fix FEATURE_HYPERV_EDX value in hyperv_passthrough caseZhenyu Wang
Fix typo to use correct edx value for FEATURE_HYPERV_EDX when hyperv_passthrough is enabled. Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Message-Id: <20190820103030.12515-1-zhenyuw@linux.intel.com> Fixes: e48ddcc6ce13 ("i386/kvm: implement 'hv-passthrough' mode") Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-10-04target-i386: post memory failure event to QMPzhenwei pi
Post memory failure event through QMP to handle hardware memory corrupted event. Rather than simply printing to the log, QEMU could report more effective message to the client. For example, if a guest receives an MCE, evacuating the host could be a good idea. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Message-Id: <20200930100440.1060708-4-pizhenwei@bytedance.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-30target/i386: kvm: do not use kvm_check_extension to find paravirtual ↵Paolo Bonzini
capabilities Paravirtualized features have been listed in KVM_GET_SUPPORTED_CPUID since Linux 2.6.35 (commit 84478c829d0f, "KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID", 2010-05-19). It has been more than 10 years, so remove the fallback code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-30target/i386: always create kvmclock deviceVitaly Kuznetsov
QEMU's kvmclock device is only created when KVM PV feature bits for kvmclock (KVM_FEATURE_CLOCKSOURCE/KVM_FEATURE_CLOCKSOURCE2) are exposed to the guest. With 'kvm=off' cpu flag the device is not created and we don't call KVM_GET_CLOCK/KVM_SET_CLOCK upon migration. It was reported that without these call at least Hyper-V TSC page clocksouce (which can be enabled independently) gets broken after migration. Switch to creating kvmclock QEMU device unconditionally, it seems to always make sense to call KVM_GET_CLOCK/KVM_SET_CLOCK on migration. Use KVM_CAP_ADJUST_CLOCK check instead of CPUID feature bits. Reported-by: Antoine Damhet <antoine.damhet@blade-group.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200922151934.899555-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-30target/i386: Fix VM migration when interrupt based APF is enabledVitaly Kuznetsov
VM with interrupt based APF enabled fails to migrate: qemu-system-x86_64: error: failed to set MSR 0x4b564d02 to 0xf3 We have two issues: 1) There is a typo in kvm_put_msrs() and we write async_pf_int_msr to MSR_KVM_ASYNC_PF_EN (instead of MSR_KVM_ASYNC_PF_INT) 2) We restore MSR_KVM_ASYNC_PF_EN before MSR_KVM_ASYNC_PF_INT is set and this violates the check in KVM. Re-order MSR_KVM_ASYNC_PF_EN/MSR_KVM_ASYNC_PF_INT setting (and kvm_get_msrs() for consistency) and fix the typo. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200917102316.814804-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-30target/i386: support KVM_FEATURE_ASYNC_PF_INTVitaly Kuznetsov
Linux-5.8 introduced interrupt based mechanism for 'page ready' events delivery and disabled the old, #PF based one (see commit 2635b5c4a0e4 "KVM: x86: interrupt based APF 'page ready' event delivery"). Linux guest switches to using in in 5.9 (see commit b1d405751cd5 "KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery"). The feature has a new KVM_FEATURE_ASYNC_PF_INT bit assigned and the interrupt vector is set in MSR_KVM_ASYNC_PF_INT MSR. Support this in QEMU. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200908141206.357450-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-18i386/kvm: correct the meaning of '0xffffffff' value for hv-spinlocksVitaly Kuznetsov
Hyper-V TLFS prior to version 6.0 had a mistake in it: special value '0xffffffff' for CPUID 0x40000004.EBX was called 'never to retry', this looked weird (like why it's not '0' which supposedly have the same effect?) but nobody raised the question. In TLFS version 6.0 the mistake was corrected to 'never notify' which sounds logical. Fix QEMU accordingly. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200515114847.74523-1-vkuznets@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-09-11target/i386/kvm: Add missing fallthrough commentThomas Huth
Let's make this file compilable with -Werror=implicit-fallthrough : Looking at the code, it seems like the fallthrough is intended here, so we should add the corresponding "/* fallthrough */" comment here. Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200911125301.413081-1-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-09-10target/i386/kvm: Rename host_tsx_blacklisted() as host_tsx_broken()Philippe Mathieu-Daudé
In order to use inclusive terminology, rename host_tsx_blacklisted() as host_tsx_broken(). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200910070131.435543-7-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-09-02x86-iommu: Rename QOM type macrosEduardo Habkost
Some QOM macros were using a X86_IOMMU_DEVICE prefix, and others were using a X86_IOMMU prefix. Rename all of them to use the same X86_IOMMU_DEVICE prefix. This will make future conversion to OBJECT_DECLARE* easier. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20200825192110.3528606-47-ehabkost@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-07-23KVM: fix CPU reset wrt HF2_GIF_MASKVitaly Kuznetsov
HF2_GIF_MASK is set in env->hflags2 unconditionally on CPU reset (see x86_cpu_reset()) but when calling KVM_SET_NESTED_STATE, KVM_STATE_NESTED_GIF_SET is only valid for nSVM as e.g. nVMX code looks like if (kvm_state->hdr.vmx.vmxon_pa == -1ull) { if (kvm_state->flags & ~KVM_STATE_NESTED_EVMCS) return -EINVAL; } Also, when adjusting the environment after KVM_GET_NESTED_STATE we need not reset HF2_GIF_MASK on VMX as e.g. x86_cpu_pending_interrupt() expects it to be set. Alternatively, we could've made env->hflags2 SVM-only. Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Fixes: b16c0e20c742 ("KVM: add support for AMD nested live migration") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200723142701.2521161-1-vkuznets@redhat.com> Tested-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-07-10KVM: x86: believe what KVM says about WAITPKGPaolo Bonzini
Currently, QEMU is overriding KVM_GET_SUPPORTED_CPUID's answer for the WAITPKG bit depending on the "-overcommit cpu-pm" setting. This is a bad idea because it does not even check if the host supports it, but it can be done in x86_cpu_realizefn just like we do for the MONITOR bit. This patch moves it there, while making it conditional on host support for the related UMWAIT MSR. Cc: qemu-stable@nongnu.org Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10KVM: add support for AMD nested live migrationPaolo Bonzini
Support for nested guest live migration is part of Linux 5.8, add the corresponding code to QEMU. The migration format consists of a few flags, is an opaque 4k blob. The blob is in VMCB format (the control area represents the L1 VMCB control fields, the save area represents the pre-vmentry state; KVM does not use the host save area since the AMD manual allows that) but QEMU does not really care about that. However, the flags need to be copied to hflags/hflags2 and back. In addition, support for retrieving and setting the AMD nested virtualization states allows the L1 guest to be reset while running a nested guest, but a small bug in CPU reset needs to be fixed for that to work. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-26kvm: i386: allow TSC to differ by NTP correction bounds without TSC scalingMarcelo Tosatti
The Linux TSC calibration procedure is subject to small variations (its common to see +-1 kHz difference between reboots on a given CPU, for example). So migrating a guest between two hosts with identical processor can fail, in case of a small variation in calibrated TSC between them. Allow a conservative 250ppm error between host TSC and VM TSC frequencies, rather than requiring an exact match. NTP daemon in the guest can correct this difference. Also change migration to accept this bound. KVM_SET_TSC_KHZ depends on a kernel interface change. Without this change, the behaviour remains the same: in case of a different frequency between host and VM, KVM_SET_TSC_KHZ will fail and QEMU will exit. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Message-Id: <20200616165805.GA324612@fuller.cnet> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10target/i386: define a new MSR based feature word - FEAT_PERF_CAPABILITIESLike Xu
The Perfmon and Debug Capability MSR named IA32_PERF_CAPABILITIES is a feature-enumerating MSR, which only enumerates the feature full-width write (via bit 13) by now which indicates the processor supports IA32_A_PMCx interface for updating bits 32 and above of IA32_PMCx. The existence of MSR IA32_PERF_CAPABILITIES is enumerated by CPUID.1:ECX[15]. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Like Xu <like.xu@linux.intel.com> Message-Id: <20200529074347.124619-5-like.xu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10i386/kvm: fix a use-after-free when vcpu plug/unplugPan Nengyuan
When we hotplug vcpus, cpu_update_state is added to vm_change_state_head in kvm_arch_init_vcpu(). But it forgot to delete in kvm_arch_destroy_vcpu() after unplug. Then it will cause a use-after-free access. This patch delete it in kvm_arch_destroy_vcpu() to fix that. Reproducer: virsh setvcpus vm1 4 --live virsh setvcpus vm1 2 --live virsh suspend vm1 virsh resume vm1 The UAF stack: ==qemu-system-x86_64==28233==ERROR: AddressSanitizer: heap-use-after-free on address 0x62e00002e798 at pc 0x5573c6917d9e bp 0x7fff07139e50 sp 0x7fff07139e40 WRITE of size 1 at 0x62e00002e798 thread T0 #0 0x5573c6917d9d in cpu_update_state /mnt/sdb/qemu/target/i386/kvm.c:742 #1 0x5573c699121a in vm_state_notify /mnt/sdb/qemu/vl.c:1290 #2 0x5573c636287e in vm_prepare_start /mnt/sdb/qemu/cpus.c:2144 #3 0x5573c6362927 in vm_start /mnt/sdb/qemu/cpus.c:2150 #4 0x5573c71e8304 in qmp_cont /mnt/sdb/qemu/monitor/qmp-cmds.c:173 #5 0x5573c727cb1e in qmp_marshal_cont qapi/qapi-commands-misc.c:835 #6 0x5573c7694c7a in do_qmp_dispatch /mnt/sdb/qemu/qapi/qmp-dispatch.c:132 #7 0x5573c7694c7a in qmp_dispatch /mnt/sdb/qemu/qapi/qmp-dispatch.c:175 #8 0x5573c71d9110 in monitor_qmp_dispatch /mnt/sdb/qemu/monitor/qmp.c:145 #9 0x5573c71dad4f in monitor_qmp_bh_dispatcher /mnt/sdb/qemu/monitor/qmp.c:234 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200513132630.13412-1-pannengyuan@huawei.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-10i386/cpu: Store LAPIC bus frequency in CPU structureLiran Alon
No functional change. This information will be used by following patches. Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Message-Id: <20200312165431.82118-15-liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-14KVM: Move hwpoison page related functions into kvm-all.cDongjiu Geng
kvm_hwpoison_page_add() and kvm_unpoison_all() will both be used by X86 and ARM platforms, so moving them into "accel/kvm/kvm-all.c" to avoid duplicate code. For architectures that don't use the poison-list functionality the reset handler will harmlessly do nothing, so let's register the kvm_unpoison_all() function in the generic kvm_init() function. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: Xiang Zheng <zhengxiang9@huawei.com> Acked-by: Xiang Zheng <zhengxiang9@huawei.com> Message-id: 20200512030609.19593-8-gengdongjiu@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-04-02target/i386: do not set unsupported VMX secondary execution controlsVitaly Kuznetsov
Commit 048c95163b4 ("target/i386: work around KVM_GET_MSRS bug for secondary execution controls") added a workaround for KVM pre-dating commit 6defc591846d ("KVM: nVMX: include conditional controls in /dev/kvm KVM_GET_MSRS") which wasn't setting certain available controls. The workaround uses generic CPUID feature bits to set missing VMX controls. It was found that in some cases it is possible to observe hosts which have certain CPUID features but lack the corresponding VMX control. In particular, it was reported that Azure VMs have RDSEED but lack VMX_SECONDARY_EXEC_RDSEED_EXITING; attempts to enable this feature bit result in QEMU abort. Resolve the issue but not applying the workaround when we don't have to. As there is no good way to find out if KVM has the fix itself, use 95c5c7c77c ("KVM: nVMX: list VMX MSRs in KVM_GET_MSR_INDEX_LIST") instead as these [are supposed to] come together. Fixes: 048c95163b4 ("target/i386: work around KVM_GET_MSRS bug for secondary execution controls") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200331162752.1209928-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12target/i386: check for availability of MSR_IA32_UCODE_REV as an emulated MSRPaolo Bonzini
Even though MSR_IA32_UCODE_REV has been available long before Linux 5.6, which added it to the emulated MSR list, a bug caused the microcode version to revert to 0x100000000 on INIT. As a result, processors other than the bootstrap processor would not see the host microcode revision; some Windows version complain loudly about this and crash with a fairly explicit MICROCODE REVISION MISMATCH error. [If running 5.6 prereleases, the kernel fix "KVM: x86: do not reset microcode version on INIT or RESET" should also be applied.] Reported-by: Alex Williamson <alex.williamson@redhat.com> Message-id: <20200211175516.10716-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24accel: Replace current_machine->accelerator by current_accel() wrapperPhilippe Mathieu-Daudé
We actually want to access the accelerator, not the machine, so use the current_accel() wrapper instead. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200121110349.25842-10-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24target/i386: kvm: initialize microcode revision from KVMPaolo Bonzini
KVM can return the host microcode revision as a feature MSR. Use it as the default value for -cpu host. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1579544504-3616-4-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24target/i386: kvm: initialize feature MSRs very earlyPaolo Bonzini
Some read-only MSRs affect the behavior of ioctls such as KVM_SET_NESTED_STATE. We can initialize them once and for all right after the CPU is realized, since they will never be modified by the guest. Reported-by: Qingua Cheng <qcheng@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1579544504-3616-2-git-send-email-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-07x86: Check for machine state object class before typecasting itMichal Privoznik
In ed9e923c3c ("x86: move SMM property to X86MachineState", 2019-12-17) In v4.2.0-246-ged9e923c3c the SMM property was moved from PC machine class to x86 machine class. Makes sense, but the change was too aggressive: in target/i386/kvm.c:kvm_arch_init() it altered check which sets SMRAM if given machine has SMM enabled. The line that detects whether given machine object is class of PC_MACHINE was removed from the check. This makes qemu try to enable SMRAM for all machine types, which is not what we want. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Fixes: ed9e923c3c ("x86: move SMM property to X86MachineState", 2019-12-17) Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <7cc91bab3191bfd7e071bdd3fdf7fe2a2991deb0.1577692822.git.mprivozn@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-18target/i386: remove unused pci-assign codesEiichi Tsukata
Legacy PCI device assignment has been already removed in commit ab37bfc7d641 ("pci-assign: Remove"), but some codes remain unused. CC: qemu-trivial@nongnu.org Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Message-Id: <20191209072932.313056-1-devel@etsukata.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-17x86: move more x86-generic functions out of PC filesPaolo Bonzini
These are needed by microvm too, so move them outside of PC-specific files. With this patch, microvm.c need not include pc.h anymore. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-17x86: move SMM property to X86MachineStatePaolo Bonzini
Add it to microvm as well, it is a generic property of the x86 architecture. Suggested-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-17kvm: introduce kvm_kernel_irqchip_* functionsPaolo Bonzini
The KVMState struct is opaque, so provide accessors for the fields that will be moved from current_machine to the accelerator. For now they just forward to the machine object, but this will change. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-17kvm: convert "-machine kvm_shadow_mem" to an accelerator propertyPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-12-06target/i386: disable VMX features if nested=0Yang Zhong
If kvm does not support VMX feature by nested=0, the kvm_vmx_basic can't get the right value from MSR_IA32_VMX_BASIC register, which make qemu coredump when qemu do KVM_SET_MSRS. The coredump info: error: failed to set MSR 0x480 to 0x0 kvm_put_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Message-Id: <20191206071111.12128-1-yang.zhong@intel.com> Reported-by: Catherine Ho <catherine.hecx@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-11-21target/i386: add support for MSR_IA32_TSX_CTRLPaolo Bonzini
The MSR_IA32_TSX_CTRL MSR can be used to hide TSX (also known as the Trusty Side-channel Extension). By virtualizing the MSR, KVM guests can disable TSX and avoid paying the price of mitigating TSX-based attacks on microarchitectural side channels. Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-23target/i386: Add support for save/load IA32_UMWAIT_CONTROL MSRTao Xu
UMWAIT and TPAUSE instructions use 32bits IA32_UMWAIT_CONTROL at MSR index E1H to determines the maximum time in TSC-quanta that the processor can reside in either C0.1 or C0.2. This patch is to Add support for save/load IA32_UMWAIT_CONTROL MSR in guest. Co-developed-by: Jingqi Liu <jingqi.liu@intel.com> Signed-off-by: Jingqi Liu <jingqi.liu@intel.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20191011074103.30393-3-tao3.xu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-23x86/cpu: Add support for UMONITOR/UMWAIT/TPAUSETao Xu
UMONITOR, UMWAIT and TPAUSE are a set of user wait instructions. This patch adds support for user wait instructions in KVM. Availability of the user wait instructions is indicated by the presence of the CPUID feature flag WAITPKG CPUID.0x07.0x0:ECX[5]. User wait instructions may be executed at any privilege level, and use IA32_UMWAIT_CONTROL MSR to set the maximum time. The patch enable the umonitor, umwait and tpause features in KVM. Because umwait and tpause can put a (psysical) CPU into a power saving state, by default we dont't expose it to kvm and enable it only when guest CPUID has it. And use QEMU command-line "-overcommit cpu-pm=on" (enable_cpu_pm is enabled), a VM can use UMONITOR, UMWAIT and TPAUSE instructions. If the instruction causes a delay, the amount of time delayed is called here the physical delay. The physical delay is first computed by determining the virtual delay (the time to delay relative to the VM’s timestamp counter). Otherwise, UMONITOR, UMWAIT and TPAUSE cause an invalid-opcode exception(#UD). The release document ref below link: https://software.intel.com/sites/default/files/\ managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf Co-developed-by: Jingqi Liu <jingqi.liu@intel.com> Signed-off-by: Jingqi Liu <jingqi.liu@intel.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20191011074103.30393-2-tao3.xu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22i386/kvm: add NoNonArchitecturalCoreSharing Hyper-V enlightenmentVitaly Kuznetsov
Hyper-V TLFS specifies this enlightenment as: "NoNonArchitecturalCoreSharing - Indicates that a virtual processor will never share a physical core with another virtual processor, except for virtual processors that are reported as sibling SMT threads. This can be used as an optimization to avoid the performance overhead of STIBP". However, STIBP is not the only implication. It was found that Hyper-V on KVM doesn't pass MD_CLEAR bit to its guests if it doesn't see NoNonArchitecturalCoreSharing bit. KVM reports NoNonArchitecturalCoreSharing in KVM_GET_SUPPORTED_HV_CPUID to indicate that SMT on the host is impossible (not supported of forcefully disabled). Implement NoNonArchitecturalCoreSharing support in QEMU as tristate: 'off' - the feature is disabled (default) 'on' - the feature is enabled. This is only safe if vCPUS are properly pinned and correct topology is exposed. As CPU pinning is done outside of QEMU the enablement decision will be made on a higher level. 'auto' - copy KVM setting. As during live migration SMT settings on the source and destination host may differ this requires us to add a migration blocker. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20191018163908.10246-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-22target/i386: log MCE guest and host addressesMario Smarduch
Patch logs MCE AO, AR messages injected to guest or taken by QEMU itself. We print the QEMU address for guest MCEs, helps on hypervisors that have another source of MCE logging like mce log, and when they go missing. For example we found these QEMU logs: September 26th 2019, 17:36:02.309 Droplet-153258224: Guest MCE Memory Error at qemu addr 0x7f8ce14f5000 and guest 3d6f5000 addr of type BUS_MCEERR_AR injected qemu-system-x86_64 amsN ams3nodeNNNN September 27th 2019, 06:25:03.234 Droplet-153258224: Guest MCE Memory Error at qemu addr 0x7f8ce14f5000 and guest 3d6f5000 addr of type BUS_MCEERR_AR injected qemu-system-x86_64 amsN ams3nodeNNNN The first log had a corresponding mce log entry, the second didnt (logging thresholds) we can infer from second entry same PA and mce type. Signed-off-by: Mario Smarduch <msmarduch@digitalocean.com> Message-Id: <20191009164459.8209-3-msmarduch@digitalocean.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-15i386: Omit all-zeroes entries from KVM CPUID tableEduardo Habkost
KVM has a 80-entry limit at KVM_SET_CPUID2. With the introduction of CPUID[0x1F], it is now possible to hit this limit with unusual CPU configurations, e.g.: $ ./x86_64-softmmu/qemu-system-x86_64 \ -smp 1,dies=2,maxcpus=2 \ -cpu EPYC,check=off,enforce=off \ -machine accel=kvm qemu-system-x86_64: kvm_init_vcpu failed: Argument list too long This happens because QEMU adds a lot of all-zeroes CPUID entries for unused CPUID leaves. In the example above, we end up creating 48 all-zeroes CPUID entries. KVM already returns all-zeroes when emulating the CPUID instruction if an entry is missing, so the all-zeroes entries are redundant. Skip those entries. This reduces the CPUID table size by half while keeping CPUID output unchanged. Reported-by: Yumei Huang <yuhuang@redhat.com> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1741508 Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190822225210.32541-1-ehabkost@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-10-04target/i386/kvm: Silence warning from Valgrind about uninitialized bytesThomas Huth
When I run QEMU with KVM under Valgrind, I currently get this warning: Syscall param ioctl(generic) points to uninitialised byte(s) at 0x95BA45B: ioctl (in /usr/lib64/libc-2.28.so) by 0x429DC3: kvm_ioctl (kvm-all.c:2365) by 0x51B249: kvm_arch_get_supported_msr_feature (kvm.c:469) by 0x4C2A49: x86_cpu_get_supported_feature_word (cpu.c:3765) by 0x4C4116: x86_cpu_expand_features (cpu.c:5065) by 0x4C7F8D: x86_cpu_realizefn (cpu.c:5242) by 0x5961F3: device_set_realized (qdev.c:835) by 0x7038F6: property_set_bool (object.c:2080) by 0x707EFE: object_property_set_qobject (qom-qobject.c:26) by 0x705814: object_property_set_bool (object.c:1338) by 0x498435: pc_new_cpu (pc.c:1549) by 0x49C67D: pc_cpus_init (pc.c:1681) Address 0x1ffeffee74 is on thread 1's stack in frame #2, created by kvm_arch_get_supported_msr_feature (kvm.c:445) It's harmless, but a little bit annoying, so silence it by properly initializing the whole structure with zeroes. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-04target/i386: work around KVM_GET_MSRS bug for secondary execution controlsPaolo Bonzini
Some secondary controls are automatically enabled/disabled based on the CPUID values that are set for the guest. However, they are still available at a global level and therefore should be present when KVM_GET_MSRS is sent to /dev/kvm. Unfortunately KVM forgot to include those, so fix that. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-04target/i386: add VMX featuresPaolo Bonzini
Add code to convert the VMX feature words back into MSR values, allowing the user to enable/disable VMX features as they wish. The same infrastructure enables support for limiting VMX features in named CPU models. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-04target/i386: expand feature words to 64 bitsPaolo Bonzini
VMX requires 64-bit feature words for the IA32_VMX_EPT_VPID_CAP and IA32_VMX_BASIC MSRs. (The VMX control MSRs are 64-bit wide but actually have only 32 bits of information). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-16hw/i386/pc: Extract e820 memory layout codePhilippe Mathieu-Daudé
Suggested-by: Samuel Ortiz <sameo@linux.intel.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190818225414.22590-3-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-16i386/kvm: support guest access CORE cstateWanpeng Li
Allow guest reads CORE cstate when exposing host CPU power management capabilities to the guest. PKG cstate is restricted to avoid a guest to get the whole package information in multi-tenant scenario. Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1563154124-18579-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-20x86: Intel AVX512_BF16 feature enablingJing Liu
Intel CooperLake cpu adds AVX512_BF16 instruction, defining as CPUID.(EAX=7,ECX=1):EAX[bit 05]. The patch adds a property for setting the subleaf of CPUID leaf 7 in case that people would like to specify it. The release spec link as follows, https://software.intel.com/sites/default/files/managed/c5/15/\ architecture-instruction-set-extensions-programming-reference.pdf Signed-off-by: Jing Liu <jing2.liu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-20i386/kvm: initialize struct at full before ioctl callAndrey Shinkevich
Not the whole structure is initialized before passing it to the KVM. Reduce the number of Valgrind reports. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Message-Id: <1564502498-805893-4-git-send-email-andrey.shinkevich@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>