aboutsummaryrefslogtreecommitdiff
path: root/target/i386
AgeCommit message (Collapse)Author
2024-05-25target/i386: clean up repeated string operationsPaolo Bonzini
Do not bother generating inline wrappers for gen_repz and gen_repz2; use s->prefix to separate REPZ from REPNZ in the case of SCAS and CMPS. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25target/i386: introduce gen_lea_ss_ofsPaolo Bonzini
Generalize gen_stack_A0() to include an initial add and to use an arbitrary destination. This is a common pattern and it is not a huge burden to add the extra arguments to the only caller of gen_stack_A0(). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25target/i386: use mo_stacksize morePaolo Bonzini
Use mo_stacksize for all stack accesses, including when a 64-bit code segment is impossible and the code is therefore checking only for SS32(s). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25target/i386: inline gen_add_A0_ds_segPaolo Bonzini
It is only used in MONITOR, where a direct call of gen_lea_v_seg is simpler, and in XLAT. Inline it in the latter. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25target/i386: split gen_ldst_modrm for load and storePaolo Bonzini
The is_store argument of gen_ldst_modrm has only ever been passed a constant. Just split the function in two. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25target/i386: reg in gen_ldst_modrm is always OR_TMP0Paolo Bonzini
Values other than OR_TMP0 were only ever used by MOV and MOVNTI opcodes. Now that these have been converted to the new decoder, remove the argument. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25target/i386: raze the gen_eob* junglePaolo Bonzini
Make gen_eob take the DISAS_* constant as an argument, so that it is not necessary to have wrappers around it. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25target/i386: assert that gen_update_eip_cur and gen_update_eip_next are the ↵Paolo Bonzini
same in tb_stop This is an invariant now that there are no calls to gen_eob_inhibit_irq() outside tb_stop. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25target/i386: avoid calling gen_eob_inhibit_irq before tb_stopPaolo Bonzini
sti only has one exit, so it does not need to generate the end-of-translation code inline. It can be deferred to tb_stop. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25target/i386: avoid calling gen_eob_syscall before tb_stopPaolo Bonzini
syscall and sysret only have one exit, so they do not need to generate the end-of-translation code inline. It can be deferred to tb_stop. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25target/i386: document and group DISAS_* constantsPaolo Bonzini
Place DISAS_* constants that update cpu_eip first, and the "jump" ones last. Add comments explaining the differences and usage. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25target/i386: set CC_OP in helpers if they want CC_OP_EFLAGSPaolo Bonzini
Mark cc_op as clean and do not spill it at the end of the translation block. Technically this is a tiny bit less efficient, but: * it results in translations that are a tiny bit smaller * for most of these instructions, it is not unlikely that they are close to the end of the basic block, in which case cc_op would not be overwritten * anyway the cost is probably dwarfed by that of computing flags. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25target/i386: cpu_load_eflags already sets cc_opPaolo Bonzini
No need to set it again at the end of the translation block, cc_op_dirty can be set to false. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25target/i386: remove unnecessary gen_update_cc_op before gen_eob*Paolo Bonzini
This is already handled in gen_eob(). Before adding another DISAS_* case, remove the double calls. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25target/i386: cleanup eob handling of RSMPaolo Bonzini
gen_helper_rsm cannot generate an exception, and reloads the flags. So there's no need to spill cc_op and update cpu_eip, but on the other hand cc_op must be reset to CC_OP_EFLAGS before returning. It all works by chance, because by spilling cc_op before the call to the helper, it becomes non-dirty and gen_eob will not overwrite the CC_OP_EFLAGS value that is placed there by the helper. But let's clean it up. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25target/i386: no single-step exception after MOV or POP SSPaolo Bonzini
Intel SDM 18.3.1.4 "If an occurrence of the MOV or POP instruction loads the SS register executes with EFLAGS.TF = 1, no single-step debug exception occurs following the MOV or POP instruction." Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-25target/i386: disable jmp_opt if EFLAGS.RF is 1Paolo Bonzini
If EFLAGS.RF is 1, special processing in gen_eob_worker() is needed and therefore goto_tb cannot be used. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22target-i386: hyper-v: Correct kvm_hv_handle_exit return valuedonsheng
This bug fix addresses the incorrect return value of kvm_hv_handle_exit for KVM_EXIT_HYPERV_SYNIC, which should be EXCP_INTERRUPT. Handling of KVM_EXIT_HYPERV_SYNIC in QEMU needs to be synchronous. This means that async_synic_update should run in the current QEMU vCPU thread before returning to KVM, returning EXCP_INTERRUPT to guarantee this. Returning 0 can cause async_synic_update to run asynchronously. One problem (kvm-unit-tests's hyperv_synic test fails with timeout error) caused by this bug: When a guest VM writes to the HV_X64_MSR_SCONTROL MSR to enable Hyper-V SynIC, a VM exit is triggered and processed by the kvm_hv_handle_exit function of the QEMU vCPU. This function then calls the async_synic_update function to set synic->sctl_enabled to true. A true value of synic->sctl_enabled is required before creating SINT routes using the hyperv_sint_route_new() function. If kvm_hv_handle_exit returns 0 for KVM_EXIT_HYPERV_SYNIC, the current QEMU vCPU thread may return to KVM and enter the guest VM before running async_synic_update. In such case, the hyperv_synic test’s subsequent call to synic_ctl(HV_TEST_DEV_SINT_ROUTE_CREATE, ...) immediately after writing to HV_X64_MSR_SCONTROL can cause QEMU’s hyperv_sint_route_new() function to return prematurely (because synic->sctl_enabled is false). If the SINT route is not created successfully, the SINT interrupt will not be fired, resulting in a timeout error in the hyperv_synic test. Fixes: 267e071bd6d6 (“hyperv: make overlay pages for SynIC”) Suggested-by: Chao Gao <chao.gao@intel.com> Signed-off-by: Dongsheng Zhang <dongsheng.x.zhang@intel.com> Message-ID: <20240521200114.11588-1-dongsheng.x.zhang@intel.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22i386/cpu: Use CPUCacheInfo.share_level to encode CPUID[0x8000001D].EAX[bits ↵Zhao Liu
25:14] CPUID[0x8000001D].EAX[bits 25:14] NumSharingCache: number of logical processors sharing cache. The number of logical processors sharing this cache is NumSharingCache + 1. After cache models have topology information, we can use CPUCacheInfo.share_level to decide which topology level to be encoded into CPUID[0x8000001D].EAX[bits 25:14]. Tested-by: Yongwei Ma <yongwei.ma@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Message-ID: <20240424154929.1487382-22-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22i386/cpu: Use CPUCacheInfo.share_level to encode CPUID[4]Zhao Liu
CPUID[4].EAX[bits 25:14] is used to represent the cache topology for Intel CPUs. After cache models have topology information, we can use CPUCacheInfo.share_level to decide which topology level to be encoded into CPUID[4].EAX[bits 25:14]. And since with the helper max_processor_ids_for_cache(), the filed CPUID[4].EAX[bits 25:14] (original virable "num_apic_ids") is parsed based on cpu topology levels, which are verified when parsing -smp, it's no need to check this value by "assert(num_apic_ids > 0)" again, so remove this assert(). Additionally, wrap the encoding of CPUID[4].EAX[bits 31:26] into a helper to make the code cleaner. Tested-by: Yongwei Ma <yongwei.ma@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Message-ID: <20240424154929.1487382-21-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22i386: Add cache topology info in CPUCacheInfoZhao Liu
Currently, by default, the cache topology is encoded as: 1. i/d cache is shared in one core. 2. L2 cache is shared in one core. 3. L3 cache is shared in one die. This default general setting has caused a misunderstanding, that is, the cache topology is completely equated with a specific cpu topology, such as the connection between L2 cache and core level, and the connection between L3 cache and die level. In fact, the settings of these topologies depend on the specific platform and are not static. For example, on Alder Lake-P, every four Atom cores share the same L2 cache. Thus, we should explicitly define the corresponding cache topology for different cache models to increase scalability. Except legacy_l2_cache_cpuid2 (its default topo level is CPU_TOPO_LEVEL_UNKNOW), explicitly set the corresponding topology level for all other cache models. In order to be compatible with the existing cache topology, set the CPU_TOPO_LEVEL_CORE level for the i/d cache, set the CPU_TOPO_LEVEL_CORE level for L2 cache, and set the CPU_TOPO_LEVEL_DIE level for L3 cache. The field for CPUID[4].EAX[bits 25:14] or CPUID[0x8000001D].EAX[bits 25:14] will be set based on CPUCacheInfo.share_level. Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Yongwei Ma <yongwei.ma@intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20240424154929.1487382-20-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22i386/cpu: Introduce module-id to X86CPUZhao Liu
Introduce module-id to be consistent with the module-id field in CpuInstanceProperties. Following the legacy smp check rules, also add the module_id validity into x86_cpu_pre_plug(). Tested-by: Yongwei Ma <yongwei.ma@intel.com> Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com> Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Message-ID: <20240424154929.1487382-17-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22i386: Expose module level in CPUID[0x1F]Zhao Liu
Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms") is able to handle platforms with Module level enumerated via CPUID.1F. Expose the module level in CPUID[0x1F] if the machine has more than 1 modules. Tested-by: Yongwei Ma <yongwei.ma@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Message-ID: <20240424154929.1487382-15-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22i386: Support modules_per_die in X86CPUTopoInfoZhao Liu
Support module level in i386 cpu topology structure "X86CPUTopoInfo". Since x86 does not yet support the "modules" parameter in "-smp", X86CPUTopoInfo.modules_per_die is currently always 1. Therefore, the module level width in APIC ID, which can be calculated by "apicid_bitwidth_for_count(topo_info->modules_per_die)", is always 0 for now, so we can directly add APIC ID related helpers to support module level parsing. In addition, update topology structure in test-x86-topo.c. Tested-by: Yongwei Ma <yongwei.ma@intel.com> Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com> Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Message-ID: <20240424154929.1487382-14-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22i386: Introduce module level cpu topology to CPUX86StateZhao Liu
Intel CPUs implement module level on hybrid client products (e.g., ADL-N, MTL, etc) and E-core server products. A module contains a set of cores that share certain resources (in current products, the resource usually includes L2 cache, as well as module scoped features and MSRs). Module level support is the prerequisite for L2 cache topology on module level. With module level, we can implement the Guest's CPU topology and future cache topology to be consistent with the Host's on Intel hybrid client/E-core server platforms. Tested-by: Yongwei Ma <yongwei.ma@intel.com> Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com> Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Message-ID: <20240424154929.1487382-13-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22i386/cpu: Decouple CPUID[0x1F] subleaf with specific topology levelZhao Liu
At present, the subleaf 0x02 of CPUID[0x1F] is bound to the "die" level. In fact, the specific topology level exposed in 0x1F depends on the platform's support for extension levels (module, tile and die). To help expose "module" level in 0x1F, decouple CPUID[0x1F] subleaf with specific topology level. Tested-by: Yongwei Ma <yongwei.ma@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240424154929.1487382-12-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22i386: Split topology types of CPUID[0x1F] from the definitions of CPUID[0xB]Zhao Liu
CPUID[0xB] defines SMT, Core and Invalid types, and this leaf is shared by Intel and AMD CPUs. But for extended topology levels, Intel CPU (in CPUID[0x1F]) and AMD CPU (in CPUID[0x80000026]) have the different definitions with different enumeration values. Though CPUID[0x80000026] hasn't been implemented in QEMU, to avoid possible misunderstanding, split topology types of CPUID[0x1F] from the definitions of CPUID[0xB] and introduce CPUID[0x1F]-specific topology types. Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Yongwei Ma <yongwei.ma@intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Babu Moger <babu.moger@amd.com> Message-ID: <20240424154929.1487382-11-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22i386/cpu: Introduce bitmap to cache available CPU topology levelsZhao Liu
Currently, QEMU checks the specify number of topology domains to detect if there's extended topology levels (e.g., checking nr_dies). With this bitmap, the extended CPU topology (the levels other than SMT, core and package) could be easier to detect without touching the topology details. This is also in preparation for the follow-up to decouple CPUID[0x1F] subleaf with specific topology level. Tested-by: Yongwei Ma <yongwei.ma@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240424154929.1487382-10-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22i386/cpu: Consolidate the use of topo_info in cpu_x86_cpuid()Zhao Liu
In cpu_x86_cpuid(), there are many variables in representing the cpu topology, e.g., topo_info, cs->nr_cores and cs->nr_threads. Since the names of cs->nr_cores and cs->nr_threads do not accurately represent its meaning, the use of cs->nr_cores or cs->nr_threads is prone to confusion and mistakes. And the structure X86CPUTopoInfo names its members clearly, thus the variable "topo_info" should be preferred. In addition, in cpu_x86_cpuid(), to uniformly use the topology variable, replace env->dies with topo_info.dies_per_pkg as well. Suggested-by: Robert Hoo <robert.hu@linux.intel.com> Tested-by: Yongwei Ma <yongwei.ma@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Babu Moger <babu.moger@amd.com> Message-ID: <20240424154929.1487382-9-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22i386/cpu: Use APIC ID info get NumSharingCache for ↵Zhao Liu
CPUID[0x8000001D].EAX[bits 25:14] The commit 8f4202fb1080 ("i386: Populate AMD Processor Cache Information for cpuid 0x8000001D") adds the cache topology for AMD CPU by encoding the number of sharing threads directly. From AMD's APM, NumSharingCache (CPUID[0x8000001D].EAX[bits 25:14]) means [1]: The number of logical processors sharing this cache is the value of this field incremented by 1. To determine which logical processors are sharing a cache, determine a Share Id for each processor as follows: ShareId = LocalApicId >> log2(NumSharingCache+1) Logical processors with the same ShareId then share a cache. If NumSharingCache+1 is not a power of two, round it up to the next power of two. From the description above, the calculation of this field should be same as CPUID[4].EAX[bits 25:14] for Intel CPUs. So also use the offsets of APIC ID to calculate this field. [1]: APM, vol.3, appendix.E.4.15 Function 8000_001Dh--Cache Topology Information Tested-by: Yongwei Ma <yongwei.ma@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240424154929.1487382-8-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22i386/cpu: Use APIC ID info to encode cache topo in CPUID[4]Zhao Liu
Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the nearest power-of-2 integer. The nearest power-of-2 integer can be calculated by pow2ceil() or by using APIC ID offset/width (like L3 topology using 1 << die_offset [3]). But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] are associated with APIC ID. For example, in linux kernel, the field "num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID. And for another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not matched with actual core numbers and it's calculated by: "(1 << (pkg_offset - core_offset)) - 1". Therefore the topology information of APIC ID should be preferred to calculate nearest power-of-2 integer for CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]: 1. d/i cache is shared in a core, 1 << core_offset should be used instead of "cs->nr_threads" in encode_cache_cpuid4() for CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14]. 2. L2 cache is supposed to be shared in a core as for now, thereby 1 << core_offset should also be used instead of "cs->nr_threads" in encode_cache_cpuid4() for CPUID.04H.02H:EAX[bits 25:14]. 3. Similarly, the value for CPUID.04H:EAX[bits 31:26] should also be calculated with the bit width between the package and SMT levels in the APIC ID (1 << (pkg_offset - core_offset) - 1). In addition, use APIC ID bits calculations to replace "pow2ceil()" for cache_info_passthrough case. [1]: efb3934adf9e ("x86: cpu: make sure number of addressable IDs for processor cores meets the spec") [2]: d7caf13b5fcf ("x86: cpu: fixup number of addressable IDs for logical processors sharing cache") [3]: d65af288a84d ("i386: Update new x86_apicid parsing rules with die_offset support") Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently") Suggested-by: Robert Hoo <robert.hu@linux.intel.com> Tested-by: Yongwei Ma <yongwei.ma@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Message-ID: <20240424154929.1487382-7-zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22i386/cpu: Fix i/d-cache topology to core level for Intel CPUZhao Liu
For i-cache and d-cache, current QEMU hardcodes the maximum IDs for CPUs sharing cache (CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14]) to 0, and this means i-cache and d-cache are shared in the SMT level. This is correct if there's single thread per core, but is wrong for the hyper threading case (one core contains multiple threads) since the i-cache and d-cache are shared in the core level other than SMT level. For AMD CPU, commit 8f4202fb1080 ("i386: Populate AMD Processor Cache Information for cpuid 0x8000001D") has already introduced i/d cache topology as core level by default. Therefore, in order to be compatible with both multi-threaded and single-threaded situations, we should set i-cache and d-cache be shared at the core level by default. This fix changes the default i/d cache topology from per-thread to per-core. Potentially, this change in L1 cache topology may affect the performance of the VM if the user does not specifically specify the topology or bind the vCPU. However, the way to achieve optimal performance should be to create a reasonable topology and set the appropriate vCPU affinity without relying on QEMU's default topology structure. Fixes: 7e3482f82480 ("i386: Helpers to encode cache information consistently") Suggested-by: Robert Hoo <robert.hu@linux.intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Yongwei Ma <yongwei.ma@intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20240424154929.1487382-6-zhao1.liu@intel.com> [Add compat property. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22target/i386: add control bits support for LAMBinbin Wu
LAM uses CR3[61] and CR3[62] to configure/enable LAM on user pointers. LAM uses CR4[28] to configure/enable LAM on supervisor pointers. For CR3 LAM bits, no additional handling needed: - TCG LAM is not supported for TCG of target-i386. helper_write_crN() and helper_vmrun() check max physical address bits before calling cpu_x86_update_cr3(), no change needed, i.e. CR3 LAM bits are not allowed to be set in TCG. - gdbstub x86_cpu_gdb_write_register() will call cpu_x86_update_cr3() to update cr3. Allow gdb to set the LAM bit(s) to CR3, if vcpu doesn't support LAM, KVM_SET_SREGS will fail as other reserved bits. For CR4 LAM bit, its reservation depends on vcpu supporting LAM feature or not. - TCG LAM is not supported for TCG of target-i386. helper_write_crN() and helper_vmrun() check CR4 reserved bit before calling cpu_x86_update_cr4(), i.e. CR4 LAM bit is not allowed to be set in TCG. - gdbstub x86_cpu_gdb_write_register() will call cpu_x86_update_cr4() to update cr4. Mask out LAM bit on CR4 if vcpu doesn't support LAM. - x86_cpu_reset_hold() doesn't need special handling. Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-ID: <20240112060042.19925-3-binbin.wu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22target/i386: add support for LAM in CPUID enumerationRobert Hoo
Linear Address Masking (LAM) is a new Intel CPU feature, which allows software to use of the untranslated address bits for metadata. The bit definition: CPUID.(EAX=7,ECX=1):EAX[26] Add CPUID definition for LAM. Note LAM feature is not supported for TCG of target-i386, LAM CPIUD bit will not be added to TCG_7_1_EAX_FEATURES. More info can be found in Intel ISE Chapter "LINEAR ADDRESS MASKING(LAM)" https://cdrdv2.intel.com/v1/dl/getContent/671368 Signed-off-by: Robert Hoo <robert.hu@linux.intel.com> Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-ID: <20240112060042.19925-2-binbin.wu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22target/i386: clean up AAM/AADPaolo Bonzini
The 32-bit AAM/AAD opcodes are using helpers that read and write flags and env->regs[R_EAX]. Clean them up so that the table correctly includes AX as a 16-bit input and output. No real reason to do it to be honest, but they are nice one-output helpers and it removes the masking of env->regs[R_EAX] that generic load/writeback code already does. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240522123912.608497-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-22target/i386: generate simpler code for ROL/ROR with immediate countPaolo Bonzini
gen_rot_carry and gen_rot_overflow are meant to be called with count == NULL if the count cannot be zero. However this is not done in gen_ROL and gen_ROR, and writing everywhere "can_be_zero ? count : NULL" is burdensome and less readable. Just pass can_be_zero as a separate argument. gen_RCL and gen_RCR use a conditional branch to skip the computation if count is zero, so they can pass false unconditionally to gen_rot_overflow. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240522123914.608516-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-15target/i386: Use translator_ldub for everythingRichard Henderson
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-05-15accel/tcg: Provide default implementation of disas_logRichard Henderson
Almost all of the disas_log implementations are identical. Unify them within translator_loop. Drop extra Priv/Virt logging from target/riscv. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-05-10i386: select correct components for no-board buildPaolo Bonzini
The local APIC is a part of the CPU and has callbacks that are invoked from multiple accelerators. The IOAPIC on the other hand is optional, but ioapic_eoi_broadcast is used by common x86 code to implement the IOAPIC's implicit EOI mode. Add a stub in case the IOAPIC device is not included but the APIC is. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-ID: <20240509170044.190795-13-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-10target/i386: fix feature dependency for WAITPKGPaolo Bonzini
The VMX feature bit depends on general availability of WAITPKG, not the other way round. Fixes: 33cc88261c3 ("target/i386: add support for VMX_SECONDARY_EXEC_ENABLE_USER_WAIT_PAUSE", 2023-08-28) Cc: qemu-stable@nongnu.org Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-10target/i386: move prefetch and multi-byte UD/NOP to new decoderPaolo Bonzini
These are trivial to add, and moving them to the new decoder fixes some corner cases: raising #UD instead of an instruction fetch page fault for the undefined opcodes, and incorrectly rejecting 0F 18 prefetches with register operands (which are treated as reserved NOPs). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-10target/i386: rdpkru/wrpkru are no-prefix instructionsPaolo Bonzini
Reject 0x66/0xf3/0xf2 in front of them. Cc: qemu-stable@nongnu.org Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-10target/i386: fix operand size for DATA16 REX.W POPCNTPaolo Bonzini
According to the manual, 32-bit vs 64-bit is governed by REX.W and REX ignores the 0x66 prefix. This can be confirmed with this program: #include <stdio.h> int main() { int x = 0x12340000; int y; asm("popcntl %1, %0" : "=r" (y) : "r" (x)); printf("%x\n", y); asm("mov $-1, %0; .byte 0x66; popcntl %1, %0" : "+r" (y) : "r" (x)); printf("%x\n", y); asm("mov $-1, %0; .byte 0x66; popcntq %q1, %q0" : "+r" (y) : "r" (x)); printf("%x\n", y); } which prints 5/ffff0000/5 on real hardware and 5/ffff0000/ffff0000 on QEMU. Cc: qemu-stable@nongnu.org Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-10target/i386: remove PCOMMIT from TCG, deprecate propertyPaolo Bonzini
The PCOMMIT instruction was never included in any physical processor. TCG implements it as a no-op instruction, but its utility is debatable to say the least. Drop it from the decoder since it is only available with "-cpu max", which does not guarantee migration compatibility across versions, and deprecate the property just in case someone is using it as "pcommit=off". Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-09misc: Use QEMU header path relative to include/ directoryPhilippe Mathieu-Daudé
QEMU headers are relative to the include/ directory, not to the project root directory. Remove "include/". See also: https://www.qemu.org/docs/master/devel/style.html#include-directives Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240507142737.95735-1-philmd@linaro.org>
2024-05-07target/i386: remove duplicate prefix decodingPaolo Bonzini
Now that a bulk of opcodes go through the new decoder, it is sensible to do some cleanup. Go immediately through disas_insn_new and only jump back after parsing the prefixes. disas_insn() now only contains the three sigsetjmp cases, and they are more easily managed if they are inlined into i386_tr_translate_insn. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-07target/i386: split legacy decoder into a separate functionPaolo Bonzini
Split the bits that have some duplication with disas_insn_new, from those that should be the main topic of the conversion. This is the first step towards removing duplicate decoding of prefixes between disas_insn and disas_insn_new. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-07target/i386: decode x87 instructions in a separate functionPaolo Bonzini
These are unlikely to be converted to the table-based decoding soon (perhaps there could be generic ESC decoding in decode-new.c.inc for the Mod/RM byte, but not operand decoding), so keep them separate from the remaining legacy-decoded instructions. Acked-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-07target/i386: remove now-converted opcodes from old decoderPaolo Bonzini
Send all converted opcodes to disas_insn_new() directly from the big decoding switch statement; once more, the debugging/bisecting logic disappears. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-05-07target/i386: port extensions of one-byte opcodes to new decoderPaolo Bonzini
A few two-byte opcodes are simple extensions of existing one-byte opcodes; they are easy to decode and need no change to emit.c.inc. Port them to the new decoder. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>