aboutsummaryrefslogtreecommitdiff
path: root/target/i386/tcg
AgeCommit message (Collapse)Author
2023-03-01target/i386: Replace `tb_pc()` with `tb->pc`Anton Johansson
Signed-off-by: Anton Johansson <anjo@rev.ng> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230227135202.9710-23-anjo@rev.ng> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-01target/i386: Replace `TARGET_TB_PCREL` with `CF_PCREL`Anton Johansson
Signed-off-by: Anton Johansson <anjo@rev.ng> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230227135202.9710-8-anjo@rev.ng> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-02-28accel/tcg: Add 'size' param to probe_access_fullRichard Henderson
Change to match the recent change to probe_access_flags. All existing callers updated to supply 0, so no change in behaviour. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-02-27target/i386: Fix BZHI instructionRichard Henderson
We did not correctly handle N >= operand size. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1374 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230114233206.3118472-1-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-16target/i386: Fix 32-bit AD[CO]X insns in 64-bit modeRichard Henderson
Failure to truncate the inputs results in garbage for the carry-out. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1373 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230115012103.3131796-1-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-11target/i386: fix ADOX followed by ADCXPaolo Bonzini
When ADCX is followed by ADOX or vice versa, the second instruction's carry comes from EFLAGS and the condition codes use the CC_OP_ADCOX operation. Retrieving the carry from EFLAGS is handled by this bit of gen_ADCOX: tcg_gen_extract_tl(carry_in, cpu_cc_src, ctz32(cc_op == CC_OP_ADCX ? CC_C : CC_O), 1); Unfortunately, in this case cc_op has been overwritten by the previous "if" statement to CC_OP_ADCOX. This works by chance when the first instruction is ADCX; however, if the first instruction is ADOX, ADCX will incorrectly take its carry from OF instead of CF. Fix by moving the computation of the new cc_op at the end of the function. The included exhaustive test case fails without this patch and passes afterwards. Because ADCX/ADOX need not be invoked through the VEX prefix, this regression bisects to commit 16fc5726a6e2 ("target/i386: reimplement 0x0f 0x38, add AVX", 2022-10-18). However, the mistake happened a little earlier, when BMI instructions were rewritten using the new decoder framework. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1471 Reported-by: Paul Jolly <https://gitlab.com/myitcv> Fixes: 1d0b926150e5 ("target/i386: move scalar 0F 38 and 0F 3A instruction to new decoder", 2022-10-18) Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-11target/i386: Fix C flag for BLSI, BLSMSK, BLSRRichard Henderson
We forgot to set cc_src, which is used for computing C. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1370 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230114180601.2993644-1-richard.henderson@linaro.org> Cc: qemu-stable@nongnu.org Fixes: 1d0b926150e5 ("target/i386: move scalar 0F 38 and 0F 3A instruction to new decoder", 2022-10-18) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-11target/i386: Fix BEXTR instructionRichard Henderson
There were two problems here: not limiting the input to operand bits, and not correctly handling large extraction length. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1372 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230114230542.3116013-3-richard.henderson@linaro.org> Cc: qemu-stable@nongnu.org Fixes: 1d0b926150e5 ("target/i386: move scalar 0F 38 and 0F 3A instruction to new decoder", 2022-10-18) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-04target/i386: Inline cmpxchg16bRichard Henderson
Use tcg_gen_atomic_cmpxchg_i128 for the atomic case, and tcg_gen_qemu_ld/st_i128 otherwise. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-02-04target/i386: Inline cmpxchg8bRichard Henderson
Use tcg_gen_atomic_cmpxchg_i64 for the atomic case, and tcg_gen_nonatomic_cmpxchg_i64 otherwise. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-02-04target/i386: Split out gen_cmpxchg8b, gen_cmpxchg16bRichard Henderson
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-01-11target/i386: fix operand size of unary SSE operationsPaolo Bonzini
VRCPSS, VRSQRTSS and VCVTSx2Sx have a 32-bit or 64-bit memory operand, which is represented in the decoding tables by X86_VEX_REPScalar. Add it to the tables, and make validate_vex() handle the case of an instruction that is in exception type 4 without the REP prefix and exception type 5 with it; this is the cas of VRCP and VRSQRT. Reported-by: yongwoo <https://gitlab.com/yongwoo36> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1377 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11i386: Emit correct error code for 64-bit IDT entryJoe Richey
When in 64-bit mode, IDT entiries are 16 bytes, so `intno * 16` is used for base/limit/offset calculations. However, even in 64-bit mode, the exception error code still uses bits [3,16) for the invlaid interrupt index. This means the error code should still be `intno * 8 + 2` even in 64-bit mode. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1382 Signed-off-by: Joe Richey <joerichey@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-01target/i386: Always completely initialize TranslateFaultRichard Henderson
In get_physical_address, the canonical address check failed to set TranslateFault.stage2, which resulted in an uninitialized read from the struct when reporting the fault in x86_cpu_tlb_fill. Adjust all error paths to use structure assignment so that the entire struct is always initialized. Reported-by: Daniel Hoffman <dhoff749@gmail.com> Fixes: 9bbcf372193a ("target/i386: Reorg GET_HPHYS") Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221201074522.178498-1-richard.henderson@linaro.org> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1324 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-01target/i386: allow MMX instructions with CR4.OSFXSR=0Paolo Bonzini
MMX state is saved/restored by FSAVE/FRSTOR so the instructions are not illegal opcodes even if CR4.OSFXSR=0. Make sure that validate_vex takes into account the prefix and only checks HF_OSFXSR_MASK in the presence of an SSE instruction. Fixes: 20581aadec5e ("target/i386: validate VEX prefixes via the instructions' exception classes", 2022-10-18) Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1350 Reported-by: Helge Konetzka (@hejko on gitlab.com) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-15target/i386: hardcode R_EAX as destination register for LAHF/SAHFPaolo Bonzini
When translating code that is using LAHF and SAHF in combination with the REX prefix, the instructions should not use any other register than AH; however, QEMU selects SPL (SP being register 4, just like AH) if the REX prefix is present. To fix this, use deposit directly without going through gen_op_mov_v_reg and gen_op_mov_reg_v. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/130 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-11-15target/i386: fix cmpxchg with 32-bit register destinationPaolo Bonzini
Unlike the memory case, where "the destination operand receives a write cycle without regard to the result of the comparison", rm must not be touched altogether if the write fails, including not zero-extending it on 64-bit processors. This is not how the movcond currently works, because it is always followed by a gen_op_mov_reg_v to rm. To fix it, introduce a new function that is similar to gen_op_mov_reg_v but writes to a TCG temporary. Considering that gen_extu(ot, oldv) is not needed in the memory case either, the two cases for register and memory destinations are different enough that one might as well fuse the two "if (mod == 3)" into one. So do that too. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/508 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [rth: Add a test case ] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-11-03Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into stagingStefan Hajnoczi
* bug fixes * reduced memory footprint for IPI virtualization on Intel processors * asynchronous teardown support (Linux only) # -----BEGIN PGP SIGNATURE----- # # iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmNiVykUHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroN0Swf/YxjphCtFgYYSO14WP+7jAnfRZLhm # 0xWChWP8rco5I352OBFeFU64Av5XoLGNn6SZLl8lcg86lQ/G0D27jxu6wOcDDHgw # 0yTDO1gevj51UKsbxoC66OWSZwKTEo398/BHPDcI2W41yOFycSdtrPgspOrFRVvf # 7M3nNjuNPsQorZeuu8NGr3jakqbt99ZDXcyDEWbrEAcmy2JBRMbGgT0Kdnc6aZfW # CvL+1ljxzldNwGeNBbQW2QgODbfHx5cFZcy4Daze35l5Ra7K/FrgAzr6o/HXptya # 9fEs5LJQ1JWI6JtpaWwFy7fcIIOsJ0YW/hWWQZSDt9JdAJFE5/+vF+Kz5Q== # =CgrO # -----END PGP SIGNATURE----- # gpg: Signature made Wed 02 Nov 2022 07:40:25 EDT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: target/i386: Fix test for paging enabled util/log: Close per-thread log file on thread termination target/i386: Set maximum APIC ID to KVM prior to vCPU creation os-posix: asynchronous teardown for shutdown on Linux target/i386: Fix calculation of LOCK NEG eflags Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-11-02target/i386: Fix test for paging enabledRichard Henderson
If CR0.PG is unset, pg_mode will be zero, but it will also be zero for non-PAE/non-PSE page tables with CR0.WP=0. Restore the correct test for paging enabled. Fixes: 98281984a37 ("target/i386: Add MMU_PHYS_IDX and MMU_NESTED_IDX") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1269 Reported-by: Andreas Gustafsson <gson@gson.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221102091232.1092552-1-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-01target/i386: Expand eflags updates inlineRichard Henderson
The helpers for reset_rf, cli, sti, clac, stac are completely trivial; implement them inline. Drop some nearby #if 0 code. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-11-01accel/tcg: Remove will_exit argument from cpu_restore_stateRichard Henderson
The value passed is always true, and if the target's synchronize_from_tb hook is non-trivial, not exiting may be erroneous. Reviewed-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-10-31target/i386: Fix calculation of LOCK NEG eflagsQi Hu
After: lock negl -0x14(%rbp) pushf pop %rax %rax will contain the wrong value because the "lock neg" calculates the wrong eflags. Simple test: #include <assert.h> int main() { __volatile__ unsigned test = 0x2363a; __volatile__ char cond = 0; asm( "lock negl %0 \n\t" "sets %1" : "=m"(test), "=r"(cond)); assert(cond & 1); return 0; } Reported-by: Jinyang Shen <shenjinyang@loongson.cn> Co-Developed-by: Xuehai Chen <chenxuehai@loongson.cn> Signed-off-by: Xuehai Chen <chenxuehai@loongson.cn> Signed-off-by: Qi Hu <huqi@loongson.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-26target/i386: Convert to tcg_ops restore_state_to_opcRichard Henderson
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-10-22target/i386: implement FMA instructionsPaolo Bonzini
The only issue with FMA instructions is that there are _a lot_ of them (30 opcodes, each of which comes in up to 4 versions depending on VEX.W and VEX.L; a total of 96 possibilities). However, they can be implement with only 6 helpers, two for scalar operations and four for packed operations. (Scalar versions do not do any merging; they only affect the bottom 32 or 64 bits of the output operand. Therefore, there is no separate XMM and YMM of the scalar helpers). First, we can reduce the number of helpers to one third by passing four operands (one output and three inputs); the reordering of which operands go to the multiply and which go to the add is done in emit.c. Second, the different instructions also dispatch to the same softfloat function, so the flags for float32_muladd and float64_muladd are passed in the helper as int arguments, with a little extra complication to handle FMADDSUB and FMSUBADD. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-20target/i386: implement F16C instructionsPaolo Bonzini
F16C only consists of two instructions, which are a bit peculiar nevertheless. First, they access only the low half of an YMM or XMM register for the packed-half operand; the exact size still depends on the VEX.L flag. This is similar to the existing avx_movx flag, but not exactly because avx_movx is hardcoded to affect operand 2. To this end I added a "ph" format name; it's possible to reuse this approach for the VPMOVSX and VPMOVZX instructions, though that would also require adding two more formats for the low-quarter and low-eighth of an operand. Second, VCVTPS2PH is somewhat weird because it *stores* the result of the instruction into memory rather than loading it. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-20target/i386: introduce function to set rounding mode from FPCW or MXCSR bitsPaolo Bonzini
VROUND, FSTCW and STMXCSR all have to perform the same conversion from x86 rounding modes to softfloat constants. Since the ISA is consistent on the meaning of the two-bit rounding modes, extract the common code into a wrapper for set_float_rounding_mode. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-20target/i386: decode-new: avoid out-of-bounds access to xmm_regs[-1]Paolo Bonzini
If the destination is a memory register, op->n is -1. Going through tcg_gen_gvec_dup_imm path is both useless (the value has been stored by the gen_* function already) and wrong because of the out-of-bounds access. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: remove old SSE decoderPaolo Bonzini
With all SSE (and AVX!) instructions now implemented in disas_insn_new, it's possible to remove gen_sse, as well as the helpers for instructions that now use gvec. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: move 3DNow to the new decoderPaolo Bonzini
This adds another kind of weirdness when you thought you had seen it all: an opcode byte that comes _after_ the address, not before. It's not worth adding a new X86_SPECIAL_* constant for it, but it's actually not unlike VCMP; so, forgive me for exploiting the similarity and just deciding to dispatch to the right gen_helper_* call in a single code generation function. In fact, the old decoder had a bug where s->rip_offset should have been set to 1 for 3DNow! instructions, and it's fixed now. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: implement VLDMXCSR/VSTMXCSRPaolo Bonzini
These are exactly the same as the non-VEX version, but one has to be careful that only VEX.L=0 is allowed. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: implement XSAVE and XRSTOR of AVX registersPaolo Bonzini
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0x28-0x2f, add AVXPaolo Bonzini
Here the code is a bit uglier due to the truncation and extension of registers to and from 32-bit. There is also a mistake in the manual with respect to the size of the memory operand of CVTPS2PI and CVTTPS2PI, reported by Ricky Zhou. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0x10-0x17, add AVXPaolo Bonzini
These are mostly moves, and yet are a total pain. The main issue is that: 1) some instructions are selected by mod==11 (register operand) vs. mod=00/01/10 (memory operand) 2) stores to memory are two-operand operations, while the 3-register and load-from-memory versions operate on the entire contents of the destination; this makes it easier to separate the gen_* function for the store case 3) it's inefficient to load into xmm_T0 only to move the value out again, so the gen_* function for the load case is separated too The manual also has various mistakes in the operands here, for example the store case of MOVHPS operates on a 128-bit source (albeit discarding the bottom 64 bits) and therefore should be Mq,Vdq rather than Mq,Vq. Likewise for the destination and source of MOVHLPS. VUNPCK?PS and VUNPCK?PD are the same as VUNPCK?DQ and VUNPCK?QDQ, but encoded as prefixes rather than separate operands. The helpers can be reused however. For MOVSLDUP, MOVSHDUP and MOVDDUP I chose to reimplement them as helpers. I named the helper for MOVDDUP "movdldup" in preparation for possible future introduction of MOVDHDUP and to clarify the similarity with MOVSLDUP. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0xc2, 0xc4-0xc6, add AVXPaolo Bonzini
Nothing special going on here, for once. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0x38, add AVXPaolo Bonzini
There are several special cases here: 1) extending moves have different widths for the helpers vs. for the memory loads, and the width for memory loads depends on VEX.L too. This is represented by X86_SPECIAL_AVXExtMov. 2) some instructions, such as variable-width shifts, select the vector element size via REX.W. 3) VSIB instructions (VGATHERxPy, VPGATHERxy) are also part of this group, and they have (among other things) two output operands. 3) the macros for 4-operand blends (which are under 0x0f 0x3a) have to be extended to support 2-operand blends. The 2-operand variant actually came a few years earlier, but it is clearer to implement them in the opposite order. X86_TYPE_WM, introduced earlier for unaligned loads, is reused for helpers that accept a Reg* but have a M argument. These three-byte opcodes also include AVX new instructions, for which the helpers were originally implemented by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: Use tcg gvec ops for pmovmskbRichard Henderson
As pmovmskb is used by strlen et al, this is the third highest overhead sse operation at %0.8. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> [Reorganize to generate code for any vector size. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0x3a, add AVXPaolo Bonzini
The more complicated operations here are insertions and extractions. Otherwise, there are just more entries than usual because the PS/PD/SS/SD variations are encoded in the opcode rater than in the prefixes. These three-byte opcodes also include AVX new instructions, whose implementation in the helpers was originally done by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVXPaolo Bonzini
The more complicated ones here are d6-d7, e6-e7, f7. The others are trivial. For LDDQU, using gen_load_sse directly might corrupt the register if the second part of the load fails. Therefore, add a custom X86_TYPE_WM value; like X86_TYPE_W it does call gen_load(), but it also rejects a value of 11 in the ModRM field like X86_TYPE_M. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0x70-0x77, add AVXPaolo Bonzini
This includes shifts by immediate, which use bits 3-5 of the ModRM byte as an opcode extension. With the exception of 128-bit shifts, they are implemented using gvec. This also covers VZEROALL and VZEROUPPER, which use the same opcode as EMMS. If we were wanting to optimize out gen_clear_ymmh then this would be one of the starting points. The implementation of the VZEROALL and VZEROUPPER helpers is by Paul Brook. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0x78-0x7f, add AVXPaolo Bonzini
These are a mixed batch, including the first two horizontal (66 and F2 only) operations, more moves, and SSE4a extract/insert. Because SSE4a is pretty rare, I chose to leave the helper as they are, but it is possible to unify them by loading index and length from the source XMM register and generating deposit or extract TCG ops. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0x50-0x5f, add AVXPaolo Bonzini
These are mostly floating-point SSE operations. The odd ones out are MOVMSK and CVTxx2yy, the others are straightforward. Unary operations are a bit special in AVX because they have 2 operands for PD/PS operands (VEX.vvvv must be 1111b), and 3 operands for SD/SS. They are handled using X86_OP_GROUP3 for compactness. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0xd8-0xdf, 0xe8-0xef, 0xf8-0xff, add AVXPaolo Bonzini
These are more simple integer instructions present in both MMX and SSE/AVX, with no holes that were later occupied by newer instructions. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0x60-0x6f, add AVXPaolo Bonzini
These are both MMX and SSE/AVX instructions, except for vmovdqu. In both cases the inputs and output is in s->ptr{0,1,2}, so the only difference between MMX, SSE, and AVX is which helper to call. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: Introduce 256-bit vector helpersPaolo Bonzini
The new implementation of SSE will cover AVX from the get go, because all the work for the helper functions is already done. We just need to build them. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: provide 3-operand versions of unary scalar helpersPaolo Bonzini
Compared to Paul's implementation, the new decoder will use a different approach to implement AVX's merging of dst with src1 on scalar operations. Adjust the old SSE decoder to be compatible with new-style helpers. The affected instructions are CVTSx2Sx, ROUNDSx, RSQRTSx, SQRTSx, RCPSx. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: extend helpers to support VEX.V 3- and 4- operand encodingsPaolo Bonzini
Add to the helpers all the operands that are needed to implement AVX. Extracted from a patch by Paul Brook <paul@nowt.org>. Message-Id: <20220424220204.2493824-26-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: move scalar 0F 38 and 0F 3A instruction to new decoderPaolo Bonzini
Because these are the only VEX instructions that QEMU supports, the new decoder is entered on the first byte of a valid VEX prefix, and VEX decoding only needs to be done in decode-new.c.inc. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: validate SSE prefixes directly in the decoding tablePaolo Bonzini
Many SSE and AVX instructions are only valid with specific prefixes (none, 66, F3, F2). Introduce a direct way to encode this in the decoding table to avoid using decode groups too much. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: validate VEX prefixes via the instructions' exception classesPaolo Bonzini
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: add AVX_EN hflagPaul Brook
Add a new hflag bit to determine whether AVX instructions are allowed Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-4-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>