aboutsummaryrefslogtreecommitdiff
path: root/tcg/optimize.c
AgeCommit message (Collapse)Author
2021-10-27tcg/optimize: Split out finish_foldingRichard Henderson
Copy z_mask into OptContext, for writeback to the first output within the new function. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27tcg/optimize: Return true from tcg_opt_gen_{mov,movi}Richard Henderson
This will allow callers to tail call to these functions and return true indicating processing complete. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27tcg/optimize: Change fail return for do_constant_folding_cond*Richard Henderson
Return -1 instead of 2 for failure, so that we can use comparisons against 0 for all cases. Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27tcg/optimize: Drop nb_oargs, nb_iargs localsRichard Henderson
Rather than try to keep these up-to-date across folding, re-read nb_oargs at the end, after re-reading the opcode. A couple of asserts need dropping, but that will take care of itself as we split the function further. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27tcg/optimize: Split out fold_callRichard Henderson
Calls are special in that they have a variable number of arguments, and need to be able to clobber globals. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27tcg/optimize: Split out copy_propagateRichard Henderson
Continue splitting tcg_optimize. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27tcg/optimize: Split out init_argumentsRichard Henderson
There was no real reason for calls to have separate code here. Unify init for calls vs non-calls using the call path, which handles TCG_CALL_DUMMY_ARG. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27tcg/optimize: Move prev_mb into OptContextRichard Henderson
This will expose the variable to subroutines that will be broken out of tcg_optimize. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27tcg/optimize: Change tcg_opt_gen_{mov,movi} interfaceRichard Henderson
Adjust the interface to take the OptContext parameter instead of TCGContext or both. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27tcg/optimize: Remove do_default labelRichard Henderson
Break the final cleanup clause out of the main switch statement. When fully folding an opcode to mov/movi, use "continue" to process the next opcode, else break to fall into the final cleanup. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27tcg/optimize: Split out OptContextRichard Henderson
Provide what will become a larger context for splitting the very large tcg_optimize function. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-27tcg/optimize: Rename "mask" to "z_mask"Richard Henderson
Prepare for tracking different masks by renaming this one. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Luis Pires <luis.pires@eldorado.org.br> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-05tcg: Rename TCGMemOpIdx to MemOpIdxRichard Henderson
We're about to move this out of tcg.h, so rename it as we did when moving MemOp. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-29tcg: Handle new bswap flags during optimizeRichard Henderson
Notice when the input is known to be zero-extended and force the TCG_BSWAP_IZ flag on. Honor the TCG_BSWAP_OS bit during constant folding. Propagate the input to the output mask. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-06-19tcg: Add tcg_call_flagsRichard Henderson
We're going to change how to look up the call flags from a TCGop, so extract it as a helper. Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13tcg: Remove movi and dupi opcodesRichard Henderson
These are now completely covered by mov from a TYPE_CONST temporary. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13tcg: Convert tcg_gen_dupi_vec to TCG_CONSTRichard Henderson
Because we now store uint64_t in TCGTemp, we can now always store the full 64-bit duplicate immediate. So remove the difference between 32- and 64-bit hosts. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13tcg/optimize: Use tcg_constant_internal with constant foldingRichard Henderson
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13tcg/optimize: Adjust TempOptInfo allocationRichard Henderson
Do not allocate a large block for indexing. Instead, allocate for each temporary as they are seen. In general, this will use less memory, if we consider that most TBs do not touch every target register. This also allows us to allocate TempOptInfo for new temps created during optimization. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13tcg/optimize: Improve find_better_copyRichard Henderson
Prefer TEMP_CONST over anything else. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13tcg: Introduce TYPE_CONST temporariesRichard Henderson
These will hold a single constant for the duration of the TB. They are hashed, so that each value has one temp across the TB. Not used yet, this is all infrastructure. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13tcg: Expand TempOptInfo to 64-bitsRichard Henderson
This propagates the extended value of TCGTemp.val that we did before. In addition, it will be required for vector constants. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13tcg: Rename struct tcg_temp_info to TempOptInfoRichard Henderson
Fix this name vs our coding style. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-13tcg: Consolidate 3 bits into enum TCGTempKindRichard Henderson
The temp_fixed, temp_global, temp_local bits are all related. Combine them into a single enumeration. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-07tcg: Introduce INDEX_op_qemu_st8_i32Richard Henderson
Enable this on i386 to restrict the set of input registers for an 8-bit store, as required by the architecture. This removes the last use of scratch registers for user-only mode. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-12-18tcg/optimize: Add fallthrough annotationsThomas Huth
To be able to compile this file with -Werror=implicit-fallthrough, we need to add some fallthrough annotations to the case statements that might fall through. Unfortunately, the typical "/* fallthrough */" comments do not work here as expected since some case labels are wrapped in macros and the compiler fails to match the comments in this case. But using __attribute__((fallthrough)) seems to work fine, so let's use that instead (by introducing a new QEMU_FALLTHROUGH macro in our compiler.h header file). Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20201211152426.350966-11-thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2020-11-04tcg: Revert "tcg/optimize: Flush data at labels not TCG_OPF_BB_END"Richard Henderson
This reverts commit cd0372c515c4732d8bd3777cdd995c139c7ed7ea. The patch is incorrect in that it retains copies between globals and non-local temps, and non-local temps still die at the end of the BB. Failing test case for hppa: .globl _start _start: cmpiclr,= 0x24,%r19,%r0 cmpiclr,<> 0x2f,%r19,%r19 ---- 00010057 0001005b movi_i32 tmp0,$0x24 sub_i32 tmp1,tmp0,r19 mov_i32 tmp2,tmp0 mov_i32 tmp3,r19 movi_i32 tmp1,$0x0 ---- 0001005b 0001005f brcond_i32 tmp2,tmp3,eq,$L1 movi_i32 tmp0,$0x2f sub_i32 tmp1,tmp0,r19 mov_i32 tmp2,tmp0 mov_i32 tmp3,r19 movi_i32 tmp1,$0x0 mov_i32 r19,tmp1 setcond_i32 psw_n,tmp2,tmp3,ne set_label $L1 In this case, both copies of "mov_i32 tmp3,r19" are removed. The second because opt thought it was redundant. The first is removed later by liveness because tmp3 is known to be dead. This leaves the setcond_i32 with an uninitialized input. Revert the entire patch for 5.2, and a proper optimization across the branch may be considered for the next development cycle. Reported-by: qemu@igor2.repo.hu Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-10-27tcg/optimize: Flush data at labels not TCG_OPF_BB_ENDRichard Henderson
We can easily propagate temp values through the entire extended basic block (in this case, the set of blocks connected by fallthru), simply by not discarding the register state at the branch. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-10-08tcg/optimize: Fold dup2_vecRichard Henderson
When the two arguments are identical, this can be reduced to dup_vec or to mov_vec from a tcg_constant_vec. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2020-01-15tcg: Search includes from the project root source directoryPhilippe Mathieu-Daudé
We currently search both the root and the tcg/ directories for tcg files: $ git grep '#include "tcg/' | wc -l 28 $ git grep '#include "tcg[^/]' | wc -l 94 To simplify the preprocessor search path, unify by expliciting the tcg/ directory. Patch created mechanically by running: $ for x in \ tcg.h tcg-mo.h tcg-op.h tcg-opc.h \ tcg-op-gvec.h tcg-gvec-desc.h; do \ sed -i "s,#include \"$x\",#include \"tcg/$x\"," \ $(git grep -l "#include \"$x\""); \ done Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts) Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200101112303.20724-2-philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03tcg: TCGMemOp is now accelerator independent MemOpTony Nguyen
Preparation for collapsing the two byte swaps, adjust_endianness and handle_bswap, along the I/O path. Target dependant attributes are conditionalized upon NEED_CPU_H. Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <81d9cd7d7f5aaadfa772d6c48ecee834e9cf7882.1566466906.git.tony.nguyen@bt.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-08-16Clean up inclusion of exec/cpu-common.hMarkus Armbruster
migration/qemu-file.h neglects to include it even though it needs ram_addr_t. Fix that. Drop a few superfluous inclusions elsewhere. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190812052359.30071-14-armbru@redhat.com>
2019-07-14tcg: Fix constant folding of INDEX_op_extract2_i32Richard Henderson
On a 64-bit host, discard any replications of the 32-bit sign bit when performing the shift and merge. Fixes: https://bugs.launchpad.net/bugs/1834496 Tested-by: Christophe Lyon <christophe.lyon@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-06-12Include qemu-common.h exactly where neededMarkus Armbruster
No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]
2019-05-13tcg: Do not recreate INDEX_op_neg_vec unless supportedRichard Henderson
Use tcg_can_emit_vec_op instead of just TCG_TARGET_HAS_neg_vec, so that we check the type and vece for the actual operation. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24tcg: Add INDEX_op_extract2_{i32,i64}Richard Henderson
This will let backends implement the double-word shift operation. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17tcg: Drop nargs from tcg_op_insert_{before,after}Emilio G. Cota
It's unused since 75e8b9b7aa0b95a761b9add7e2f09248b101a392. Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181209193749.12277-9-cota@braap.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-17tcg/optimize: Optimize bswapRichard Henderson
Somehow we forgot these operations, once upon a time. This will allow immediate stores to have their bswap optimized away. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-08-06tcg/optimize: Do not skip default processing of dup_vecRichard Henderson
If we do not opimize away dup_vec, we must mark its output as changed. Fixes: 170ba88f45b Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com> Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com> Message-id: 20180805233258.31892-1-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-02-08tcg/optimize: Handle vector opcodes during optimizeRichard Henderson
Trivial move and constant propagation. Some identity and constant function folding, but nothing that requires knowledge of the size of the vector element. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-12-29tcg: Generalize TCGOp parametersRichard Henderson
We had two fields specific to INDEX_op_call. Rename these and add some macros so that the fields may be reused for other opcodes. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-12-29tcg: Dynamically allocate TCGOpsRichard Henderson
With no fixed array allocation, we can't overflow a buffer. This will be important as optimizations related to host vectors may expand the number of ops used. Use QTAILQ to link the ops together. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: allocate optimizer temps with tcg_mallocEmilio G. Cota
Groundwork for supporting multiple TCG contexts. While at it, also allocate temps_used directly as a bitmap of the required size, instead of using a bitmap of TCG_MAX_TEMPS via TCGTempSet. Performance-wise we lose about 1.12% in a translation-heavy workload such as booting+shutting down debian-arm: Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -device virtio-net-device,netdev=unet \ -drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 1' (10 runs): exec time (s) Relative slowdown wrt original (%) --------------------------------------------------------------- original 20.213321616 0. tcg_malloc 20.441130078 1.1270214 TCGContext 20.477846517 1.3086662 g_malloc 20.780527895 2.8061013 The other two alternatives shown in the table are: - TCGContext: embed temps[TCG_MAX_TEMPS] and TCGTempSet used_temps in TCGContext. This is simple enough but it isn't faster than using tcg_malloc; moreover, it wastes memory. - g_malloc: allocate/deallocate both temps and used_temps every time tcg_optimize is executed. Suggested-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: Use per-temp state data in optimizeRichard Henderson
While we're touching many of the lines anyway, adjust the naming of the functions to better distinguish when "TCGArg" vs "TCGTemp" should be used. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24tcg: Add temp_global bit to TCGTempRichard Henderson
This avoids needing to test the index of a temp against nb_globals. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24tcg: Introduce arg_tempRichard Henderson
Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24tcg: Propagate args to op->args in optimizerRichard Henderson
Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-10-24tcg: Merge opcode arguments into TCGOpRichard Henderson
Rather than have a separate buffer of 10*max_ops entries, give each opcode 10 entries. The result is actually a bit smaller and should have slightly more cache locality. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10tcg: Add opcode for ctpopRichard Henderson
The number of actual invocations of ctpop itself does not warrent an opcode, but it is very helpful for POWER7 to use in generating an expansion for ctz. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-01-10tcg: Add clz and ctz opcodesRichard Henderson
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>