Age | Commit message (Collapse) | Author |
|
This changes the code generation for the tlb from e.g.
ldur x0, [x19, #0xffffffffffffffe0]
ldur x1, [x19, #0xffffffffffffffe8]
and x0, x0, x20, lsr #8
add x1, x1, x0
ldr x0, [x1]
ldr x1, [x1, #0x18]
to
ldp x0, x1, [x19, #-0x20]
and x0, x0, x20, lsr #8
add x1, x1, x0
ldr x0, [x1]
ldr x1, [x1, #0x18]
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
We have for some time had code within the tcg backends to
handle large positive offsets from env. This move makes
sure that need not happen. Indeed, we are able to assert
at build time that simple offsets suffice for all hosts.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Move all softmmu tlb data into this structure. Arrange the
members so that we are able to place mask+table together and
at a smaller absolute offset from ENV.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
The allows immediates to be used for ORR and BIC,
as well as the trivial inversions, ORC and AND.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Use MOVI+ORR or MVNI+BIC in order to build some vector constants,
as opposed to dropping them to the constant pool. This includes
all 16-bit constants and a similar set of 32-bit constants.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
The compliment of a subset of immediates can be computed
with a single instruction.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
There are several sub-classes of vector immediate, and only MOVI
can use them all. This will enable usage of MVNI and ORRI, which
use progressively fewer sub-classes.
This patch adds no new functionality, merely splits the function
and moves part of the logic into tcg_out_dupi_vec.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
The instruction set has 3 insns that perform the same operation,
only varying in which operand must overlap the destination. We
can represent the operation without overlap and choose based on
the operands seen.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Perform a per-element conditional move. This combination operation is
easier to implement on some host vector units than plain cmp+bitsel.
Omit the usual gvec interface, as this is intended to be used by
target-specific gvec expansion call-backs.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This operation performs d = (b & a) | (c & ~a), and is present
on a majority of host vector units. Include gvec expanders.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
The min/max instructions are not available for 64-bit elements.
Fixes: 93f332a50371
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Allow the backend to expand dup from memory directly, instead of
forcing the value into a temp first. This is especially important
if integer/vector register moves do not exist.
Note that officially tcg_out_dupm_vec is allowed to fail.
If it did, we could fix this up relatively easily:
VECE == 32/64:
Load the value into a vector register, then dup.
Both of these must work.
VECE == 8/16:
If the value happens to be at an offset such that an aligned
load would place the desired value in the least significant
end of the register, go ahead and load w/garbage in high bits.
Load the value w/INDEX_op_ld{8,16}_i32.
Attempt a move directly to vector reg, which may fail.
Store the value into the backing store for OTS.
Load the value into the vector reg w/TCG_TYPE_I32, which must work.
Duplicate from the vector reg into itself, which must work.
All of which is well and good, except that all supported
hosts can support dupm for all vece, so all of the failure
paths would be dead code and untestable.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
The LD1R instruction does all the work. Note that the only
useful addressing mode is a base register with no offset.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Currently stubbed out in all backends that support vectors.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This case is similar to INDEX_op_mov_* in that we need to do
different things depending on the current location of the source.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
v3: Added some commentary to the tcg_reg_alloc_* functions.
|
|
The i386 backend already has these functions, and the aarch64 backend
could easily split out one. Nothing is done with these functions yet,
but this will aid register allocation of INDEX_op_dup_vec in a later patch.
Adjust the aarch64 tcg_out_dupi_vec signature to match the new interface.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This patch merely changes the interface, aborting on all failures,
of which there are currently none.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This is part c of relocation overflow handling.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This will let backends implement the double-word shift operation.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Now that all tcg backends support TCG_TARGET_IMPLEMENTS_DYN_TLB,
remove the define and the old code.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Disabled in all TCG backends for now.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190116170114.26802-3-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
For now, defined universally as true, since we previously required
backends to implement swapped memory operations. Future patches
may now remove that support where it is onerous.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This does require an extra two checks within the slow paths
to replace the assert that we're moving.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This will move the assert for success from within (subroutines of)
patch_reloc into the callers. It will also let new code do something
different when a relocation is out of range.
For the moment, all backends are trivially converted to return true.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
There are one use apiece for these. There is no longer a need for
preserving branch offset operands, as we no longer re-translate.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
It is unused since b68686bd4bfeb70040b4099df993dfa0b4f37b03.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
In AdvSIMD we can only do 32x32 integer multiples although SVE is
capable of larger 64 bit multiples. As a result we can end up
generating invalid opcodes. Fix this by only reprting we can emit
mul vector ops if the size is small enough.
Fixes a crash on:
sve-all-short-v8.3+sve@vq3/insn_mul_z_zi___INC.risu.bin
When running on AArch64 hardware.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20180719154248.29669-1-alex.bennee@linaro.org>
[rth: Removed the tcg_debug_assert -- there are plenty of other
cases that we do not diagnose within the insn encoding helpers.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Also, assert that we don't overflow any of two different offsets into
the TB. Both unwind and goto_tb both record a uint16_t for later use.
This fixes an arm-softmmu test case utilizing NEON in which there is
a TB generated that runs to 7800 opcodes, and compiles to 96k on an
x86_64 host. This overflows the 16-bit offset in which we record the
goto_tb reset offset. Because of that overflow, we install a jump
destination that goes to neverland. Boom.
With this reduced op count, the same TB compiles to about 48k for
aarch64, ppc64le, and x86_64 hosts, and neither assertion fires.
Cc: qemu-stable@nongnu.org
Reported-by: "Jason A. Donenfeld" <Jason@zx2c4.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
It's not even clear what the interface REG and VAL32 were supposed to mean.
All uses had REG = 0 and VAL32 was the bitset assigned to the destination.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Dispense with TCGBackendData, as it has never been used for more than
holding a single pointer. Use a define in the cpu/tcg-target.h to
signal requirement for TCGLabelQemuLdst, so that we can drop the no-op
tcg-be-null.h stubs. Rename tcg-be-ldst.h to tcg-ldst.inc.c.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump
boolean test. Replace the tb_set_jmp_target1 ifdef with an unconditional
function tb_target_set_jmp_target.
While we're touching all backends, add a parameter for tb->tc_ptr;
we're going to need it shortly for some backends.
Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c.
This opens the possibility for TCG_TARGET_HAS_direct_jump to be
a runtime decision -- based on host cpu capabilities, the size of
code_gen_buffer, or a future debugging switch.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20170829063313.10237-3-bobby.prani@gmail.com>
[rth: Dropped ia64 hunk]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This patch enables the indirect jump path using an LDR (literal)
instruction. It will be interesting to test and see which performs
better among the two paths.
CC: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20170630143614.31059-3-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
We use ADRP+ADD to compute the target address for goto_tb. This patch
introduces the NOP instruction which is used to align the above
instruction pair so that we can use one atomic instruction to patch
the destination offsets.
CC: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20170630143614.31059-2-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
We can use a branch to register instruction for exit_tb for offsets
greater than 128MB.
CC: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Message-Id: <20170630143614.31059-1-bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
The new placement of the TB means that we can use one insn
to load the return value for exit_tb returning the TB pointer.
Tested-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Measurements:
SPECint06 (test set), x86_64-linux-user. Host: APM 64-bit ARMv8 (Atlas/A57) @ 2.4 GHz
1.45x +-+-------------------------------------------------------------------------------------------------------------+-+
| ***** |
| +++ * * +goto-ptr |
1.4x +-+...*****............................*...*....................................................................+-+
| *+++* * * +++ |
1.35x +-+...*...*............................*...*...........................*****....................................+-+
| * * * * *+++* |
| * * * * * * |
1.3x +-+...*...*............................*...*...........................*...*....................................+-+
| * * * * * * |
| * * * * * * ***** |
1.25x +-+...*...*...........*****............*...*...........................*...*............*****...*...*...........+-+
| * * * * * * * * *+++* * * |
1.2x +-+...*...*...........*...*............*...*...........................*...*............*...*...*...*...........+-+
| * * * * * * * * * * * * |
| * * * * * * * * * * * * ***** |
1.15x +-+...*...*...........*...*............*...*...........................*...*............*...*...*...*...*...*...+-+
| * * * * * * * * +++ * * * * * * |
| * * * * * * * * ***** * * * * * * |
1.1x +-+...*...*...........*...*....*****...*...*...*****...................*...*...*...*....*...*...*...*...*...*...+-+
| * * * * * * * * * * * * * * * * * * * * |
1.05x +-+...*...*...........*...*....*...*...*...*...*...*...................*...*...*...*....*...*...*...*...*...*...+-+
| * * ***** * * * * * * * * * * * * * * * * * * |
| * * * * * * * * * * * * ***** ***** * * * * * * * * * * |
1x +-+---*****---*****---*****----*****---*****---*****---*****---*****---*****---*****----*****---*****---*****---+-+
astar bzip2 gcc gobmk h264ref hmmlibquantum mcf omnetpperlbench sjenxalancbmk hmean
png: http://imgur.com/en9HE8L
Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|