Age | Commit message (Collapse) | Author |
|
rotr_i32 calculates the amount to left shift and puts it into a
temporary, but then doesn't use it when doing the shift.
Cc: qemu-stable@nongnu.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
(cherry picked from commit d1bdd3af49f227dd4a4b03b90cb020c55cbed440)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
|
|
add2_i64 was adding the lower double word to the upper double word
of each input. Fix this so we add the lower double words, then the
upper double words with carry propagation.
Cc: qemu-stable@nongnu.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
(cherry picked from commit 84247357104044b8c4ec4a634e84769f432cbe52)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
|
|
If our input and output is in the same register, bswap64 tries to
undo a rotate of the input. This just ends up rotating the output.
Cc: qemu-stable@nongnu.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
(cherry picked from commit 82e0f9170ac9307de4fc15bfb4d12d5534550322)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
|
|
The rldcl instruction doesn't have an sh field, so the minor opcode
is shifted 1 bit. We were using the XO30 macro which shifted the
minor opcode 2 bits.
Remove XO30 and add MD30 and MDS30 macros which match the
Power ISA categories.
Cc: qemu-stable@nongnu.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
(cherry picked from commit 8a94cfb05ea9a8991c832236b4174d354025a7b7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
|
|
When setcond2 is rewritten into setcond, the state of the destination
temp should be reset, so that a copy of the previous value is not
used instead of the result.
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
|
|
Avoid the mini constant pool for armv7, and avoid replicating
the test for pre-v7.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
|
|
Found by inspection, since the effect of the bug was simply to
send all memory ops through the slow path.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
|
|
Branches within a TB will always be within 16MB.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Move the slow path out of line, as the TODO's mention.
This allows the fast path to be unconditional, which can
speed up the fast path as well, depending on the core.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Work better with branch predition when we have movw+movt,
as the size of the code is the same. Perhaps re-evaluate
when we have a proper constant pool.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
After the previous patch, 's' and 'S' are the same.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
The schedule was fully serial, with no possibility for dual issue.
The old schedule had a minimal issue of 7 cycles; the new schedule
has a minimal issue of 5 cycles.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Share code between qemu_ld and qemu_st to process the tlb.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Use even more primitive helper functions to avoid lots of duplicated code.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Make the code more readable by only having one copy of the magic
numbers, swapping registers as needed prior to that. Speed the
compiler by not applying the rd == rn avoidance for v6 or later.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
R12 is call clobbered, while R8 is call saved. This change
gives tcg one more call saved register for real data.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Don't hard-code R8.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
An armv7 extension implements division, present on Cortex A15.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
We have BFI and BFC available for implementing it.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Try fully rotated arguments to mov and mvn before trying movt
or full decomposition. Begin decomposition with mvn when it
looks like it'll help. Examples include
-: mov r9, #0x00000fa0
-: orr r9, r9, #0x000ee000
-: orr r9, r9, #0x0ff00000
-: orr r9, r9, #0xf0000000
+: mvn r9, #0x0000005f
+: eor r9, r9, #0x00011000
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
We get to re-use the _rIN and _rIK subroutines to handle the various
combinations of add vs sub. Fold the << 21 into the opcode enum values
so that we can explicitly add TO_CPSR as desired.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
This allows us to emit CMN instructions.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
This allows the generation of RSB instructions.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
This greatly improves code generation for addition of small
negative constants.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
This greatly improves the code we can produce for deposit
without armv7 support.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
This makes it easier to verify changes to the code
generating the prologue.
[Aurelien: change the format from %i to %zu]
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
We were not allocating TCG_STATIC_CALL_ARGS_SIZE, so this meant that
any helper with more than 4 arguments would clobber the saved regs.
Realizing that we're supposed to have this memory pre-allocated means
we can clean up the tcg_out_arg functions, which were trying to do
more stack allocation.
Allocate stack memory for the TCG temporaries while we're at it.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
On 32-bit TCG targets, when emulating deposit_i64 with a mov_i32 +
deposit_i32, care should be taken to not overwrite the low part of
the second argument before the deposit when it is the same the
destination.
This fixes the shld instruction in qemu-system-x86_64, which in turns
fixes booting "system rescue CD version 2.8.0" on this target.
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
|
|
The TCG optimizer does great work when inserting constants, being able
to fold the open-coded deposit expansion to just an AND or an OR. Avoid
a bit the regression caused by having the deposit opcode by expanding
deposit of zero as an AND.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Glibc 2.16 includes an easy way to get feature bits previously
buried in /proc or the program startup auxiliary vector. Use it.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
There are a few simple special cases that should be handled first.
Break these out to subroutines to avoid code duplication.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
It takes half the cycles to read one CR register instead of all 8.
This is a backward compatible addition to the ISA, so chips prior
to Power 2.00 spec will simply continue to read the entire CR register.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Nothing else in the call chain ensures that these
constants don't have garbage in the high bits.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
The optimization/bug being fixed is that tcg_out_cmp was not applying the
right type to loading a constant, in the case it can't be implemented
directly. Rather than recomputing the TCGType enum from the arch64 bool,
pass around the original TCGType throughout.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
The mul_i32 pattern was loading non-16-bit constants into a register,
when we can get the middle-end to do that for us. The mul_i64 pattern
was not considering that MULLI takes 64-bit inputs.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Since we have special code to handle and/or/xor with a constant,
apply the same to andc/orc/eqv with a constant.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Mostly copied from the ppc32 port.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Using a table to look up insns of the right width and sign.
Include support for the Power 2.06 LDBRX and STDBRX insns.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
The enhancements to and immediate obviate this.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Use RLDICL and RLDICR.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Use RLWINM
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Handle constants in common code; we'll want to reuse that later.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Using SUBFIC for 16-bit signed constants.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|