aboutsummaryrefslogtreecommitdiff
path: root/target-arm/helper-a64.h
AgeCommit message (Collapse)Author
2016-10-26target-arm: emulate aarch64's LL/SC using cmpxchg helpersEmilio G. Cota
Emulating LL/SC with cmpxchg is not correct, since it can suffer from the ABA problem. Portable parallel code, however, is written assuming only cmpxchg--and not LL/SC--is available. This means that in practice emulating LL/SC with cmpxchg is a viable alternative. The appended emulates LL/SC pairs in aarch64 with cmpxchg helpers. This works in both user and system mode. In usermode, it avoids pausing all other CPUs to perform the LL/SC pair. The subsequent performance and scalability improvement is significant, as the plots below show. They plot the throughput of atomic_add-bench compiled for ARM and executed on a 64-core x86 machine. Hi-res plots: http://imgur.com/a/JVc8Y atomic_add-bench: 1000000 ops/thread, [0,1] range 18 ++---------+----------+---------+----------+----------+----------+---++ +cmpxchg +-E--+ + + + + + | 16 ++master +-H--+ ++ || | 14 ++ ++ | | | 12 ++| ++ | | | 10 ++++ ++ 8 ++E ++ |+++ | 6 ++ | ++ | | | 4 ++ | ++ | | | 2 +H++E+--- ++ + | +E++----+E+---+--+E+----++E+------+E+------+E++----+E+---+--+E| 0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 1000000 ops/thread, [0,2] range 18 ++---------+----------+---------+----------+----------+----------+---++ +cmpxchg +-E--+ + + + + + | 16 ++master +-H--+ ++ | | | 14 ++E ++ | | | 12 ++| ++ |+++ | 10 ++ | ++ 8 ++ | ++ | | | 6 ++ | ++ | | | 4 ++ | ++ | +E+--- | 2 +H+ +E+-----+++ +++ +++ ---+E+-----+E+------+++ +++ + +E+---+--+E+----++E+------+E+--- ++++ +++ + +E| 0 ++H-H----H-+-----H----+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 1000000 ops/thread, [0,128] range 70 ++---------+----------+---------+----------+----------+----------+---++ +cmpxchg +-E--+ + + + + + | 60 ++master +-H--+ +++ ---+E+-----+E+------+E+ | +E+------E-------+E+--- | | --- +++ | 50 ++ +++--- ++ | -+E+ | 40 ++ +++---- ++ | E- | | --| | 30 ++ -- +++ ++ | +E+ | 20 ++E+ ++ |E+ | | | 10 ++ ++ + + + + + + + | 0 +HH-H----H-+-----H----+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 1000000 ops/thread, [0,1024] range 160 ++---------+---------+----------+---------+----------+----------+---++ +cmpxchg +-E--+ + + + + + | 140 ++master +-H--+ +++ +++ | -+E+-----+E+-------E| 120 ++ +++ ---- +++ | +++ ----E-- | 100 ++ --E--- +++ ++ | +++ ---- +++ | 80 ++ --E-- ++ | ---- +++ | | -+E+ | 60 ++ ---- +++ ++ | +E+- | 40 ++ -- ++ | +E+ | 20 +EE+ ++ +++ + + + + + + | 0 +HH-H---H--+-----H---+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads [rth: Rearrange 128-bit cmpxchg helper. Enforce alignment on LL.] Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1467054136-10430-28-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2014-06-09target-arm: A64: Implement CRC instructionsPeter Maydell
Implement the optional A64 CRC instructions. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1401458125-27977-6-git-send-email-peter.maydell@linaro.org
2014-06-09target-arm: add support for v8 VMULL.P64 instructionPeter Maydell
Add support for the VMULL.P64 polynomial 64x64 to 128 bit multiplication instruction in the A32/T32 instruction sets; this is part of the v8 Crypto Extensions. To do this we have to move the neon_pmull_64_{lo,hi} helpers from helper-a64.c into neon_helper.c so they can be used by the AArch32 translator. Inspired-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1401386724-26529-4-git-send-email-peter.maydell@linaro.org
2014-03-17target-arm: A64: Implement FCVTXNPeter Maydell
Implement the FCVTXN operation, which does a narrowing fp precision conversion using the "round to odd" (von Neumann) mode. This can conveniently be implemented as "do operation using round to zero; then set the LSB of the mantissa to 1 if the Inexact flag was set". Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-id: 1394822294-14837-24-git-send-email-peter.maydell@linaro.org
2014-03-17target-arm: A64: Add FRECPX (reciprocal exponent)Alex Bennée
These are fairly simple exponent only estimation functions using helpers. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-id: 1394822294-14837-14-git-send-email-peter.maydell@linaro.org
2014-03-17target-arm: A64: Implement SADDLP, UADDLP, SADALP, UADALPPeter Maydell
Implement the SADDLP, UADDLP, SADALP and UADALP instructions in the SIMD 2-reg misc category. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-id: 1394822294-14837-8-git-send-email-peter.maydell@linaro.org
2014-03-17target-arm: A64: Add remaining CLS/Z vector opsAlex Bennée
Implement the CLS, CLZ operations in the 2-reg-misc category. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-id: 1394822294-14837-6-git-send-email-peter.maydell@linaro.org
2014-03-17target-arm: A64: Implement PMULL instructionPeter Maydell
Implement the PMULL instruction; this is the last unimplemented insn in the three-reg-diff group. Note that PMULL with size 3 is considered part of the AES part of the crypto extensions (see the ID_AA64ISAR0_EL1 register definition in the v8 ARM ARM), so it isn't necessary to burn an extra feature bit on it, even though we're using more feature bits than a single "crypto extension present/not present" toggle. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-id: 1394822294-14837-2-git-send-email-peter.maydell@linaro.org
2014-02-20target-arm: A64: Implement remaining 3-same instructionsPeter Maydell
Implement the remaining instructions in the SIMD 3-reg-same and scalar-3-reg-same groups: FMULX, FRECPS, FRSQRTS, FACGE, FACGT, FMLA and FMLS. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2014-02-20target-arm: A64: Implement SIMD FP compare and set insnsAlex Bennée
This adds all forms of the SIMD floating point and set instructions: FCM(GT|GE|EQ|LE|LT) Most of the heavy lifting is done by either the existing neon helpers or some new helpers for the 64bit double cases. Most of the code paths are common although the 2misc versions are a little special as they compare against zero. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> [PMM: fixed some minor bugs, added the 2-misc-scalar encoding] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2014-02-20target-arm: A64: Implement plain vector SIMD indexed element insnsPeter Maydell
Implement all the SIMD vector x indexed element instructions in the subcategory which are not 'long' ops. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2014-01-31target-arm: A64: Add SIMD TBL/TBLXMichael Matz
Add support for the SIMD TBL/TBLX instructions (group C3.6.2). Signed-off-by: Michael Matz <matz@suse.de> [PMM: rewritten to do more of the decode in translate-a64.c, and to do only one 64 bit pass at a time in the helper] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2014-01-08target-arm: A64: Add support for floating point compareClaudio Fontana
Add decoding support for C3.6.22 Floating-point compare. Signed-off-by: Claudio Fontana <claudio.fontana@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2013-12-17target-arm: A64: add support for 1-src CLS insnClaudio Fontana
this patch adds support for the CLS instruction. Signed-off-by: Claudio Fontana <claudio.fontana@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2013-12-17target-arm: A64: add support for 1-src RBIT insnAlexander Graf
This adds support for the C5.6.147 RBIT instruction. Signed-off-by: Alexander Graf <agraf@suse.de> [claudio: adapted to new decoder, use bswap64, make RBIT part standalone from the rest of the patch, splitting REV into a separate patch] Signed-off-by: Claudio Fontana <claudio.fontana@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2013-12-17target-arm: A64: add support for 1-src data processing and CLZClaudio Fontana
This patch adds support for decoding 1-src data processing insns, and the first user, C5.6.40 CLZ (count leading zeroes). Signed-off-by: Claudio Fontana <claudio.fontana@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2013-12-17target-arm: A64: add support for 2-src data processing and DIVAlexander Graf
This patch adds support for decoding 2-src data processing insns, and the first users, UDIV and SDIV. Signed-off-by: Alexander Graf <agraf@suse.de> [claudio: adapted to new decoder adding the 2-src decoding level, always zero-extend result in 32bit mode] Signed-off-by: Claudio Fontana <claudio.fontana@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2013-12-17target-arm: A64: add stubs for a64 specific helpersAlexander Graf
We will need helpers that only make sense with AArch64. Add helper-a64.{c,h} files as stubs that we can fill with these helpers in the following patches. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>