aboutsummaryrefslogtreecommitdiff
path: root/target/arm/tcg/sme_helper.c
AgeCommit message (Collapse)Author
2024-08-01target/arm: Handle denormals correctly for FMOPA (widening)Peter Maydell
The FMOPA (widening) SME instruction takes pairs of half-precision floating point values, widens them to single-precision, does a two-way dot product and accumulates the results into a single-precision destination. We don't quite correctly handle the FPCR bits FZ and FZ16 which control flushing of denormal inputs and outputs. This is because at the moment we pass a single float_status value to the helper function, which then uses that configuration for all the fp operations it does. However, because the inputs to this operation are float16 and the outputs are float32 we need to use the fp_status_f16 for the float16 input widening but the normal fp_status for everything else. Otherwise we will apply the flushing control FPCR.FZ16 to the 32-bit output rather than the FPCR.FZ control, and incorrectly flush a denormal output to zero when we should not (or vice-versa). (In commit 207d30b5fdb5b we tried to fix the FZ handling but didn't get it right, switching from "use FPCR.FZ for everything" to "use FPCR.FZ16 for everything".) Pass the CPU env to the sme_fmopa_h helper instead of an fp_status pointer, and have the helper pass an extra fp_status into the f16_dotadd() function so that we can use the right status for the right parts of this operation. Cc: qemu-stable@nongnu.org Fixes: 207d30b5fdb5 ("target/arm: Use FPST_F16 for SME FMOPA (widening)") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2373 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-29target/arm: Fix UMOPA/UMOPS of 16-bit valuesPeter Maydell
The UMOPA/UMOPS instructions are supposed to multiply unsigned 8 or 16 bit elements and accumulate the products into a 64-bit element. In the Arm ARM pseudocode, this is done with the usual infinite-precision signed arithmetic. However our implementation doesn't quite get it right, because in the DEF_IMOP_64() macro we do: sum += (NTYPE)(n >> 0) * (MTYPE)(m >> 0); where NTYPE and MTYPE are uint16_t or int16_t. In the uint16_t case, the C usual arithmetic conversions mean the values are converted to "int" type and the multiply is done as a 32-bit multiply. This means that if the inputs are, for example, 0xffff and 0xffff then the result is 0xFFFE0001 as an int, which is then promoted to uint64_t for the accumulation into sum; this promotion incorrectly sign extends the multiply. Avoid the incorrect sign extension by casting to int64_t before the multiply, so we do the multiply as 64-bit signed arithmetic, which is a type large enough that the multiply can never overflow into the sign bit. (The equivalent 8-bit operations in DEF_IMOP_32() are fine, because the 8-bit multiplies can never overflow into the sign bit of a 32-bit integer.) Cc: qemu-stable@nongnu.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2372 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240722172957.1041231-3-peter.maydell@linaro.org
2024-07-23target/arm: Use set/clear_helper_retaddr in SVE and SME helpersRichard Henderson
Avoid a race condition with munmap in another thread. Use around blocks that exclusively use "host_fn". Keep the blocks as small as possible, but without setting and clearing for every operation on one page. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-18target/arm: Use float_status copy in sme_fmopa_sDaniyal Khan
We made a copy above because the fp exception flags are not propagated back to the FPST register, but then failed to use the copy. Cc: qemu-stable@nongnu.org Fixes: 558e956c719 ("target/arm: Implement FMOPA, FMOPS (non-widening)") Signed-off-by: Daniyal Khan <danikhan632@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20240717060149.204788-2-richard.henderson@linaro.org [rth: Split from a larger patch] Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-03-07target/arm: Fix 32-bit SMOPARichard Henderson
While the 8-bit input elements are sequential in the input vector, the 32-bit output elements are not sequential in the output matrix. Do not attempt to compute 2 32-bit outputs at the same time. Cc: qemu-stable@nongnu.org Fixes: 23a5e3859f5 ("target/arm: Implement SME integer outer product") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2083 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240305163931.242795-1-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-02-15target/arm: Fix SVE/SME gross MTE suppression checksRichard Henderson
The TBI and TCMA bits are located within mtedesc, not desc. Cc: qemu-stable@nongnu.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Gustavo Romero <gustavo.romero@linaro.org> Message-id: 20240207025210.8837-7-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-11-20target/arm: Fix SME FMOPA (16-bit), BFMOPARichard Henderson
Perform the loop increment unconditionally, not nested within the predication. Cc: qemu-stable@nongnu.org Fixes: 3916841ac75 ("target/arm: Implement FMOPA, FMOPS (widening)") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1985 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20231117193135.1180657-1-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-08-22target/arm: Fix SME ST1QRichard Henderson
A typo, noted in the bug report, resulting in an incorrect write offset. Cc: qemu-stable@nongnu.org Fixes: 7390e0e9ab8 ("target/arm: Implement SME LD1, ST1") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1833 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20230818214255.146905-1-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-02-27target/arm: move helpers to tcg/Claudio Fontana
Signed-off-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>