aboutsummaryrefslogtreecommitdiff
path: root/target/ppc/translate
AgeCommit message (Collapse)Author
2024-02-24target/ppc: Fix lxv/stxv MSR facility checkNicholas Piggin
The move to decodetree flipped the inequality test for the VEC / VSX MSR facility check. This caused application crashes under Linux, where these facility unavailable interrupts are used for lazy-switching of VEC/VSX register sets. Getting the incorrect interrupt would result in wrong registers being loaded, potentially overwriting live values and/or exposing stale ones. Cc: qemu-stable@nongnu.org Reported-by: Joel Stanley <joel@jms.id.au> Fixes: 70426b5bb738 ("target/ppc: moved stxvx and lxvx from legacy to decodtree") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1769 Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Tested-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> (cherry picked from commit 2cc0e449d17310877fb28a942d4627ad22bb68ea) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-03tcg: Rename cpu_env to tcg_envRichard Henderson
Allow the name 'cpu_env' to be used for something else. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-09-20ppc: spelling fixesMichael Tokarev
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Cédric Le Goater <clg@kaod.org>
2023-09-06target/ppc: Fix LQ, STQ register-pair order for big-endianNicholas Piggin
LQ, STQ have the same register-pair ordering as LQARX/STQARX., which is the even (lower) register contains the most significant bits. This is not implemented correctly for big-endian. do_ldst_quad() has variables low_addr_gpr and high_addr_gpr which is confusing because they are low and high addresses, whereas LQARX/STQARX. and most such things use the low and high values for lo/hi variables. The conversion to native 128-bit memory access functions missed this strangeness. Fix this by changing the if condition, and change the variable names to hi/lo to match convention. Cc: qemu-stable@nongnu.org Reported-by: Ivan Warren <ivan@vmfacility.fr> Fixes: 57b38ffd0c6f ("target/ppc: Use tcg_gen_qemu_{ld,st}_i128 for LQARX, LQ, STQ") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1836 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Cédric Le Goater <clg@kaod.org>
2023-08-24target/ppc: Use tcg_gen_negsetcond_*Richard Henderson
Tested-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-05target/ppc: Inline gen_icount_io_start()Philippe Mathieu-Daudé
Now that gen_icount_io_start() is a simple wrapper to translator_io_start(), inline it. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230602095439.48102-1-philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-05-27target/ppc: Fix fallback to MFSS for MFFS* instructions on pre 3.0 ISAsRichard Purdie
The following commits changed the code such that the fallback to MFSS for MFFSCRN, MFFSCRNI, MFFSCE and MFFSL on pre 3.0 ISAs was removed and became an illegal instruction: bf8adfd88b547680aa857c46098f3a1e94373160 - target/ppc: Move mffscrn[i] to decodetree 394c2e2fda70da722f20fb60412d6c0ca4bfaa03 - target/ppc: Move mffsce to decodetree 3e5bce70efe6bd1f684efbb21fd2a316cbf0657e - target/ppc: Move mffsl to decodetree The hardware will handle them as a MFFS instruction as the code did previously. This means applications that were segfaulting under qemu when encountering these instructions which is used in glibc libm functions for example. The fallback for MFFSCDRN and MFFSCDRNI added in a later patch was also missing. This patch restores the fallback to MFSS for these instructions on pre 3.0s ISAs as the hardware decoder would, fixing the segfaulting libm code. It doesn't have the fallback for 3.0 onwards to match hardware behaviour. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org> Reviewed-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230510111913.1718734-1-richard.purdie@linuxfoundation.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2023-05-23target/ppc: Use tcg_gen_qemu_{ld,st}_i128 for LQARX, LQ, STQRichard Henderson
No need to roll our own, as this is now provided by tcg. This was the last use of retxl, so remove that too. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-05-05tcg: ppc64: Fix mask generation for vextractdmShivaprasad G Bhat
In function do_extractm() the mask is calculated as dup_const(1 << (element_width - 1)). '1' being signed int works fine for MO_8,16,32. For MO_64, on PPC64 host this ends up becoming 0 on compilation. The vextractdm uses MO_64, and it ends up having mask as 0. Explicitly use 1ULL instead of signed int 1 like its used everywhere else. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1536 Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Lucas Mateus Castro <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-Id: <168319292809.1159309.5817546227121323288.stgit@ltc-boston1.aus.stglabs.ibm.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2023-03-13target/ppc: Rewrite trans_ADDG6SRichard Henderson
Compute all carry bits in parallel instead of a loop. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-13target/ppc: Avoid tcg_const_* in fp-impl.c.incRichard Henderson
All uses are strictly read-only. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-13target/ppc: Avoid tcg_const_* in vsx-impl.c.incRichard Henderson
All remaining uses are strictly read-only. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-13target/ppc: Avoid tcg_const_* in xxevalRichard Henderson
Initialize a new temp instead of tcg_const_*. Fix a pasto in a comment. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-13target/ppc: Avoid tcg_const_* in vmx-impl.c.incRichard Henderson
All remaining uses are strictly read-only. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-13target/ppc: Avoid tcg_const_i64 in do_vcntmbRichard Henderson
Compute both partial results separately and accumulate at the end, instead of accumulating in the middle. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-13target/ppc: Avoid tcg_const_i64 in do_vector_shift_quadRichard Henderson
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-13target/ppc: Split out gen_vx_vmul10Richard Henderson
Move the body out of this large macro. Use tcg_constant_i64. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-05target/ppc: Drop tcg_temp_freeRichard Henderson
Translators are no longer required to free tcg temporaries. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-01target/ppc: Don't use tcg_temp_local_newRichard Henderson
Since tcg_temp_new is now identical, use that. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-10-28target/ppc: Use gvec to decode XVTSTDC[DS]PLucas Mateus Castro (alqotel)
Used gvec to translate XVTSTDCSP and XVTSTDCDP. xvtstdcsp: rept loop imm master version prev version current version 25 4000 0 0,206200 0,040730 (-80.2%) 0,040740 (-80.2%) 25 4000 1 0,205120 0,053650 (-73.8%) 0,053510 (-73.9%) 25 4000 3 0,206160 0,058630 (-71.6%) 0,058570 (-71.6%) 25 4000 51 0,217110 0,191490 (-11.8%) 0,192320 (-11.4%) 25 4000 127 0,206160 0,191490 (-7.1%) 0,192640 (-6.6%) 8000 12 0 1,234719 0,418833 (-66.1%) 0,386365 (-68.7%) 8000 12 1 1,232417 1,435979 (+16.5%) 1,462792 (+18.7%) 8000 12 3 1,232760 1,766073 (+43.3%) 1,743990 (+41.5%) 8000 12 51 1,239281 1,319562 (+6.5%) 1,423479 (+14.9%) 8000 12 127 1,231708 1,315760 (+6.8%) 1,426667 (+15.8%) xvtstdcdp: rept loop imm master version prev version current version 25 4000 0 0,159930 0,040830 (-74.5%) 0,040610 (-74.6%) 25 4000 1 0,160640 0,053670 (-66.6%) 0,053480 (-66.7%) 25 4000 3 0,160020 0,063030 (-60.6%) 0,062960 (-60.7%) 25 4000 51 0,160410 0,128620 (-19.8%) 0,127470 (-20.5%) 25 4000 127 0,160330 0,127670 (-20.4%) 0,128690 (-19.7%) 8000 12 0 1,190365 0,422146 (-64.5%) 0,388417 (-67.4%) 8000 12 1 1,191292 1,445312 (+21.3%) 1,428698 (+19.9%) 8000 12 3 1,188687 1,980656 (+66.6%) 1,975354 (+66.2%) 8000 12 51 1,191250 1,264500 (+6.1%) 1,355083 (+13.8%) 8000 12 127 1,197313 1,266729 (+5.8%) 1,349156 (+12.7%) Overall, these instructions are the hardest ones to measure performance as the gvec implementation is affected by the immediate. Above there are 5 different scenarios when it comes to immediate and 2 when it comes to rept/loop combination. The immediates scenarios are: all bits are 0 therefore the target register should just be changed to 0, with 1 bit set, with 2 bits set in a combination the new implementation can deal with using gvec, 4 bits set and the new implementation can't deal with it using gvec and all bits set. The rept/loop scenarios are high loop and low rept (so it should spend more time executing it than translating it) and high rept low loop (so it should spend more time translating it than executing this code). These comparisons are between the upstream version, a previous similar implementation and a one with a cleaner code(this one). For a comparison with o previous different implementation: <20221010191356.83659-13-lucas.araujo@eldorado.org.br> Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-13-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28target/ppc: Moved XSTSTDC[QDS]P to decodetreeLucas Mateus Castro (alqotel)
Moved XSTSTDCSP, XSTSTDCDP and XSTSTDCQP to decodetree and moved some of its decoding away from the helper as previously the DCMX, XB and BF were calculated in the helper with the help of cpu_env, now that part was moved to the decodetree with the rest. xvtstdcsp: rept loop master patch 8 12500 1,85393600 1,94683600 (+5.0%) 25 4000 1,78779800 1,92479000 (+7.7%) 100 1000 2,12775000 2,28895500 (+7.6%) 500 200 2,99655300 3,23102900 (+7.8%) 2500 40 6,89082200 7,44827500 (+8.1%) 8000 12 17,50585500 18,95152100 (+8.3%) xvtstdcdp: rept loop master patch 8 12500 1,39043100 1,33539800 (-4.0%) 25 4000 1,35731800 1,37347800 (+1.2%) 100 1000 1,51514800 1,56053000 (+3.0%) 500 200 2,21014400 2,47906000 (+12.2%) 2500 40 5,39488200 6,68766700 (+24.0%) 8000 12 13,98623900 18,17661900 (+30.0%) xvtstdcdp: rept loop master patch 8 12500 1,35123800 1,34455800 (-0.5%) 25 4000 1,36441200 1,36759600 (+0.2%) 100 1000 1,49763500 1,54138400 (+2.9%) 500 200 2,19020200 2,46196400 (+12.4%) 2500 40 5,39265700 6,68147900 (+23.9%) 8000 12 14,04163600 18,19669600 (+29.6%) As some values are now decoded outside the helper and passed to it as an argument the number of arguments of the helper increased, the number of TCGop needed to load the arguments increased. I suspect that's why the slow-down in the tests with a high REPT but low LOOP. Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-12-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28target/ppc: Moved XVTSTDC[DS]P to decodetreeLucas Mateus Castro (alqotel)
Moved XVTSTDCSP and XVTSTDCDP to decodetree an restructured the helper to be simpler and do all decoding in the decodetree (so XB, XT and DCMX are all calculated outside the helper). Obs: The tests in this one are slightly different, these are the sum of these instructions with all possible immediate and those instructions are repeated 10 times. xvtstdcsp: rept loop master patch 8 12500 2,76402100 2,70699100 (-2.1%) 25 4000 2,64867100 2,67884100 (+1.1%) 100 1000 2,73806300 2,78701000 (+1.8%) 500 200 3,44666500 3,61027600 (+4.7%) 2500 40 5,85790200 6,47475500 (+10.5%) 8000 12 15,22102100 17,46062900 (+14.7%) xvtstdcdp: rept loop master patch 8 12500 2,11818000 1,61065300 (-24.0%) 25 4000 2,04573400 1,60132200 (-21.7%) 100 1000 2,13834100 1,69988100 (-20.5%) 500 200 2,73977000 2,48631700 (-9.3%) 2500 40 5,05067000 5,25914100 (+4.1%) 8000 12 14,60507800 15,93704900 (+9.1%) Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-11-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28target/ppc: Use gvec to decode XVCPSGN[SD]PLucas Mateus Castro (alqotel)
Moved XVCPSGNSP and XVCPSGNDP to decodetree and used gvec to translate them. xvcpsgnsp: rept loop master patch 8 12500 0,00561400 0,00537900 (-4.2%) 25 4000 0,00562100 0,00400000 (-28.8%) 100 1000 0,00696900 0,00416300 (-40.3%) 500 200 0,02211900 0,00840700 (-62.0%) 2500 40 0,09328600 0,02728300 (-70.8%) 8000 12 0,27295300 0,06867800 (-74.8%) xvcpsgndp: rept loop master patch 8 12500 0,00556300 0,00584200 (+5.0%) 25 4000 0,00482700 0,00431700 (-10.6%) 100 1000 0,00585800 0,00464400 (-20.7%) 500 200 0,01565300 0,00839700 (-46.4%) 2500 40 0,05766500 0,02430600 (-57.8%) 8000 12 0,19875300 0,07947100 (-60.0%) Like the previous instructions there seemed to be a improvement on translation time. Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-10-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28target/ppc: Use gvec to decode XV[N]ABS[DS]P/XVNEG[DS]PLucas Mateus Castro (alqotel)
Moved XVABSSP, XVABSDP, XVNABSSP,XVNABSDP, XVNEGSP and XVNEGDP to decodetree and used gvec to translate them. xvabssp: rept loop master patch 8 12500 0,00477900 0,00476000 (-0.4%) 25 4000 0,00442800 0,00353300 (-20.2%) 100 1000 0,00478700 0,00366100 (-23.5%) 500 200 0,00973200 0,00649400 (-33.3%) 2500 40 0,03165200 0,02226700 (-29.7%) 8000 12 0,09315900 0,06674900 (-28.3%) xvabsdp: rept loop master patch 8 12500 0,00475000 0,00474400 (-0.1%) 25 4000 0,00355600 0,00367500 (+3.3%) 100 1000 0,00444200 0,00366000 (-17.6%) 500 200 0,00942700 0,00732400 (-22.3%) 2500 40 0,02990000 0,02308500 (-22.8%) 8000 12 0,08770300 0,06683800 (-23.8%) xvnabssp: rept loop master patch 8 12500 0,00494500 0,00492900 (-0.3%) 25 4000 0,00397700 0,00338600 (-14.9%) 100 1000 0,00421400 0,00353500 (-16.1%) 500 200 0,01048000 0,00707100 (-32.5%) 2500 40 0,03251500 0,02238300 (-31.2%) 8000 12 0,08889100 0,06469800 (-27.2%) xvnabsdp: rept loop master patch 8 12500 0,00511000 0,00492700 (-3.6%) 25 4000 0,00398800 0,00381500 (-4.3%) 100 1000 0,00390500 0,00365900 (-6.3%) 500 200 0,00924800 0,00784600 (-15.2%) 2500 40 0,03138900 0,02391600 (-23.8%) 8000 12 0,09654200 0,05684600 (-41.1%) xvnegsp: rept loop master patch 8 12500 0,00493900 0,00452800 (-8.3%) 25 4000 0,00369100 0,00366800 (-0.6%) 100 1000 0,00371100 0,00380000 (+2.4%) 500 200 0,00991100 0,00652300 (-34.2%) 2500 40 0,03025800 0,02422300 (-19.9%) 8000 12 0,09251100 0,06457600 (-30.2%) xvnegdp: rept loop master patch 8 12500 0,00474900 0,00454400 (-4.3%) 25 4000 0,00353100 0,00325600 (-7.8%) 100 1000 0,00398600 0,00366800 (-8.0%) 500 200 0,01032300 0,00702400 (-32.0%) 2500 40 0,03125000 0,02422400 (-22.5%) 8000 12 0,09475100 0,06173000 (-34.9%) This one to me seemed the opposite of the previous instructions, as it looks like there was an improvement in the translation time (itself not a surprise as operations were done twice before so there was the need to translate twice as many TCGop) Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-9-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28target/ppc: Move VABSDU[BHW] to decodetree and use gvecLucas Mateus Castro (alqotel)
Moved VABSDUB, VABSDUH and VABSDUW to decodetree and use gvec to translate them. vabsdub: rept loop master patch 8 12500 0,03601600 0,00688500 (-80.9%) 25 4000 0,03651000 0,00532100 (-85.4%) 100 1000 0,03666900 0,00595300 (-83.8%) 500 200 0,04305800 0,01244600 (-71.1%) 2500 40 0,06893300 0,04273700 (-38.0%) 8000 12 0,14633200 0,12660300 (-13.5%) vabsduh: rept loop master patch 8 12500 0,02172400 0,00687500 (-68.4%) 25 4000 0,02154100 0,00531500 (-75.3%) 100 1000 0,02235400 0,00596300 (-73.3%) 500 200 0,02827500 0,01245100 (-56.0%) 2500 40 0,05638400 0,04285500 (-24.0%) 8000 12 0,13166000 0,12641400 (-4.0%) vabsduw: rept loop master patch 8 12500 0,01646400 0,00688300 (-58.2%) 25 4000 0,01454500 0,00475500 (-67.3%) 100 1000 0,01545800 0,00511800 (-66.9%) 500 200 0,02168200 0,01114300 (-48.6%) 2500 40 0,04571300 0,04138800 (-9.5%) 8000 12 0,12209500 0,12178500 (-0.3%) Same as VADDCUW and VSUBCUW, overall performance gain but it uses more TCGop (4 before the patch, 6 after). Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-8-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28target/ppc: Move VAVG[SU][BHW] to decodetree and use gvecLucas Mateus Castro (alqotel)
Moved the instructions VAVGUB, VAVGUH, VAVGUW, VAVGSB, VAVGSH, VAVGSW, to decodetree and use gvec with them. For these one the right shift had to be made before the sum as to avoid an overflow, so add 1 at the end if any of the entries had 1 in its LSB as to replicate the "+ 1" before the shift described by the ISA. vavgub: rept loop master patch 8 12500 0,02616600 0,00754200 (-71.2%) 25 4000 0,02530000 0,00637700 (-74.8%) 100 1000 0,02604600 0,00790100 (-69.7%) 500 200 0,03189300 0,01838400 (-42.4%) 2500 40 0,06006900 0,06851000 (+14.1%) 8000 12 0,13941000 0,20548500 (+47.4%) vavguh: rept loop master patch 8 12500 0,01818200 0,00780600 (-57.1%) 25 4000 0,01789300 0,00641600 (-64.1%) 100 1000 0,01899100 0,00787200 (-58.5%) 500 200 0,02527200 0,01828400 (-27.7%) 2500 40 0,05361800 0,06773000 (+26.3%) 8000 12 0,12886600 0,20291400 (+57.5%) vavguw: rept loop master patch 8 12500 0,01423100 0,00776600 (-45.4%) 25 4000 0,01780800 0,00638600 (-64.1%) 100 1000 0,02085500 0,00787000 (-62.3%) 500 200 0,02737100 0,01828800 (-33.2%) 2500 40 0,05572600 0,06774200 (+21.6%) 8000 12 0,13101700 0,20311600 (+55.0%) vavgsb: rept loop master patch 8 12500 0,03006000 0,00788600 (-73.8%) 25 4000 0,02882200 0,00637800 (-77.9%) 100 1000 0,02958000 0,00791400 (-73.2%) 500 200 0,03548800 0,01860400 (-47.6%) 2500 40 0,06360000 0,06850800 (+7.7%) 8000 12 0,13816500 0,20550300 (+48.7%) vavgsh: rept loop master patch 8 12500 0,01965900 0,00776600 (-60.5%) 25 4000 0,01875400 0,00638700 (-65.9%) 100 1000 0,01952200 0,00786900 (-59.7%) 500 200 0,02562000 0,01760300 (-31.3%) 2500 40 0,05384300 0,06742800 (+25.2%) 8000 12 0,13240800 0,20330000 (+53.5%) vavgsw: rept loop master patch 8 12500 0,01407700 0,00775600 (-44.9%) 25 4000 0,01762300 0,00640000 (-63.7%) 100 1000 0,02046500 0,00788500 (-61.5%) 500 200 0,02745600 0,01843000 (-32.9%) 2500 40 0,05375500 0,06820500 (+26.9%) 8000 12 0,13068300 0,20304900 (+55.4%) These results to me seems to indicate that with gvec the results have a slower translation but faster execution. Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-7-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28target/ppc: Move VPRTYB[WDQ] to decodetree and use gvecLucas Mateus Castro (alqotel)
Moved VPRTYBW and VPRTYBD to use gvec and both of them and VPRTYBQ to decodetree. VPRTYBW and VPRTYBD now also use .fni4 and .fni8, respectively. vprtybw: rept loop master patch 8 12500 0,01198900 0,00703100 (-41.4%) 25 4000 0,01070100 0,00571400 (-46.6%) 100 1000 0,01123300 0,00678200 (-39.6%) 500 200 0,01601500 0,01535600 (-4.1%) 2500 40 0,03872900 0,05562100 (43.6%) 8000 12 0,10047000 0,16643000 (65.7%) vprtybd: rept loop master patch 8 12500 0,00757700 0,00788100 (4.0%) 25 4000 0,00652500 0,00669600 (2.6%) 100 1000 0,00714400 0,00825400 (15.5%) 500 200 0,01211000 0,01903700 (57.2%) 2500 40 0,03483800 0,07021200 (101.5%) 8000 12 0,09591800 0,21036200 (119.3%) vprtybq: rept loop master patch 8 12500 0,00675600 0,00667200 (-1.2%) 25 4000 0,00619400 0,00643200 (3.8%) 100 1000 0,00707100 0,00751100 (6.2%) 500 200 0,01199300 0,01342000 (11.9%) 2500 40 0,03490900 0,04092900 (17.2%) 8000 12 0,09588200 0,11465100 (19.6%) I wasn't expecting such a performance lost in both VPRTYBD and VPRTYBQ, I'm not sure if it's worth to move those instructions. Comparing the assembly of the helper with the TCGop they are pretty similar, so I'm not sure why vprtybd took so much more time. Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-6-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28target/ppc: Move VNEG[WD] to decodtree and use gvecLucas Mateus Castro (alqotel)
Moved the instructions VNEGW and VNEGD to decodetree and used gvec to decode it. vnegw: rept loop master patch 8 12500 0,01053200 0,00548400 (-47.9%) 25 4000 0,01030500 0,00390000 (-62.2%) 100 1000 0,01096300 0,00395400 (-63.9%) 500 200 0,01472000 0,00712300 (-51.6%) 2500 40 0,03809000 0,02147700 (-43.6%) 8000 12 0,09957100 0,06202100 (-37.7%) vnegd: rept loop master patch 8 12500 0,00594600 0,00543800 (-8.5%) 25 4000 0,00575200 0,00396400 (-31.1%) 100 1000 0,00676100 0,00394800 (-41.6%) 500 200 0,01149300 0,00709400 (-38.3%) 2500 40 0,03441500 0,02169600 (-37.0%) 8000 12 0,09516900 0,06337000 (-33.4%) Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-5-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28target/ppc: Move V(ADD|SUB)CUW to decodetree and use gvecLucas Mateus Castro (alqotel)
This patch moves VADDCUW and VSUBCUW to decodtree with gvec using an implementation based on the helper, with the main difference being changing the -1 (aka all bits set to 1) result returned by cmp when true to +1. It also implemented a .fni4 version of those instructions and dropped the helper. vaddcuw: rept loop master patch 8 12500 0,01008200 0,00612400 (-39.3%) 25 4000 0,01091500 0,00471600 (-56.8%) 100 1000 0,01332500 0,00593700 (-55.4%) 500 200 0,01998500 0,01275700 (-36.2%) 2500 40 0,04704300 0,04364300 (-7.2%) 8000 12 0,10748200 0,11241000 (+4.6%) vsubcuw: rept loop master patch 8 12500 0,01226200 0,00571600 (-53.4%) 25 4000 0,01493500 0,00462100 (-69.1%) 100 1000 0,01522700 0,00455100 (-70.1%) 500 200 0,02384600 0,01133500 (-52.5%) 2500 40 0,04935200 0,03178100 (-35.6%) 8000 12 0,09039900 0,09440600 (+4.4%) Overall there was a gain in performance, but the TCGop code was still slightly bigger in the new version (it went from 4 to 5). Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-4-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28target/ppc: Move VMH[R]ADDSHS instruction to decodetreeLucas Mateus Castro (alqotel)
This patch moves VMHADDSHS and VMHRADDSHS to decodetree I couldn't find a satisfactory implementation with TCG inline. vmhaddshs: rept loop master patch 8 12500 0,02983400 0,02648500 (-11.2%) 25 4000 0,02946000 0,02518000 (-14.5%) 100 1000 0,03104300 0,02638000 (-15.0%) 500 200 0,04002000 0,03502500 (-12.5%) 2500 40 0,08090100 0,07562200 (-6.5%) 8000 12 0,19242600 0,18626800 (-3.2%) vmhraddshs: rept loop master patch 8 12500 0,03078600 0,02851000 (-7.4%) 25 4000 0,02793200 0,02746900 (-1.7%) 100 1000 0,02886000 0,02839900 (-1.6%) 500 200 0,03714700 0,03799200 (+2.3%) 2500 40 0,07948000 0,07852200 (-1.2%) 8000 12 0,19049800 0,18813900 (-1.2%) Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-3-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28target/ppc: Moved VMLADDUHM to decodetree and use gvecLucas Mateus Castro (alqotel)
This patch moves VMLADDUHM to decodetree a creates a gvec implementation using mul_vec and add_vec. rept loop master patch 8 12500 0,01810500 0,00903100 (-50.1%) 25 4000 0,01739400 0,00747700 (-57.0%) 100 1000 0,01843600 0,00901400 (-51.1%) 500 200 0,02574600 0,01971000 (-23.4%) 2500 40 0,05921600 0,07121800 (+20.3%) 8000 12 0,15326700 0,21725200 (+41.7%) The significant difference in performance when REPT is low and LOOP is high I think is due to the fact that the new implementation has a higher translation time, as when using a helper only 5 TCGop are used but with the patch a total of 10 TCGop are needed (Power lacks a direct mul_vec equivalent so this instruction is implemented with the help of 5 others, vmuleu, vmulou, vmrgh, vmrgl and vpkum). Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221019125040.48028-2-lucas.araujo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28target/ppc: move msgsync to decodetreeMatheus Ferst
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20221006200654.725390-7-matheus.ferst@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28target/ppc: move msgclrp/msgsndp to decodetreeMatheus Ferst
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20221006200654.725390-6-matheus.ferst@eldorado.org.br> [danielhb: ppc32 build fix in trans_(MSGCLRP|MSGSNDP)] Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-28target/ppc: move msgclr/msgsnd to decodetreeMatheus Ferst
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20221006200654.725390-5-matheus.ferst@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-10-17target/ppc: Fix xvcmp* clearing FI bitVíctor Colombo
Vector instructions in general are not supposed to change the FI bit. However, xvcmp* instructions are calling gen_helper_float_check_status, which is leading to a cleared FI flag where it should be kept unchanged. As helper_float_check_status only affects inexact, overflow and underflow, and the xvcmp* instructions don't change these flags, this issue can be fixed by removing the call to helper_float_check_status. By doing this, the FI bit in FPSCR will be preserved as expected. Fixes: 00084a25adf ("target/ppc: introduce separate VSX_CMP macro for xvcmp* instructions") Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221005121551.27957-1-victor.colombo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-09-20target/ppc: Merge fsqrt and fsqrts helpersVíctor Colombo
These two helpers are almost identical, differing only by the softfloat operation it calls. Merge them into one using a macro. Also, take this opportunity to capitalize the helper name as we moved the instruction to decodetree in a previous patch. Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220905123746.54659-4-victor.colombo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-09-20target/ppc: Move fsqrts to decodetreeVíctor Colombo
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220905123746.54659-3-victor.colombo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-09-20target/ppc: Move fsqrt to decodetreeVíctor Colombo
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220905123746.54659-2-victor.colombo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-09-20target/ppc: Implement hashstp and hashchkpVíctor Colombo
Implementation for instructions hashstp and hashchkp, the privileged versions of hashst and hashchk, which were added in Power ISA 3.1B. Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br> Reviewed-by: Lucas Mateus Castro <lucas.araujo@eldorado.org.br> Message-Id: <20220715205439.161110-4-victor.colombo@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-09-20target/ppc: Implement hashst and hashchkVíctor Colombo
Implementation for instructions hashst and hashchk, which were added in Power ISA 3.1B. It was decided to implement the hash algorithm from ground up in this patch exactly as described in Power ISA. Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br> Reviewed-by: Lucas Mateus Castro <lucas.araujo@eldorado.org.br> Message-Id: <20220715205439.161110-3-victor.colombo@eldorado.org.br> [danielhb: fix block comment in excp_helper.c] Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18target/ppc: Implement slbiagLucas Coutinho
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-12-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18target/ppc: Move slbsync to decodetreeLucas Coutinho
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-11-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18target/ppc: Move slbfee to decodetreeLucas Coutinho
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-10-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18target/ppc: Move slbmfee to decodetreeLucas Coutinho
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-9-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18target/ppc: Move slbmfev to decodetreeLucas Coutinho
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-8-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18target/ppc: Move slbmte to decodetreeLucas Coutinho
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-7-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18target/ppc: Move slbia to decodetreeLucas Coutinho
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-6-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18target/ppc: Move slbieg to decodetreeLucas Coutinho
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-5-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18target/ppc: Move slbie to decodetreeLucas Coutinho
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-4-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2022-07-18target/ppc: add macros to check privilege levelMatheus Ferst
Equivalent to CHK_SV and CHK_HV, but can be used in decodetree methods. Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br> Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br> Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br> Message-Id: <20220701133507.740619-3-lucas.coutinho@eldorado.org.br> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>