Age | Commit message (Collapse) | Author |
|
The move to decodetree flipped the inequality test for the VEC / VSX
MSR facility check.
This caused application crashes under Linux, where these facility
unavailable interrupts are used for lazy-switching of VEC/VSX register
sets. Getting the incorrect interrupt would result in wrong registers
being loaded, potentially overwriting live values and/or exposing
stale ones.
Cc: qemu-stable@nongnu.org
Reported-by: Joel Stanley <joel@jms.id.au>
Fixes: 70426b5bb738 ("target/ppc: moved stxvx and lxvx from legacy to decodtree")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1769
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Tested-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
(cherry picked from commit 2cc0e449d17310877fb28a942d4627ad22bb68ea)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
|
|
In function do_extractm() the mask is calculated as
dup_const(1 << (element_width - 1)). '1' being signed int
works fine for MO_8,16,32. For MO_64, on PPC64 host
this ends up becoming 0 on compilation. The vextractdm
uses MO_64, and it ends up having mask as 0.
Explicitly use 1ULL instead of signed int 1 like its
used everywhere else.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1536
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Lucas Mateus Castro <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Message-Id: <168319292809.1159309.5817546227121323288.stgit@ltc-boston1.aus.stglabs.ibm.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
(cherry picked from commit 6a5d81b17201ab8a95539bad94c8a6c08a42e076)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
|
|
Used gvec to translate XVTSTDCSP and XVTSTDCDP.
xvtstdcsp:
rept loop imm master version prev version current version
25 4000 0 0,206200 0,040730 (-80.2%) 0,040740 (-80.2%)
25 4000 1 0,205120 0,053650 (-73.8%) 0,053510 (-73.9%)
25 4000 3 0,206160 0,058630 (-71.6%) 0,058570 (-71.6%)
25 4000 51 0,217110 0,191490 (-11.8%) 0,192320 (-11.4%)
25 4000 127 0,206160 0,191490 (-7.1%) 0,192640 (-6.6%)
8000 12 0 1,234719 0,418833 (-66.1%) 0,386365 (-68.7%)
8000 12 1 1,232417 1,435979 (+16.5%) 1,462792 (+18.7%)
8000 12 3 1,232760 1,766073 (+43.3%) 1,743990 (+41.5%)
8000 12 51 1,239281 1,319562 (+6.5%) 1,423479 (+14.9%)
8000 12 127 1,231708 1,315760 (+6.8%) 1,426667 (+15.8%)
xvtstdcdp:
rept loop imm master version prev version current version
25 4000 0 0,159930 0,040830 (-74.5%) 0,040610 (-74.6%)
25 4000 1 0,160640 0,053670 (-66.6%) 0,053480 (-66.7%)
25 4000 3 0,160020 0,063030 (-60.6%) 0,062960 (-60.7%)
25 4000 51 0,160410 0,128620 (-19.8%) 0,127470 (-20.5%)
25 4000 127 0,160330 0,127670 (-20.4%) 0,128690 (-19.7%)
8000 12 0 1,190365 0,422146 (-64.5%) 0,388417 (-67.4%)
8000 12 1 1,191292 1,445312 (+21.3%) 1,428698 (+19.9%)
8000 12 3 1,188687 1,980656 (+66.6%) 1,975354 (+66.2%)
8000 12 51 1,191250 1,264500 (+6.1%) 1,355083 (+13.8%)
8000 12 127 1,197313 1,266729 (+5.8%) 1,349156 (+12.7%)
Overall, these instructions are the hardest ones to measure performance
as the gvec implementation is affected by the immediate. Above there are
5 different scenarios when it comes to immediate and 2 when it comes to
rept/loop combination. The immediates scenarios are: all bits are 0
therefore the target register should just be changed to 0, with 1 bit
set, with 2 bits set in a combination the new implementation can deal
with using gvec, 4 bits set and the new implementation can't deal with
it using gvec and all bits set. The rept/loop scenarios are high loop
and low rept (so it should spend more time executing it than translating
it) and high rept low loop (so it should spend more time translating it
than executing this code).
These comparisons are between the upstream version, a previous similar
implementation and a one with a cleaner code(this one).
For a comparison with o previous different implementation:
<20221010191356.83659-13-lucas.araujo@eldorado.org.br>
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-13-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Moved XSTSTDCSP, XSTSTDCDP and XSTSTDCQP to decodetree and moved some of
its decoding away from the helper as previously the DCMX, XB and BF were
calculated in the helper with the help of cpu_env, now that part was
moved to the decodetree with the rest.
xvtstdcsp:
rept loop master patch
8 12500 1,85393600 1,94683600 (+5.0%)
25 4000 1,78779800 1,92479000 (+7.7%)
100 1000 2,12775000 2,28895500 (+7.6%)
500 200 2,99655300 3,23102900 (+7.8%)
2500 40 6,89082200 7,44827500 (+8.1%)
8000 12 17,50585500 18,95152100 (+8.3%)
xvtstdcdp:
rept loop master patch
8 12500 1,39043100 1,33539800 (-4.0%)
25 4000 1,35731800 1,37347800 (+1.2%)
100 1000 1,51514800 1,56053000 (+3.0%)
500 200 2,21014400 2,47906000 (+12.2%)
2500 40 5,39488200 6,68766700 (+24.0%)
8000 12 13,98623900 18,17661900 (+30.0%)
xvtstdcdp:
rept loop master patch
8 12500 1,35123800 1,34455800 (-0.5%)
25 4000 1,36441200 1,36759600 (+0.2%)
100 1000 1,49763500 1,54138400 (+2.9%)
500 200 2,19020200 2,46196400 (+12.4%)
2500 40 5,39265700 6,68147900 (+23.9%)
8000 12 14,04163600 18,19669600 (+29.6%)
As some values are now decoded outside the helper and passed to it as an
argument the number of arguments of the helper increased, the number
of TCGop needed to load the arguments increased. I suspect that's why
the slow-down in the tests with a high REPT but low LOOP.
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-12-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Moved XVTSTDCSP and XVTSTDCDP to decodetree an restructured the helper
to be simpler and do all decoding in the decodetree (so XB, XT and DCMX
are all calculated outside the helper).
Obs: The tests in this one are slightly different, these are the sum of
these instructions with all possible immediate and those instructions
are repeated 10 times.
xvtstdcsp:
rept loop master patch
8 12500 2,76402100 2,70699100 (-2.1%)
25 4000 2,64867100 2,67884100 (+1.1%)
100 1000 2,73806300 2,78701000 (+1.8%)
500 200 3,44666500 3,61027600 (+4.7%)
2500 40 5,85790200 6,47475500 (+10.5%)
8000 12 15,22102100 17,46062900 (+14.7%)
xvtstdcdp:
rept loop master patch
8 12500 2,11818000 1,61065300 (-24.0%)
25 4000 2,04573400 1,60132200 (-21.7%)
100 1000 2,13834100 1,69988100 (-20.5%)
500 200 2,73977000 2,48631700 (-9.3%)
2500 40 5,05067000 5,25914100 (+4.1%)
8000 12 14,60507800 15,93704900 (+9.1%)
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-11-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Moved XVCPSGNSP and XVCPSGNDP to decodetree and used gvec to translate
them.
xvcpsgnsp:
rept loop master patch
8 12500 0,00561400 0,00537900 (-4.2%)
25 4000 0,00562100 0,00400000 (-28.8%)
100 1000 0,00696900 0,00416300 (-40.3%)
500 200 0,02211900 0,00840700 (-62.0%)
2500 40 0,09328600 0,02728300 (-70.8%)
8000 12 0,27295300 0,06867800 (-74.8%)
xvcpsgndp:
rept loop master patch
8 12500 0,00556300 0,00584200 (+5.0%)
25 4000 0,00482700 0,00431700 (-10.6%)
100 1000 0,00585800 0,00464400 (-20.7%)
500 200 0,01565300 0,00839700 (-46.4%)
2500 40 0,05766500 0,02430600 (-57.8%)
8000 12 0,19875300 0,07947100 (-60.0%)
Like the previous instructions there seemed to be a improvement on
translation time.
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-10-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Moved XVABSSP, XVABSDP, XVNABSSP,XVNABSDP, XVNEGSP and XVNEGDP to
decodetree and used gvec to translate them.
xvabssp:
rept loop master patch
8 12500 0,00477900 0,00476000 (-0.4%)
25 4000 0,00442800 0,00353300 (-20.2%)
100 1000 0,00478700 0,00366100 (-23.5%)
500 200 0,00973200 0,00649400 (-33.3%)
2500 40 0,03165200 0,02226700 (-29.7%)
8000 12 0,09315900 0,06674900 (-28.3%)
xvabsdp:
rept loop master patch
8 12500 0,00475000 0,00474400 (-0.1%)
25 4000 0,00355600 0,00367500 (+3.3%)
100 1000 0,00444200 0,00366000 (-17.6%)
500 200 0,00942700 0,00732400 (-22.3%)
2500 40 0,02990000 0,02308500 (-22.8%)
8000 12 0,08770300 0,06683800 (-23.8%)
xvnabssp:
rept loop master patch
8 12500 0,00494500 0,00492900 (-0.3%)
25 4000 0,00397700 0,00338600 (-14.9%)
100 1000 0,00421400 0,00353500 (-16.1%)
500 200 0,01048000 0,00707100 (-32.5%)
2500 40 0,03251500 0,02238300 (-31.2%)
8000 12 0,08889100 0,06469800 (-27.2%)
xvnabsdp:
rept loop master patch
8 12500 0,00511000 0,00492700 (-3.6%)
25 4000 0,00398800 0,00381500 (-4.3%)
100 1000 0,00390500 0,00365900 (-6.3%)
500 200 0,00924800 0,00784600 (-15.2%)
2500 40 0,03138900 0,02391600 (-23.8%)
8000 12 0,09654200 0,05684600 (-41.1%)
xvnegsp:
rept loop master patch
8 12500 0,00493900 0,00452800 (-8.3%)
25 4000 0,00369100 0,00366800 (-0.6%)
100 1000 0,00371100 0,00380000 (+2.4%)
500 200 0,00991100 0,00652300 (-34.2%)
2500 40 0,03025800 0,02422300 (-19.9%)
8000 12 0,09251100 0,06457600 (-30.2%)
xvnegdp:
rept loop master patch
8 12500 0,00474900 0,00454400 (-4.3%)
25 4000 0,00353100 0,00325600 (-7.8%)
100 1000 0,00398600 0,00366800 (-8.0%)
500 200 0,01032300 0,00702400 (-32.0%)
2500 40 0,03125000 0,02422400 (-22.5%)
8000 12 0,09475100 0,06173000 (-34.9%)
This one to me seemed the opposite of the previous instructions, as it
looks like there was an improvement in the translation time (itself not
a surprise as operations were done twice before so there was the need to
translate twice as many TCGop)
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-9-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Moved VABSDUB, VABSDUH and VABSDUW to decodetree and use gvec to
translate them.
vabsdub:
rept loop master patch
8 12500 0,03601600 0,00688500 (-80.9%)
25 4000 0,03651000 0,00532100 (-85.4%)
100 1000 0,03666900 0,00595300 (-83.8%)
500 200 0,04305800 0,01244600 (-71.1%)
2500 40 0,06893300 0,04273700 (-38.0%)
8000 12 0,14633200 0,12660300 (-13.5%)
vabsduh:
rept loop master patch
8 12500 0,02172400 0,00687500 (-68.4%)
25 4000 0,02154100 0,00531500 (-75.3%)
100 1000 0,02235400 0,00596300 (-73.3%)
500 200 0,02827500 0,01245100 (-56.0%)
2500 40 0,05638400 0,04285500 (-24.0%)
8000 12 0,13166000 0,12641400 (-4.0%)
vabsduw:
rept loop master patch
8 12500 0,01646400 0,00688300 (-58.2%)
25 4000 0,01454500 0,00475500 (-67.3%)
100 1000 0,01545800 0,00511800 (-66.9%)
500 200 0,02168200 0,01114300 (-48.6%)
2500 40 0,04571300 0,04138800 (-9.5%)
8000 12 0,12209500 0,12178500 (-0.3%)
Same as VADDCUW and VSUBCUW, overall performance gain but it uses more
TCGop (4 before the patch, 6 after).
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-8-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Moved the instructions VAVGUB, VAVGUH, VAVGUW, VAVGSB, VAVGSH, VAVGSW,
to decodetree and use gvec with them. For these one the right shift
had to be made before the sum as to avoid an overflow, so add 1 at the
end if any of the entries had 1 in its LSB as to replicate the "+ 1"
before the shift described by the ISA.
vavgub:
rept loop master patch
8 12500 0,02616600 0,00754200 (-71.2%)
25 4000 0,02530000 0,00637700 (-74.8%)
100 1000 0,02604600 0,00790100 (-69.7%)
500 200 0,03189300 0,01838400 (-42.4%)
2500 40 0,06006900 0,06851000 (+14.1%)
8000 12 0,13941000 0,20548500 (+47.4%)
vavguh:
rept loop master patch
8 12500 0,01818200 0,00780600 (-57.1%)
25 4000 0,01789300 0,00641600 (-64.1%)
100 1000 0,01899100 0,00787200 (-58.5%)
500 200 0,02527200 0,01828400 (-27.7%)
2500 40 0,05361800 0,06773000 (+26.3%)
8000 12 0,12886600 0,20291400 (+57.5%)
vavguw:
rept loop master patch
8 12500 0,01423100 0,00776600 (-45.4%)
25 4000 0,01780800 0,00638600 (-64.1%)
100 1000 0,02085500 0,00787000 (-62.3%)
500 200 0,02737100 0,01828800 (-33.2%)
2500 40 0,05572600 0,06774200 (+21.6%)
8000 12 0,13101700 0,20311600 (+55.0%)
vavgsb:
rept loop master patch
8 12500 0,03006000 0,00788600 (-73.8%)
25 4000 0,02882200 0,00637800 (-77.9%)
100 1000 0,02958000 0,00791400 (-73.2%)
500 200 0,03548800 0,01860400 (-47.6%)
2500 40 0,06360000 0,06850800 (+7.7%)
8000 12 0,13816500 0,20550300 (+48.7%)
vavgsh:
rept loop master patch
8 12500 0,01965900 0,00776600 (-60.5%)
25 4000 0,01875400 0,00638700 (-65.9%)
100 1000 0,01952200 0,00786900 (-59.7%)
500 200 0,02562000 0,01760300 (-31.3%)
2500 40 0,05384300 0,06742800 (+25.2%)
8000 12 0,13240800 0,20330000 (+53.5%)
vavgsw:
rept loop master patch
8 12500 0,01407700 0,00775600 (-44.9%)
25 4000 0,01762300 0,00640000 (-63.7%)
100 1000 0,02046500 0,00788500 (-61.5%)
500 200 0,02745600 0,01843000 (-32.9%)
2500 40 0,05375500 0,06820500 (+26.9%)
8000 12 0,13068300 0,20304900 (+55.4%)
These results to me seems to indicate that with gvec the results have a
slower translation but faster execution.
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-7-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Moved VPRTYBW and VPRTYBD to use gvec and both of them and VPRTYBQ to
decodetree. VPRTYBW and VPRTYBD now also use .fni4 and .fni8,
respectively.
vprtybw:
rept loop master patch
8 12500 0,01198900 0,00703100 (-41.4%)
25 4000 0,01070100 0,00571400 (-46.6%)
100 1000 0,01123300 0,00678200 (-39.6%)
500 200 0,01601500 0,01535600 (-4.1%)
2500 40 0,03872900 0,05562100 (43.6%)
8000 12 0,10047000 0,16643000 (65.7%)
vprtybd:
rept loop master patch
8 12500 0,00757700 0,00788100 (4.0%)
25 4000 0,00652500 0,00669600 (2.6%)
100 1000 0,00714400 0,00825400 (15.5%)
500 200 0,01211000 0,01903700 (57.2%)
2500 40 0,03483800 0,07021200 (101.5%)
8000 12 0,09591800 0,21036200 (119.3%)
vprtybq:
rept loop master patch
8 12500 0,00675600 0,00667200 (-1.2%)
25 4000 0,00619400 0,00643200 (3.8%)
100 1000 0,00707100 0,00751100 (6.2%)
500 200 0,01199300 0,01342000 (11.9%)
2500 40 0,03490900 0,04092900 (17.2%)
8000 12 0,09588200 0,11465100 (19.6%)
I wasn't expecting such a performance lost in both VPRTYBD and VPRTYBQ,
I'm not sure if it's worth to move those instructions. Comparing the
assembly of the helper with the TCGop they are pretty similar, so
I'm not sure why vprtybd took so much more time.
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-6-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Moved the instructions VNEGW and VNEGD to decodetree and used gvec to
decode it.
vnegw:
rept loop master patch
8 12500 0,01053200 0,00548400 (-47.9%)
25 4000 0,01030500 0,00390000 (-62.2%)
100 1000 0,01096300 0,00395400 (-63.9%)
500 200 0,01472000 0,00712300 (-51.6%)
2500 40 0,03809000 0,02147700 (-43.6%)
8000 12 0,09957100 0,06202100 (-37.7%)
vnegd:
rept loop master patch
8 12500 0,00594600 0,00543800 (-8.5%)
25 4000 0,00575200 0,00396400 (-31.1%)
100 1000 0,00676100 0,00394800 (-41.6%)
500 200 0,01149300 0,00709400 (-38.3%)
2500 40 0,03441500 0,02169600 (-37.0%)
8000 12 0,09516900 0,06337000 (-33.4%)
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-5-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
This patch moves VADDCUW and VSUBCUW to decodtree with gvec using an
implementation based on the helper, with the main difference being
changing the -1 (aka all bits set to 1) result returned by cmp when
true to +1. It also implemented a .fni4 version of those instructions
and dropped the helper.
vaddcuw:
rept loop master patch
8 12500 0,01008200 0,00612400 (-39.3%)
25 4000 0,01091500 0,00471600 (-56.8%)
100 1000 0,01332500 0,00593700 (-55.4%)
500 200 0,01998500 0,01275700 (-36.2%)
2500 40 0,04704300 0,04364300 (-7.2%)
8000 12 0,10748200 0,11241000 (+4.6%)
vsubcuw:
rept loop master patch
8 12500 0,01226200 0,00571600 (-53.4%)
25 4000 0,01493500 0,00462100 (-69.1%)
100 1000 0,01522700 0,00455100 (-70.1%)
500 200 0,02384600 0,01133500 (-52.5%)
2500 40 0,04935200 0,03178100 (-35.6%)
8000 12 0,09039900 0,09440600 (+4.4%)
Overall there was a gain in performance, but the TCGop code was still
slightly bigger in the new version (it went from 4 to 5).
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-4-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
This patch moves VMHADDSHS and VMHRADDSHS to decodetree I couldn't find
a satisfactory implementation with TCG inline.
vmhaddshs:
rept loop master patch
8 12500 0,02983400 0,02648500 (-11.2%)
25 4000 0,02946000 0,02518000 (-14.5%)
100 1000 0,03104300 0,02638000 (-15.0%)
500 200 0,04002000 0,03502500 (-12.5%)
2500 40 0,08090100 0,07562200 (-6.5%)
8000 12 0,19242600 0,18626800 (-3.2%)
vmhraddshs:
rept loop master patch
8 12500 0,03078600 0,02851000 (-7.4%)
25 4000 0,02793200 0,02746900 (-1.7%)
100 1000 0,02886000 0,02839900 (-1.6%)
500 200 0,03714700 0,03799200 (+2.3%)
2500 40 0,07948000 0,07852200 (-1.2%)
8000 12 0,19049800 0,18813900 (-1.2%)
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-3-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
This patch moves VMLADDUHM to decodetree a creates a gvec implementation
using mul_vec and add_vec.
rept loop master patch
8 12500 0,01810500 0,00903100 (-50.1%)
25 4000 0,01739400 0,00747700 (-57.0%)
100 1000 0,01843600 0,00901400 (-51.1%)
500 200 0,02574600 0,01971000 (-23.4%)
2500 40 0,05921600 0,07121800 (+20.3%)
8000 12 0,15326700 0,21725200 (+41.7%)
The significant difference in performance when REPT is low and LOOP is
high I think is due to the fact that the new implementation has a higher
translation time, as when using a helper only 5 TCGop are used but with
the patch a total of 10 TCGop are needed (Power lacks a direct mul_vec
equivalent so this instruction is implemented with the help of 5 others,
vmuleu, vmulou, vmrgh, vmrgl and vpkum).
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221019125040.48028-2-lucas.araujo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20221006200654.725390-7-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20221006200654.725390-6-matheus.ferst@eldorado.org.br>
[danielhb: ppc32 build fix in trans_(MSGCLRP|MSGSNDP)]
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20221006200654.725390-5-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Vector instructions in general are not supposed to change the FI bit.
However, xvcmp* instructions are calling gen_helper_float_check_status,
which is leading to a cleared FI flag where it should be kept
unchanged.
As helper_float_check_status only affects inexact, overflow and
underflow, and the xvcmp* instructions don't change these flags, this
issue can be fixed by removing the call to helper_float_check_status.
By doing this, the FI bit in FPSCR will be preserved as expected.
Fixes: 00084a25adf ("target/ppc: introduce separate VSX_CMP macro for xvcmp* instructions")
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221005121551.27957-1-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
These two helpers are almost identical, differing only by the softfloat
operation it calls. Merge them into one using a macro.
Also, take this opportunity to capitalize the helper name as we moved
the instruction to decodetree in a previous patch.
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220905123746.54659-4-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220905123746.54659-3-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220905123746.54659-2-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Implementation for instructions hashstp and hashchkp, the privileged
versions of hashst and hashchk, which were added in Power ISA 3.1B.
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Lucas Mateus Castro <lucas.araujo@eldorado.org.br>
Message-Id: <20220715205439.161110-4-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Implementation for instructions hashst and hashchk, which were added
in Power ISA 3.1B.
It was decided to implement the hash algorithm from ground up in this
patch exactly as described in Power ISA.
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Lucas Mateus Castro <lucas.araujo@eldorado.org.br>
Message-Id: <20220715205439.161110-3-victor.colombo@eldorado.org.br>
[danielhb: fix block comment in excp_helper.c]
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-12-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-11-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-10-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-9-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-8-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-7-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-6-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-5-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-4-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Equivalent to CHK_SV and CHK_HV, but can be used in decodetree methods.
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-3-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
GEN_PRIV and related CHK_* macros just assumed that variable named
"ctx" would be in scope when they are used, and that it would be a
pointer to DisasContext. Change these macros to receive the pointer
explicitly.
Reviewed-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Signed-off-by: Lucas Coutinho <lucas.coutinho@eldorado.org.br>
Message-Id: <20220701133507.740619-2-lucas.coutinho@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
This initial version supports the invalidation of one or all
TLB entries. Flush by PID/LPID, or based in process/partition
scope is not supported, because it would make using the
generic QEMU TLB implementation hard. In these cases, all
entries are flushed.
Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220712193741.59134-3-leandro.lupori@eldorado.org.br>
[danielhb: moved 'set' declaration to TLBIE_RIC_PWC block]
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Also decode RIC, PRS and R operands.
Signed-off-by: Leandro Lupori <leandro.lupori@eldorado.org.br>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220712193741.59134-2-leandro.lupori@eldorado.org.br>
[danielhb: mark bit 31 in @X_tlbie pattern as ignored]
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Implements the Convert Declets To Binary Coded Decimal instruction.
Since libdecnumber doesn't expose the methods for direct conversion
(decDigitsFromDPD, DPD2BCD, etc), a positive decimal32 with zero
exponent is used as an intermediate value to convert the declets.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Message-Id: <20220629162904.105060-12-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Implements the Convert Binary Coded Decimal To Declets instruction.
Since libdecnumber doesn't expose the methods for direct conversion
(decDigitsToDPD, BCD2DPD, etc.), the BCD values are converted to
decimal32 format, from which the declets are extracted.
Where the behavior is undefined, we try to match the result observed in
a POWER9 DD2.3.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Message-Id: <20220629162904.105060-11-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Implements the following Power ISA v2.06 instruction:
addg6s: Add and Generate Sixes
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Message-Id: <20220629162904.105060-10-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20220629162904.105060-7-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20220629162904.105060-6-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20220629162904.105060-5-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20220629162904.105060-4-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Reviewed-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20220629162904.105060-3-victor.colombo@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
And also move the insn to decodetree and remove the now unused
avr_qw_not, avr_qw_cmpu, and avr_qw_add methods.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Message-Id: <20220606150037.338931-8-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
And also move the insns to decodetree.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Message-Id: <20220606150037.338931-7-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
And also move the insn to decodetree
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Message-Id: <20220606150037.338931-6-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
And also move the insn to decodetree.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Message-Id: <20220606150037.338931-5-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
And also move the insns to decodetree and remove the now unused
avr_qw_addc method.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Message-Id: <20220606150037.338931-4-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|
|
And also move the insn to decodetree.
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Message-Id: <20220606150037.338931-3-matheus.ferst@eldorado.org.br>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
|