aboutsummaryrefslogtreecommitdiff
path: root/target-i386/helper.h
AgeCommit message (Collapse)Author
2016-10-26target-i386: remove helper_lock()Emilio G. Cota
It's been superseded by the atomic helpers. The use of the atomic helpers provides a significant performance and scalability improvement. Below is the result of running the atomic_add-test microbenchmark with: $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n , where $n is the number of threads and $r is the allowed range for the additions. The scenarios measured are: - atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset) - cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper - master: before this patchset Results sorted in ascending range, i.e. descending degree of contention. Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64 Opteron 6376 cores. atomic_add-bench: 5000000 ops/thread, [0,1] range 25 ++---------+----------+---------+----------+----------+----------+---++ + atomic +-E--+ + + + + + | |cmpxchg +-H--+ | 20 +Emaster +-N--+ ++ || | |++ | || | 15 +++ ++ |N| | |+| | 10 ++| ++ |+|+ | | | -+E+------ +++ ---+E+------+E+------+E+-----+E+------+E| |+E+E+- +++ +E+------+E+-- | 5 ++|+ ++ |+N+H+--- +++ | ++++N+--+H++----+++ + +++ --++H+------+H+------+H++----+H+---+--- | 0 ++---------+-----H----+---H-----+----------+----------+----------+---H+ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,2] range 25 ++---------+----------+---------+----------+----------+----------+---++ ++atomic +-E--+ + + + + + | |cmpxchg +-H--+ | 20 ++master +-N--+ ++ |E| | |++ | ||E | 15 ++| ++ |N|| | |+|| ---+E+------+E+-----+E+------+E| 10 ++| | ---+E+------+E+-----+E+--- +++ +++ ||H+E+--+E+-- | |+++++ | | || | 5 ++|+H+-- +++ ++ |+N+ - ---+H+------+H+------ | + +N+--+H++----+H+---+--+H+----++H+--- + + +H+---+--+H| 0 ++---------+----------+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,8] range 40 ++---------+----------+---------+----------+----------+----------+---++ ++atomic +-E--+ + + + + + | 35 +cmpxchg +-H--+ ++ | master +-N--+ ---+E+------+E+------+E+-----+E+------+E| 30 ++| ---+E+-- +++ ++ | | -+E+--- | 25 ++E ---- +++ ++ |+++++ -+E+ | 20 +E+ E-- +++ ++ |H|+++ | |+| +H+------- | 15 ++H+ ---+++ +H+------ ++ |N++H+-- +++--- +H+------++| 10 ++ +++ - +++ ---+H+ +++ +H+ | | +H+-----+H+------+H+-- | 5 ++| +++ ++ ++N+N+--+N++ + + + + + | 0 ++---------+----------+---------+----------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,128] range 160 ++---------+---------+----------+---------+----------+----------+---++ + atomic +-E--+ + + + + + | 140 +cmpxchg +-H--+ +++ +++ ++ | master +-N--+ E--------E------+E+------++| 120 ++ --| | +++ E+ | -- +++ +++ ++| 100 ++ - ++ | +++- +++ ++| 80 ++ -+E+ -+H+------+H+------H--------++ | ---- ---- +++ H| | ---+E+-----+E+- ---+H+ ++| 60 ++ +E+--- +++ ---+H+--- ++ | --+++ ---+H+-- | 40 ++ +E+-+H+--- ++ | +H+ | 20 +EE+ ++ +N+ + + + + + + | 0 ++N-N---N--+---------+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads atomic_add-bench: 5000000 ops/thread, [0,1024] range 350 ++---------+---------+----------+---------+----------+----------+---++ + atomic +-E--+ + + + + + | 300 +cmpxchg +-H--+ +++ | master +-N--+ +++ || | +++ | ----E| 250 ++ | ----E---- ++ | ----E--- | ---+H| 200 ++ -+E+--- +++ ---+H+--- ++ | ---- -+H+-- | | +E+ +++ ---- +++ | 150 ++ ---+++ ---+H+- ++ | --- -+H+-- | 100 ++ ---+E+ ---- +++ ++ | +++ ---+E+-----+H+- | | -+E+------+H+-- | 50 ++ +E+ ++ +EE+ + + + + + + | 0 ++N-N---N--+---------+----------+---------+----------+----------+---++ 0 10 20 30 40 50 60 Number of threads hi-res: http://imgur.com/a/fMRmq For master I stopped measuring master after 8 threads, because there is little point in measuring the well-known performance collapse of a contended lock. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpersEmilio G. Cota
The diff here is uglier than necessary. All this does is to turn FOO into: if (s->prefix & PREFIX_LOCK) { BAR } else { FOO } where FOO is the original implementation of an unlocked cmpxchg. [rth: Adjust unlocked cmpxchg to use movcond instead of branches. Adjust helpers to use atomic helpers.] Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1467054136-10430-6-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-03-24target-i386: implement PKE for TCGPaolo Bonzini
Tested with kvm-unit-tests. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-15target-i386: Implement FSGSBASERichard Henderson
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-02-15target-i386: Clear bndregs during legacy near jumpsRichard Henderson
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-02-15target-i386: Implement BNDLDX, BNDSTXRichard Henderson
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-02-15target-i386: Implement BNDCL, BNDCU, BNDCNRichard Henderson
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-02-13target-i386: Perform set/reset_inhibit_irq inlineRichard Henderson
With helpers that can be reused for other things. Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-02-13target-i386: Implement XSAVEOPTRichard Henderson
Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-02-13target-i386: Add XSAVE extensionRichard Henderson
This includes XSAVE, XRSTOR, XGETBV, XSETBV, which are all related, as well as the associate cpuid bits. Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-02-13target-i386: Split fxsave/fxrstor implementationRichard Henderson
We will be able to reuse these pieces for XSAVE/XRSTOR. Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-02-09target-i386: Rewrite gen_enter inlineRichard Henderson
Use gen_lea_v_seg for centralized segment base knowledge. Unify code across 32- and 64-bit. Fix note about "must save state" before using the out-of-line helpers. Signed-off-by: Richard Henderson <rth@twiddle.net> Message-Id: <1450379966-28198-8-git-send-email-rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-23target-i386: Check CR4[DE] for processing DR4/DR5Richard Henderson
Introduce helper_get_dr so that we don't have to put CR4[DE] into the scarce HFLAGS resource. At the same time, rename helper_movl_drN_T0 to helper_set_dr and set the helper flags. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-10-23target-i386: Handle I/O breakpointsEduardo Habkost
Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2015-09-15target-i386: exception handling for seg_helper functionsPavel Dovgalyuk
This patch fixes exception handling for seg_helper functions. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-06-05target-i386: Use correct memory attributes for ioport accessesPaolo Bonzini
In order to do this, stop using the cpu_in*/out* helpers, and instead access address_space_io directly. cpu_in* and cpu_out* remain for usage in the monitor, in qtest, and in Xen. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-28tcg: Invert the inclusion of helper.hRichard Henderson
Rather than include helper.h with N values of GEN_HELPER, include a secondary file that sets up the macros to include helper.h. This minimizes the files that must be rebuilt when changing the macros for file N. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-11-21target-i386: yield to another VCPU on PAUSEPaolo Bonzini
After commit b1bbfe7 (aio / timers: On timer modification, qemu_notify or aio_notify, 2013-08-21) FreeBSD guests report a huge slowdown. The problem shows up as soon as FreeBSD turns out its periodic (~1 ms) tick, but the timers are only the trigger for a pre-existing problem. Before the offending patch, setting a timer did a timer_settime system call. After, setting the timer exits the event loop (which uses poll) and reenters it with a new deadline. This does not cause any slowdown; the difference is between one system call (timer_settime and a signal delivery (SIGALRM) before the patch, and two system calls afterwards (write to a pipe or eventfd + calling poll again when re-entering the event loop). Unfortunately, the exit/enter causes the main loop to grab the iothread lock, which in turns kicks the VCPU thread out of execution. This causes TCG to execute the next VCPU in its round-robin scheduling of VCPUS. When the second VCPU is mostly unused, FreeBSD runs a "pause" instruction in its idle loop which only burns cycles without any progress. As soon as the timer tick expires, the first VCPU runs the interrupt handler but very soon it sets it again---and QEMU then goes back doing nothing in the second VCPU. The fix is to make the pause instruction do "cpu_loop_exit". Cc: Richard Henderson <rth@twiddle.net> Reported-by: Luigi Rizzo <rizzo@iet.unipi.it> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-id: 1384948442-24217-1-git-send-email-pbonzini@redhat.com Signed-off-by: Anthony Liguori <aliguori@amazon.com>
2013-02-27target-i386: Use mulu2 and muls2Richard Henderson
These correspond very closely to the insns that we're emulating. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-19target-i386: Implement tzcnt and fix lzcntRichard Henderson
We weren't computing flags for lzcnt at all. At the same time, adjust the implementation of bsf/bsr to avoid the local branch, using movcond instead. Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-19target-i386: Use clz/ctz for bsf/bsr helpersRichard Henderson
And mark the helpers as NO_RWG_SE. Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18target-i386: Implement PDEP, PEXTRichard Henderson
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18target-i386: Implement MULXRichard Henderson
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18target-i386: Use CC_SRC2 for ADC and SBBRichard Henderson
Add another slot in ENV and store two of the three inputs. This lets us do less work when carry-out is not needed, and avoids the unpredictable CC_OP after translating these insns. Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-02-18target-i386: Make helper_cc_compute_{all,c} constRichard Henderson
Pass the data in explicitly, rather than indirectly via env. This avoids all sorts of unnecessary register spillage. Signed-off-by: Richard Henderson <rth@twiddle.net>
2012-12-19exec: move include files to include/exec/Paolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-10-28target-i386: rename helper flagsAurelien Jarno
Rename helper flags to the new ones. This is purely a mechanical change, it's possible to use better flags by looking at the helpers. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-01x86: Implement SMEP and SMAPH. Peter Anvin
This patch implements Supervisor Mode Execution Prevention (SMEP) and Supervisor Mode Access Prevention (SMAP) for x86. The purpose of the patch, obviously, is to help kernel developers debug the support for those features. A fair bit of the code relates to the handling of CPUID features. The CPUID code probably would get greatly simplified if all the feature bit words were unified into a single vector object, but in the interest of producing a minimal patch for SMEP/SMAP, and because I had very limited time for this project, I followed the existing style. [ v2: don't change the definition of the qemu64 CPU shorthand, since that breaks loading old snapshots. Per Anthony Liguori this can be fixed once the CPU feature set is snapshot. Change the coding style slightly to conform to checkpatch.pl. ] Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-08-14x86: switch to AREG0 free modeBlue Swirl
Add an explicit CPUX86State parameter instead of relying on AREG0. Remove temporary wrappers and switch to AREG0 free mode. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-14x86: avoid AREG0 in segmentation helpersBlue Swirl
Add an explicit CPUX86State parameter instead of relying on AREG0. Rename remains of op_helper.c to seg_helper.c. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-14x86: avoid AREG0 for misc helpersBlue Swirl
Add an explicit CPUX86State parameter instead of relying on AREG0. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-14x86: avoid AREG0 for SMM helpersBlue Swirl
Add an explicit CPUX86State parameter instead of relying on AREG0. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-14x86: avoid AREG0 for SVM helpersBlue Swirl
Add an explicit CPUX86State parameter instead of relying on AREG0. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-14x86: avoid AREG0 for integer helpersBlue Swirl
Add an explicit CPUX86State parameter instead of relying on AREG0. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-14x86: avoid AREG0 for condition code helpersBlue Swirl
Add an explicit CPUX86State parameter instead of relying on AREG0. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-08-14x86: avoid AREG0 for FPU helpersBlue Swirl
Make FPU helpers take a parameter for CPUState instead of relying on global env. Introduce temporary wrappers for FPU load and store ops. Remove wrappers for non-AREG0 code. Don't call unconverted helpers directly. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-06-28x86: avoid AREG0 for exceptionsBlue Swirl
Add an explicit CPUX86State parameter instead of relying on AREG0. Merge raise_exception_env() to raise_exception(), likewise with raise_exception_err_env() and raise_exception_err(). Introduce cpu_svm_check_intercept_param() and cpu_vmexit() as wrappers. Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-01-11target-i386: fix SSE rounding and flush to zeroAurelien Jarno
SSE rounding and flush to zero control has never been implemented. However given that softfloat-native was using a single state for FPU and SSE and given that glibc is setting both FPU and SSE state in fesetround(), this was working correctly up to the switch to softfloat. Fix that by adding an update_sse_status() function similar to update_fpu_status(), and callin git on write to mxcsr. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2009-10-23target-i386: implement lzcnt emulationAndre Przywara
lzcnt is a AMD Phenom/Barcelona added instruction returning the number of leading zero bits in a word. As this is similar to the "bsr" instruction, reuse the existing code. There need to be some more changes, though, as lzcnt always returns a valid value (in opposite to bsr, which has a special case when the operand is 0). lzcnt is guarded by the ABM CPUID bit (Fn8000_0001:ECX_5). Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2009-10-04target-i386: add RDTSCP supportAndre Przywara
RDTSCP reads the time stamp counter and atomically also the content of a 32-bit MSR, which can be freely set by the OS. This allows CPU local data to be queried by userspace. Linux uses this to allow a fast implementation of the getcpu() syscall, which uses the vsyscall page to avoid a context switch. AMD CPUs since K8RevF and Intel CPUs since Nehalem support this instruction. RDTSCP is guarded by the RDTSCP CPUID bit (Fn8000_0001:EDX[27]). Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2009-05-22x86: Add support for resume flagJan Kiszka
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
2008-11-17TCG variable type checking.pbrook
Signed-off-by: Paul Brook <paul@codesourcery.com> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5729 c046a42c-6fe2-441c-8c8c-71466251a162
2008-09-25SYSENTER/SYSEXIT IA-32e implementation (Alexander Graf).balrog
On Intel CPUs, sysenter and sysexit are valid in 64-bit mode. This patch makes both 64-bit aware and enables them for Intel CPUs. Add cpu save/load for 64-bit wide sysenter variables. Signed-off-by: Alexander Graf <agraf@suse.de> git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5318 c046a42c-6fe2-441c-8c8c-71466251a162
2008-08-30Fix some warnings that would be generated by gcc -Wredundant-declsblueswir1
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5115 c046a42c-6fe2-441c-8c8c-71466251a162
2008-06-18HLT, MWAIT and MONITOR insn fixes (initial patch by Alexander Graf)bellard
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4746 c046a42c-6fe2-441c-8c8c-71466251a162
2008-06-04reworked SVM interrupt handling logic - fixed vmrun EIP saved value - ↵bellard
reworked cr8 handling - added CPUState.hflags2 git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4662 c046a42c-6fe2-441c-8c8c-71466251a162
2008-06-0432 bit SVM fixes - INVLPG and INVLPGA updatesbellard
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4660 c046a42c-6fe2-441c-8c8c-71466251a162
2008-05-28SVM reworkbellard
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4605 c046a42c-6fe2-441c-8c8c-71466251a162
2008-05-22proper helper definition registering (all targets must do that)bellard
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4530 c046a42c-6fe2-441c-8c8c-71466251a162
2008-05-22cmpxchg8b fix - added cmpxchg16bbellard
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4522 c046a42c-6fe2-441c-8c8c-71466251a162