Age | Commit message (Collapse) | Author |
|
We've currently got 18 architectures in QEMU, and thus 18 target-xxx
folders in the root folder of the QEMU source tree. More architectures
(e.g. RISC-V, AVR) are likely to be included soon, too, so the main
folder of the QEMU sources slowly gets quite overcrowded with the
target-xxx folders.
To disburden the main folder a little bit, let's move the target-xxx
folders into a dedicated target/ folder, so that target-xxx/ simply
becomes target/xxx/ instead.
Acked-by: Laurent Vivier <laurent@vivier.eu> [m68k part]
Acked-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> [tricore part]
Acked-by: Michael Walle <michael@walle.cc> [lm32 part]
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> [s390x part]
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [s390x part]
Acked-by: Eduardo Habkost <ehabkost@redhat.com> [i386 part]
Acked-by: Artyom Tarasenko <atar4qemu@gmail.com> [sparc part]
Acked-by: Richard Henderson <rth@twiddle.net> [alpha part]
Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa part]
Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [ppc part]
Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> [crisµblaze part]
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn> [unicore32 part]
Signed-off-by: Thomas Huth <thuth@redhat.com>
|
|
It's been superseded by the atomic helpers.
The use of the atomic helpers provides a significant performance and scalability
improvement. Below is the result of running the atomic_add-test microbenchmark with:
$ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
, where $n is the number of threads and $r is the allowed range for the additions.
The scenarios measured are:
- atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
- cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
- master: before this patchset
Results sorted in ascending range, i.e. descending degree of contention.
Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
Opteron 6376 cores.
atomic_add-bench: 5000000 ops/thread, [0,1] range
25 ++---------+----------+---------+----------+----------+----------+---++
+ atomic +-E--+ + + + + + |
|cmpxchg +-H--+ |
20 +Emaster +-N--+ ++
|| |
|++ |
|| |
15 +++ ++
|N| |
|+| |
10 ++| ++
|+|+ |
| | -+E+------ +++ ---+E+------+E+------+E+-----+E+------+E|
|+E+E+- +++ +E+------+E+-- |
5 ++|+ ++
|+N+H+--- +++ |
++++N+--+H++----+++ + +++ --++H+------+H+------+H++----+H+---+--- |
0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
0 10 20 30 40 50 60
Number of threads
atomic_add-bench: 5000000 ops/thread, [0,2] range
25 ++---------+----------+---------+----------+----------+----------+---++
++atomic +-E--+ + + + + + |
|cmpxchg +-H--+ |
20 ++master +-N--+ ++
|E| |
|++ |
||E |
15 ++| ++
|N|| |
|+|| ---+E+------+E+-----+E+------+E|
10 ++| | ---+E+------+E+-----+E+--- +++ +++
||H+E+--+E+-- |
|+++++ |
| || |
5 ++|+H+-- +++ ++
|+N+ - ---+H+------+H+------ |
+ +N+--+H++----+H+---+--+H+----++H+--- + + +H+---+--+H|
0 ++---------+----------+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads
atomic_add-bench: 5000000 ops/thread, [0,8] range
40 ++---------+----------+---------+----------+----------+----------+---++
++atomic +-E--+ + + + + + |
35 +cmpxchg +-H--+ ++
| master +-N--+ ---+E+------+E+------+E+-----+E+------+E|
30 ++| ---+E+-- +++ ++
| | -+E+--- |
25 ++E ---- +++ ++
|+++++ -+E+ |
20 +E+ E-- +++ ++
|H|+++ |
|+| +H+------- |
15 ++H+ ---+++ +H+------ ++
|N++H+-- +++--- +H+------++|
10 ++ +++ - +++ ---+H+ +++ +H+
| | +H+-----+H+------+H+-- |
5 ++| +++ ++
++N+N+--+N++ + + + + + |
0 ++---------+----------+---------+----------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads
atomic_add-bench: 5000000 ops/thread, [0,128] range
160 ++---------+---------+----------+---------+----------+----------+---++
+ atomic +-E--+ + + + + + |
140 +cmpxchg +-H--+ +++ +++ ++
| master +-N--+ E--------E------+E+------++|
120 ++ --| | +++ E+
| -- +++ +++ ++|
100 ++ - ++
| +++- +++ ++|
80 ++ -+E+ -+H+------+H+------H--------++
| ---- ---- +++ H|
| ---+E+-----+E+- ---+H+ ++|
60 ++ +E+--- +++ ---+H+--- ++
| --+++ ---+H+-- |
40 ++ +E+-+H+--- ++
| +H+ |
20 +EE+ ++
+N+ + + + + + + |
0 ++N-N---N--+---------+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads
atomic_add-bench: 5000000 ops/thread, [0,1024] range
350 ++---------+---------+----------+---------+----------+----------+---++
+ atomic +-E--+ + + + + + |
300 +cmpxchg +-H--+ +++
| master +-N--+ +++ ||
| +++ | ----E|
250 ++ | ----E---- ++
| ----E--- | ---+H|
200 ++ -+E+--- +++ ---+H+--- ++
| ---- -+H+-- |
| +E+ +++ ---- +++ |
150 ++ ---+++ ---+H+- ++
| --- -+H+-- |
100 ++ ---+E+ ---- +++ ++
| +++ ---+E+-----+H+- |
| -+E+------+H+-- |
50 ++ +E+ ++
+EE+ + + + + + + |
0 ++N-N---N--+---------+----------+---------+----------+----------+---++
0 10 20 30 40 50 60
Number of threads
hi-res: http://imgur.com/a/fMRmq
For master I stopped measuring master after 8 threads, because there is little
point in measuring the well-known performance collapse of a contended lock.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
The diff here is uglier than necessary. All this does is to turn
FOO
into:
if (s->prefix & PREFIX_LOCK) {
BAR
} else {
FOO
}
where FOO is the original implementation of an unlocked cmpxchg.
[rth: Adjust unlocked cmpxchg to use movcond instead of branches.
Adjust helpers to use atomic helpers.]
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-6-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Tested with kvm-unit-tests.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
With helpers that can be reused for other things.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
This includes XSAVE, XRSTOR, XGETBV, XSETBV, which are all related,
as well as the associate cpuid bits.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
We will be able to reuse these pieces for XSAVE/XRSTOR.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Use gen_lea_v_seg for centralized segment base knowledge. Unify
code across 32- and 64-bit. Fix note about "must save state"
before using the out-of-line helpers.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1450379966-28198-8-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Introduce helper_get_dr so that we don't have to put CR4[DE]
into the scarce HFLAGS resource. At the same time, rename
helper_movl_drN_T0 to helper_set_dr and set the helper flags.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
|
|
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
|
|
This patch fixes exception handling for seg_helper functions.
Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
In order to do this, stop using the cpu_in*/out* helpers, and instead
access address_space_io directly.
cpu_in* and cpu_out* remain for usage in the monitor, in qtest, and
in Xen.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Rather than include helper.h with N values of GEN_HELPER, include a
secondary file that sets up the macros to include helper.h. This
minimizes the files that must be rebuilt when changing the macros
for file N.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
After commit b1bbfe7 (aio / timers: On timer modification, qemu_notify
or aio_notify, 2013-08-21) FreeBSD guests report a huge slowdown.
The problem shows up as soon as FreeBSD turns out its periodic (~1 ms)
tick, but the timers are only the trigger for a pre-existing problem.
Before the offending patch, setting a timer did a timer_settime system call.
After, setting the timer exits the event loop (which uses poll) and
reenters it with a new deadline. This does not cause any slowdown; the
difference is between one system call (timer_settime and a signal
delivery (SIGALRM) before the patch, and two system calls afterwards
(write to a pipe or eventfd + calling poll again when re-entering the
event loop).
Unfortunately, the exit/enter causes the main loop to grab the iothread
lock, which in turns kicks the VCPU thread out of execution. This
causes TCG to execute the next VCPU in its round-robin scheduling of
VCPUS. When the second VCPU is mostly unused, FreeBSD runs a "pause"
instruction in its idle loop which only burns cycles without any
progress. As soon as the timer tick expires, the first VCPU runs
the interrupt handler but very soon it sets it again---and QEMU
then goes back doing nothing in the second VCPU.
The fix is to make the pause instruction do "cpu_loop_exit".
Cc: Richard Henderson <rth@twiddle.net>
Reported-by: Luigi Rizzo <rizzo@iet.unipi.it>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 1384948442-24217-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
|
|
These correspond very closely to the insns that we're emulating.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
We weren't computing flags for lzcnt at all. At the same time,
adjust the implementation of bsf/bsr to avoid the local branch,
using movcond instead.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
And mark the helpers as NO_RWG_SE.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Add another slot in ENV and store two of the three inputs. This lets us
do less work when carry-out is not needed, and avoids the unpredictable
CC_OP after translating these insns.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Pass the data in explicitly, rather than indirectly via env.
This avoids all sorts of unnecessary register spillage.
Signed-off-by: Richard Henderson <rth@twiddle.net>
|
|
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Rename helper flags to the new ones. This is purely a mechanical change,
it's possible to use better flags by looking at the helpers.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
|
|
This patch implements Supervisor Mode Execution Prevention (SMEP) and
Supervisor Mode Access Prevention (SMAP) for x86. The purpose of the
patch, obviously, is to help kernel developers debug the support for
those features.
A fair bit of the code relates to the handling of CPUID features. The
CPUID code probably would get greatly simplified if all the feature
bit words were unified into a single vector object, but in the
interest of producing a minimal patch for SMEP/SMAP, and because I had
very limited time for this project, I followed the existing style.
[ v2: don't change the definition of the qemu64 CPU shorthand, since
that breaks loading old snapshots. Per Anthony Liguori this can be
fixed once the CPU feature set is snapshot.
Change the coding style slightly to conform to checkpatch.pl. ]
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
|
|
Add an explicit CPUX86State parameter instead of relying on AREG0.
Remove temporary wrappers and switch to AREG0 free mode.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
Add an explicit CPUX86State parameter instead of relying on AREG0.
Rename remains of op_helper.c to seg_helper.c.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
Add an explicit CPUX86State parameter instead of relying on AREG0.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
Add an explicit CPUX86State parameter instead of relying on AREG0.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
Add an explicit CPUX86State parameter instead of relying on AREG0.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
Add an explicit CPUX86State parameter instead of relying on AREG0.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
Add an explicit CPUX86State parameter instead of relying on AREG0.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
Make FPU helpers take a parameter for CPUState instead
of relying on global env.
Introduce temporary wrappers for FPU load and store ops. Remove
wrappers for non-AREG0 code. Don't call unconverted helpers
directly.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
Add an explicit CPUX86State parameter instead of relying on AREG0.
Merge raise_exception_env() to raise_exception(), likewise with
raise_exception_err_env() and raise_exception_err().
Introduce cpu_svm_check_intercept_param() and cpu_vmexit()
as wrappers.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
SSE rounding and flush to zero control has never been implemented. However
given that softfloat-native was using a single state for FPU and SSE and
given that glibc is setting both FPU and SSE state in fesetround(), this
was working correctly up to the switch to softfloat.
Fix that by adding an update_sse_status() function similar to
update_fpu_status(), and callin git on write to mxcsr.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
|
|
lzcnt is a AMD Phenom/Barcelona added instruction returning the
number of leading zero bits in a word.
As this is similar to the "bsr" instruction, reuse the existing
code. There need to be some more changes, though, as lzcnt always
returns a valid value (in opposite to bsr, which has a special
case when the operand is 0).
lzcnt is guarded by the ABM CPUID bit (Fn8000_0001:ECX_5).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
|
|
RDTSCP reads the time stamp counter and atomically also the content
of a 32-bit MSR, which can be freely set by the OS. This allows CPU
local data to be queried by userspace.
Linux uses this to allow a fast implementation of the getcpu()
syscall, which uses the vsyscall page to avoid a context switch.
AMD CPUs since K8RevF and Intel CPUs since Nehalem support this
instruction.
RDTSCP is guarded by the RDTSCP CPUID bit (Fn8000_0001:EDX[27]).
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
|
|
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
|
|
Signed-off-by: Paul Brook <paul@codesourcery.com>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5729 c046a42c-6fe2-441c-8c8c-71466251a162
|
|
On Intel CPUs, sysenter and sysexit are valid in 64-bit mode. This patch
makes both 64-bit aware and enables them for Intel CPUs.
Add cpu save/load for 64-bit wide sysenter variables.
Signed-off-by: Alexander Graf <agraf@suse.de>
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5318 c046a42c-6fe2-441c-8c8c-71466251a162
|
|
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@5115 c046a42c-6fe2-441c-8c8c-71466251a162
|
|
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4746 c046a42c-6fe2-441c-8c8c-71466251a162
|
|
reworked cr8 handling - added CPUState.hflags2
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4662 c046a42c-6fe2-441c-8c8c-71466251a162
|
|
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4660 c046a42c-6fe2-441c-8c8c-71466251a162
|
|
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4605 c046a42c-6fe2-441c-8c8c-71466251a162
|
|
git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4530 c046a42c-6fe2-441c-8c8c-71466251a162
|