Age | Commit message (Collapse) | Author |
|
qemu_log_lock() now returns a handle and qemu_log_unlock() receives a
handle to unlock. This allows for changing the handle during logging
and ensures the lock() and unlock() are for the same file.
Also in target/tilegx/translate.c removed the qemu_log_lock()/unlock()
calls (and the log("\n")), since the translator can longjmp out of the
loop if it attempts to translate an instruction in an inaccessible page.
Signed-off-by: Robert Foley <robert.foley@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20191118211528.3221-5-robert.foley@linaro.org>
|
|
'remotes/stsquad/tags/pull-tcg-plugins-281019-4' into staging
TCG Plugins initial implementation
- use --enable-plugins @ configure
- low impact introspection (-plugin empty.so to measure overhead)
- plugins cannot alter guest state
- example plugins included in source tree (tests/plugins)
- -d plugin to enable plugin output in logs
- check-tcg runs extra tests when plugins enabled
- documentation in docs/devel/plugins.rst
# gpg: Signature made Mon 28 Oct 2019 15:13:23 GMT
# gpg: using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" [full]
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* remotes/stsquad/tags/pull-tcg-plugins-281019-4: (57 commits)
travis.yml: enable linux-gcc-debug-tcg cache
MAINTAINERS: add me for the TCG plugins code
scripts/checkpatch.pl: don't complain about (foo, /* empty */)
.travis.yml: add --enable-plugins tests
include/exec: wrap cpu_ldst.h in CONFIG_TCG
accel/stubs: reduce headers from tcg-stub
tests/plugin: add hotpages to analyse memory access patterns
tests/plugin: add instruction execution breakdown
tests/plugin: add a hotblocks plugin
tests/tcg: enable plugin testing
tests/tcg: drop test-i386-fprem from TESTS when not SLOW
tests/tcg: move "virtual" tests to EXTRA_TESTS
tests/tcg: set QEMU_OPTS for all cris runs
tests/tcg/Makefile.target: fix path to config-host.mak
tests/plugin: add sample plugins
linux-user: support -plugin option
vl: support -plugin option
plugin: add qemu_plugin_outs helper
plugin: add qemu_plugin_insn_disas helper
plugin: expand the plugin_init function to include an info block
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
|
|
Plugins might allocate per-TB data that then they get passed each
time a TB is executed (via the *userdata pointer).
Notify plugin code every time a code cache flush occurs, so
that plugins can then reclaim the memory of the per-TB data.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
|
|
tb_flush will be called by the plugin module from a safe
work environment. Prepare for that.
Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Since 2ac01d6dafab, this function does only two things: assert a
lock is held, and call tcg_tb_alloc. It is used exactly once,
and its user has already done the assert.
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Clement Deschamps <clement.deschamps@greensocs.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This fixes a segmentation fault in icount mode when executing
from an IO region.
TB is marked as CF_NOCACHE but tb->orig_tb is not initialized
(equals previous value in code_gen_buffer).
The issue happens in cpu_io_recompile() when it tries to invalidate orig_tb.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Clement Deschamps <clement.deschamps@greensocs.com>
Message-Id: <20191022140016.918371-1-clement.deschamps@greensocs.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Fixes the previous TLB_WATCHPOINT patches because we are currently
failing to set cpu->mem_io_pc with the call to cpu_check_watchpoint.
Pass down the retaddr directly because it's readily available.
Fixes: 50b107c5d61
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Rather than rely on cpu->mem_io_pc, pass retaddr down directly.
Within tb_invalidate_phys_page_range__locked, the is_cpu_write_access
parameter is non-zero exactly when retaddr would be non-zero, so that
is a simple replacement.
Recognize that current_tb_not_found is true only when mem_io_pc
(and now retaddr) are also non-zero, so remove a redundant test.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
All callers pass false to this argument. Remove it and pass the
constant on to tb_invalidate_phys_page_range__locked.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
No header includes qemu-common.h after this commit, as prescribed by
qemu-common.h's file comment.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-5-armbru@redhat.com>
[Rebased with conflicts resolved automatically, except for
include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c
block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c
target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h
target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h
target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h
target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and
net/tap-bsd.c fixed up]
|
|
Other accelerators have their own headers: sysemu/hax.h, sysemu/hvf.h,
sysemu/kvm.h, sysemu/whpx.h. Only tcg_enabled() & friends sit in
qemu-common.h. This necessitates inclusion of qemu-common.h into
headers, which is against the rules spelled out in qemu-common.h's
file comment.
Move tcg_enabled() & friends into their own header sysemu/tcg.h, and
adjust #include directives.
Cc: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190523143508.25387-2-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[Rebased with conflicts resolved automatically, except for
accel/tcg/tcg-all.c]
|
|
Amusingly, we had already ignored the comment to keep this value
at the end of CPUState. This restores the minimum negative offset
from TCG_AREG0 for code generation.
For the couple of uses within qom/cpu.c, without NEED_CPU_H, add
a pointer from the CPUState object to the IcountDecr object within
CPUNegativeOffsetState.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Now that we have both ArchCPU and CPUArchState, we can define
this generically instead of via macro in each target's cpu.h.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
If a TB generates too much code, try again with fewer insns.
Fixes: https://bugs.launchpad.net/bugs/1824853
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
In order to handle TB's that translate to too much code, we
need to place the control of the length of the translation
in the hands of the code gen master loop.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
dump_exec_info() takes an fprintf()-like callback and a FILE * to pass
to it.
Its only caller hmp_info_jit() passes monitor_fprintf() and the
current monitor cast to FILE *. monitor_fprintf() casts it right
back, and is otherwise identical to monitor_printf(). The
type-punning is ugly.
Drop the callback, and call qemu_printf() instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-5-armbru@redhat.com>
|
|
dump_opcount_info() takes an fprintf()-like callback and a FILE * to
pass to it.
Its only caller hmp_info_opcount() passes monitor_fprintf() and the
current monitor cast to FILE *. monitor_fprintf() casts it right
back, and is otherwise identical to monitor_printf(). The
type-punning is ugly.
Drop the callback, and call qemu_printf() instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-4-armbru@redhat.com>
|
|
It's either "GNU *Library* General Public version 2" or "GNU Lesser
General Public version *2.1*", but there was no "version 2.0" of the
"Lesser" library. So assume that version 2.1 is meant here.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1548252536-6242-5-git-send-email-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
|
|
Include the cluster number in the hash we use to look
up TBs. This is important because a TB that is valid
for one cluster at a given physical address and set
of CPU flags is not necessarily valid for another:
the two clusters may have different views of physical
memory, or may have different CPU features (eg FPU
present or absent).
We put the cluster number in the high 8 bits of the
TB cflags. This gives us up to 256 clusters, which should
be enough for anybody. If we ever need more, or need
more bits in cflags for other purposes, we could make
tb_hash_func() take more data (and expand qemu_xxhash7()
to qemu_xxhash8()).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 20190121152218.9592-4-peter.maydell@linaro.org
|
|
osdep.h will also define the available Windows API version for QEMU.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20181122110039.15972-2-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Our only statistic so far was "full" tlb flushes, where all mmu_idx
are flushed at the same time.
Now count "partial" tlb flushes where sets of mmu_idx are flushed,
but the set is not maximal. Account one per mmu_idx flushed, as
that is the unit of work performed.
We don't actually count elided flushes yet, but go ahead and change
the interface presented to the monitor all at once.
Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Consistently access u16.high with atomics to avoid
undefined behaviour in MTTCG.
Note that icount_decr.u16.low is only used in icount mode,
so regular accesses to it are OK.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181010144853.13005-2-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
The global cpu_single_env variable has been removed more than 5 years
ago, so apparently nobody used this dead debug code in that timeframe
anymore. Thus let's remove it completely now.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1537204134-15905-1-git-send-email-thuth@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Accessing the HT from an iterator results almost always
in a deadlock. Given that only one qht-internal function
uses this argument, drop it from the interface.
Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
If get_page_addr_code() returns -1, this indicates that there is no RAM
page we can read a full TB from. Instead we must create a TB which
contains a single instruction and which we do not cache, so it is
executed only once.
Since this means we can now have TBs which are not in any page list,
we also need to make tb_phys_invalidate() handle them (by not trying
to remove them from a nonexistent page list).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20180710160013.26559-5-peter.maydell@linaro.org
|
|
When we support execution from non-RAM MMIO regions, get_page_addr_code()
will return -1 to indicate that there is no RAM at the requested address.
Handle this in tb_check_watchpoint() -- if the exception happened for a
PC which doesn't correspond to RAM then there is no need to invalidate
any TBs, because the one-instruction TB will not have been cached.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20180710160013.26559-4-peter.maydell@linaro.org
|
|
The typo was found by codespell.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-Id: <20180712194454.26765-1-sw@weilnetz.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This fixes a record-replay regression introduced by 95590e2
("translate-all: discard TB when tb_link_page returns an existing
matching TB", 2018-06-15). The problem is that code using CF_NOCACHE
assumes that the TB returned from tb_gen_code is always a
newly-generated one. This assumption, however, was broken in
the aforementioned commit.
Fix it by honouring CF_NOCACHE, so that tb_gen_code always
returns a newly-generated TB when CF_NOCACHE is passed to it.
Do this by avoiding the TB hash table if CF_NOCACHE is set.
Reported-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Tested-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 1530806837-5416-1-git-send-email-cota@braap.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
|
|
Commit 0b5c91f ("translate-all: use per-page locking in !user-mode",
2018-06-15) introduced per-page locking. It assumed that the physical
pages corresponding to a TB (at most two pages) are always distinct,
which is wrong. For instance, an xtensa test provided by Max Filippov
is broken by the commit, since the test maps two virtual pages
to the same physical page:
virt1: 7fff, virt2: 8000
phys1 6000fff, phys2 6000000
Fix it by removing the assumption from page_lock_pair.
If the two physical page addresses are equal, we only lock
the PageDesc once. Note that the two callers of page_lock_pair,
namely page_unlock_tb and tb_link_page, are also updated so that
we do not try to unlock the same PageDesc twice.
Fixes: 0b5c91f74f3c83a36f37740969df8c775c997e69
Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Tested-by: Max Filippov <jcmvbkbc@gmail.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1529944302-14186-1-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
* "info mtree" improvements (Alexey)
* fake VPD block limits for SCSI passthrough (Daniel Barboza)
* chardev and main loop fixes (Daniel Berrangé, Sergio, Stefan)
* help fixes (Eduardo)
* pc-dimm refactoring (David)
* tests improvements and fixes (Emilio, Thomas)
* SVM emulation fixes (Jan)
* MemoryRegionCache fix (Eric)
* WHPX improvements (Justin)
* ESP cleanup (Mark)
* -overcommit option (Michael)
* qemu-pr-helper fixes (me)
* "info pic" improvements for x86 (Peter)
* x86 TCG emulation fixes (Richard)
* KVM slot handling fix (Shannon)
* Next round of deprecation (Thomas)
* Windows dump format support (Viktor)
# gpg: Signature made Fri 29 Jun 2018 12:03:05 BST
# gpg: using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream: (60 commits)
tests/boot-serial: Do not delete the output file in case of errors
hw/scsi: add VPD Block Limits emulation
hw/scsi: centralize SG_IO calls into single function
hw/scsi: cleanups before VPD BL emulation
dump: add Windows live system dump
dump: add fallback KDBG using in Windows dump
dump: use system context in Windows dump
dump: add Windows dump format to dump-guest-memory
i386/cpu: make -cpu host support monitor/mwait
kvm: support -overcommit cpu-pm=on|off
hmp: obsolete "info ioapic"
ioapic: support "info irq"
ioapic: some proper indents when dump info
ioapic: support "info pic"
doc: another fix to "info pic"
target-i386: Mark cpu_vmexit noreturn
target-i386: Allow interrupt injection after STGI
target-i386: Add NMI interception to SVM
memory/hmp: Print owners/parents in "info mtree"
WHPX: register for unrecognized MSR exits
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
|
|
Place them in exec.c, exec-all.h and ram_addr.h. This removes
knowledge of translate-all.h (which is an internal header) from
several files outside accel/tcg and removes knowledge of
AddressSpace from translate-all.c (as it only operates on ram_addr_t).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Determining the size of a field is useful when you don't have a struct
variable handy. Open-coding this is ugly.
This patch adds the sizeof_field() macro, which is similar to
typeof_field(). Existing instances are updated to use the macro.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180614164431.29305-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
|
|
Use mmap_lock in user-mode to protect TCG state and the page descriptors.
In !user-mode, each vCPU has its own TCG state, so no locks needed.
Per-page locks are used to protect the page descriptors.
Per-TB locks are used in both modes to protect TB jumps.
Some notes:
- tb_lock is removed from notdirty_mem_write by passing a
locked page_collection to tb_invalidate_phys_page_fast.
- tcg_tb_lookup/remove/insert/etc have their own internal lock(s),
so there is no need to further serialize access to them.
- do_tb_flush is run in a safe async context, meaning no other
vCPU threads are running. Therefore acquiring mmap_lock there
is just to please tools such as thread sanitizer.
- Not visible in the diff, but tb_invalidate_phys_page already
has an assert_memory_lock.
- cpu_io_recompile is !user-only, so no mmap_lock there.
- Added mmap_unlock()'s before all siglongjmp's that could
be called in user-mode while mmap_lock is held.
+ Added an assert for !have_mmap_lock() after returning from
the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic.
Performance numbers before/after:
Host: AMD Opteron(tm) Processor 6376
ubuntu 17.04 ppc64 bootup+shutdown time
700 +-+--+----+------+------------+-----------+------------*--+-+
| + + + + + *B |
| before ***B*** ** * |
|tb lock removal ###D### *** |
600 +-+ *** +-+
| ** # |
| *B* #D |
| *** * ## |
500 +-+ *** ### +-+
| * *** ### |
| *B* # ## |
| ** * #D# |
400 +-+ ** ## +-+
| ** ### |
| ** ## |
| ** # ## |
300 +-+ * B* #D# +-+
| B *** ### |
| * ** #### |
| * *** ### |
200 +-+ B *B #D# +-+
| #B* * ## # |
| #* ## |
| + D##D# + + + + |
100 +-+--+----+------+------------+-----------+------------+--+-+
1 8 16 Guest CPUs 48 64
png: https://imgur.com/HwmBHXe
debian jessie aarch64 bootup+shutdown time
90 +-+--+-----+-----+------------+------------+------------+--+-+
| + + + + + + |
| before ***B*** B |
80 +tb lock removal ###D### **D +-+
| **### |
| **## |
70 +-+ ** # +-+
| ** ## |
| ** # |
60 +-+ *B ## +-+
| ** ## |
| *** #D |
50 +-+ *** ## +-+
| * ** ### |
| **B* ### |
40 +-+ **** # ## +-+
| **** #D# |
| ***B** ### |
30 +-+ B***B** #### +-+
| B * * # ### |
| B ###D# |
20 +-+ D ##D## +-+
| D# |
| + + + + + + |
10 +-+--+-----+-----+------------+------------+------------+--+-+
1 8 16 Guest CPUs 48 64
png: https://imgur.com/iGpGFtv
The gains are high for 4-8 CPUs. Beyond that point, however, unrelated
lock contention significantly hurts scalability.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
tb_lock was needed when the function did retranslation. However,
since fca8a500d519 ("tcg: Save insn data and use it in
cpu_restore_state_from_tb") we don't do retranslation.
Get rid of the comment.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This applies to both user-mode and !user-mode emulation.
Instead of relying on a global lock, protect the list of incoming
jumps with tb->jmp_lock. This lock also protects tb->cflags,
so update all tb->cflags readers outside tb->jmp_lock to use
atomic reads via tb_cflags().
In order to find the destination TB (and therefore its jmp_lock)
from the origin TB, we introduce tb->jmp_dest[].
I considered not using a linked list of jumps, which simplifies
code and makes the struct smaller. However, it unnecessarily increases
memory usage, which results in a performance decrease. See for
instance these numbers booting+shutting down debian-arm:
Time (s) Rel. err (%) Abs. err (s) Rel. slowdown (%)
------------------------------------------------------------------------------
before 20.88 0.74 0.154512 0.
after 20.81 0.38 0.079078 -0.33524904
GTree 21.02 0.28 0.058856 0.67049808
GHashTable + xxhash 21.63 1.08 0.233604 3.5919540
Using a hash table or a binary tree to keep track of the jumps
doesn't really pay off, not only due to the increased memory usage,
but also because most TBs have only 0 or 1 jumps to them. The maximum
number of jumps when booting debian-arm that I measured is 35, but
as we can see in the histogram below a TB with that many incoming jumps
is extremely rare; the average TB has 0.80 incoming jumps.
n_jumps: 379208; avg jumps/tb: 0.801099
dist: [0.0,1.0)|▄█▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁ ▁▁▁ ▁▁▁ ▁|[34.0,35.0]
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Use the recently-gained QHT feature of returning the matching TB if it
already exists. This allows us to get rid of the lookup we perform
right after acquiring tb_lock.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
The appended adds assertions to make sure we do not longjmp with page
locks held. Note that user-mode has nothing to check, since page_locks
are !user-mode only.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This is only compiled under CONFIG_DEBUG_TCG to avoid
bloating the binary.
In user-mode, assert_page_locked is equivalent to assert_mmap_lock.
Note: There are some tb_lock assertions left that will be
removed by later patches.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Groundwork for supporting parallel TCG generation.
Instead of using a global lock (tb_lock) to protect changes
to pages, use fine-grained, per-page locks in !user-mode.
User-mode stays with mmap_lock.
Sometimes changes need to happen atomically on more than one
page (e.g. when a TB that spans across two pages is
added/invalidated, or when a range of pages is invalidated).
We therefore introduce struct page_collection, which helps
us keep track of a set of pages that have been locked in
the appropriate locking order (i.e. by ascending page index).
This commit first introduces the structs and the function helpers,
to then convert the calling code to use per-page locking. Note
that tb_lock is not removed yet.
While at it, rename tb_alloc_page to tb_page_add, which pairs with
tb_page_remove.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This greatly simplifies next commit's diff.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
So that we pass a same-page range to tb_invalidate_phys_page_range,
instead of always passing an end address that could be on a different
page.
As discussed with Peter Maydell on the list [1], tb_invalidate_phys_page_range
doesn't actually do much with 'end', which explains why we have never
hit a bug despite going against what the comment on top of
tb_invalidate_phys_page_range requires:
> * Invalidate all TBs which intersect with the target physical address range
> * [start;end[. NOTE: start and end must refer to the *same* physical page.
The appended honours the comment, which avoids confusion.
While at it, rework the loop into a for loop, which is less error prone
(e.g. "continue" won't result in an infinite loop).
[1] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg09165.html
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Groundwork for supporting parallel TCG generation.
Move the hole to the end of the struct, so that a u32
field can be added there without bloating the struct.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Groundwork for supporting parallel TCG generation.
We never remove entries from the radix tree, so we can use cmpxchg
to implement lockless insertions.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This commit does several things, but to avoid churn I merged them all
into the same commit. To wit:
- Use uintptr_t instead of TranslationBlock * for the list of TBs in a page.
Just like we did in (c37e6d7e "tcg: Use uintptr_t type for
jmp_list_{next|first} fields of TB"), the rationale is the same: these
are tagged pointers, not pointers. So use a more appropriate type.
- Only check the least significant bit of the tagged pointers. Masking
with 3/~3 is unnecessary and confusing.
- Introduce the TB_FOR_EACH_TAGGED macro, and use it to define
PAGE_FOR_EACH_TB, which improves readability. Note that
TB_FOR_EACH_TAGGED will gain another user in a subsequent patch.
- Update tb_page_remove to use PAGE_FOR_EACH_TB. In case there
is a bug and we attempt to remove a TB that is not in the list, instead
of segfaulting (since the list is NULL-terminated) we will reach
g_assert_not_reached().
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Thereby making it per-TCGContext. Once we remove tb_lock, this will
avoid an atomic increment every time a TB is invalidated.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This paves the way for enabling scalable parallel generation of TCG code.
Instead of tracking TBs with a single binary search tree (BST), use a
BST for each TCG region, protecting it with a lock. This is as scalable
as it gets, since each TCG thread operates on a separate region.
The core of this change is the introduction of struct tcg_region_tree,
which contains a pointer to a GTree and an associated lock to serialize
accesses to it. We then allocate an array of tcg_region_tree's, adding
the appropriate padding to avoid false sharing based on
qemu_dcache_linesize.
Given a tc_ptr, we first find the corresponding region_tree. This
is done by special-casing the first and last regions first, since they
might be of size != region.size; otherwise we just divide the offset
by region.stride. I was worried about this division (several dozen
cycles of latency), but profiling shows that this is not a fast path.
Note that region.stride is not required to be a power of two; it
is only required to be a multiple of the host's page size.
Note that with this design we can also provide consistent snapshots
about all region trees at once; for instance, tcg_tb_foreach
acquires/releases all region_tree locks before/after iterating over them.
For this reason we now drop tb_lock in dump_exec_info().
As an alternative I considered implementing a concurrent BST, but this
can be tricky to get right, offers no consistent snapshots of the BST,
and performance and scalability-wise I don't think it could ever beat
having separate GTrees, given that our workload is insert-mostly (all
concurrent BST designs I've seen focus, understandably, on making
lookups fast, which comes at the expense of convoluted, non-wait-free
insertions/removals).
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
The meaning of "existing" is now changed to "matches in hash and
ht->cmp result". This is saner than just checking the pointer value.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
qht_lookup now uses the default cmp function. qht_lookup_custom is defined
to retain the old behaviour, that is a cmp function is explicitly provided.
qht_insert will gain use of the default cmp in the next patch.
Note that we move qht_lookup_custom's @func to be the last argument,
which makes the new qht_lookup as simple as possible.
Instead of this (i.e. keeping @func 2nd):
0000000000010750 <qht_lookup>:
10750: 89 d1 mov %edx,%ecx
10752: 48 89 f2 mov %rsi,%rdx
10755: 48 8b 77 08 mov 0x8(%rdi),%rsi
10759: e9 22 ff ff ff jmpq 10680 <qht_lookup_custom>
1075e: 66 90 xchg %ax,%ax
We get:
0000000000010740 <qht_lookup>:
10740: 48 8b 4f 08 mov 0x8(%rdi),%rcx
10744: e9 37 ff ff ff jmpq 10680 <qht_lookup_custom>
10749: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
As part of plumbing MemTxAttrs down to the IOMMU translate method,
add MemTxAttrs as an argument to address_space_translate()
and address_space_translate_cached(). Callers either have an
attrs value to hand, or don't care and can use MEMTXATTRS_UNSPECIFIED.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180521140402.23318-4-peter.maydell@linaro.org
|
|
As part of plumbing MemTxAttrs down to the IOMMU translate method,
add MemTxAttrs as an argument to tb_invalidate_phys_addr().
Its callers either have an attrs value to hand, or don't care
and can use MEMTXATTRS_UNSPECIFIED.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180521140402.23318-3-peter.maydell@linaro.org
|