aboutsummaryrefslogtreecommitdiff
path: root/include/exec/cpu-defs.h
AgeCommit message (Collapse)Author
2018-10-31cputlb: Remove tlb_c.pending_flushesRichard Henderson
This is essentially redundant with tlb_c.dirty. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31cputlb: Filter flushes on already clean tlbsRichard Henderson
Especially for guests with large numbers of tlbs, like ARM or PPC, we may well not use all of them in between flush operations. Remember which tlbs have been used since the last flush, and avoid any useless flushing. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31cputlb: Count "partial" and "elided" tlb flushesRichard Henderson
Our only statistic so far was "full" tlb flushes, where all mmu_idx are flushed at the same time. Now count "partial" tlb flushes where sets of mmu_idx are flushed, but the set is not maximal. Account one per mmu_idx flushed, as that is the unit of work performed. We don't actually count elided flushes yet, but go ahead and change the interface presented to the monitor all at once. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31cputlb: Move env->vtlb_index to env->tlb_d.vindexRichard Henderson
The rest of the tlb victim cache is per-tlb, the next use index should be as well. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31cputlb: Split large page tracking per mmu_idxRichard Henderson
The set of large pages in the kernel is probably not the same as the set of large pages in the application. Forcing one range to cover both will flush more often than necessary. This allows tlb_flush_page_async_work to flush just the one mmu_idx implicated, which in turn allows us to remove tlb_check_page_and_flush_by_mmuidx_async_work. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31cputlb: Move cpu->pending_tlb_flush to env->tlb_c.pending_flushRichard Henderson
Protect it with the tlb_lock instead of using atomics. The move puts it in or near the same cacheline as the lock; using the lock means we don't need a second atomic operation in order to perform the update. Which makes it cheap to also update pending_flush in tlb_flush_by_mmuidx_async_work. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31cputlb: Move tlb_lock to CPUTLBCommonRichard Henderson
This is the first of several moves to reduce the size of the CPU_COMMON_TLB macro and improve some locality of refernce. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18cputlb: serialize tlb updates with env->tlb_lockEmilio G. Cota
Currently we rely on atomic operations for cross-CPU invalidations. There are two cases that these atomics miss: cross-CPU invalidations can race with either (1) vCPU threads flushing their TLB, which happens via memset, or (2) vCPUs calling tlb_reset_dirty on their TLB, which updates .addr_write with a regular store. This results in undefined behaviour, since we're mixing regular and atomic ops on concurrent accesses. Fix it by using tlb_lock, a per-vCPU lock. All updaters of tlb_table and the corresponding victim cache now hold the lock. The readers that do not hold tlb_lock must use atomic reads when reading .addr_write, since this field can be updated by other threads; the conversion to atomic reads is done in the next patch. Note that an alternative fix would be to expand the use of atomic ops. However, in the case of TLB flushes this would have a huge performance impact, since (1) TLB flushes can happen very frequently and (2) we currently use a full memory barrier to flush each TLB entry, and a TLB has many entries. Instead, acquiring the lock is barely slower than a full memory barrier since it is uncontended, and with a single lock acquisition we can flush the entire TLB. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181009174557.16125-6-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15cpu-defs.h: Document CPUIOTLBEntry 'addr' fieldPeter Maydell
The 'addr' field in the CPUIOTLBEntry struct has a rather non-obvious use; add a comment documenting it (reverse-engineered from what the code that sets it is doing). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180611125633.32755-2-peter.maydell@linaro.org
2017-10-10cputlb: bring back tlb_flush_count under !TLB_DEBUGEmilio G. Cota
Commit f0aff0f124 ("cputlb: add assert_cpu_is_self checks") buried the increment of tlb_flush_count under TLB_DEBUG. This results in "info jit" always (mis)reporting 0 TLB flushes when !TLB_DEBUG. Besides, under MTTCG tlb_flush_count is updated by several threads, so in order not to lose counts we'd either have to use atomic ops or distribute the counter, which is more scalable. This patch does the latter by embedding tlb_flush_count in CPUArchState. The global count is then easily obtained by iterating over the CPU list. Note that this change also requires updating the accessors to tlb_flush_count to use atomic_read/set whenever there may be conflicting accesses (as defined in C11) to it. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-07-05tcg: add CONFIG_TCG guards in headersYang Zhong
Add CONFIG_TCG around TLB-related functions and structure declarations. Some of these functions are defined in ./accel/tcg/cputlb.c, which will not be linked in if TCG is disabled, and have no stubs; therefore, their callers will also be compiled out for --disable-tcg. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19qemu-common: stop including qemu/host-utils.h from qemu-common.hPaolo Bonzini
Move it to the actual users. There are some inclusions of qemu/host-utils.h in headers, but they are all necessary. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-23include: Clean up includesPeter Maydell
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. NB: If this commit breaks compilation for your out-of-tree patchseries or fork, then you need to make sure you add #include "qemu/osdep.h" to any new .c files that you have. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-08-06cpu_defs: Simplify CPUTLB padding logicPeter Crosthwaite
There was a complicated subtractive arithmetic for determining the padding on the CPUTLBEntry structure. Simplify this with a union. Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Message-Id: <1436130533-18565-1-git-send-email-crosthwaite.peter@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-26cpu-defs: Move out TB_JMP definesPeter Crosthwaite
These are not Architecture specific in any way so move them out of cpu-defs.h. tb-hash.h is an appropriate place as a leading user and their strong relationship to TB hashing and caching. Reviewed-by: Richard Henderson <rth@redhat.com> Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Message-Id: <43ceca65a3fa240efac49aa0bf604ad0442e1710.1433052532.git.crosthwaite.peter@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-26include/exec: Move standard exceptions to cpu-all.hPeter Crosthwaite
These exception indicies are generic and don't have any reliance on the per-arch cpu.h defs. Move them to cpu-all.h so they can be used by core code that does not have access to cpu-defs.h. Reviewed-by: Richard Henderson <rth@redhat.com> Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Message-Id: <dbebd3062c7cd4332240891a3564e73f374ddfcd.1433052532.git.crosthwaite.peter@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-26cpu-defs: Move CPU_TEMP_BUF_NLONGS to tcgPeter Crosthwaite
The usages of this define are pure TCG and there is no architecture specific variation of the value. Localise it to the TCG engine to remove another architecture agnostic piece from cpu-defs.h. This follows on from a28177820a868eafda8fab007561cc19f41941f4 where temp_buf was moved out of the CPU_COMMON obsoleting the need for the super early definition. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Message-Id: <498e8e5325c1a1aff79e5bcfc28cb760ef6b214e.1433052532.git.crosthwaite.peter@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-03softmmu: support up to 12 MMU modesPaolo Bonzini
At 8k per TLB (for 64-bit host or target), 8 or more modes make the TLBs bigger than 64k, and some RISC TCG backends do not like that. On the affected hosts, cut the TLB size in half---there is still a measurable speedup on PPC with the next patch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1424436345-37924-3-git-send-email-pbonzini@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-26Add MemTxAttrs to the IOTLBPeter Maydell
Add a MemTxAttrs field to the IOTLB, and allow target-specific code to set it via a new tlb_set_page_with_attrs() function; pass the attributes through to the device when making IO accesses. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2015-04-26Make CPU iotlb a structure rather than a plain hwaddrPeter Maydell
Make the CPU iotlb a structure rather than a plain hwaddr; this will allow us to add transaction attributes to it. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2014-09-01implementing victim TLB for QEMU system emulated TLBXin Tong
QEMU system mode page table walks are expensive. Taken by running QEMU qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a 4-level page tables in guest Linux OS takes ~450 X86 instructions on average. QEMU system mode TLB is implemented using a directly-mapped hashtable. This structure suffers from conflict misses. Increasing the associativity of the TLB may not be the solution to conflict misses as all the ways may have to be walked in serial. A victim TLB is a TLB used to hold translations evicted from the primary TLB upon replacement. The victim TLB lies between the main TLB and its refill path. Victim TLB is of greater associativity (fully associative in this patch). It takes longer to lookup the victim TLB, but its likely better than a full page table walk. The memory translation path is changed as follows : Before Victim TLB: 1. Inline TLB lookup 2. Exit code cache on TLB miss. 3. Check for unaligned, IO accesses 4. TLB refill. 5. Do the memory access. 6. Return to code cache. After Victim TLB: 1. Inline TLB lookup 2. Exit code cache on TLB miss. 3. Check for unaligned, IO accesses 4. Victim TLB lookup. 5. If victim TLB misses, TLB refill 6. Do the memory access. 7. Return to code cache The advantage is that victim TLB can offer more associativity to a directly mapped TLB and thus potentially fewer page table walks while still keeping the time taken to flush within reasonable limits. However, placing a victim TLB before the refill path increase TLB refill path as the victim TLB is consulted before the TLB refill. The performance results demonstrate that the pros outweigh the cons. some performance results taken on SPECINT2006 train datasets and kernel boot and qemu configure script on an Intel(R) Xeon(R) CPU E5620 @ 2.40GHz Linux machine are shown in the Google Doc link below. https://docs.google.com/spreadsheets/d/1eiItzekZwNQOal_h-5iJmC4tMDi051m9qidi5_nwvH4/edit?usp=sharing In summary, victim TLB improves the performance of qemu-system-x86_64 by 11% on average on SPECINT2006, kernelboot and qemu configscript and with highest improvement of in 26% in 456.hmmer. And victim TLB does not result in any performance degradation in any of the measured benchmarks. Furthermore, the implemented victim TLB is architecture independent and is expected to benefit other architectures in QEMU as well. Although there are measurement fluctuations, the performance improvement is very significant and by no means in the range of noises. Signed-off-by: Xin Tong <trent.tong@gmail.com> Message-id: 1407202523-23553-1-git-send-email-trent.tong@gmail.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-03-13cpu: Move breakpoints field from CPU_COMMON to CPUStateAndreas Färber
Most targets were using offsetof(CPUFooState, breakpoints) to determine how much of CPUFooState to clear on reset. Use the next field after CPU_COMMON instead, if any, or sizeof(CPUFooState) otherwise. Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13cpu: Move watchpoint fields from CPU_COMMON to CPUStateAndreas Färber
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13cpu: Move opaque field from CPU_COMMON to CPUStateAndreas Färber
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13cpu: Move exception_index field from CPU_COMMON to CPUStateAndreas Färber
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13cpu: Move jmp_env field from CPU_COMMON to CPUStateAndreas Färber
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13cpu: Move tb_jmp_cache field from CPU_COMMON to CPUStateAndreas Färber
Clear it on reset. Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13cpu: Move icount_decr field from CPU_COMMON to CPUStateAndreas Färber
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13cpu: Move icount_extra field from CPU_COMMON to CPUStateAndreas Färber
Reset it. Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13cpu: Move can_do_io field from CPU_COMMON to CPUStateAndreas Färber
Rename can_do_io() to cpu_can_do_io() and change argument to CPUState. Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13cpu: Move mem_io_{pc,vaddr} fields from CPU_COMMON to CPUStateAndreas Färber
Reset them. Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-10target-arm: Implement WFE as a yield operationPeter Maydell
Implement WFE to yield our timeslice to the next CPU. This avoids slowdowns in multicore configurations caused by one core busy-waiting on a spinlock which can't possibly be unlocked until the other core has an opportunity to run. This speeds up my test case A15 dual-core boot by a factor of three (though it is still four or five times slower than a single-core boot). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1393339545-22111-1-git-send-email-peter.maydell@linaro.org Reviewed-by: Richard Henderson <rth@twiddle.net> Tested-by: Rob Herring <rob.herring@linaro.org>
2013-10-07cpu: Drop cpu_model_str from CPU_COMMONAndreas Färber
Since this is only read in cpu_copy() and linux-user has a global cpu_model, drop the field from generic code. Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-07-26Merge remote-tracking branch 'rth/tcg-next' into stagingAnthony Liguori
# By Claudio Fontana (1) and others # Via Richard Henderson * rth/tcg-next: tcg: Remove temp_buf tcg/aarch64: Implement tlb lookup fast path tcg/aarch64: implement ldst 12bit scaled uimm offset Message-id: 1373919944-8521-1-git-send-email-rth@twiddle.net Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-23cpu: Move gdb_regs field from CPU_COMMON to CPUStateAndreas Färber
Prepares for changing gdb_register_coprocessor() argument to CPUState. Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-07-23cpu: Move singlestep_enabled field from CPU_COMMON to CPUStateAndreas Färber
Prepares for changing cpu_single_step() argument to CPUState. Acked-by: Michael Walle <michael@walle.cc> (for lm32) Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-07-15tcg: Remove temp_bufRichard Henderson
All targets have been converted to allocating space for temporaries on the stack. No need to allocate space within the CPU_COMMON block. Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-09cpu: Make first_cpu and next_cpu CPUStateAndreas Färber
Move next_cpu from CPU_COMMON to CPUState. Move first_cpu variable to qom/cpu.h. gdbstub needs to use CPUState::env_ptr for now. cpu_copy() no longer needs to save and restore cpu_next. Acked-by: Paolo Bonzini <pbonzini@redhat.com> [AF: Rebased, simplified cpu_copy()] Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-06-28hwaddr: Make hwaddr type usable beyond softmmuAndreas Färber
While not normally needed for *-user, it can safely be used there since always based on uint64_t, to avoid ifdeffery. To avoid accidental uses, move the guards from exec/hwaddr.h to its inclusion sites. No need for them in include/hw/. Prepares for hwaddr use in qom/cpu.h. Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-06-05tcg: Use QEMU_BUILD_BUG_ON for CPU_TLB_ENTRY_BITSRichard Henderson
Rather than a hand-coded version of the same thing. Reviewed-by: Andreas Färber <afaerber@suse.de> Reviewed-by: liguang <lig.fnst@cn.fujitsu.com> Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-18elfload: use abi_llong/ullong instead of target_llong/ullongPaolo Bonzini
The alignment is a characteristic of the ABI, not the CPU. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
2013-04-18elfload: only give abi_long/ulong the alignment specified by the targetPaolo Bonzini
Previously, this was done for target_long/ulong, and propagated to abi_long/ulong via a typedef. But target_long/ulong should not have any specific alignment, it is never used to access guest memory. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
2013-04-18elfload: use abi_int/uint instead of target_int/uintPaolo Bonzini
The alignment is a characteristic of the ABI, not the CPU. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
2013-04-18elfload: use abi_short/ushort instead of target_short/ushortPaolo Bonzini
The alignment is a characteristic of the ABI, not the CPU. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
2013-03-12cpu: Move halted and interrupt_request fields to CPUStateAndreas Färber
Both fields are used in VMState, thus need to be moved together. Explicitly zero them on reset since they were located before breakpoints. Pass PowerPCCPU to kvmppc_handle_halt(). Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-02-23Replace all setjmp()/longjmp() with sigsetjmp()/siglongjmp()Peter Maydell
The setjmp() function doesn't specify whether signal masks are saved and restored; on Linux they are not, but on BSD (including MacOSX) they are. We want to have consistent behaviour across platforms, so we should always use "don't save/restore signal mask" (this is also generally going to be faster). This also works around a bug in MacOSX where the signal-restoration on longjmp() affects the signal mask for a completely different thread, not just the mask for the thread which did the longjmp. The most visible effect of this was that ctrl-C was ignored on MacOSX because the CPU thread did a longjmp which resulted in its signal mask being applied to every thread, so that all threads had SIGINT and SIGTERM blocked. The POSIX-sanctioned portable way to do a jump without affecting signal masks is to siglongjmp() to a sigjmp_buf which was created by calling sigsetjmp() with a zero savemask parameter, so change all uses of setjmp()/longjmp() accordingly. [Technically POSIX allows sigsetjmp(buf, 0) to save the signal mask; however the following siglongjmp() must not restore the signal mask, so the pair can be effectively considered as "sigjmp/longjmp which don't touch the mask".] For Windows we provide a trivial sigsetjmp/siglongjmp in terms of setjmp/longjmp -- this is OK because no user will ever pass a non-zero savemask. The setjmp() uses in tests/tcg/test-i386.c and tests/tcg/linux-test.c are left untouched because these are self-contained singlethreaded test programs intended to be run under QEMU's Linux emulation, so they have neither the portability nor the multithreading issues to deal with. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Tested-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-16cpu: Move current_tb field to CPUStateAndreas Färber
Explictly NULL it on CPU reset since it was located before breakpoints. Change vapic_report_tpr_access() argument to CPUState. This also resolves the use of void* for cpu.h independence. Change vAPIC patch_instruction() argument to X86CPU. Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-02-16cpu: Move exit_request field to CPUStateAndreas Färber
Since it was located before breakpoints field, it needs to be reset. Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-02-16cpu: Move running field to CPUStateAndreas Färber
Pass CPUState to cpu_exec_{start,end}() functions. Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-02-16cpu: Move host_tid field to CPUStateAndreas Färber
Change gdbstub's cpu_index() argument to CPUState now that CPUArchState is no longer used. Signed-off-by: Andreas Färber <afaerber@suse.de>