slackcoder/qemu - QEMU is a generic and open source machine & userspace emulator and virtualizer

Age	Commit message (Collapse)	Author
2018-10-31	cputlb: Remove tlb_c.pending_flushes	Richard Henderson
	This is essentially redundant with tlb_c.dirty. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31	cputlb: Filter flushes on already clean tlbs	Richard Henderson
	Especially for guests with large numbers of tlbs, like ARM or PPC, we may well not use all of them in between flush operations. Remember which tlbs have been used since the last flush, and avoid any useless flushing. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31	cputlb: Count "partial" and "elided" tlb flushes	Richard Henderson
	Our only statistic so far was "full" tlb flushes, where all mmu_idx are flushed at the same time. Now count "partial" tlb flushes where sets of mmu_idx are flushed, but the set is not maximal. Account one per mmu_idx flushed, as that is the unit of work performed. We don't actually count elided flushes yet, but go ahead and change the interface presented to the monitor all at once. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31	cputlb: Move env->vtlb_index to env->tlb_d.vindex	Richard Henderson
	The rest of the tlb victim cache is per-tlb, the next use index should be as well. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31	cputlb: Split large page tracking per mmu_idx	Richard Henderson
	The set of large pages in the kernel is probably not the same as the set of large pages in the application. Forcing one range to cover both will flush more often than necessary. This allows tlb_flush_page_async_work to flush just the one mmu_idx implicated, which in turn allows us to remove tlb_check_page_and_flush_by_mmuidx_async_work. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31	cputlb: Move cpu->pending_tlb_flush to env->tlb_c.pending_flush	Richard Henderson
	Protect it with the tlb_lock instead of using atomics. The move puts it in or near the same cacheline as the lock; using the lock means we don't need a second atomic operation in order to perform the update. Which makes it cheap to also update pending_flush in tlb_flush_by_mmuidx_async_work. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31	cputlb: Move tlb_lock to CPUTLBCommon	Richard Henderson
	This is the first of several moves to reduce the size of the CPU_COMMON_TLB macro and improve some locality of refernce. Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18	cputlb: serialize tlb updates with env->tlb_lock	Emilio G. Cota
	Currently we rely on atomic operations for cross-CPU invalidations. There are two cases that these atomics miss: cross-CPU invalidations can race with either (1) vCPU threads flushing their TLB, which happens via memset, or (2) vCPUs calling tlb_reset_dirty on their TLB, which updates .addr_write with a regular store. This results in undefined behaviour, since we're mixing regular and atomic ops on concurrent accesses. Fix it by using tlb_lock, a per-vCPU lock. All updaters of tlb_table and the corresponding victim cache now hold the lock. The readers that do not hold tlb_lock must use atomic reads when reading .addr_write, since this field can be updated by other threads; the conversion to atomic reads is done in the next patch. Note that an alternative fix would be to expand the use of atomic ops. However, in the case of TLB flushes this would have a huge performance impact, since (1) TLB flushes can happen very frequently and (2) we currently use a full memory barrier to flush each TLB entry, and a TLB has many entries. Instead, acquiring the lock is barely slower than a full memory barrier since it is uncontended, and with a single lock acquisition we can flush the entire TLB. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181009174557.16125-6-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15	cpu-defs.h: Document CPUIOTLBEntry 'addr' field	Peter Maydell
	The 'addr' field in the CPUIOTLBEntry struct has a rather non-obvious use; add a comment documenting it (reverse-engineered from what the code that sets it is doing). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180611125633.32755-2-peter.maydell@linaro.org
2017-10-10	cputlb: bring back tlb_flush_count under !TLB_DEBUG	Emilio G. Cota
	Commit f0aff0f124 ("cputlb: add assert_cpu_is_self checks") buried the increment of tlb_flush_count under TLB_DEBUG. This results in "info jit" always (mis)reporting 0 TLB flushes when !TLB_DEBUG. Besides, under MTTCG tlb_flush_count is updated by several threads, so in order not to lose counts we'd either have to use atomic ops or distribute the counter, which is more scalable. This patch does the latter by embedding tlb_flush_count in CPUArchState. The global count is then easily obtained by iterating over the CPU list. Note that this change also requires updating the accessors to tlb_flush_count to use atomic_read/set whenever there may be conflicting accesses (as defined in C11) to it. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-07-05	tcg: add CONFIG_TCG guards in headers	Yang Zhong
	Add CONFIG_TCG around TLB-related functions and structure declarations. Some of these functions are defined in ./accel/tcg/cputlb.c, which will not be linked in if TCG is disabled, and have no stubs; therefore, their callers will also be compiled out for --disable-tcg. Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19	qemu-common: stop including qemu/host-utils.h from qemu-common.h	Paolo Bonzini
	Move it to the actual users. There are some inclusions of qemu/host-utils.h in headers, but they are all necessary. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-23	include: Clean up includes	Peter Maydell
	Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. NB: If this commit breaks compilation for your out-of-tree patchseries or fork, then you need to make sure you add #include "qemu/osdep.h" to any new .c files that you have. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com>
2015-08-06	cpu_defs: Simplify CPUTLB padding logic	Peter Crosthwaite
	There was a complicated subtractive arithmetic for determining the padding on the CPUTLBEntry structure. Simplify this with a union. Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Message-Id: <1436130533-18565-1-git-send-email-crosthwaite.peter@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-26	cpu-defs: Move out TB_JMP defines	Peter Crosthwaite
	These are not Architecture specific in any way so move them out of cpu-defs.h. tb-hash.h is an appropriate place as a leading user and their strong relationship to TB hashing and caching. Reviewed-by: Richard Henderson <rth@redhat.com> Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Message-Id: <43ceca65a3fa240efac49aa0bf604ad0442e1710.1433052532.git.crosthwaite.peter@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-26	include/exec: Move standard exceptions to cpu-all.h	Peter Crosthwaite
	These exception indicies are generic and don't have any reliance on the per-arch cpu.h defs. Move them to cpu-all.h so they can be used by core code that does not have access to cpu-defs.h. Reviewed-by: Richard Henderson <rth@redhat.com> Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Message-Id: <dbebd3062c7cd4332240891a3564e73f374ddfcd.1433052532.git.crosthwaite.peter@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-26	cpu-defs: Move CPU_TEMP_BUF_NLONGS to tcg	Peter Crosthwaite
	The usages of this define are pure TCG and there is no architecture specific variation of the value. Localise it to the TCG engine to remove another architecture agnostic piece from cpu-defs.h. This follows on from a28177820a868eafda8fab007561cc19f41941f4 where temp_buf was moved out of the CPU_COMMON obsoleting the need for the super early definition. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Message-Id: <498e8e5325c1a1aff79e5bcfc28cb760ef6b214e.1433052532.git.crosthwaite.peter@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-03	softmmu: support up to 12 MMU modes	Paolo Bonzini
	At 8k per TLB (for 64-bit host or target), 8 or more modes make the TLBs bigger than 64k, and some RISC TCG backends do not like that. On the affected hosts, cut the TLB size in half---there is still a measurable speedup on PPC with the next patch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1424436345-37924-3-git-send-email-pbonzini@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Alexander Graf <agraf@suse.de>
2015-04-26	Add MemTxAttrs to the IOTLB	Peter Maydell
	Add a MemTxAttrs field to the IOTLB, and allow target-specific code to set it via a new tlb_set_page_with_attrs() function; pass the attributes through to the device when making IO accesses. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2015-04-26	Make CPU iotlb a structure rather than a plain hwaddr	Peter Maydell
	Make the CPU iotlb a structure rather than a plain hwaddr; this will allow us to add transaction attributes to it. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2014-09-01	implementing victim TLB for QEMU system emulated TLB	Xin Tong
	QEMU system mode page table walks are expensive. Taken by running QEMU qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a 4-level page tables in guest Linux OS takes ~450 X86 instructions on average. QEMU system mode TLB is implemented using a directly-mapped hashtable. This structure suffers from conflict misses. Increasing the associativity of the TLB may not be the solution to conflict misses as all the ways may have to be walked in serial. A victim TLB is a TLB used to hold translations evicted from the primary TLB upon replacement. The victim TLB lies between the main TLB and its refill path. Victim TLB is of greater associativity (fully associative in this patch). It takes longer to lookup the victim TLB, but its likely better than a full page table walk. The memory translation path is changed as follows : Before Victim TLB: 1. Inline TLB lookup 2. Exit code cache on TLB miss. 3. Check for unaligned, IO accesses 4. TLB refill. 5. Do the memory access. 6. Return to code cache. After Victim TLB: 1. Inline TLB lookup 2. Exit code cache on TLB miss. 3. Check for unaligned, IO accesses 4. Victim TLB lookup. 5. If victim TLB misses, TLB refill 6. Do the memory access. 7. Return to code cache The advantage is that victim TLB can offer more associativity to a directly mapped TLB and thus potentially fewer page table walks while still keeping the time taken to flush within reasonable limits. However, placing a victim TLB before the refill path increase TLB refill path as the victim TLB is consulted before the TLB refill. The performance results demonstrate that the pros outweigh the cons. some performance results taken on SPECINT2006 train datasets and kernel boot and qemu configure script on an Intel(R) Xeon(R) CPU E5620 @ 2.40GHz Linux machine are shown in the Google Doc link below. https://docs.google.com/spreadsheets/d/1eiItzekZwNQOal_h-5iJmC4tMDi051m9qidi5_nwvH4/edit?usp=sharing In summary, victim TLB improves the performance of qemu-system-x86_64 by 11% on average on SPECINT2006, kernelboot and qemu configscript and with highest improvement of in 26% in 456.hmmer. And victim TLB does not result in any performance degradation in any of the measured benchmarks. Furthermore, the implemented victim TLB is architecture independent and is expected to benefit other architectures in QEMU as well. Although there are measurement fluctuations, the performance improvement is very significant and by no means in the range of noises. Signed-off-by: Xin Tong <trent.tong@gmail.com> Message-id: 1407202523-23553-1-git-send-email-trent.tong@gmail.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-03-13	cpu: Move breakpoints field from CPU_COMMON to CPUState	Andreas Färber
	Most targets were using offsetof(CPUFooState, breakpoints) to determine how much of CPUFooState to clear on reset. Use the next field after CPU_COMMON instead, if any, or sizeof(CPUFooState) otherwise. Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13	cpu: Move watchpoint fields from CPU_COMMON to CPUState	Andreas Färber
	Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13	cpu: Move opaque field from CPU_COMMON to CPUState	Andreas Färber
	Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13	cpu: Move exception_index field from CPU_COMMON to CPUState	Andreas Färber
	Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13	cpu: Move jmp_env field from CPU_COMMON to CPUState	Andreas Färber
	Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13	cpu: Move tb_jmp_cache field from CPU_COMMON to CPUState	Andreas Färber
	Clear it on reset. Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13	cpu: Move icount_decr field from CPU_COMMON to CPUState	Andreas Färber
	Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13	cpu: Move icount_extra field from CPU_COMMON to CPUState	Andreas Färber
	Reset it. Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13	cpu: Move can_do_io field from CPU_COMMON to CPUState	Andreas Färber
	Rename can_do_io() to cpu_can_do_io() and change argument to CPUState. Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13	cpu: Move mem_io_{pc,vaddr} fields from CPU_COMMON to CPUState	Andreas Färber
	Reset them. Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-10	target-arm: Implement WFE as a yield operation	Peter Maydell
	Implement WFE to yield our timeslice to the next CPU. This avoids slowdowns in multicore configurations caused by one core busy-waiting on a spinlock which can't possibly be unlocked until the other core has an opportunity to run. This speeds up my test case A15 dual-core boot by a factor of three (though it is still four or five times slower than a single-core boot). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1393339545-22111-1-git-send-email-peter.maydell@linaro.org Reviewed-by: Richard Henderson <rth@twiddle.net> Tested-by: Rob Herring <rob.herring@linaro.org>
2013-10-07	cpu: Drop cpu_model_str from CPU_COMMON	Andreas Färber
	Since this is only read in cpu_copy() and linux-user has a global cpu_model, drop the field from generic code. Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-07-26	Merge remote-tracking branch 'rth/tcg-next' into staging	Anthony Liguori
	# By Claudio Fontana (1) and others # Via Richard Henderson * rth/tcg-next: tcg: Remove temp_buf tcg/aarch64: Implement tlb lookup fast path tcg/aarch64: implement ldst 12bit scaled uimm offset Message-id: 1373919944-8521-1-git-send-email-rth@twiddle.net Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-23	cpu: Move gdb_regs field from CPU_COMMON to CPUState	Andreas Färber
	Prepares for changing gdb_register_coprocessor() argument to CPUState. Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-07-23	cpu: Move singlestep_enabled field from CPU_COMMON to CPUState	Andreas Färber
	Prepares for changing cpu_single_step() argument to CPUState. Acked-by: Michael Walle <michael@walle.cc> (for lm32) Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-07-15	tcg: Remove temp_buf	Richard Henderson
	All targets have been converted to allocating space for temporaries on the stack. No need to allocate space within the CPU_COMMON block. Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-09	cpu: Make first_cpu and next_cpu CPUState	Andreas Färber
	Move next_cpu from CPU_COMMON to CPUState. Move first_cpu variable to qom/cpu.h. gdbstub needs to use CPUState::env_ptr for now. cpu_copy() no longer needs to save and restore cpu_next. Acked-by: Paolo Bonzini <pbonzini@redhat.com> [AF: Rebased, simplified cpu_copy()] Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-06-28	hwaddr: Make hwaddr type usable beyond softmmu	Andreas Färber
	While not normally needed for *-user, it can safely be used there since always based on uint64_t, to avoid ifdeffery. To avoid accidental uses, move the guards from exec/hwaddr.h to its inclusion sites. No need for them in include/hw/. Prepares for hwaddr use in qom/cpu.h. Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-06-05	tcg: Use QEMU_BUILD_BUG_ON for CPU_TLB_ENTRY_BITS	Richard Henderson
	Rather than a hand-coded version of the same thing. Reviewed-by: Andreas Färber <afaerber@suse.de> Reviewed-by: liguang <lig.fnst@cn.fujitsu.com> Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-18	elfload: use abi_llong/ullong instead of target_llong/ullong	Paolo Bonzini
	The alignment is a characteristic of the ABI, not the CPU. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
2013-04-18	elfload: only give abi_long/ulong the alignment specified by the target	Paolo Bonzini
	Previously, this was done for target_long/ulong, and propagated to abi_long/ulong via a typedef. But target_long/ulong should not have any specific alignment, it is never used to access guest memory. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
2013-04-18	elfload: use abi_int/uint instead of target_int/uint	Paolo Bonzini
	The alignment is a characteristic of the ABI, not the CPU. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
2013-04-18	elfload: use abi_short/ushort instead of target_short/ushort	Paolo Bonzini
	The alignment is a characteristic of the ABI, not the CPU. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
2013-03-12	cpu: Move halted and interrupt_request fields to CPUState	Andreas Färber
	Both fields are used in VMState, thus need to be moved together. Explicitly zero them on reset since they were located before breakpoints. Pass PowerPCCPU to kvmppc_handle_halt(). Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-02-23	Replace all setjmp()/longjmp() with sigsetjmp()/siglongjmp()	Peter Maydell
	The setjmp() function doesn't specify whether signal masks are saved and restored; on Linux they are not, but on BSD (including MacOSX) they are. We want to have consistent behaviour across platforms, so we should always use "don't save/restore signal mask" (this is also generally going to be faster). This also works around a bug in MacOSX where the signal-restoration on longjmp() affects the signal mask for a completely different thread, not just the mask for the thread which did the longjmp. The most visible effect of this was that ctrl-C was ignored on MacOSX because the CPU thread did a longjmp which resulted in its signal mask being applied to every thread, so that all threads had SIGINT and SIGTERM blocked. The POSIX-sanctioned portable way to do a jump without affecting signal masks is to siglongjmp() to a sigjmp_buf which was created by calling sigsetjmp() with a zero savemask parameter, so change all uses of setjmp()/longjmp() accordingly. [Technically POSIX allows sigsetjmp(buf, 0) to save the signal mask; however the following siglongjmp() must not restore the signal mask, so the pair can be effectively considered as "sigjmp/longjmp which don't touch the mask".] For Windows we provide a trivial sigsetjmp/siglongjmp in terms of setjmp/longjmp -- this is OK because no user will ever pass a non-zero savemask. The setjmp() uses in tests/tcg/test-i386.c and tests/tcg/linux-test.c are left untouched because these are self-contained singlethreaded test programs intended to be run under QEMU's Linux emulation, so they have neither the portability nor the multithreading issues to deal with. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Tested-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-16	cpu: Move current_tb field to CPUState	Andreas Färber
	Explictly NULL it on CPU reset since it was located before breakpoints. Change vapic_report_tpr_access() argument to CPUState. This also resolves the use of void* for cpu.h independence. Change vAPIC patch_instruction() argument to X86CPU. Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-02-16	cpu: Move exit_request field to CPUState	Andreas Färber
	Since it was located before breakpoints field, it needs to be reset. Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-02-16	cpu: Move running field to CPUState	Andreas Färber
	Pass CPUState to cpu_exec_{start,end}() functions. Signed-off-by: Andreas Färber <afaerber@suse.de>
2013-02-16	cpu: Move host_tid field to CPUState	Andreas Färber
	Change gdbstub's cpu_index() argument to CPUState now that CPUArchState is no longer used. Signed-off-by: Andreas Färber <afaerber@suse.de>