aboutsummaryrefslogtreecommitdiff
path: root/cputlb.c
AgeCommit message (Collapse)Author
2017-05-11cputlb: handle first atomic write to the pageNikunj A Dadhania
In case where the conditional write is the first write to the page, TLB_NOTDIRTY will be set and stop_the_world is triggered. Handle this as a special case and set the dirty bit. After that fall through to the actual atomic instruction below. Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-02-28cputlb: Don't assume do_unassigned_access() never returnsPeter Maydell
In get_page_addr_code(), if the guest PC doesn't correspond to RAM then we currently run the CPU's do_unassigned_access() hook if it has one, and otherwise we give up and exit QEMU with a more-or-less useful message. This code assumes that the do_unassigned_access hook will never return, because if it does then we'll plough on attempting to use a non-RAM TLB entry to get a RAM address and will abort() in qemu_ram_addr_from_host_nofail(). Unfortunately some CPU implementations of this hook do return: Microblaze, SPARC and the ARM v7M. Change the code to call report_bad_exec() if the hook returns, as well as if it didn't have one. This means we can tidy it up to use the cpu_unassigned_access() function which wraps the "get the CPU class and call the hook if it has one" work, since we aren't trying to distinguish "no hook" from "hook existed and returned" any more. This brings the handling of this hook into line with the handling used for data accesses, where "hook returned" is treated the same as "no hook existed" and gets you the default behaviour. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-02-24cputlb: introduce tlb_flush_*_all_cpus[_synced]Alex Bennée
This introduces support to the cputlb API for flushing all CPUs TLBs with one call. This avoids the need for target helpers to iterate through the vCPUs themselves. An additional variant of the API (_synced) will cause the source vCPUs work to be scheduled as "safe work". The result will be all the flush operations will be complete by the time the originating vCPU executes its safe work. The calling implementation can either end the TB straight away (which will then pick up the cpu->exit_request on entering the next block) or defer the exit until the architectural sync point (usually a barrier instruction). Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-02-24cputlb: atomically update tlb fields used by tlb_reset_dirtyAlex Bennée
The main use case for tlb_reset_dirty is to set the TLB_NOTDIRTY flags in TLB entries to force the slow-path on writes. This is used to mark page ranges containing code which has been translated so it can be invalidated if written to. To do this safely we need to ensure the TLB entries in question for all vCPUs are updated before we attempt to run the code otherwise a race could be introduced. To achieve this we atomically set the flag in tlb_reset_dirty_range and take care when setting it when the TLB entry is filled. On 32 bit systems attempting to emulate 64 bit guests we don't even bother as we might not have the atomic primitives available. MTTCG is disabled in this case and can't be forced on. The copy_tlb_helper function helps keep the atomic semantics in one place to avoid confusion. The dirty helper function is made static as it isn't used outside of cputlb. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-02-24cputlb: add tlb_flush_by_mmuidx async routinesAlex Bennée
This converts the remaining TLB flush routines to use async work when detecting a cross-vCPU flush. The only minor complication is having to serialise the var_list of MMU indexes into a form that can be punted to an asynchronous job. The pending_tlb_flush field on QOM's CPU structure also becomes a bitfield rather than a boolean. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-02-24cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmapAlex Bennée
While the vargs approach was flexible the original MTTCG ended up having munge the bits to a bitmap so the data could be used in deferred work helpers. Instead of hiding that in cputlb we push the change to the API to make it take a bitmap of MMU indexes instead. For ARM some the resulting flushes end up being quite long so to aid readability I've tended to move the index shifting to a new line so all the bits being or-ed together line up nicely, for example: tlb_flush_page_by_mmuidx(other_cs, pageaddr, (1 << ARMMMUIdx_S1SE1) | (1 << ARMMMUIdx_S1SE0)); Signed-off-by: Alex Bennée <alex.bennee@linaro.org> [AT: SPARC parts only] Reviewed-by: Artyom Tarasenko <atar4qemu@gmail.com> Reviewed-by: Richard Henderson <rth@twiddle.net> [PM: ARM parts only] Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2017-02-24cputlb: introduce tlb_flush_* async work.KONRAD Frederic
Some architectures allow to flush the tlb of other VCPUs. This is not a problem when we have only one thread for all VCPUs but it definitely needs to be an asynchronous work when we are in true multithreaded work. We take the tb_lock() when doing this to avoid racing with other threads which may be invalidating TB's at the same time. The alternative would be to use proper atomic primitives to clear the tlb entries en-mass. This patch doesn't do anything to protect other cputlb function being called in MTTCG mode making cross vCPU changes. Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> [AJB: remove need for g_malloc on defer, make check fixes, tb_lock] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-02-24cputlb: tweak qemu_ram_addr_from_host_nofail reportingAlex Bennée
This moves the helper function closer to where it is called and updates the error message to report via error_report instead of the deprecated fprintf. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-02-24cputlb: add assert_cpu_is_self checksAlex Bennée
For SoftMMU the TLB flushes are an example of a task that can be triggered on one vCPU by another. To deal with this properly we need to use safe work to ensure these changes are done safely. The new assert can be enabled while debugging to catch these cases. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>
2017-02-24tcg: drop global lock during TCG code executionJan Kiszka
This finally allows TCG to benefit from the iothread introduction: Drop the global mutex while running pure TCG CPU code. Reacquire the lock when entering MMIO or PIO emulation, or when leaving the TCG loop. We have to revert a few optimization for the current TCG threading model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not kicking it in qemu_cpu_kick. We also need to disable RAM block reordering until we have a more efficient locking mechanism at hand. Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here. These numbers demonstrate where we gain something: 20338 jan 20 0 331m 75m 6904 R 99 0.9 0:50.95 qemu-system-arm 20337 jan 20 0 331m 75m 6904 S 20 0.9 0:26.50 qemu-system-arm The guest CPU was fully loaded, but the iothread could still run mostly independent on a second core. Without the patch we don't get beyond 32206 jan 20 0 330m 73m 7036 R 82 0.9 1:06.00 qemu-system-arm 32204 jan 20 0 330m 73m 7036 S 21 0.9 0:17.03 qemu-system-arm We don't benefit significantly, though, when the guest is not fully loading a host CPU. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com> [FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex] Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com> [EGC: fixed iothread lock for cpu-exec IRQ handling] Signed-off-by: Emilio G. Cota <cota@braap.org> [AJB: -smp single-threaded fix, clean commit msg, BQL fixes] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com> [PM: target-arm changes] Acked-by: Peter Maydell <peter.maydell@linaro.org>
2017-01-13cputlb: drop flush_global flag from tlb_flushAlex Bennée
We have never has the concept of global TLB entries which would avoid the flush so we never actually use this flag. Drop it and make clear that tlb_flush is the sledge-hammer it has always been. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> [DG: ppc portions] Acked-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-28clean-up: removed duplicate #includesAnand J
Some files contain multiple #includes of the same header file. Removed most of those unnecessary duplicate entries using scripts/clean-includes. Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Anand J <anand.indukala@gmail.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2016-10-26tcg: Add CONFIG_ATOMIC64Richard Henderson
Allow qemu to build on 32-bit hosts without 64-bit atomic ops. Even if we only allow 32-bit hosts to multi-thread emulate 32-bit guests, we still need some way to handle the 32-bit guest using a 64-bit atomic operation. Do so by dropping back to single-step. Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26tcg: Add atomic128 helpersRichard Henderson
Force the use of cmpxchg16b on x86_64. Wikipedia suggests that only very old AMD64 (circa 2004) did not have this instruction. Further, it's required by Windows 8 so no new cpus will ever omit it. If we truely care about these, then we could check this at startup time and then avoid executing paths that use it. Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26tcg: Add atomic helpersRichard Henderson
Add all of cmpxchg, op_fetch, fetch_op, and xchg. Handle both endian-ness, and sizes up to 8. Handle expanding non-atomically, when emulating in serial. Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26cputlb: Tidy some macrosRichard Henderson
TGT_LE and TGT_BE are not size dependent and do not need to be redefined. The others are no longer used at all. Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26cputlb: Move most of iotlb code out of lineRichard Henderson
Saves 2k code size off of a cold path. Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26cputlb: Move probe_write out of softmmu_template.hRichard Henderson
Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-10-26cputlb: Replace SHIFT with DATA_SIZERichard Henderson
Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-09-16tcg: Merge GETPC and GETRARichard Henderson
The return address argument to the softmmu template helpers was confused. In the legacy case, we wanted to indicate that there is no return address, and so passed in NULL. However, we then immediately subtracted GETPC_ADJ from NULL, resulting in a non-zero value, indicating the presence of an (invalid) return address. Push the GETPC_ADJ subtraction down to the only point it's required: immediately before use within cpu_restore_state_from_tb, after all NULL pointer checks have been completed. This makes GETPC and GETRA identical. Remove GETRA as the lesser used macro, replacing all uses with GETPC. Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-07-08cputlb: Add address parameter to VICTIM_TLB_HITSamuel Damashek
[rth: Split out from the original patch.] Signed-off-by: Samuel Damashek <samuel.damashek@invincea.com> Message-Id: <20160706182652.16190-1-samuel.damashek@invincea.com> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-07-08cputlb: Move VICTIM_TLB_HIT out of lineRichard Henderson
There are currently 22 invocations of this function, and we're about to increase that number. Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-06-28cputlb: don't cpu_abort() if guest tries to execute outside RAM or RAMPeter Maydell
In get_page_addr_code(), if the guest program counter turns out not to be in ROM or RAM, we can't handle executing from it, and we call cpu_abort(). This results in the message qemu: fatal: Trying to execute code outside RAM or ROM at 0x08000000 followed by a guest register dump, and then QEMU dumps core. This situation happens in one of two cases: (1) a guest kernel bug, where it jumped off into nowhere (2) a user command line mistake, where they tried to run an image for board A on a QEMU model of board B, or where they didn't provide an image at all, and QEMU executed through a ROM or RAM full of NOP instructions and then fell off the end In either case, a core dump of QEMU itself is entirely useless, and only confuses users into thinking that this is a bug in QEMU rather than a bug in the guest or a problem with their command line. (This is a variation on the general idea that we shouldn't assert() on something the user can accidentally provoke.) Replace the cpu_abort() with something that explains the situation a bit better and exits QEMU without dumping core. (See LP:1062220 for several examples of confused users.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-id: 1466442425-11885-1-git-send-email-peter.maydell@linaro.org
2016-05-29memory: split memory_region_from_host from qemu_ram_addr_from_hostPaolo Bonzini
Move the old qemu_ram_addr_from_host to memory_region_from_host and make it return an offset within the region. For qemu_ram_addr_from_host return the ram_addr_t directly, similar to what it was before commit 1b5ec23 ("memory: return MemoryRegion from qemu_ram_addr_from_host", 2013-07-04). Reviewed-by: Marc-André Lureau <marcandre.lureau@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-19cpu: move exec-all.h inclusion out of cpu.hPaolo Bonzini
exec-all.h contains TCG-specific definitions. It is not needed outside TCG-specific files such as translate.c, exec.c or *helper.c. One generic function had snuck into include/exec/exec-all.h; move it to include/qom/cpu.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-05-12tcg: Remove needless CPUState::current_tbSergey Fedorov
This field was used for telling cpu_interrupt() to unlink a chain of TBs being executed when it worked that way. Now, cpu_interrupt() don't do this anymore. So we don't need this field anymore. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Message-Id: <1462273462-14036-1-git-send-email-sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2016-03-22cputlb: modernise the debug supportAlex Bennée
To avoid cluttering the code with #ifdef legs we wrap up the print statements into a tlb_debug() macro. As access to the virtual TLB can get quite heavy defining DEBUG_TLB_LOG will ensure all the logs go to the qemu_log target of CPU_LOG_MMU instead of stderr. This remains compile time optional as these debug statements haven't been considered for usefulness for user visible logging. I've also removed DEBUG_TLB_CHECK which wasn't used. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-Id: <1458052224-9316-11-git-send-email-alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-07memory: Drop MemoryRegion.ram_addrFam Zheng
All references to mr->ram_addr are replaced by memory_region_get_ram_addr(mr) (except for a few assertions that are replaced with mr->ram_block). Reviewed-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <1456813104-25902-5-git-send-email-famz@redhat.com> Acked-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-29exec: Clean up includesPeter Maydell
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1453832250-766-4-git-send-email-peter.maydell@linaro.org
2016-01-21exec.c: Pass MemTxAttrs to iotlb_to_region so it uses the right ASPeter Maydell
Pass the MemTxAttrs for the memory access to iotlb_to_region(); this allows it to determine the correct AddressSpace to use for the lookup. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2016-01-21cputlb.c: Use correct address space when looking up MemoryRegionSectionPeter Maydell
When looking up the MemoryRegionSection for the new TLB entry in tlb_set_page_with_attrs(), use cpu_asidx_from_attrs() to determine the correct address space index for the lookup, and pass it into address_space_translate_for_iotlb(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
2015-09-16cputlb: Change tlb_set_dirty() arg to cpuPeter Crosthwaite
Change tlb_set_dirty() to accept a CPU instead of an env pointer. This allows for removal of another CPUArchState usage from prototypes that need to be QOMified. Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Message-Id: <d2b1dcbe7945112989861d8ba7369449c11cc273.1441614289.git.crosthwaite.peter@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-16cputlb: move CPU_LOOP() for tlb_reset() to exec.cPeter Crosthwaite
To prepare for multi-arch, cputlb.c should only have awareness of one single architecture. This means it should not have access to the full CPU lists which may be heterogeneous. Instead, push the CPU_LOOP() up to the one and only caller in exec.c. Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com> Message-Id: <db06dc6c49f8970caaf116d0385f00ee10a56f2f.1441614289.git.crosthwaite.peter@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-11tlb: Add "ifetch" argument to cpu_mmu_index()Benjamin Herrenschmidt
This is set to true when the index is for an instruction fetch translation. The core get_page_addr_code() sets it, as do the SOFTMMU_CODE_ACCESS acessors. All targets ignore it for now, and all other callers pass "false". This will allow targets who wish to split the mmu index between instruction and data accesses to do so. A subsequent patch will do just that for PowerPC. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Message-Id: <1439796853-4410-2-git-send-email-benh@kernel.crashing.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2015-08-25cputlb: Add functions for flushing TLB for a single MMU indexPeter Maydell
Guest CPU TLB maintenance operations may be sufficiently specialized to only need to flush TLB entries corresponding to a particular MMU index. Implement cputlb functions for this, to avoid the inefficiency of flushing TLB entries which we don't need to. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Message-id: 1439548879-1972-2-git-send-email-peter.maydell@linaro.org
2015-06-05memory: replace cpu_physical_memory_reset_dirty() with test-and-clearStefan Hajnoczi
The cpu_physical_memory_reset_dirty() function is sometimes used together with cpu_physical_memory_get_dirty(). This is not atomic since two separate accesses to the dirty memory bitmap are made. Turn cpu_physical_memory_reset_dirty() and cpu_physical_memory_clear_dirty_range_type() into the atomic cpu_physical_memory_test_and_clear_dirty(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <1417519399-3166-6-git-send-email-stefanha@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-05cputlb: remove useless arguments to tlb_unprotect_code_phys, renamePaolo Bonzini
These days modification of the TLB is done in notdirty_mem_write, so the virtual address and env pointer as unnecessary. The new name of the function, tlb_unprotect_code, is consistent with tlb_protect_code. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-04-26Add MemTxAttrs to the IOTLBPeter Maydell
Add a MemTxAttrs field to the IOTLB, and allow target-specific code to set it via a new tlb_set_page_with_attrs() function; pass the attributes through to the device when making IO accesses. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2015-04-26Make CPU iotlb a structure rather than a plain hwaddrPeter Maydell
Make the CPU iotlb a structure rather than a plain hwaddr; this will allow us to add transaction attributes to it. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2015-02-16exec: RCUify AddressSpaceDispatchPaolo Bonzini
Note that even after this patch, most callers of address_space_* functions must still be under the big QEMU lock, otherwise the memory region returned by address_space_translate can disappear as soon as address_space_translate returns. This will be fixed in the next part of this series. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-16exec: make iotlb RCU-friendlyPaolo Bonzini
After the previous patch, TLBs will be flushed on every change to the memory mapping. This patch augments that with synchronization of the MemoryRegionSections referred to in the iotlb array. With this change, it is guaranteed that iotlb_to_region will access the correct memory map, even once the TLB will be accessed outside the BQL. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-12-16qemu-log: add log category for MMU infoAntony Pavlov
Running barebox on qemu-system-mips* with '-d unimp' overloads stderr by very very many mips_cpu_handle_mmu_fault() messages: mips_cpu_handle_mmu_fault address=b80003fd ret 0 physical 00000000180003fd prot 3 mips_cpu_handle_mmu_fault address=a0800884 ret 0 physical 0000000000800884 prot 3 mips_cpu_handle_mmu_fault pc a080cd80 ad b80003fd rw 0 mmu_idx 0 So it's very difficult to find LOG_UNIMP message. The mips_cpu_handle_mmu_fault() messages appear on enabling ANY logging! It's not very handy. Adding separate log category for *_cpu_handle_mmu_fault() logging fixes the problem. Signed-off-by: Antony Pavlov <antonynpavlov@gmail.com> Acked-by: Alexander Graf <agraf@suse.de> Reviewed-by: Richard Henderson <rth@twiddle.net> Message-id: 1418489298-1184-1-git-send-email-antonynpavlov@gmail.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-09-01implementing victim TLB for QEMU system emulated TLBXin Tong
QEMU system mode page table walks are expensive. Taken by running QEMU qemu-system-x86_64 system mode on Intel PIN , a TLB miss and walking a 4-level page tables in guest Linux OS takes ~450 X86 instructions on average. QEMU system mode TLB is implemented using a directly-mapped hashtable. This structure suffers from conflict misses. Increasing the associativity of the TLB may not be the solution to conflict misses as all the ways may have to be walked in serial. A victim TLB is a TLB used to hold translations evicted from the primary TLB upon replacement. The victim TLB lies between the main TLB and its refill path. Victim TLB is of greater associativity (fully associative in this patch). It takes longer to lookup the victim TLB, but its likely better than a full page table walk. The memory translation path is changed as follows : Before Victim TLB: 1. Inline TLB lookup 2. Exit code cache on TLB miss. 3. Check for unaligned, IO accesses 4. TLB refill. 5. Do the memory access. 6. Return to code cache. After Victim TLB: 1. Inline TLB lookup 2. Exit code cache on TLB miss. 3. Check for unaligned, IO accesses 4. Victim TLB lookup. 5. If victim TLB misses, TLB refill 6. Do the memory access. 7. Return to code cache The advantage is that victim TLB can offer more associativity to a directly mapped TLB and thus potentially fewer page table walks while still keeping the time taken to flush within reasonable limits. However, placing a victim TLB before the refill path increase TLB refill path as the victim TLB is consulted before the TLB refill. The performance results demonstrate that the pros outweigh the cons. some performance results taken on SPECINT2006 train datasets and kernel boot and qemu configure script on an Intel(R) Xeon(R) CPU E5620 @ 2.40GHz Linux machine are shown in the Google Doc link below. https://docs.google.com/spreadsheets/d/1eiItzekZwNQOal_h-5iJmC4tMDi051m9qidi5_nwvH4/edit?usp=sharing In summary, victim TLB improves the performance of qemu-system-x86_64 by 11% on average on SPECINT2006, kernelboot and qemu configscript and with highest improvement of in 26% in 456.hmmer. And victim TLB does not result in any performance degradation in any of the measured benchmarks. Furthermore, the implemented victim TLB is architecture independent and is expected to benefit other architectures in QEMU as well. Although there are measurement fluctuations, the performance improvement is very significant and by no means in the range of noises. Signed-off-by: Xin Tong <trent.tong@gmail.com> Message-id: 1407202523-23553-1-git-send-email-trent.tong@gmail.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-06-05softmmu: introduce cpu_ldst.hPaolo Bonzini
This will collect all load and store helpers soon. For now it is just a replacement for softmmu_exec.h, which this patch stops including directly, but we also include it where this will be necessary in order to simplify the next patch. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-05softmmu: move softmmu_template.h out of include/Paolo Bonzini
It is only included in cputlb.c now. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-05softmmu: commonize helper definitionsPaolo Bonzini
They do not need to be in op_helper.c. Because cputlb.c now includes softmmu_template.h twice for each size, io_readX must be elided the second time through. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-05cputlb: Fix regression with TCG interpreter (bug 1310324)Stefan Weil
Commit 0f842f8a246f2b5b51a11c13f933bf7a90ae8e96 replaced GETPC_EXT() which was derived from GETPC() by GETRA_EXT() without fixing cputlb.c. A later patch replaced GETRA_EXT() by GETRA() in exec/softmmu_template.h which is included in cputlb.c. The TCG interpreter failed because the values returned by GETRA() were no longer explicitly set to 0. The redefinition of GETRA() introduced here fixes this. In addition, GETPC_ADJ which is also used in exec/softmmu_template.h is set to 0. Both changes reduce the compiled code size for cputlb.c by more than 100 bytes, so the normal TCG without interpreter also profits from the reduced code size and slightly faster code. Cc: qemu-stable@nongnu.org Reported-by: Giovanni Mascellani <gio@debian.org> Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-13cputlb: Change tlb_set_page() argument to CPUStateAndreas Färber
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13cputlb: Change tlb_flush() argument to CPUStateAndreas Färber
Signed-off-by: Andreas Färber <afaerber@suse.de>
2014-03-13cputlb: Change tlb_flush_page() argument to CPUStateAndreas Färber
Signed-off-by: Andreas Färber <afaerber@suse.de>