aboutsummaryrefslogtreecommitdiff
path: root/accel
AgeCommit message (Collapse)Author
2017-10-24tcg: take tb_ctx out of TCGContextEmilio G. Cota
Groundwork for supporting multiple TCG contexts. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24translate-all: report correct avg host TB sizeEmilio G. Cota
Since commit 6e3b2bfd6 ("tcg: allocate TB structs before the corresponding translated code") we are not fully utilizing code_gen_buffer for translated code, and therefore are incorrectly reporting the amount of translated code as well as the average host TB size. Address this by: - Making the conscious choice of misreporting the total translated code; doing otherwise would mislead users into thinking "-tb-size" is not honoured. - Expanding tb_tree_stats to accurately count the bytes of translated code on the host, and using this for reporting the average tb host size, as well as the expansion ratio. In the future we might want to consider reporting the accurate numbers for the total translated code, together with a "bookkeeping/overhead" field to account for the TB structs. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24exec-all: rename tb_free to tb_removeEmilio G. Cota
We don't really free anything in this function anymore; we just remove the TB from the binary search tree. Suggested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24translate-all: use a binary search tree to track TBs in TBContextEmilio G. Cota
This is a prerequisite for supporting multiple TCG contexts, since we will have threads generating code in separate regions of code_gen_buffer. For this we need a new field (.size) in struct tb_tc to keep track of the size of the translated code. This field uses a size_t to avoid adding a hole to the struct, although really an unsigned int would have been enough. The comparison function we use is optimized for the common case: insertions. Profiling shows that upon booting debian-arm, 98% of comparisons are between existing tb's (i.e. a->size and b->size are both !0), which happens during insertions (and removals, but those are rare). The remaining cases are lookups. From reading the glib sources we see that the first key is always the lookup key. However, the code does not assume this to always be the case because this behaviour is not guaranteed in the glib docs. However, we embed this knowledge in the code as a branch hint for the compiler. Note that tb_free does not free space in the code_gen_buffer anymore, since we cannot easily know whether the tb is the last one inserted in code_gen_buffer. The next patch in this series renames tb_free to tb_remove to reflect this. Performance-wise, lookups in tb_find_pc are the same as before: O(log n). However, insertions are O(log n) instead of O(1), which results in a small slowdown when booting debian-arm: Performance counter stats for 'build/arm-softmmu/qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -device virtio-net-device,netdev=unet \ -drive file=img/arm/jessie-arm32.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel img/arm/aarch32-current-linux-kernel-only.img \ -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 1' (10 runs): - Before: 8048.598422 task-clock (msec) # 0.931 CPUs utilized ( +- 0.28% ) 16,974 context-switches # 0.002 M/sec ( +- 0.12% ) 0 cpu-migrations # 0.000 K/sec 10,125 page-faults # 0.001 M/sec ( +- 1.23% ) 35,144,901,879 cycles # 4.367 GHz ( +- 0.14% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 65,758,252,643 instructions # 1.87 insns per cycle ( +- 0.33% ) 10,871,298,668 branches # 1350.707 M/sec ( +- 0.41% ) 192,322,212 branch-misses # 1.77% of all branches ( +- 0.32% ) 8.640869419 seconds time elapsed ( +- 0.57% ) - After: 8146.242027 task-clock (msec) # 0.923 CPUs utilized ( +- 1.23% ) 17,016 context-switches # 0.002 M/sec ( +- 0.40% ) 0 cpu-migrations # 0.000 K/sec 18,769 page-faults # 0.002 M/sec ( +- 0.45% ) 35,660,956,120 cycles # 4.378 GHz ( +- 1.22% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 65,095,366,607 instructions # 1.83 insns per cycle ( +- 1.73% ) 10,803,480,261 branches # 1326.192 M/sec ( +- 1.95% ) 195,601,289 branch-misses # 1.81% of all branches ( +- 0.39% ) 8.828660235 seconds time elapsed ( +- 0.38% ) Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: Remove CF_IGNORE_ICOUNTRichard Henderson
Now that we have curr_cflags, we can include CF_USE_ICOUNT early and then remove it as necessary. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24cpu-exec: lookup/generate TB outside exclusive region during step_atomicEmilio G. Cota
Now that all code generation has been converted to check CF_PARALLEL, we can generate !CF_PARALLEL code without having yet set !parallel_cpus -- and therefore without having to be in the exclusive region during cpu_exec_step_atomic. While at it, merge cpu_exec_step into cpu_exec_step_atomic. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: check CF_PARALLEL instead of parallel_cpusEmilio G. Cota
Thereby decoupling the resulting translated code from the current state of the system. The tb->cflags field is not passed to tcg generation functions. So we add a field to TCGContext, storing there a copy of tb->cflags. Most architectures have <= 32 registers, which results in a 4-byte hole in TCGContext. Use this hole for the new field. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: convert tb->cflags reads to tb_cflags(tb)Emilio G. Cota
Convert all existing readers of tb->cflags to tb_cflags, so that we use atomic_read and therefore avoid undefined behaviour in C11. Note that the remaining setters/getters of the field are protected by tb_lock, and therefore do not need conversion. Luckily all readers access the field via 'tb->cflags' (so no foo.cflags, bar->cflags in the code base), which makes the conversion easily scriptable: FILES=$(git grep 'tb->cflags' target include/exec/gen-icount.h \ accel/tcg/translator.c | cut -f1 -d':' | sort | uniq) perl -pi -e 's/([^.>])tb->cflags/$1tb_cflags(tb)/g' $FILES perl -pi -e 's/([a-z->.]*)(->|\.)tb->cflags/tb_cflags($1$2tb)/g' $FILES Then manually fixed the few errors that checkpatch reported. Compile-tested for all targets. Suggested-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: Add CPUState cflags_next_tbRichard Henderson
We were generating code during tb_invalidate_phys_page_range, check_watchpoint, cpu_io_recompile, and (seemingly) discarding the TB, assuming that it would magically be picked up during the next iteration through the cpu_exec loop. Instead, record the desired cflags in CPUState so that we request the proper TB so that there is no more magic. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASKEmilio G. Cota
This will enable us to decouple code translation from the value of parallel_cpus at any given time. It will also help us minimize TB flushes when generating code via EXCP_ATOMIC. Note that the declaration of parallel_cpus is brought to exec-all.h to be able to define there the "curr_cflags" inline. Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-20accel/tcg: allow to invalidate a write TLB entry immediatelyDavid Hildenbrand
Background: s390x implements Low-Address Protection (LAP). If LAP is enabled, writing to effective addresses (before any translation) 0-511 and 4096-4607 triggers a protection exception. So we have subpage protection on the first two pages of every address space (where the lowcore - the CPU private data resides). By immediately invalidating the write entry but allowing the caller to continue, we force every write access onto these first two pages into the slow path. we will get a tlb fault with the specific accessed addresses and can then evaluate if protection applies or not. We have to make sure to ignore the invalid bit if tlb_fill() succeeds. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20171016202358.3633-2-david@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
2017-10-19Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into stagingPeter Maydell
* TCG 8-byte atomic accesses bugfix (Andrew) * Report disk rotation rate (Daniel) * Report invalid scsi-disk block size configuration (Mark) * KVM and memory API MemoryListener fixes (David, Maxime, Peter Xu) * x86 CPU hotplug crash fix (Igor) * Load/store API documentation (Peter Maydell) * Small fixes by myself and Thomas * qdev DEVICE_DELETED deferral (Michael) # gpg: Signature made Wed 18 Oct 2017 10:56:24 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (29 commits) scsi: reject configurations with logical block size > physical block size qdev: defer DEVICE_DEL event until instance_finalize() Revert "qdev: Free QemuOpts when the QOM path goes away" qdev: store DeviceState's canonical path to use when unparenting qemu-pr-helper: use new libmultipath API watch_mem_write: implement 8-byte accesses notdirty_mem_write: implement 8-byte accesses memory: reuse section_from_flat_range() kvm: simplify kvm_align_section() kvm: region_add and region_del is not called on updates kvm: fix error message when failing to unregister slot kvm: tolerate non-existing slot for log_start/log_stop/log_sync kvm: fix alignment of ram address memory: call log_start after region_add target/i386: trap on instructions longer than >15 bytes target/i386: introduce x86_ld*_code tco: add trace events docs/devel/loads-stores.rst: Document our various load and store APIs nios2: define tcg_env build: remove CONFIG_LIBDECNUMBER ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-18kvm: simplify kvm_align_section()David Hildenbrand
Use ROUND_UP and simplify the code a bit. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20171016144302.24284-7-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-18kvm: region_add and region_del is not called on updatesDavid Hildenbrand
Attributes are not updated via region_add()/region_del(). Attribute changes lead to a delete first, followed by a new add. If this would ever not be the case, we would get an error when trying to register the new slot. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20171016144302.24284-6-david@redhat.com> Tested-by: Joe Clifford <joeclifford@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-18kvm: fix error message when failing to unregister slotDavid Hildenbrand
"overlapping" is a leftover, let's drop it. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20171016144302.24284-5-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-18kvm: tolerate non-existing slot for log_start/log_stop/log_syncDavid Hildenbrand
If we want to trap every access to a section, we might not have a slot. So let's just tolerate if we don't have one. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20171016144302.24284-4-david@redhat.com> Tested-by: Joe Clifford <joeclifford@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-18kvm: fix alignment of ram addressDavid Hildenbrand
Fix the wrong calculation of the delta, used to align the ram address. This only strikes if alignment has to be done. Reported-by: Joe Clifford <joeclifford@gmail.com> Fixes: 5ea69c2e3614 ("kvm: factor out alignment of memory section") Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20171016144302.24284-3-david@redhat.com> Tested-by: Joe Clifford <joeclifford@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-16tcg: Fix off-by-one in assert in page_set_flagsRichard Henderson
Most of the users of page_set_flags offset (page, page + len) as the end points. One might consider this an error, since the other users do supply an endpoint as the last byte of the region. However, the first thing that page_set_flags does is round end UP to the start of the next page. Which means computing page + len - 1 is in the end pointless. Therefore, accept this usage and do not assert when given the exact size of the vm as the endpoint. Signed-off-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20170708025030.15845-2-rth@twiddle.net> Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2017-10-10exec-all: extract tb->tc_* into a separate struct tc_tbEmilio G. Cota
In preparation for adding tc.size to be able to keep track of TB's using the binary search tree implementation from glib. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10translate-all: define and use DEBUG_TB_CHECK_GATEEmilio G. Cota
This prevents bit rot by ensuring the debug code is compiled when building a user-mode target. Unfortunately the helpers are user-mode-only so we cannot fully get rid of the ifdef checks. Add a comment to explain this. Suggested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10translate-all: define and use DEBUG_TB_INVALIDATE_GATEEmilio G. Cota
This gets rid of an ifdef check while ensuring that the debug code is compiled, which prevents bit rot. Suggested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10exec-all: introduce TB_PAGE_ADDR_FMTEmilio G. Cota
And fix the following warning when DEBUG_TB_INVALIDATE is enabled in translate-all.c: CC mipsn32-linux-user/accel/tcg/translate-all.o /data/src/qemu/accel/tcg/translate-all.c: In function ‘tb_alloc_page’: /data/src/qemu/accel/tcg/translate-all.c:1201:16: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘tb_page_addr_t {aka unsigned int}’ [-Werror=format=] printf("protecting code page: 0x" TARGET_FMT_lx "\n", ^ cc1: all warnings being treated as errors /data/src/qemu/rules.mak:66: recipe for target 'accel/tcg/translate-all.o' failed make[1]: *** [accel/tcg/translate-all.o] Error 1 Makefile:328: recipe for target 'subdir-mipsn32-linux-user' failed make: *** [subdir-mipsn32-linux-user] Error 2 cota@flamenco:/data/src/qemu/build ((18f3fe1...) *$)$ Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10translate-all: define and use DEBUG_TB_FLUSH_GATEEmilio G. Cota
This gets rid of some ifdef checks while ensuring that the debug code is compiled, which prevents bit rot. Suggested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10exec-all: bring tb->invalid into tb->cflagsEmilio G. Cota
This gets rid of a hole in struct TranslationBlock. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10tcg: consolidate TB lookups in tb_lookup__cpu_stateEmilio G. Cota
This avoids duplicating code. cpu_exec_step will also use the new common function once we integrate parallel_cpus into tb->cflags. Note that in this commit we also fix a race, described by Richard Henderson during review. Think of this scenario with threads A and B: (A) Lookup succeeds for TB in hash without tb_lock (B) Sets the TB's tb->invalid flag (B) Removes the TB from tb_htable (B) Clears all CPU's tb_jmp_cache (A) Store TB into local tb_jmp_cache Given that order of events, (A) will keep executing that invalid TB until another flush of its tb_jmp_cache happens, which in theory might never happen. We can fix this by checking the tb->invalid flag every time we look up a TB from tb_jmp_cache, so that in the above scenario, next time we try to find that TB in tb_jmp_cache, we won't, and will therefore be forced to look it up in tb_htable. Performance-wise, I measured a small improvement when booting debian-arm. Note that inlining pays off: Performance counter stats for 'taskset -c 0 qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -device virtio-net-device,netdev=unet \ -drive file=jessie.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 1' (10 runs): Before: 18714.917392 task-clock # 0.952 CPUs utilized ( +- 0.95% ) 23,142 context-switches # 0.001 M/sec ( +- 0.50% ) 1 CPU-migrations # 0.000 M/sec 10,558 page-faults # 0.001 M/sec ( +- 0.95% ) 53,957,727,252 cycles # 2.883 GHz ( +- 0.91% ) [83.33%] 24,440,599,852 stalled-cycles-frontend # 45.30% frontend cycles idle ( +- 1.20% ) [83.33%] 16,495,714,424 stalled-cycles-backend # 30.57% backend cycles idle ( +- 0.95% ) [66.66%] 76,267,572,582 instructions # 1.41 insns per cycle # 0.32 stalled cycles per insn ( +- 0.87% ) [83.34%] 12,692,186,323 branches # 678.186 M/sec ( +- 0.92% ) [83.35%] 263,486,879 branch-misses # 2.08% of all branches ( +- 0.73% ) [83.34%] 19.648474449 seconds time elapsed ( +- 0.82% ) After, w/ inline (this patch): 18471.376627 task-clock # 0.955 CPUs utilized ( +- 0.96% ) 23,048 context-switches # 0.001 M/sec ( +- 0.48% ) 1 CPU-migrations # 0.000 M/sec 10,708 page-faults # 0.001 M/sec ( +- 0.81% ) 53,208,990,796 cycles # 2.881 GHz ( +- 0.98% ) [83.34%] 23,941,071,673 stalled-cycles-frontend # 44.99% frontend cycles idle ( +- 0.95% ) [83.34%] 16,161,773,848 stalled-cycles-backend # 30.37% backend cycles idle ( +- 0.76% ) [66.67%] 75,786,269,766 instructions # 1.42 insns per cycle # 0.32 stalled cycles per insn ( +- 1.24% ) [83.34%] 12,573,617,143 branches # 680.708 M/sec ( +- 1.34% ) [83.33%] 260,235,550 branch-misses # 2.07% of all branches ( +- 0.66% ) [83.33%] 19.340502161 seconds time elapsed ( +- 0.56% ) After, w/o inline: 18791.253967 task-clock # 0.954 CPUs utilized ( +- 0.78% ) 23,230 context-switches # 0.001 M/sec ( +- 0.42% ) 1 CPU-migrations # 0.000 M/sec 10,563 page-faults # 0.001 M/sec ( +- 1.27% ) 54,168,674,622 cycles # 2.883 GHz ( +- 0.80% ) [83.34%] 24,244,712,629 stalled-cycles-frontend # 44.76% frontend cycles idle ( +- 1.37% ) [83.33%] 16,288,648,572 stalled-cycles-backend # 30.07% backend cycles idle ( +- 0.95% ) [66.66%] 77,659,755,503 instructions # 1.43 insns per cycle # 0.31 stalled cycles per insn ( +- 0.97% ) [83.34%] 12,922,780,045 branches # 687.702 M/sec ( +- 1.06% ) [83.34%] 261,962,386 branch-misses # 2.03% of all branches ( +- 0.71% ) [83.35%] 19.700174670 seconds time elapsed ( +- 0.56% ) Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10tcg: remove addr argument from lookup_tb_ptrEmilio G. Cota
It is unlikely that we will ever want to call this helper passing an argument other than the current PC. So just remove the argument, and use the pc we already get from cpu_get_tb_cpu_state. This change paves the way to having a common "tb_lookup" function. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10cpu-exec: rename have_tb_lock to acquired_tb_lock in tb_findEmilio G. Cota
Reusing the have_tb_lock name, which is also defined in translate-all.c, makes code reviewing unnecessarily harder. Avoid potential confusion by renaming the local have_tb_lock variable to something else. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10translate-all: make have_tb_lock staticEmilio G. Cota
It is only used by this object, and it's not exported to any other. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10tcg: fix corruption of code_time profiling counter upon tb_flushEmilio G. Cota
Whenever there is an overflow in code_gen_buffer (e.g. we run out of space in it and have to flush it), the code_time profiling counter ends up with an invalid value (that is, code_time -= profile_getclock(), without later on getting += profile_getclock() due to the goto). Fix it by using the ti variable, so that we only update code_time when there is no overflow. Note that in case there is an overflow we fail to account for the elapsed coding time, but this is quite rare so we can probably live with it. "info jit" before/after, roughly at the same time during debian-arm bootup: - before: Statistics: TB flush count 1 TB invalidate count 4665 TLB flush count 998 JIT cycles -615191529184601 (-256329.804 s at 2.4 GHz) translated TBs 302310 (aborted=0 0.0%) avg ops/TB 48.4 max=438 deleted ops/TB 8.54 avg temps/TB 32.31 max=38 avg host code/TB 361.5 avg search data/TB 24.5 cycles/op -42014693.0 cycles/in byte -121444900.2 cycles/out byte -5629031.1 cycles/search byte -83114481.0 gen_interm time -0.0% gen_code time 100.0% optim./code time -0.0% liveness/code time -0.0% cpu_restore count 6236 avg cycles 110.4 - after: Statistics: TB flush count 1 TB invalidate count 4665 TLB flush count 1010 JIT cycles 1996899624 (0.832 s at 2.4 GHz) translated TBs 297961 (aborted=0 0.0%) avg ops/TB 48.5 max=438 deleted ops/TB 8.56 avg temps/TB 32.31 max=38 avg host code/TB 361.8 avg search data/TB 24.5 cycles/op 138.2 cycles/in byte 398.4 cycles/out byte 18.5 cycles/search byte 273.1 gen_interm time 14.0% gen_code time 86.0% optim./code time 19.4% liveness/code time 10.3% cpu_restore count 6372 avg cycles 111.0 Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-10cputlb: bring back tlb_flush_count under !TLB_DEBUGEmilio G. Cota
Commit f0aff0f124 ("cputlb: add assert_cpu_is_self checks") buried the increment of tlb_flush_count under TLB_DEBUG. This results in "info jit" always (mis)reporting 0 TLB flushes when !TLB_DEBUG. Besides, under MTTCG tlb_flush_count is updated by several threads, so in order not to lose counts we'd either have to use atomic ops or distribute the counter, which is more scalable. This patch does the latter by embedding tlb_flush_count in CPUArchState. The global count is then easily obtained by iterating over the CPU list. Note that this change also requires updating the accessors to tlb_flush_count to use atomic_read/set whenever there may be conflicting accesses (as defined in C11) to it. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-02kvm: check KVM_CAP_NR_VCPUS with kvm_vm_check_extension()Greg Kurz
On a modern server-class ppc host with the following CPU topology: Architecture: ppc64le Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0,8,16,24 Off-line CPU(s) list: 1-7,9-15,17-23,25-31 Thread(s) per core: 1 If both KVM PR and KVM HV loaded and we pass: -machine pseries,accel=kvm,kvm-type=PR -smp 8 We expect QEMU to warn that this exceeds the number of online CPUs: Warning: Number of SMP cpus requested (8) exceeds the recommended cpus supported by KVM (4) Warning: Number of hotpluggable cpus requested (8) exceeds the recommended cpus supported by KVM (4) but nothing is printed... This happens because on ppc the KVM_CAP_NR_VCPUS capability is VM specific ndreally depends on the KVM type, but we currently use it as a global capability. And KVM returns a fallback value based on KVM HV being present. Maybe KVM on POWER shouldn't presume anything as long as it doesn't have a VM, but in all cases, we should call KVM_CREATE_VM first and use KVM_CAP_NR_VCPUS as a VM capability. This patch hence changes kvm_recommended_vcpus() accordingly and moves the sanity checking of smp_cpus after the VM creation. It is okay for the other archs that also implement KVM_CAP_NR_VCPUS, ie, mips, s390, x86 and arm, because they don't depend on the VM being created or not. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <150600966286.30533.10909862523552370889.stgit@bahia.lan> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-10-02kvm: check KVM_CAP_SYNC_MMU with kvm_vm_check_extension()Greg Kurz
On a server-class ppc host, this capability depends on the KVM type, ie, HV or PR. If both KVM are present in the kernel, we will always get the HV specific value, even if we explicitely requested PR on the command line. This can have an impact if we're using hugepages or a balloon device. Since we've already created the VM at the time any user calls kvm_has_sync_mmu(), switching to kvm_vm_check_extension() is enough to fix any potential issue. It is okay for the other archs that also implement KVM_CAP_SYNC_MMU, ie, mips, s390, x86 and arm, because they don't depend on the VM being created or not. While here, let's cache the state of this extension in a bool variable, since it has several users in the code, as suggested by Thomas Huth. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <150600965332.30533.14702405809647835716.stgit@bahia.lan> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-25accel/tcg/cputlb: avoid recursive BQL (fixes #1706296)Alex Bennée
The mmio path (see exec.c:prepare_mmio_access) already protects itself against recursive locking and it makes sense to do the same for io_readx/writex. Otherwise any helper running in the BQL context will assert when it attempts to write to device memory as in the case of the bug report. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> CC: Richard Jones <rjones@redhat.com> CC: Paolo Bonzini <bonzini@gnu.org> CC: qemu-stable@nongnu.org Message-Id: <20170921110625.9500-1-alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-21kvm: drop wrong assertion creating problems with pflashDavid Hildenbrand
pflash toggles mr->romd_mode. So this assert does not always hold. 1) a device was added with !mr->romd_mode, therefore effectively not creating a kvm slot as we want to trap every access (add = false). 2) mr->romd_mode was toggled on before remove it. There is now actually no slot to remove and the assert is wrong. So let's just drop the assert. Reported-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170920145025.19403-1-david@redhat.com> Tested-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19accel/hax: move hax-stub.c to accel/stubs/Philippe Mathieu-Daudé
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20170913221149.30382-1-f4bug@amsat.org> Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2017-09-19Convert multi-line fprintf() to warn_report()Alistair Francis
Convert all the multi-line uses of fprintf(stderr, "warning:"..."\n"... to use warn_report() instead. This helps standardise on a single method of printing warnings to the user. All of the warnings were changed using these commands: find ./* -type f -exec sed -i \ 'N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N;N {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + find ./* -type f -exec sed -i \ 'N;N;N;N;N;N;N; {s|fprintf(.*".*warning[,:] \(.*\)\\n"\(.*\));|warn_report("\1"\2);|Ig}' \ {} + Indentation fixed up manually afterwards. Some of the lines were manually edited to reduce the line length to below 80 charecters. Some of the lines with newlines in the middle of the string were also manually edit to avoid checkpatch errrors. The #include lines were manually updated to allow the code to compile. Several of the warning messages can be improved after this patch, to keep this patch mechanical this has been moved into a later patch. Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Anthony Perard <anthony.perard@citrix.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Yongbok Kim <yongbok.kim@imgtec.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Alexander Graf <agraf@suse.de> Cc: Jason Wang <jasowang@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <5def63849ca8f551630c6f2b45bcb1c482f765a6.1505158760.git.alistair.francis@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19kvm: kvm_log_sync() is only called with known memory sectionsDavid Hildenbrand
Flatview will make sure that we can only end up in this function with memory sections that correspond to exactly one slot. So we don't have to iterate multiple times. There won't be overlapping slots but only matching slots. Properly align the section and look up the corresponding slot. This heavily simplifies this function. We can now get rid of kvm_lookup_overlapping_slot(). Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170911174933.20789-7-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19kvm: kvm_log_start/stop are only called with known sectionsDavid Hildenbrand
Let's properly align the sections first and bail out if we would ever get called with a memory section we don't know yet. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170911174933.20789-6-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19kvm: we never have overlapping slots in kvm_set_phys_mem()David Hildenbrand
The way flatview handles memory sections, we will never have overlapping memory sections in kvm. address_space_update_topology_pass() will make sure that we will only get called for a) an existing memory section for which we only update parameters (log_start, log_stop). b) an existing memory section we want to delete (region_del) c) a brand new memory section we want to add (region_add) We cannot have overlapping memory sections in kvm as we will first remove the overlapping sections and then add the ones without conflicts. Therefore we can remove the complexity for handling prefix and suffix slots. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170911174933.20789-5-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19kvm: use start + size for memory rangesDavid Hildenbrand
Convert kvm_lookup_matching_slot(). Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170911174933.20789-4-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19kvm: factor out alignment of memory sectionDavid Hildenbrand
Factor it out, so we can reuse it later. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170911174933.20789-3-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-19kvm: require JOIN_MEMORY_REGIONS_WORKSDavid Hildenbrand
We already require DESTROY_MEMORY_REGION_WORKS, JOIN_MEMORY_REGIONS_WORKS was added just half a year later. In addition, with flatview overlapping memory regions are first removed before adding the changed one. So we can't really detect joining memory regions this way. Let's just get rid of this special handling. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170911174933.20789-2-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-09-17accel/tcg: move USER code to user-exec.cPhilippe Mathieu-Daudé
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20170912211934.20919-1-f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17accel/tcg: move atomic_template.h to accel/tcg/Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Thomas Huth <thuth@redhat.com> Message-Id: <20170911213328.9701-5-f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17accel/tcg: move tcg-runtime to accel/tcg/Philippe Mathieu-Daudé
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20170911213328.9701-4-f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17accel/tcg: move user-exec to accel/tcg/Philippe Mathieu-Daudé
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20170911213328.9701-3-f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-17accel/tcg: move softmmu_template.h to accel/tcg/Thomas Huth
The header is only used by accel/tcg/cputlb.c so we can move it to the accel/tcg/ folder, too. Signed-off-by: Thomas Huth <thuth@redhat.com> [PMD: reword commit title to match series] Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20170911213328.9701-2-f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-09-07tcg: Infrastructure for managing constant poolsRichard Henderson
A new shared header tcg-pool.inc.c adds new_pool_label, for registering a tcg_target_ulong to be emitted after the generated code, plus relocation data to install a pointer to the data. A new pointer is added to the TCGContext, so that we dump the constant pool as data, not code. Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-07tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.hRichard Henderson
Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump boolean test. Replace the tb_set_jmp_target1 ifdef with an unconditional function tb_target_set_jmp_target. While we're touching all backends, add a parameter for tb->tc_ptr; we're going to need it shortly for some backends. Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c. This opens the possibility for TCG_TARGET_HAS_direct_jump to be a runtime decision -- based on host cpu capabilities, the size of code_gen_buffer, or a future debugging switch. Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-09-06tcg: Add generic translation frameworkLluís Vilanova
Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu> Message-Id: <150002073981.22386.9870422422367410100.stgit@frigg.lan> [rth: Moved max_insns adjustment from tb_start to init_disas_context. Removed pc_next return from translate_insn. Removed tcg_check_temp_count from generic loop. Moved gen_io_end to exactly match gen_io_start. Use qemu_log instead of error_report for temporary leaks. Moved TB size/icount assignments before disas_log.] Signed-off-by: Richard Henderson <rth@twiddle.net>