aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-10-27sh4: r2d: use generic cpu_model parsingIgor Mammedov
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1507211474-188400-20-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-27openrisc: use generic cpu_model parsingIgor Mammedov
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1507211474-188400-19-git-send-email-imammedo@redhat.com> Acked-by: Stafford Horne <shorne@gmail.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-27openrisc: cleanup cpu type name compositionIgor Mammedov
use new OPENRISC_CPU_TYPE_NAME to compose CPU type name and get rid of intermediate OpenRISCCPUInfo/openrisc_cpu_register_types() which is replaced by static TypeInfo array. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1507211474-188400-18-git-send-email-imammedo@redhat.com> Acked-by: Stafford Horne <shorne@gmail.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-27moxie: use generic cpu_model parsingIgor Mammedov
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1507211474-188400-17-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-27moxie: cleanup cpu type name compositionIgor Mammedov
introduce MOXIE_CPU_TYPE_NAME macro and consistently use it to construct cpu type names. While at it replace dynamic cpu type name composition with static data. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1507211474-188400-16-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-27moxie: fix qemu-system-moxie failing to start with CLI "-cpu MoxieLite"Igor Mammedov
It 'works' with default CPU only because of bug in moxie_cpu_class_by_name() where it treats cpu_model as type name and default cpu_model also happens to be type name. But specifying explicitly cpu on CLI, ex: '-cpu MoxieLite', makes QEMU fail since moxie_cpu_class_by_name() doesn't traslate cpu_model to cpu type and fails to find corresponding object class. Fix moxie_cpu_class_by_name() to do proper cpu_model -> cpu type translation and fix default cpu_model to be cpu_model instead of being typename. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1507211474-188400-15-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-27m68k: mcf5208: use generic cpu_model parsingIgor Mammedov
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Thomas Huth <huth@tuxfamily.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <1507211474-188400-14-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-27m68k: an5206: use generic cpu_model parsingIgor Mammedov
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Thomas Huth <huth@tuxfamily.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <1507211474-188400-13-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-27m68k: cleanup cpu type name compositionIgor Mammedov
use new M68K_CPU_TYPE_NAME to compose CPU type names and get rid of intermediate M68kCPUInfo/register_cpu_type() which is replaced by static TypeInfo array. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Acked-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <1507211474-188400-12-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-27lm32: lm32_boards: use generic cpu_model parsingIgor Mammedov
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Acked-by: Michael Walle <michael@walle.cc> Message-Id: <1507211474-188400-11-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-27lm32: milkymist: use generic cpu_model parsingIgor Mammedov
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Acked-by: Michael Walle <michael@walle.cc> Message-Id: <1507211474-188400-10-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-27lm32: cleanup cpu type name compositionIgor Mammedov
introduce LM32_CPU_TYPE_NAME macro and consistently use it to construct cpu type names. While at it replace dynamic cpu type name composition with static data. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Acked-by: Michael Walle <michael@walle.cc> Message-Id: <1507211474-188400-9-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-27cris: use generic cpu_model parsingIgor Mammedov
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1507211474-188400-8-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-27cris: cleanup cpu type name compositionIgor Mammedov
replace ambiguous TYPE macro with a new CRIS_CPU_TYPE_NAME and use it consistently in the code. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1507211474-188400-7-git-send-email-imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-27alpha: use generic cpu_model parsingIgor Mammedov
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1507211474-188400-6-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-27alpha: cleanup cpu type name compositionIgor Mammedov
Introduce ALPHA_CPU_TYPE_NAME macro to replace rather ununique TYPE macro that alpha uses. With new macro it will follow the same naming convention as other targets. While at it put scattered TypeInfo into one array which places type desriptions at one place and reduces code a bit. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <1507211474-188400-5-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-10-26tcg: Avoid setting tcg_initialize if !CONFIG_TCGRichard Henderson
Fix the build for --disable-tcg. Fixes: 55c3ceef61fcf06fc98ddc752b7cce788ce7680b Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-id: 20171026135814.20773-1-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-26Merge remote-tracking branch ↵Peter Maydell
'remotes/stefanberger/tags/pull-tpm-2017-10-24-1' into staging Merge tpm 2017/10/24 v1 # gpg: Signature made Wed 25 Oct 2017 06:06:55 BST # gpg: using RSA key 0x75AD65802A0B4211 # gpg: Good signature from "Stefan Berger <stefanb@linux.vnet.ibm.com>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: B818 B9CA DF90 89C2 D5CE C66B 75AD 6580 2A0B 4211 * remotes/stefanberger/tags/pull-tpm-2017-10-24-1: tpm: print buffers received from TPM when debugging vl: remove unnecessary #ifdef CONFIG_TPM tpm: remove unnecessary #ifdef CONFIG_TPM tpm: add stubs tpm: add missing include Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-25Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20171025' into stagingPeter Maydell
TCG patch queue # gpg: Signature made Wed 25 Oct 2017 10:30:18 BST # gpg: using RSA key 0x64DF38E8AF7E215F # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth/tags/pull-tcg-20171025: (51 commits) translate-all: exit from tb_phys_invalidate if qht_remove fails tcg: Initialize cpu_env generically tcg: enable multiple TCG contexts in softmmu tcg: introduce regions to split code_gen_buffer translate-all: use qemu_protect_rwx/none helpers osdep: introduce qemu_mprotect_rwx/none tcg: allocate optimizer temps with tcg_malloc tcg: distribute profiling counters across TCGContext's tcg: introduce **tcg_ctxs to keep track of all TCGContext's gen-icount: fold exitreq_label into TCGContext tcg: define tcg_init_ctx and make tcg_ctx a pointer tcg: take tb_ctx out of TCGContext translate-all: report correct avg host TB size exec-all: rename tb_free to tb_remove translate-all: use a binary search tree to track TBs in TBContext tcg: Remove CF_IGNORE_ICOUNT tcg: Add CF_LAST_IO + CF_USE_ICOUNT to CF_HASH_MASK cpu-exec: lookup/generate TB outside exclusive region during step_atomic tcg: check CF_PARALLEL instead of parallel_cpus target/sparc: check CF_PARALLEL instead of parallel_cpus ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-25Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20171023' ↵Peter Maydell
into staging migration/next for 20171023 # gpg: Signature made Mon 23 Oct 2017 17:05:14 BST # gpg: using RSA key 0xF487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20171023: (21 commits) migration: Improve migration thread error handling qapi: Fix grammar in x-multifd-page-count descriptions migration: add bitmap for received page migration: introduce qemu_ufd_copy_ioctl helper migration: postcopy_place_page factoring out migration: new ram_init_bitmaps() migration: clean up xbzrle cache init/destroy migration: provide ram_state_cleanup migration: provide ram_state_init() migration: pause-before-switchover for postcopy migration: allow cancel to unpause migrate: HMP migate_continue migration: migrate-continue migration: Wait for semaphore before completing migration migration: Add 'pre-switchover' and 'device' statuses migration: Add 'pause-before-switchover' capability migration: Make cache_init() take an error parameter migration: Move xbzrle cache resize error handling to xbzrle_cache_resize migration: Make cache size elements use the right types migratiom: Remove max_item_age parameter ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-25tpm: print buffers received from TPM when debuggingStefan Berger
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
2017-10-25vl: remove unnecessary #ifdef CONFIG_TPMPhilippe Mathieu-Daudé
a stub is now provided. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
2017-10-25tpm: remove unnecessary #ifdef CONFIG_TPMPhilippe Mathieu-Daudé
Makefile.objs now checks for $(CONFIG_TPM). Suggested-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
2017-10-25tpm: add stubsPhilippe Mathieu-Daudé
Commit c37cacabf22 moved tpm_cleanup() in the main loop exit, however this function is not available when compiling with --disable-tpm. Provides necessary stubs to keep code clean of #ifdef'fery. Reported-by: BALATON Zoltan <balaton@eik.bme.hu> Message-Id: <20171023102903.256AF7456A0@zero.eik.bme.hu> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
2017-10-24translate-all: exit from tb_phys_invalidate if qht_remove failsEmilio G. Cota
Two or more threads might race while invalidating the same TB. We currently do not check for this at all despite taking tb_lock, which means we would wrongly invalidate the same TB more than once. This bug has actually been hit by users: I recently saw a report on IRC, although I have yet to see the corresponding test case. Fix this by using qht_remove as the synchronization point; if it fails, that means the TB has already been invalidated, and therefore there is nothing left to do in tb_phys_invalidate. Note that this solution works now that we still have tb_lock, and will continue working once we remove tb_lock. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1508445114-4717-1-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: Initialize cpu_env genericallyRichard Henderson
This is identical for each target. So, move the initialization to common code. Move the variable itself out of tcg_ctx and name it cpu_env to minimize changes within targets. This also means we can remove tcg_global_reg_new_{ptr,i32,i64}, since there are no longer global-register temps created by targets. Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: enable multiple TCG contexts in softmmuEmilio G. Cota
This enables parallel TCG code generation. However, we do not take advantage of it yet since tb_lock is still held during tb_gen_code. In user-mode we use a single TCG context; see the documentation added to tcg_region_init for the rationale. Note that targets do not need any conversion: targets initialize a TCGContext (e.g. defining TCG globals), and after this initialization has finished, the context is cloned by the vCPU threads, each of them keeping a separate copy. TCG threads claim one entry in tcg_ctxs[] by atomically increasing n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's of that variable and tcg_ctxs; they are there just to play nice with analysis tools such as thread sanitizer. Note that we do not allocate an array of contexts (we allocate an array of pointers instead) because when tcg_context_init is called, we do not know yet how many contexts we'll use since the bool behind qemu_tcg_mttcg_enabled() isn't set yet. Previous patches folded some TCG globals into TCGContext. The non-const globals remaining are only set at init time, i.e. before the TCG threads are spawned. Here is a list of these set-at-init-time globals under tcg/: Only written by tcg_context_init: - indirect_reg_alloc_order - tcg_op_defs Only written by tcg_target_init (called from tcg_context_init): - tcg_target_available_regs - tcg_target_call_clobber_regs - arm: arm_arch, use_idiv_instructions - i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt, have_movbe, have_popcnt - mips: use_movnz_instructions, use_mips32_instructions, use_mips32r2_instructions, got_sigill (tcg_target_detect_isa) - ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr - s390: tb_ret_addr, s390_facilities - sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines), use_vis3_instructions Only written by tcg_prologue_init: - 'struct jit_code_entry one_entry' - aarch64: tb_ret_addr - arm: tb_ret_addr - i386: tb_ret_addr, guest_base_flags - ia64: tb_ret_addr - mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: introduce regions to split code_gen_bufferEmilio G. Cota
This is groundwork for supporting multiple TCG contexts. The naive solution here is to split code_gen_buffer statically among the TCG threads; this however results in poor utilization if translation needs are different across TCG threads. What we do here is to add an extra layer of indirection, assigning regions that act just like pages do in virtual memory allocation. (BTW if you are wondering about the chosen naming, I did not want to use blocks or pages because those are already heavily used in QEMU). We use a global lock to serialize allocations as well as statistics reporting (we now export the size of the used code_gen_buffer with tcg_code_size()). Note that for the allocator we could just use a counter and atomic_inc; however, that would complicate the gathering of tcg_code_size()-like stats. So given that the region operations are not a fast path, a lock seems the most reasonable choice. The effectiveness of this approach is clear after seeing some numbers. I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark. Note that I'm evaluating this after enabling per-thread TCG (which is done by a subsequent commit). * -smp 1, 1 region (entire buffer): qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357 qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363 qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364 qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373 qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373 qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360 qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370 qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367 That is, 8 flushes. * -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]: qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356 qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361 qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361 qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375 qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375 qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360 qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365 qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368 Again, 8 flushes. Note how buffer utilization is not 100%, but it is close. Smaller region sizes would yield higher utilization, but we want region allocation to be rare (it acquires a lock), so we do not want to go too small. * -smp 8, static partitioning of 8 regions (10 MB per region): qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354 qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370 qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365 qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377 qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358 qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367 qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364 qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358 qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362 qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372 qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374 qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376 qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374 qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372 qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359 qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362 qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368 qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378 qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367 qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364 That is, 20 flushes. Note how a static partitioning approach uses the code buffer poorly, leading to many unnecessary flushes. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24translate-all: use qemu_protect_rwx/none helpersEmilio G. Cota
The helpers require the address and size to be page-aligned, so do that before calling them. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24osdep: introduce qemu_mprotect_rwx/noneEmilio G. Cota
Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: allocate optimizer temps with tcg_mallocEmilio G. Cota
Groundwork for supporting multiple TCG contexts. While at it, also allocate temps_used directly as a bitmap of the required size, instead of using a bitmap of TCG_MAX_TEMPS via TCGTempSet. Performance-wise we lose about 1.12% in a translation-heavy workload such as booting+shutting down debian-arm: Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -device virtio-net-device,netdev=unet \ -drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 1' (10 runs): exec time (s) Relative slowdown wrt original (%) --------------------------------------------------------------- original 20.213321616 0. tcg_malloc 20.441130078 1.1270214 TCGContext 20.477846517 1.3086662 g_malloc 20.780527895 2.8061013 The other two alternatives shown in the table are: - TCGContext: embed temps[TCG_MAX_TEMPS] and TCGTempSet used_temps in TCGContext. This is simple enough but it isn't faster than using tcg_malloc; moreover, it wastes memory. - g_malloc: allocate/deallocate both temps and used_temps every time tcg_optimize is executed. Suggested-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: distribute profiling counters across TCGContext'sEmilio G. Cota
This is groundwork for supporting multiple TCG contexts. To avoid scalability issues when profiling info is enabled, this patch makes the profiling info counters distributed via the following changes: 1) Consolidate profile info into its own struct, TCGProfile, which TCGContext also includes. Note that tcg_table_op_count is brought into TCGProfile after dropping the tcg_ prefix. 2) Iterate over the TCG contexts in the system to obtain the total counts. This change also requires updating the accessors to TCGProfile fields to use atomic_read/set whenever there may be conflicting accesses (as defined in C11) to them. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: introduce **tcg_ctxs to keep track of all TCGContext'sEmilio G. Cota
Groundwork for supporting multiple TCG contexts. Note that having n_tcg_ctxs is unnecessary. However, it is convenient to have it, since it will simplify iterating over the array: we'll have just a for loop instead of having to iterate over a NULL-terminated array (which would require n+1 elems) or having to check with ifdef's for usermode/softmmu. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24gen-icount: fold exitreq_label into TCGContextEmilio G. Cota
Groundwork for supporting multiple TCG contexts. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: define tcg_init_ctx and make tcg_ctx a pointerEmilio G. Cota
Groundwork for supporting multiple TCG contexts. The core of this patch is this change to tcg/tcg.h: > -extern TCGContext tcg_ctx; > +extern TCGContext tcg_init_ctx; > +extern TCGContext *tcg_ctx; Note that for now we set *tcg_ctx to whatever TCGContext is passed to tcg_context_init -- in this case &tcg_init_ctx. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: take tb_ctx out of TCGContextEmilio G. Cota
Groundwork for supporting multiple TCG contexts. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24translate-all: report correct avg host TB sizeEmilio G. Cota
Since commit 6e3b2bfd6 ("tcg: allocate TB structs before the corresponding translated code") we are not fully utilizing code_gen_buffer for translated code, and therefore are incorrectly reporting the amount of translated code as well as the average host TB size. Address this by: - Making the conscious choice of misreporting the total translated code; doing otherwise would mislead users into thinking "-tb-size" is not honoured. - Expanding tb_tree_stats to accurately count the bytes of translated code on the host, and using this for reporting the average tb host size, as well as the expansion ratio. In the future we might want to consider reporting the accurate numbers for the total translated code, together with a "bookkeeping/overhead" field to account for the TB structs. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24exec-all: rename tb_free to tb_removeEmilio G. Cota
We don't really free anything in this function anymore; we just remove the TB from the binary search tree. Suggested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24translate-all: use a binary search tree to track TBs in TBContextEmilio G. Cota
This is a prerequisite for supporting multiple TCG contexts, since we will have threads generating code in separate regions of code_gen_buffer. For this we need a new field (.size) in struct tb_tc to keep track of the size of the translated code. This field uses a size_t to avoid adding a hole to the struct, although really an unsigned int would have been enough. The comparison function we use is optimized for the common case: insertions. Profiling shows that upon booting debian-arm, 98% of comparisons are between existing tb's (i.e. a->size and b->size are both !0), which happens during insertions (and removals, but those are rare). The remaining cases are lookups. From reading the glib sources we see that the first key is always the lookup key. However, the code does not assume this to always be the case because this behaviour is not guaranteed in the glib docs. However, we embed this knowledge in the code as a branch hint for the compiler. Note that tb_free does not free space in the code_gen_buffer anymore, since we cannot easily know whether the tb is the last one inserted in code_gen_buffer. The next patch in this series renames tb_free to tb_remove to reflect this. Performance-wise, lookups in tb_find_pc are the same as before: O(log n). However, insertions are O(log n) instead of O(1), which results in a small slowdown when booting debian-arm: Performance counter stats for 'build/arm-softmmu/qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -device virtio-net-device,netdev=unet \ -drive file=img/arm/jessie-arm32.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel img/arm/aarch32-current-linux-kernel-only.img \ -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 1' (10 runs): - Before: 8048.598422 task-clock (msec) # 0.931 CPUs utilized ( +- 0.28% ) 16,974 context-switches # 0.002 M/sec ( +- 0.12% ) 0 cpu-migrations # 0.000 K/sec 10,125 page-faults # 0.001 M/sec ( +- 1.23% ) 35,144,901,879 cycles # 4.367 GHz ( +- 0.14% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 65,758,252,643 instructions # 1.87 insns per cycle ( +- 0.33% ) 10,871,298,668 branches # 1350.707 M/sec ( +- 0.41% ) 192,322,212 branch-misses # 1.77% of all branches ( +- 0.32% ) 8.640869419 seconds time elapsed ( +- 0.57% ) - After: 8146.242027 task-clock (msec) # 0.923 CPUs utilized ( +- 1.23% ) 17,016 context-switches # 0.002 M/sec ( +- 0.40% ) 0 cpu-migrations # 0.000 K/sec 18,769 page-faults # 0.002 M/sec ( +- 0.45% ) 35,660,956,120 cycles # 4.378 GHz ( +- 1.22% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 65,095,366,607 instructions # 1.83 insns per cycle ( +- 1.73% ) 10,803,480,261 branches # 1326.192 M/sec ( +- 1.95% ) 195,601,289 branch-misses # 1.81% of all branches ( +- 0.39% ) 8.828660235 seconds time elapsed ( +- 0.38% ) Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: Remove CF_IGNORE_ICOUNTRichard Henderson
Now that we have curr_cflags, we can include CF_USE_ICOUNT early and then remove it as necessary. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: Add CF_LAST_IO + CF_USE_ICOUNT to CF_HASH_MASKRichard Henderson
These flags are used by target/*/translate.c, and affect code generation. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24cpu-exec: lookup/generate TB outside exclusive region during step_atomicEmilio G. Cota
Now that all code generation has been converted to check CF_PARALLEL, we can generate !CF_PARALLEL code without having yet set !parallel_cpus -- and therefore without having to be in the exclusive region during cpu_exec_step_atomic. While at it, merge cpu_exec_step into cpu_exec_step_atomic. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24tcg: check CF_PARALLEL instead of parallel_cpusEmilio G. Cota
Thereby decoupling the resulting translated code from the current state of the system. The tb->cflags field is not passed to tcg generation functions. So we add a field to TCGContext, storing there a copy of tb->cflags. Most architectures have <= 32 registers, which results in a 4-byte hole in TCGContext. Use this hole for the new field. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24target/sparc: check CF_PARALLEL instead of parallel_cpusEmilio G. Cota
Thereby decoupling the resulting translated code from the current state of the system. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24target/sh4: check CF_PARALLEL instead of parallel_cpusEmilio G. Cota
Thereby decoupling the resulting translated code from the current state of the system. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24target/s390x: check CF_PARALLEL instead of parallel_cpusEmilio G. Cota
Thereby decoupling the resulting translated code from the current state of the system. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24target/m68k: check CF_PARALLEL instead of parallel_cpusEmilio G. Cota
Thereby decoupling the resulting translated code from the current state of the system. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24target/i386: check CF_PARALLEL instead of parallel_cpusEmilio G. Cota
Thereby decoupling the resulting translated code from the current state of the system. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24target/hppa: check CF_PARALLEL instead of parallel_cpusEmilio G. Cota
Thereby decoupling the resulting translated code from the current state of the system. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2017-10-24target/arm: check CF_PARALLEL instead of parallel_cpusEmilio G. Cota
Thereby decoupling the resulting translated code from the current state of the system. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>