aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-06-06Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.10-20170606' ↵Peter Maydell
into staging ppc patch queue 2017-06-06 Accumulated patches for ppc targets and the pseries machine type. The big thing in this batch is a start on a substantial cleanup of the pseries hotplug mechanisms, which were pretty confusing. For now these shouldn't cause substantial behavioural changes, but I am hoping these lead to clearer code and eventually to fixes for the bugs we have in hotplug handling, particularly when hotplug and migration are combined. The remaining patches are mostly bugfixes. # gpg: Signature made Tue 06 Jun 2017 03:48:50 BST # gpg: using RSA key 0x6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-2.10-20170606: spapr: Remove some non-useful properties on DRC objects spapr: Eliminate spapr_drc_get_type_str() spapr: Move configure-connector state into DRC spapr: Clean up spapr_dr_connector_by_*() spapr: Introduce DRC subclasses spapr/drc: don't migrate DRC of cold-plugged CPUs and LMBs spapr: Allow boot from vhost-*-scsi backends ppc/pnv: check the return value of fdt_setprop() spapr_nvram: Check return value from blk_getlength() target/ppc: Fixup set_spr error in h_register_process_table target-ppc: Fix openpic timer read register offset spapr: Make DRC get_index and get_type methods into plain functions spapr: Abolish DRC set_configured method spapr: Abolish DRC get_fdt method spapr: Move DRC RTAS calls into spapr_drc.c migration: Mark CPU states dirty before incoming migration/loadvm migration: remove register_savevm() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-06-06Merge remote-tracking branch ↵Peter Maydell
'remotes/ehabkost/tags/x86-and-machine-pull-request' into staging x86 and machine queue, 2017-06-05 # gpg: Signature made Mon 05 Jun 2017 19:58:01 BST # gpg: using RSA key 0x2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/x86-and-machine-pull-request: scripts: Test script to look for -device crashes qemu.py: Add QEMUMachine.exitcode() method qemu.py: Don't set _popen=None on error/shutdown spapr: cleanup spapr_fixup_cpu_numa_dt() usage numa: move numa_node from CPUState into target specific classes numa: make hmp 'info numa' fetch numa nodes from qmp_query_cpus() result numa: make sure that all cpus have has_node_id set if numa is enabled numa: move default mapping init to machine numa: consolidate cpu_preplug fixups/checks for pc/arm/spapr pc: Use "min-[x]level" on compat_props Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-06-06spapr: Remove some non-useful properties on DRC objectsDavid Gibson
* 'connector_type' is easily derived from the 'index' property, so there's no point to it (it's also implicit in the QOM type of the DRC) * 'isolation-state', 'indicator-state' and 'allocation-state' are part of the transaction between qemu and guest during PAPR hotplug operations, and outside tools really have no business looking at it (especially not changing, and these were RW properties) * 'entity-sense' is basically just a weird PAPR encoding of whether there is a device connected to this DRC Strictly speaking removing these properties is breaking the qemu interface. However, I'm pretty sure no management tools have ever used these. For debugging there are better alternatives. Therefore, I think removing these broken interfaces is the better option. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06spapr: Eliminate spapr_drc_get_type_str()David Gibson
This function was used in generating the device tree. However, now that we have different QOM types for different DRC types we can easily store the information we need in the class structure and avoid this specialized lookup function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06spapr: Move configure-connector state into DRCDavid Gibson
Currently the sPAPRMachineState contains a list of sPAPRConfigureConnector structures which store intermediate state for the ibm,configure-connector RTAS call. This was an attempt to separate this state from the core of the DRC state. However the configure connector process is intimately tied to the DRC model, so there's really no point trying to have two levels of interface here. Moving the configure-connector state into its corresponding DRC allows removal of a number of helpers for maintaining the anciliary list. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06spapr: Clean up spapr_dr_connector_by_*()David Gibson
* Change names to something less ludicrously verbose * Now that we have QOM subclasses for the different DRC types, use a QOM typename instead of a PAPR type value parameter The latter allows removal of the get_type_shift() helper. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06spapr: Introduce DRC subclassesDavid Gibson
Currently we only have a single QOM type for all DRCs, but lots of places where we switch behaviour based on the DRC's PAPR defined type. This is a poor use of our existing type system. So, instead create QOM subclasses for each PAPR defined DRC type. We also introduce intermediate subclasses for physical and logical DRCs, a division which will be useful later on. Instead of being stored in the DRC object itself, the PAPR type is now stored in the class structure. There are still many places where we switch directly on the PAPR type value, but this at least provides the basis to start to remove those. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com> Acked-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2017-06-06spapr/drc: don't migrate DRC of cold-plugged CPUs and LMBsGreg Kurz
As explained in commit 5c0139a8c2f0 ("spapr: fix default DRC state for coldplugged LMBs"), guests expect cold-plugged LMBs to be pre-allocated and unisolated. The same goes for cold-plugged CPUs. While here, let's convert g_assert(false) to the better self documenting g_assert_not_reached(). Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-06spapr: Allow boot from vhost-*-scsi backendsFelipe Franciosi
The current implementation of spapr_get_fw_dev_path() doesn't take into consideration vhost-*-scsi devices. This makes said devices unbootable on PPC as SLOF is unable to work out the path to scan boot disks. This makes VMs bootable on spapr when using vhost-*-scsi by implementing a disk path for VHostSCSICommon (which currently includes both vhost-user-scsi and vhost-scsi). Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Signed-off-by: Mike Cui <cui@nutanix.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-06ppc/pnv: check the return value of fdt_setprop()Cédric Le Goater
Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: Correct typo in commit message] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-06spapr_nvram: Check return value from blk_getlength()Peter Maydell
The blk_getlength() function can return an error value if the image size cannot be determined. Check for this rather than ploughing on and trying to g_malloc0() a negative number. (Spotted by Coverity, CID 1288484.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-06target/ppc: Fixup set_spr error in h_register_process_tableSuraj Jitindar Singh
set_spr is used in the function h_register_process_table() to update the LPCR_GTSE and LPCR_UPRT values based on the flags passed by the guest. The set_spr function takes the last two arguments mask and value used to mask and set the value of the spr respectively. The current call site passes these arguments in the wrong order and thus bot GTSE and UPRT will be set irrespective, which is obviously incorrect. Rearrange the function call so that these arguments are passed in the correct order and the correct behaviour is exhibited. It is worth noting that this wasn't detected earlier since these were always both set in all cases where this H_CALL was made. Fixes: 6de833070ca2 ("target/ppc: Set UPRT and GTSE on all cpus in H_REGISTER_PROCESS_TABLE") Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-06target-ppc: Fix openpic timer read register offsetAaron Larson
openpic_tmr_read() is incorrectly computing register offset of the TCCR, TBCR, TVPR, and TDR registers when accessing the open pic timer registers. Specifically the offset of timer registers for openpic_tmr_read() is not accounting for the timer frequency reporting register (TFFR) which is the first register in the "tmr" memory region. openpic_tmr_write() *is* correctly computing the offset by adding 0x10f0 to the address prior to computing the register index. This patch instead subtracts 0x10 in both the read and write routines and eliminates some other gratuitous differences between the functions. Signed-off-by: Aaron Larson <alarson@ddci.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-06spapr: Make DRC get_index and get_type methods into plain functionsDavid Gibson
These two methods only have one implementation, and the spec they're implementing means any other implementation is unlikely, verging on impossible. So replace them with simple functions. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-06-06spapr: Abolish DRC set_configured methodDavid Gibson
DRConnectorClass has a set_configured method, however: * There is only one implementation, and only ever likely to be one * There's exactly one caller, and that's (now) local * The implementation is very straightforward So abolish the method entirely, and just open-code what we need. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-06-06spapr: Abolish DRC get_fdt methodDavid Gibson
The DRConnectorClass includes a get_fdt method. However * There's only one implementation, and there's only likely to ever be one * Both callers are local to spapr_drc * Each caller only uses one half of the actual implementation So abolish get_fdt() entirely, and just open-code what we need. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-06-06spapr: Move DRC RTAS calls into spapr_drc.cDavid Gibson
Currently implementations of the RTAS calls related to DRCs are in spapr_rtas.c. They belong better in spapr_drc.c - that way they're closer to related code, and we'll be able to make some more things local. spapr_rtas.c was intended to contain the RTAS infrastructure and core calls that don't belong anywhere else, not every RTAS implementation. Code motion only. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Daniel Barboza <danielhb@linux.vnet.ibm.com>
2017-06-06migration: Mark CPU states dirty before incoming migration/loadvmDavid Gibson
As a rule, CPU internal state should never be updated when !cpu->kvm_vcpu_dirty (or the HAX equivalent). If that is done, then subsequent calls to cpu_synchronize_state() - usually safe and idempotent - will clobber state. However, we routinely do this during a loadvm or incoming migration. Usually this is called shortly after a reset, which will clear all the cpu dirty flags with cpu_synchronize_all_post_reset(). Nothing is expected to set the dirty flags again before the cpu state is loaded from the incoming stream. This means that it isn't safe to call cpu_synchronize_state() from a post_load handler, which is non-obvious and potentially inconvenient. We could cpu_synchronize_all_state() before the loadvm, but that would be overkill since a) we expect the state to already be synchronized from the reset and b) we expect to completely rewrite the state with a call to cpu_synchronize_all_post_init() at the end of qemu_loadvm_state(). To clear this up, this patch introduces cpu_synchronize_pre_loadvm() and associated helpers, which simply marks the cpu state as dirty without actually changing anything. i.e. it says we want to discard any existing KVM (or HAX) state and replace it with what we're going to load. Cc: Juan Quintela <quintela@redhat.com> Cc: Dave Gilbert <dgilbert@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Juan Quintela <quintela@redhat.com>
2017-06-06migration: remove register_savevm()Laurent Vivier
We can replace the four remaining calls of register_savevm() by calls to register_savevm_live(). So we can remove the function and as we don't allocate anymore the ops pointer with g_new0() we don't have to free it then. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2017-06-05scripts: Test script to look for -device crashesEduardo Habkost
Test code to check if we can crash QEMU using -device. It will test all accel/machine/device combinations by default, which may take a few hours (it's more than 90k test cases). There's a "-r" option that makes it test a random sample of combinations. The scripts contains a whitelist for: 1) known error messages that make QEMU exit cleanly; 2) known QEMU crashes. This is the behavior when the script finds a failure: * Known clean (exitcode=1) errors generate DEBUG messages (hidden by default) * Unknown clean (exitcode=1) errors will generate INFO messages (visible by default) * Known crashes generate error messages, but are not fatal (unless --strict mode is used) * Unknown crashes generate fatal error messages Having an updated whitelist of known clean errors is useful to make the script less verbose and run faster when in --quick mode, but the whitelist doesn't need to be always up to date. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170526181200.17227-4-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05qemu.py: Add QEMUMachine.exitcode() methodEduardo Habkost
Allow the exit code of QEMU to be queried by scripts. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170526181200.17227-3-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05qemu.py: Don't set _popen=None on error/shutdownEduardo Habkost
Keep the Popen object around to we can query its exit code later. To keep the existing 'self._popen is None' checks working, add a is_running() method, that will check if the process is still running. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20170526181200.17227-2-ehabkost@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05spapr: cleanup spapr_fixup_cpu_numa_dt() usageIgor Mammedov
even though spapr_fixup_cpu_numa_dt() has no effect on FDT if numa is disabled, don't call it uselessly. It makes it obvious at call sites that function is needed only when numa is enabled. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496161442-96665-7-git-send-email-imammedo@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05numa: move numa_node from CPUState into target specific classesIgor Mammedov
Move vcpu's associated numa_node field out of generic CPUState into inherited classes that actually care about cpu<->numa mapping, i.e: ARMCPU, PowerPCCPU, X86CPU. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496161442-96665-6-git-send-email-imammedo@redhat.com> [ehabkost: s/CPU is belonging to/CPU belongs to/ on comments] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05numa: make hmp 'info numa' fetch numa nodes from qmp_query_cpus() resultIgor Mammedov
HMP command 'info numa' is the last external user that access CPUState::numa_node field directly. In order to move it to CPU classes that actually use it, eliminate direct access and use an alternative approach by using result of qmp_query_cpus(), which provides topology properties CPU threads are associated with (including node-id). Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496161442-96665-5-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05numa: make sure that all cpus have has_node_id set if numa is enabledIgor Mammedov
It fixes/add missing _PXM object for non mapped CPU (x86) and missing fdt node (virt-arm). It ensures that possible_cpus contains complete mapping if numa is enabled by the time machine_init() is executed. As result non completely mapped CPUs: 1) appear in ACPI/fdt blobs 2) QMP query-hotpluggable-cpus command shows bound nodes for such CPUs 3) allows to drop checks for has_node_id in numa only code, reducing number of invariants incomplete mapping could produce 4) moves fixup/implicit node init from runtime numa_cpu_pre_plug() (when CPU object is created) to machine_numa_finish_init() which helps to fix [1, 2] and make possible_cpus complete source of numa mapping available even before CPUs are created. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496161442-96665-4-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05numa: move default mapping init to machineIgor Mammedov
there is no need use cpu_index_to_instance_props() for setting default cpu -> node mapping. Generic machine code can do it without cpu_index by just enabling already preset defaults in possible_cpus. PS: as bonus it makes one less user of cpu_index_to_instance_props() Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1496161442-96665-3-git-send-email-imammedo@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05numa: consolidate cpu_preplug fixups/checks for pc/arm/spaprIgor Mammedov
Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <1496161442-96665-2-git-send-email-imammedo@redhat.com> [ehabkost: Fix indentation] Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05pc: Use "min-[x]level" on compat_propsEduardo Habkost
Since the automatic cpuid-level code was introduced in commit c39c0edf9bb3b968ba95484465a50c7b19f4aa3a ("target-i386: Automatically set level/xlevel/xlevel2 when needed"), the CPU model tables just define the default CPUID level code (set using "min-level"). Setting "[x]level" forces CPUID level to a specific value and disable the automatic-level logic. But the PC compat code was not updated and the existing "[x]level" compat properties broke compatibility for people using features that triggered the auto-level code. To keep previous behavior, we should set "min-[x]level" instead of "[x]level" on compat_props. This was not a problem for most cases, because old machine-types don't have full-cpuid-auto-level enabled. The only common use case it broke was the CPUID[7] auto-level code, that was already enabled since the first CPUID[7] feature was introduced (in QEMU 1.4.0). This causes the regression reported at: https://bugzilla.redhat.com/show_bug.cgi?id=1454641 Change the PC compat code to use "min-[x]level" instead of "[x]level" on compat_props, and add new test cases to ensure we don't break this again. Reported-by: "Guo, Zhiyi" <zhguo@redhat.com> Fixes: c39c0edf9bb ("target-i386: Automatically set level/xlevel/xlevel2 when needed") Cc: qemu-stable@nongnu.org Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2017-06-05Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20170605' into stagingPeter Maydell
Queued TCG patches # gpg: Signature made Mon 05 Jun 2017 17:48:42 BST # gpg: using RSA key 0xAD1270CC4DD0279B # gpg: Good signature from "Richard Henderson <rth7680@gmail.com>" # gpg: aka "Richard Henderson <rth@redhat.com>" # gpg: aka "Richard Henderson <rth@twiddle.net>" # Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC 16A4 AD12 70CC 4DD0 279B * remotes/rth/tags/pull-tcg-20170605: (26 commits) target/alpha: Use goto_tb for fallthru between TBs target/alpha: Implement WTINT inline target/mips: optimize indirect branches target/mips: optimize cross-page direct jumps in softmmu target/aarch64: optimize indirect branches target/aarch64: optimize cross-page direct jumps in softmmu target/hppa: Use tcg_gen_lookup_and_goto_ptr target/s390: Use tcg_gen_lookup_and_goto_ptr tcg/mips: implement goto_ptr tcg/arm: Implement goto_ptr tcg/arm: Clarify tcg_out_bx for arm4 host tcg/s390: Implement goto_ptr tcg/sparc: Implement goto_ptr tcg/aarch64: Implement goto_ptr tcg/ppc: Implement goto_ptr tb-hash: improve tb_jmp_cache hash function in user mode target/i386: optimize indirect branches target/i386: optimize cross-page direct jumps in softmmu target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr target/arm: optimize indirect branches ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-06-05target/alpha: Use goto_tb for fallthru between TBsRichard Henderson
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05target/alpha: Implement WTINT inlineRichard Henderson
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05target/mips: optimize indirect branchesAurelien Jarno
Cc: Yongbok Kim <yongbok.kim@imgtec.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <20170430145254.25616-4-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05target/mips: optimize cross-page direct jumps in softmmuAurelien Jarno
Cc: Yongbok Kim <yongbok.kim@imgtec.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <20170430145254.25616-3-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05target/aarch64: optimize indirect branchesEmilio G. Cota
Measurements: [Baseline performance is that before applying this and the previous commit] - NBench, aarch64-softmmu. Host: Intel i7-4790K @ 4.00GHz 1.7x +-+--------------------------------------------------------------------------------------------------------------+-+ | | | cross | 1.6x +cross+jr.................................................####...................................................+-+ | #++# | | # # | 1.5x +-+...................................................*****..#...................................................+-+ | *+++* # | | * * # | 1.4x +-+...................................................*...*..#...................................................+-+ | * * # | | ##### * * # | 1.3x +-+................................****+++#...........*...*..#...................................................+-+ | *++* # * * # | | * * # * * # | 1.2x +-+................................*..*...#...........*...*..#...................................................+-+ | * * # * * # | | #### * * # * * # | 1.1x +-+.......................+++#..#..*..*...#...........*...*..#...................................................+-+ | **** # * * # * * # ****#### | | * * # * * # * * # ****### +++#### ****### * * # | 1x +-++-++++++-++++****###++-*++*++#++*++*+-+#++****+++++*+++*++#++*++*-+#++*****++#++****###-++*++*-+#++*+-*+++#+-++-+ | *****### * * # * * # * * # *++*### * * # * * # * * # * *++# * * # * * # | | * *++# * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # | 0.9x +-+---*****###--****###---****###--****####--****###--*****###--****###--*****###--****###---****###--****####---+-+ ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONNEURAL NUMERIC SORSTRING SORT hmean png: http://imgur.com/qO9ubtk NB. cross here represents the previous commit. - SPECint06 (test set), aarch64-linux-user. Host: Intel i7-4790K @ 4.00GHz 1.5x +-+--------------------------------------------------------------------------------------------------------------+-+ | ***** | | *+++* jr | | * * | 1.4x +-+.....................................................................*...*.....................+++............+-+ | * * | | | ***** * * | | | * * * * ***** | 1.3x +-+....................................*...*............................*...*....................*.|.*...........+-+ | +++ * * * * * | * | | ***** * * * * *+++* | | * * * * * * * * | 1.2x +-+....................*...*...........*...*............................*...*...........*****....*...*...........+-+ | ***** * * * * * * * * * * +++ | | * * * * * * * * * * * * ***** | | * * * * ***** * * * * * * * * * * | 1.1x +-+...*...*............*...*...*...*...*...*............................*...*....+++....*...*....*...*...*...*...+-+ | * * * * * * * * * * ***** * * * * * * | | * * * * * * * * ***** * * * * * * * * * * | | * * ***** * * * * * * * * ****** * * * * * * * * * * | 1x +-++-+*+++*-++*+++*++++*+-+*+++*-++*+++*-++*+++*+++*++-*++++*-++*****+++*++-*+++*++-*+++*+-+*++++*+++*++-*+++*+-++-+ | * * * * * * * * * * * * * * *+++* * * * * * * * * * * | | * * * * * * * * * * * * * * * * * * * * * * * * * * | | * * * * * * * * * * * * * * * * * * * * * * * * * * | 0.9x +-+---*****---*****----*****---*****---*****---*****---******---*****---*****---*****---*****----*****---*****---+-+ astar bzip2 gcc gobmk h264ref hmmlibquantum mcf omnetpperlbench sjengxalancbmk hmean png: http://imgur.com/3Dp4vvq - SPECint06 (train set), aarch64-linux-user. Host: Intel i7-4790K @ 4.00GHz 1.7x +-+--------------------------------------------------------------------------------------------------------------+-+ | | | jr | 1.6x +-+...............................................................................................+++............+-+ | ***** | | *+++* | | * * | 1.5x +-+..............................................................................................*...*...........+-+ | +++ * * | | ***** * * | 1.4x +-+.....................................................................*+++*....................*...*...........+-+ | * * * * | | ***** * * * * | | * * * * ***** * * | 1.3x +-+....................................*...*............................*...*...*...*............*...*...........+-+ | +++ * * * * * * * * | | ***** * * * * * * ***** * * | 1.2x +-+....................*...*...........*...*............................*...*...*...*...*+++*....*...*...*****...+-+ | * * * * * * * * * * * * *+++* | | ***** * * ***** * * * * * * * * * * * * | | * * * * *+++* * * * * * * * * * * * * | 1.1x +-+...*...*............*...*...*...*...*...*............................*...*...*...*...*...*....*...*...*...*...+-+ | * * ***** * * * * * * ***** * * * * * * * * * * | | * * * * * * * * * * +++ ****** *+++* * * * * * * * * * * | 1x +-+---*****---*****----*****---*****---*****---*****---******---*****---*****---*****---*****----*****---*****---+-+ astar bzip2 gcc gobmk h264ref hmmlibquantum mcf omnetpperlbench sjengxalancbmk hmean png: http://imgur.com/vRrdc9j Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05target/aarch64: optimize cross-page direct jumps in softmmuEmilio G. Cota
Perf numbers in next commit's log. Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05target/hppa: Use tcg_gen_lookup_and_goto_ptrRichard Henderson
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05target/s390: Use tcg_gen_lookup_and_goto_ptrRichard Henderson
Tested-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05tcg/mips: implement goto_ptrAurelien Jarno
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <20170430145254.25616-2-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05tcg/arm: Implement goto_ptrRichard Henderson
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05tcg/arm: Clarify tcg_out_bx for arm4 hostRichard Henderson
In theory this would re-enable usage of QEMU on an armv4 host. Whether this is worthwhile is debatable -- we've been unconditionally issuing the armv5t BX instruction in the prologue since 2011 without complaint. Possibly we should simply require an armv6 host. Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05tcg/s390: Implement goto_ptrRichard Henderson
Tested-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05tcg/sparc: Implement goto_ptrRichard Henderson
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05tcg/aarch64: Implement goto_ptrRichard Henderson
Measurements: SPECint06 (test set), x86_64-linux-user. Host: APM 64-bit ARMv8 (Atlas/A57) @ 2.4 GHz 1.45x +-+-------------------------------------------------------------------------------------------------------------+-+ | ***** | | +++ * * +goto-ptr | 1.4x +-+...*****............................*...*....................................................................+-+ | *+++* * * +++ | 1.35x +-+...*...*............................*...*...........................*****....................................+-+ | * * * * *+++* | | * * * * * * | 1.3x +-+...*...*............................*...*...........................*...*....................................+-+ | * * * * * * | | * * * * * * ***** | 1.25x +-+...*...*...........*****............*...*...........................*...*............*****...*...*...........+-+ | * * * * * * * * *+++* * * | 1.2x +-+...*...*...........*...*............*...*...........................*...*............*...*...*...*...........+-+ | * * * * * * * * * * * * | | * * * * * * * * * * * * ***** | 1.15x +-+...*...*...........*...*............*...*...........................*...*............*...*...*...*...*...*...+-+ | * * * * * * * * +++ * * * * * * | | * * * * * * * * ***** * * * * * * | 1.1x +-+...*...*...........*...*....*****...*...*...*****...................*...*...*...*....*...*...*...*...*...*...+-+ | * * * * * * * * * * * * * * * * * * * * | 1.05x +-+...*...*...........*...*....*...*...*...*...*...*...................*...*...*...*....*...*...*...*...*...*...+-+ | * * ***** * * * * * * * * * * * * * * * * * * | | * * * * * * * * * * * * ***** ***** * * * * * * * * * * | 1x +-+---*****---*****---*****----*****---*****---*****---*****---*****---*****---*****----*****---*****---*****---+-+ astar bzip2 gcc gobmk h264ref hmmlibquantum mcf omnetpperlbench sjenxalancbmk hmean png: http://imgur.com/en9HE8L Tested-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05tcg/ppc: Implement goto_ptrRichard Henderson
Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05tb-hash: improve tb_jmp_cache hash function in user modeEmilio G. Cota
Optimizations to cross-page chaining and indirect branches make performance more sensitive to the hit rate of tb_jmp_cache. The constraint of reserving some bits for the page number lowers the achievable quality of the hashing function. However, user-mode does not have this requirement. Thus, with this change we use for user-mode a hashing function that is both faster and of better quality than the previous one. Measurements: Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0. - SPECint06 (test set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz 2.2x +-+--------------------------------------------------------------------------------------------------------------+-+ | | | jr | 2x +jr+multhash +....................................................+++++...................................+-+ | jr+hash |$$$ | | |$+$ | | ### $ | 1.8x +-+......................................................................#|#.$...................................+-+ | ++#+# $ | | |# # $ | 1.6x +-+....................................................................***.#.$....................++$$$..........+-+ | $$$ *+* # $ |$+$ | | ++$$$ ### $ * * # $ +++|$ $ | | ++###+$ # # $ * * # $ ### ****## $ | 1.4x +-+...................***+#.$.........***.#.$..........................*.*.#.$...........#+#$$.*++*|#.$..........+-+ | *+* # $ * * # $ * * # $ # # $ * *+# $ | | * * # $ +++++ * * # $ * * # $ *** # $ * * # $ ###$$ | 1.2x +-+...................*.*.#.$.***##$$.*.*.#.$..........................*.*.#.$.........*.*.#.$.*..*.#.$.***+#+$..+-+ | * * # $ *+* # $ * * # $ +++ * * # $ ++###$$ * * # $ * * # $ * * # $ | | ***##$$ * * # $ * * # $ * * # $ ***##$$ ++### * * # $ *** #+$ * * # $ * * # $ * * # $ | | *+*+#+$ ***##$$$ * * # $ * * # $ * * # $ *+* # $ ++####$$ ***+# * * # $ * * # $ * * # $ * * # $ * * # $ | 1x +-++-*+*+#+$+*+*+#-+$+*+*-#+$+*+*+#+$+*+*+#+$+*-*+#+$+***++#+$+*+*+#$$+*+*+#+$+*+*+#+$+*+*-#+$+*+-*+#+$+*+*+#+$-++-+ | * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ | | * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ | 0.8x +-+--***##$$-***##$$$-***##$$-***##$$-***##$$-***##$$-***###$$-***##$$-***##$$-***##$$-***##$$-****##$$-***##$$--+-+ astar bzip2 gcc gobmk h264ref hmmlibquantum mcf omnetpperlbench sjengxalancbmk hmean png: http://imgur.com/4UXTrEc Here I also tried the hash function suggested by Paolo ("multhash"): return ((uint64_t) (pc * 2654435761) >> 32) & (TB_JMP_CACHE_SIZE - 1); As you can see it is just as good as the other new function ("hash"), which is what I ended up going with. - SPECint06 (train set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz 2.6x +-+--------------------------------------------------------------------------------------------------------------+-+ | | | jr ### | 2.4x +jr+hash...........................................................................................#.#...........+-+ | # # | | # # | 2.2x +-+................................................................................................#.#...........+-+ | # # | | # # | 2x +-+................................................................................................#.#...........+-+ | **** # | | * * # | 1.8x +-+.............................................................................................*..*.#...........+-+ | +++ * * # | | #### #### * * # | 1.6x +-+......................................####.............................#..#.****..#..........*..*.#...........+-+ | +++ #++# **** # * * # #### * * # | | ### # # * * # * * # # # * * # | 1.4x +-+...................****+#..........****..#..........................*..*..#.*..*..#....#..#..*..*.#...........+-+ | *++* # * * # * * # * * # *** # * * # #### | | * * # #### * * # * * # * * # * * # * * # **** # | 1.2x +-+...................*..*.#..****++#.*..*..#..........................*..*..#.*..*..#..*.*..#..*..*.#..*..*..#..+-+ | ****### * * # * * # * * # * * # * * # * * # * * # * * # | | * * # ***### * * # * * # * * # ****## * * # * * # * * # * * # * * # | 1x +-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+ astar bzip2 gcc gobmk h264ref hmmlibquantum mcf omnetpperlbench sjengxalancbmk hmean png: http://imgur.com/ArCbHqo - NBench, x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz 1.12x +-+-------------------------------------------------------------------------------------------------------------+-+ | | | jr +++ | 1.1x +jr+hash...........................................................####.........................................+-+ | +++#| # | | | #++# | 1.08x +-+................................+++................+++.+++..*****..#.........................................+-+ | | +++ | | * | * # | | | | | | *+++* # | 1.06x +-+................................****###.............|...|...*...*..#.........................+++.............+-+ | *| * |# ****### * * # | | | *| *++# *| * |# * * # #### | 1.04x +-+................................*++*..#............*|.*.|#..*...*..#........................#.|#.............+-+ | * * # *++*++# * * # +++#++# | | * * # * * # * * # | # # +++#### | 1.02x +-+................................*..*..#......+++...*..*..#..*...*..#.....................****..#..*****++#...+-+ | +++ * * # +++ | * * # * * # +++ *| * # *+++* # | | +++ | +++ +++ ++++++ * * # *****### * * # * * # | +++ ++++++ *++* # * * # | 1x +-++-+++++####++****###++++-+####+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-+++####-+*****###++*++*++#++*+-+*++#+-++-+ | *****| # *++* |# *****| # * * # * *++# * * # * * # **** |# * * # * * # * * # | | * | *| # * *++# * | *++# * * # * * # * * # * * # *| *++# * * # * * # * * # | 0.98x +-+...*.|.*++#..*..*..#..*+++*..#..*..*..#..*...*..#..*..*..#..*...*..#..*++*..#..*...*..#..*..*..#..*...*..#...+-+ | *+++* # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # | | * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # | 0.96x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+ ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT hmean png: http://imgur.com/ZXFX0hJ - NBench, arm-linux-user. Host: Intel i7-4790K @ 4.00GHz 1.3x +-+-------------------------------------------------------------------------------------------------------------+-+ | #### | | jr # # +++ | 1.25x +jr+hash.....................#..#...........................................####................................+-+ | # # # # | | # # # # | 1.2x +-+..........................#..#...........................................#..#................................+-+ | # # # # | | # # # # | 1.15x +-+..........................#..#...........................................#..#................................+-+ | # # #### # # | | # # # # # # | 1.1x +-+..........................#..#..................................#..#.....#..#................................+-+ | # # # # # # +++ | | # # #### # # # # #### | 1.05x +-+..........................#..#...............#..#.....####......#..#.....#..#.........................#..#...+-+ | # # # # # # # # # # +++ # # | | +++ ***** # #### ***** # # # +++# # **** # ****### # # | 1x +-++-+*****###++****+++++*+-+*++#+-****++#-+*+++*-+#+++++#++#++*****++#+-*++*++#-+*****-++++*++*++#++*****++#+-++-+ | * * # * * | * * # * * # * * # **** # * * # * * # * *### * *++# * * # | | * * # * *### * * # * * # * * # * * # * * # * * # * * # * * # * * # | 0.95x +-+...*...*..#..*..*.|#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#...+-+ | * * # * * |# * * # * * # * * # * * # * * # * * # * * # * * # * * # | | * * # * * |# * * # * * # * * # * * # * * # * * # * * # * * # * * # | 0.9x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+ ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT hmean png: http://imgur.com/FfD27ey Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1493263764-18657-12-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05target/i386: optimize indirect branchesEmilio G. Cota
Speed up indirect branches by jumping to the target if it is valid. Softmmu measurements (see later commit for user-mode numbers): Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0. - SPECint06 (test set), x86_64-softmmu (Ubuntu 16.04 guest). Host: Intel i7-4790K @ 4.00GHz 2.4x +-+--------------------------------------------------------------------------------------------------------------+-+ | | | cross | 2.2x +cross+jr..........................................................................+++...........................+-+ | | | | +++ | | 2x +-+..............................................................................|..|............................+-+ | | | | | | | | 1.8x +-+..............................................................................|####...........................+-+ | |# |# | | **** |# | 1.6x +-+............................................................................*.|*.|#...........................+-+ | * |* |# | | * |* |# | 1.4x +-+.......................................................................+++..*.|*.|#...........................+-+ | ++++++ #### * |*++# +++ | | +++ | | #++# *++* # +++ | | 1.2x +-+......................###.....####....+++............|..|...........****..#.*..*..#....####...|.###.....####..+-+ | +++ **** # **** # #### ***### *++* # * * # #++# ****|# +++#++# | | ****### +++ *++* # *++* # ++# # #### *|* |# +++ * * # * * # *** # *| *|# **** # | 1x +-++-*++*++#++***###++*++*+#++*+-*++#+****++#++***++#+-*+*++#-+****##++*++*-+#+*++*-+#++*+*++#++*-+*+#++*++*++#-++-+ | * * # * * # * * # * * # * * # * * # *|* |# *++* # * * # * * # * * # * * # * * # | | * * # * * # * * # * * # * * # * * # *+*++# * * # * * # * * # * * # * * # * * # | 0.8x +-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+ astar bzip2 gcc gobmk h264ref hmmlibquantum mcf omnetpperlbench sjengxalancbmk hmean png: http://imgur.com/DU36YFU NB. 'cross' represents the previous commit. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1493263764-18657-11-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05target/i386: optimize cross-page direct jumps in softmmuEmilio G. Cota
Instead of unconditionally exiting to the exec loop, use the gen_jr helper to jump to the target if it is valid. Perf impact: see next commit's log. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1493263764-18657-10-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05target/i386: introduce gen_jr helper to generate lookup_and_goto_ptrEmilio G. Cota
This helper will be used by subsequent changes. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1493263764-18657-9-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>
2017-06-05target/arm: optimize indirect branchesEmilio G. Cota
Speed up indirect branches by jumping to the target if it is valid. Softmmu measurements (see later commit for user-mode results): Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0. - Impact on Boot time | setup | ARM debian jessie boot+shutdown time | stddev | |--------+--------------------------------------+--------| | v2.9.0 | 8.84 | 0.07 | | +cross | 8.85 | 0.03 | | +jr | 8.83 | 0.06 | - NBench, arm-softmmu (debian jessie guest). Host: Intel i7-4790K @ 4.00GHz 1.3x +-+-------------------------------------------------------------------------------------------------------------+-+ | | | cross #### | 1.25x +cross+jr..........................................................#++#.........................................+-+ | #### # # | | +++# # # # | | +++ **** # # # | 1.2x +-+...................................####............*..*..#......#..#.........................................+-+ | **** # * * # # # #### | | * * # * * # # # # # | 1.15x +-+................................*..*..#............*..*..#......#..#.....#..#................................+-+ | * * # * * # # # # # | | * * # #### * * # # # # # | | * * # # # * * # # # # # #### | 1.1x +-+................................*..*..#......#..#..*..*..#......#..#.....#..#.........................#..#...+-+ | * * # # # * * # # # # # # # | | * * # # # * * # # # # # # # | 1.05x +-+..........................####..*..*..#......#..#..*..*..#......#..#.....#..#......+++............*****..#...+-+ | ***** # * * # # # * * # ***** # # # +++ | ****### * * # | | *+++* # * * # # # * * # *+++* # **** # *****### * * # * * # | | *****### +++#### * * # * * # ***** # * * # * * # * * # * | *++# * * # * * # | 1x +-++-+*+++*-+#++****++#++*+-+*++#+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-++-+ | * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # | | * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # | 0.95x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+ ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT hmean png: http://imgur.com/eOLmZNR NB. 'cross' represents the previous commit. Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1493263764-18657-8-git-send-email-cota@braap.org> [rth: Replace gen_jr global variable with DISAS_EXIT state.] Signed-off-by: Richard Henderson <rth@twiddle.net>