aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2018-06-22hw/arm/iotkit: Instantiate MPCPeter Maydell
Wire up the one MPC that is part of the IoTKit itself. For the moment we don't wire up its interrupt line. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180620132032.28865-7-peter.maydell@linaro.org
2018-06-22hw/misc/iotkit-secctl.c: Implement SECMPCINTSTATUSPeter Maydell
Implement the SECMPCINTSTATUS register. This is the only register in the security controller that deals with Memory Protection Controllers, and it simply provides a read-only view of the interrupt lines from the various MPCs in the system. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20180620132032.28865-6-peter.maydell@linaro.org
2018-06-22hw/misc/tz-mpc.c: Implement registersPeter Maydell
Implement the missing registers for the TZ MPC. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-id: 20180620132032.28865-3-peter.maydell@linaro.org
2018-06-22hw/misc/tz-mpc.c: Implement the Arm TrustZone Memory Protection ControllerPeter Maydell
Implement the Arm TrustZone Memory Protection Controller, which sits in front of RAM and allows secure software to configure it to either pass through or reject transactions. We implement the MPC as a QEMU IOMMU, which will direct transactions either through to the devices and memory behind it or to a special "never works" AddressSpace if they are blocked. This initial commit implements the skeleton of the device: * it always permits accesses * it doesn't implement most of the registers * it doesn't implement the interrupt or other behaviour for blocked transactions Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-id: 20180620132032.28865-2-peter.maydell@linaro.org
2018-06-22hw/arm/virt: Use 256MB ECAM region by defaultEric Auger
With this patch, virt-3.0 machine uses a new 256MB ECAM region by default instead of the legacy 16MB one, if highmem is set (LPAE supported by the guest) and (!firmware_loaded || aarch64). Indeed aarch32 mode FW may not support this high ECAM region. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-id: 1529072910-16156-11-git-send-email-eric.auger@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-22hw/arm/virt: Add a new 256MB ECAM regionEric Auger
This patch defines a new ECAM region located after the 256GB limit. The virt machine state is augmented with a new highmem_ecam field which guards the usage of this new ECAM region instead of the legacy 16MB one. With the highmem ECAM region, up to 256 PCIe buses can be used. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-id: 1529072910-16156-9-git-send-email-eric.auger@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-22hw/arm/virt: GICv3 DT node with one or two redistributor regionsEric Auger
This patch allows the creation of a GICv3 node with 1 or 2 redistributor regions depending on the number of smu_cpus. The second redistributor region is located just after the existing RAM region, at 256GB and contains up to up to 512 vcpus. Please refer to kernel documentation for further node details: Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.txt Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-id: 1529072910-16156-6-git-send-email-eric.auger@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-22hw/intc/arm_gicv3: Introduce redist-region-count array propertyEric Auger
To prepare for multiple redistributor regions, we introduce an array of uint32_t properties that stores the redistributor count of each redistributor region. Non accelerated VGICv3 only supports a single redistributor region. The capacity of all redist regions is checked against the number of vcpus. Machvirt is updated to set those properties, ie. a single redistributor region with count set to the number of vcpus capped by 123. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Message-id: 1529072910-16156-4-git-send-email-eric.auger@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-22linux-headers: Update to kernel mainline commit b357bf602Eric Auger
Update our kernel headers to mainline commit b357bf6023a948cf6a9472f07a1b0caac0e4f8e8 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm") Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-id: 1529072910-16156-2-git-send-email-eric.auger@redhat.com [PMM: clarified commit message] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-22Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2018-06-20-v2' ↵Peter Maydell
into staging nbd patches for 2018-06-20 Add experimental x-nbd-server-add-bitmap to expose a disabled bitmap over NBD, in preparation for a pull model incremental backup scheme. Also fix a corner case protocol issue with NBD_CMD_BLOCK_STATUS, and add new NBD_CMD_CACHE. - Eric Blake: tests: Simplify .gitignore - Eric Blake: nbd/server: Reject 0-length block status request - Vladimir Sementsov-Ogievskiy: 0/6 NBD export bitmaps - Vladimir Sementsov-Ogievskiy: nbd/server: introduce NBD_CMD_CACHE # gpg: Signature made Thu 21 Jun 2018 15:53:55 BST # gpg: using RSA key A7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" # gpg: aka "[jpeg image of size 6874]" # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2018-06-20-v2: nbd/server: introduce NBD_CMD_CACHE docs/interop: add nbd.txt qapi: new qmp command nbd-server-add-bitmap nbd/server: implement dirty bitmap export nbd/server: add nbd_meta_empty_or_pattern helper nbd/server: refactor NBDExportMetaContexts nbd/server: fix trace nbd/server: Reject 0-length block status request tests: Simplify .gitignore Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-21Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20180615' into stagingPeter Maydell
TCG patch queue: Workaround macos assembler lossage. Eliminate tb_lock. Fix TB code generation overflow. # gpg: Signature made Fri 15 Jun 2018 20:40:56 BST # gpg: using RSA key 64DF38E8AF7E215F # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth/tags/pull-tcg-20180615: tcg: Reduce max TB opcode count tcg: remove tb_lock translate-all: remove tb_lock mention from cpu_restore_state_from_tb cputlb: remove tb_lock from tlb_flush functions translate-all: protect TB jumps with a per-destination-TB lock translate-all: discard TB when tb_link_page returns an existing matching TB translate-all: introduce assert_no_pages_locked translate-all: add page_locked assertions translate-all: use per-page locking in !user-mode translate-all: move tb_invalidate_phys_page_range up in the file translate-all: work page-by-page in tb_invalidate_phys_range_1 translate-all: remove hole in PageDesc translate-all: make l1_map lockless translate-all: iterate over TBs in a page with PAGE_FOR_EACH_TB tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx tcg: track TBs with per-region BST's qht: return existing entry when qht_insert fails qht: require a default comparison function tcg/i386: Use byte form of xgetbv instruction Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-21nbd/server: introduce NBD_CMD_CACHEVladimir Sementsov-Ogievskiy
Handle nbd CACHE command. Just do read, without sending read data back. Cache mechanism should be done by exported node driver chain. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180413143156.11409-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix two missing case labels in switch statements] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-06-21nbd/server: implement dirty bitmap exportVladimir Sementsov-Ogievskiy
Handle a new NBD meta namespace: "qemu", and corresponding queries: "qemu:dirty-bitmap:<export bitmap name>". With the new metadata context negotiated, BLOCK_STATUS query will reply with dirty-bitmap data, converted to extents. The new public function nbd_export_bitmap selects which bitmap to export. For now, only one bitmap may be exported. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180609151758.17343-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: wording tweaks, minor cleanups, additional tracing] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-06-19Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches: - Active mirror (blockdev-mirror copy-mode=write-blocking) - bdrv_drain_*() fixes and test cases - Fix crash with scsi-hd and drive_del # gpg: Signature made Mon 18 Jun 2018 17:44:10 BST # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (35 commits) iotests: Add test for active mirroring block/mirror: Add copy mode QAPI interface block/mirror: Add active mirroring job: Add job_progress_increase_remaining() block/mirror: Add MirrorBDSOpaque block/dirty-bitmap: Add bdrv_dirty_iter_next_area test-hbitmap: Add non-advancing iter_next tests hbitmap: Add @advance param to hbitmap_iter_next() block: Generalize should_update_child() rule block/mirror: Use source as a BdrvChild block/mirror: Wait for in-flight op conflicts block/mirror: Use CoQueue to wait on in-flight ops block/mirror: Convert to coroutines block/mirror: Pull out mirror_perform() block: fix QEMU crash with scsi-hd and drive_del test-bdrv-drain: Test graph changes in drain_all section block: Allow graph changes in bdrv_drain_all_begin/end sections block: ignore_bds_parents parameter for drain functions block: Move bdrv_drain_all_begin() out of coroutine context block: Allow AIO_WAIT_WHILE with NULL ctx ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-19Merge remote-tracking branch 'remotes/kraxel/tags/vga-20180618-pull-request' ↵Peter Maydell
into staging vga: add ramfb, print virglrenderer version # gpg: Signature made Mon 18 Jun 2018 10:57:38 BST # gpg: using RSA key 4CB6D8EED3E87138 # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" # gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" # gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" # Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138 * remotes/kraxel/tags/vga-20180618-pull-request: Add ramfb MAINTAINERS entry hw/display: add standalone ramfb device hw/display: add ramfb, a simple boot framebuffer living in guest ram configure: print virglrenderer version Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-18block/mirror: Add copy mode QAPI interfaceMax Reitz
This patch allows the user to specify whether to use active or only background mode for mirror block jobs. Currently, this setting will remain constant for the duration of the entire block job. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180613181823.13618-14-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18job: Add job_progress_increase_remaining()Max Reitz
Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180613181823.13618-12-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18block/dirty-bitmap: Add bdrv_dirty_iter_next_areaMax Reitz
This new function allows to look for a consecutively dirty area in a dirty bitmap. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180613181823.13618-10-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18hbitmap: Add @advance param to hbitmap_iter_next()Max Reitz
This new parameter allows the caller to just query the next dirty position without moving the iterator. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180613181823.13618-8-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18block: Allow graph changes in bdrv_drain_all_begin/end sectionsKevin Wolf
bdrv_drain_all_*() used bdrv_next() to iterate over all root nodes and did a subtree drain for each of them. This works fine as long as the graph is static, but sadly, reality looks different. If the graph changes so that root nodes are added or removed, we would have to compensate for this. bdrv_next() returns each root node only once even if it's the root node for multiple BlockBackends or for a monitor-owned block driver tree, which would only complicate things. The much easier and more obviously correct way is to fundamentally change the way the functions work: Iterate over all BlockDriverStates, no matter who owns them, and drain them individually. Compensation is only necessary when a new BDS is created inside a drain_all section. Removal of a BDS doesn't require any action because it's gone afterwards anyway. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: ignore_bds_parents parameter for drain functionsKevin Wolf
In the future, bdrv_drained_all_begin/end() will drain all invidiual nodes separately rather than whole subtrees. This means that we don't want to propagate the drain to all parents any more: If the parent is a BDS, it will already be drained separately. Recursing to all parents is unnecessary work and would make it an O(n²) operation. Prepare the drain function for the changed drain_all by adding an ignore_bds_parents parameter to the internal implementation that prevents the propagation of the drain to BDS parents. We still (have to) propagate it to non-BDS parents like BlockBackends or Jobs because those are not drained separately. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Allow AIO_WAIT_WHILE with NULL ctxKevin Wolf
bdrv_drain_all() wants to have a single polling loop for draining the in-flight requests of all nodes. This means that the AIO_WAIT_WHILE() condition relies on activity in multiple AioContexts, which is polled from the mainloop context. We must therefore call AIO_WAIT_WHILE() from the mainloop thread and use the AioWait notification mechanism. Just randomly picking the AioContext of any non-mainloop thread would work, but instead of bothering to find such a context in the caller, we can just as well accept NULL for ctx. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Don't poll in parent drain callbacksKevin Wolf
bdrv_do_drained_begin() is only safe if we have a single BDRV_POLL_WHILE() after quiescing all affected nodes. We cannot allow that parent callbacks introduce a nested polling loop that could cause graph changes while we're traversing the graph. Split off bdrv_do_drained_begin_quiesce(), which only quiesces a single node without waiting for its requests to complete. These requests will be waited for in the BDRV_POLL_WHILE() call down the call chain. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Drain recursively with a single BDRV_POLL_WHILE()Kevin Wolf
Anything can happen inside BDRV_POLL_WHILE(), including graph changes that may interfere with its callers (e.g. child list iteration in recursive callers of bdrv_do_drained_begin). Switch to a single BDRV_POLL_WHILE() call for the whole subtree at the end of bdrv_do_drained_begin() to avoid such effects. The recursion happens now inside the loop condition. As the graph can only change between bdrv_drain_poll() calls, but not inside of it, doing the recursion here is safe. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Really pause block jobs on drainKevin Wolf
We already requested that block jobs be paused in .bdrv_drained_begin, but no guarantee was made that the job was actually inactive at the point where bdrv_drained_begin() returned. This introduces a new callback BdrvChildRole.bdrv_drained_poll() and uses it to make bdrv_drain_poll() consider block jobs using the node to be drained. For the test case to work as expected, we have to switch from block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even considered active and must be waited for when draining the node. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()Kevin Wolf
Commit 91af091f923 added an additional aio_poll() to BDRV_POLL_WHILE() in order to make sure that all pending BHs are executed on drain. This was the wrong place to make the fix, as it is useless overhead for all other users of the macro and unnecessarily complicates the mechanism. This patch effectively reverts said commit (the context has changed a bit and the code has moved to AIO_WAIT_WHILE()) and instead polls in the loop condition for drain. The effect is probably hard to measure in any real-world use case because actual I/O will dominate, but if I run only the initialisation part of 'qemu-img convert' where it calls bdrv_block_status() for the whole image to find out how much data there is copy, this phase actually needs only roughly half the time after this patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-18hw/display: add standalone ramfb deviceGerd Hoffmann
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Message-id: 20180613122948.18149-3-kraxel@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2018-06-18hw/display: add ramfb, a simple boot framebuffer living in guest ramGerd Hoffmann
The boot framebuffer is expected to be configured by the firmware, so it uses fw_cfg as interface. Initialization goes as follows: (1) Check whenever etc/ramfb is present. (2) Allocate framebuffer from RAM. (3) Fill struct RAMFBCfg, write it to etc/ramfb. Done. You can write stuff to the framebuffer now, and it should appear automagically on the screen. Note that this isn't very efficient because it does a full display update on each refresh. No dirty tracking. Dirty tracking would have to be active for the whole ram slot, so that wouldn't be very efficient either. For a boot display which is active for a short time only this isn't a big deal. As permanent guest display something better should be used (if possible). This is the ramfb core code. Some windup is needed for display devices which want have a ramfb boot display. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Message-id: 20180613122948.18149-2-kraxel@redhat.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2018-06-16target/ppc, spapr: Move VPA information to machine_dataDavid Gibson
CPUPPCState currently contains a number of fields containing the state of the VPA. The VPA is a PAPR specific concept covering several guest/host shared memory areas used to communicate some information with the hypervisor. As a PAPR concept this is really machine specific information, although it is per-cpu, so it doesn't really belong in the core CPU state structure. There's also other information that's per-cpu, but platform/machine specific. So create a (void *)machine_data in PowerPCCPU which can be used by the machine to locate per-cpu data. Intialization, lifetime and cleanup of machine_data is entirely up to the machine type. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org>
2018-06-16pnv_core: Allocate cpu thread objects individuallyDavid Gibson
Currently, we allocate space for all the cpu objects within a single core in one big block. This was copied from an older version of the spapr code and requires some ugly pointer manipulation to extract the individual objects. This design was due to a misunderstanding of qemu lifetime conventions and has already been changed in spapr (in 94ad93bd "spapr_cpu_core: instantiate CPUs separately". Make an equivalent change in pnv_core to get rid of the nasty pointer arithmetic. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>
2018-06-16mos6522: expose mos6522_update_irq() through MOS6522DeviceClassMark Cave-Ayland
In the case where we have an interrupt generated externally from inputs to bits 1 and 2 of port A and/or port B, it is necessary to expose mos6522_update_irq() so it can be called by the interrupt source. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16mac_newworld: add PMU deviceMark Cave-Ayland
The PMU device supercedes the CUDA device found on older New World Macs and is supported by a larger number of guest OSs from OS 9 to OS X 10.5. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16adb: add property to disable direct reg 3 writesMark Cave-Ayland
MacOS 9 has a bug in its PMU driver whereby after configuring the ADB bus devices it sends another write to reg 3 on both devices resetting them both back to the same address. Add a new disable_direct_reg3_writes property to ADBDevice to disable these direct writes which can enabled just for the upcoming pmu-adb support. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16mac_newworld: add gpios to macio devices with PMU enabledMark Cave-Ayland
PMU-enabled New World Macs expose their GPIOs via a separate memory region within the macio device. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-16mac_newworld: add via machine option to control mac99 VIA/ADB configurationMark Cave-Ayland
This option allows the VIA configuration to be controlled between 3 different possible setups: cuda, pmu-adb and pmu with USB rather than ADB keyboard/mouse. For the moment we don't do anything with the configuration except to pass it to the macio device (the via-cuda parent) and also to the firmware via the fw_cfg interface so that it can present the correct device tree. The default is cuda which is the current default and so will have no change in behaviour. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2018-06-15tcg: remove tb_lockEmilio G. Cota
Use mmap_lock in user-mode to protect TCG state and the page descriptors. In !user-mode, each vCPU has its own TCG state, so no locks needed. Per-page locks are used to protect the page descriptors. Per-TB locks are used in both modes to protect TB jumps. Some notes: - tb_lock is removed from notdirty_mem_write by passing a locked page_collection to tb_invalidate_phys_page_fast. - tcg_tb_lookup/remove/insert/etc have their own internal lock(s), so there is no need to further serialize access to them. - do_tb_flush is run in a safe async context, meaning no other vCPU threads are running. Therefore acquiring mmap_lock there is just to please tools such as thread sanitizer. - Not visible in the diff, but tb_invalidate_phys_page already has an assert_memory_lock. - cpu_io_recompile is !user-only, so no mmap_lock there. - Added mmap_unlock()'s before all siglongjmp's that could be called in user-mode while mmap_lock is held. + Added an assert for !have_mmap_lock() after returning from the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic. Performance numbers before/after: Host: AMD Opteron(tm) Processor 6376 ubuntu 17.04 ppc64 bootup+shutdown time 700 +-+--+----+------+------------+-----------+------------*--+-+ | + + + + + *B | | before ***B*** ** * | |tb lock removal ###D### *** | 600 +-+ *** +-+ | ** # | | *B* #D | | *** * ## | 500 +-+ *** ### +-+ | * *** ### | | *B* # ## | | ** * #D# | 400 +-+ ** ## +-+ | ** ### | | ** ## | | ** # ## | 300 +-+ * B* #D# +-+ | B *** ### | | * ** #### | | * *** ### | 200 +-+ B *B #D# +-+ | #B* * ## # | | #* ## | | + D##D# + + + + | 100 +-+--+----+------+------------+-----------+------------+--+-+ 1 8 16 Guest CPUs 48 64 png: https://imgur.com/HwmBHXe debian jessie aarch64 bootup+shutdown time 90 +-+--+-----+-----+------------+------------+------------+--+-+ | + + + + + + | | before ***B*** B | 80 +tb lock removal ###D### **D +-+ | **### | | **## | 70 +-+ ** # +-+ | ** ## | | ** # | 60 +-+ *B ## +-+ | ** ## | | *** #D | 50 +-+ *** ## +-+ | * ** ### | | **B* ### | 40 +-+ **** # ## +-+ | **** #D# | | ***B** ### | 30 +-+ B***B** #### +-+ | B * * # ### | | B ###D# | 20 +-+ D ##D## +-+ | D# | | + + + + + + | 10 +-+--+-----+-----+------------+------------+------------+--+-+ 1 8 16 Guest CPUs 48 64 png: https://imgur.com/iGpGFtv The gains are high for 4-8 CPUs. Beyond that point, however, unrelated lock contention significantly hurts scalability. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15translate-all: protect TB jumps with a per-destination-TB lockEmilio G. Cota
This applies to both user-mode and !user-mode emulation. Instead of relying on a global lock, protect the list of incoming jumps with tb->jmp_lock. This lock also protects tb->cflags, so update all tb->cflags readers outside tb->jmp_lock to use atomic reads via tb_cflags(). In order to find the destination TB (and therefore its jmp_lock) from the origin TB, we introduce tb->jmp_dest[]. I considered not using a linked list of jumps, which simplifies code and makes the struct smaller. However, it unnecessarily increases memory usage, which results in a performance decrease. See for instance these numbers booting+shutting down debian-arm: Time (s) Rel. err (%) Abs. err (s) Rel. slowdown (%) ------------------------------------------------------------------------------ before 20.88 0.74 0.154512 0. after 20.81 0.38 0.079078 -0.33524904 GTree 21.02 0.28 0.058856 0.67049808 GHashTable + xxhash 21.63 1.08 0.233604 3.5919540 Using a hash table or a binary tree to keep track of the jumps doesn't really pay off, not only due to the increased memory usage, but also because most TBs have only 0 or 1 jumps to them. The maximum number of jumps when booting debian-arm that I measured is 35, but as we can see in the histogram below a TB with that many incoming jumps is extremely rare; the average TB has 0.80 incoming jumps. n_jumps: 379208; avg jumps/tb: 0.801099 dist: [0.0,1.0)|▄█▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁ ▁▁▁ ▁▁▁ ▁|[34.0,35.0] Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15translate-all: introduce assert_no_pages_lockedEmilio G. Cota
The appended adds assertions to make sure we do not longjmp with page locks held. Note that user-mode has nothing to check, since page_locks are !user-mode only. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15translate-all: use per-page locking in !user-modeEmilio G. Cota
Groundwork for supporting parallel TCG generation. Instead of using a global lock (tb_lock) to protect changes to pages, use fine-grained, per-page locks in !user-mode. User-mode stays with mmap_lock. Sometimes changes need to happen atomically on more than one page (e.g. when a TB that spans across two pages is added/invalidated, or when a range of pages is invalidated). We therefore introduce struct page_collection, which helps us keep track of a set of pages that have been locked in the appropriate locking order (i.e. by ascending page index). This commit first introduces the structs and the function helpers, to then convert the calling code to use per-page locking. Note that tb_lock is not removed yet. While at it, rename tb_alloc_page to tb_page_add, which pairs with tb_page_remove. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15translate-all: iterate over TBs in a page with PAGE_FOR_EACH_TBEmilio G. Cota
This commit does several things, but to avoid churn I merged them all into the same commit. To wit: - Use uintptr_t instead of TranslationBlock * for the list of TBs in a page. Just like we did in (c37e6d7e "tcg: Use uintptr_t type for jmp_list_{next|first} fields of TB"), the rationale is the same: these are tagged pointers, not pointers. So use a more appropriate type. - Only check the least significant bit of the tagged pointers. Masking with 3/~3 is unnecessary and confusing. - Introduce the TB_FOR_EACH_TAGGED macro, and use it to define PAGE_FOR_EACH_TB, which improves readability. Note that TB_FOR_EACH_TAGGED will gain another user in a subsequent patch. - Update tb_page_remove to use PAGE_FOR_EACH_TB. In case there is a bug and we attempt to remove a TB that is not in the list, instead of segfaulting (since the list is NULL-terminated) we will reach g_assert_not_reached(). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctxEmilio G. Cota
Thereby making it per-TCGContext. Once we remove tb_lock, this will avoid an atomic increment every time a TB is invalidated. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15tcg: track TBs with per-region BST'sEmilio G. Cota
This paves the way for enabling scalable parallel generation of TCG code. Instead of tracking TBs with a single binary search tree (BST), use a BST for each TCG region, protecting it with a lock. This is as scalable as it gets, since each TCG thread operates on a separate region. The core of this change is the introduction of struct tcg_region_tree, which contains a pointer to a GTree and an associated lock to serialize accesses to it. We then allocate an array of tcg_region_tree's, adding the appropriate padding to avoid false sharing based on qemu_dcache_linesize. Given a tc_ptr, we first find the corresponding region_tree. This is done by special-casing the first and last regions first, since they might be of size != region.size; otherwise we just divide the offset by region.stride. I was worried about this division (several dozen cycles of latency), but profiling shows that this is not a fast path. Note that region.stride is not required to be a power of two; it is only required to be a multiple of the host's page size. Note that with this design we can also provide consistent snapshots about all region trees at once; for instance, tcg_tb_foreach acquires/releases all region_tree locks before/after iterating over them. For this reason we now drop tb_lock in dump_exec_info(). As an alternative I considered implementing a concurrent BST, but this can be tricky to get right, offers no consistent snapshots of the BST, and performance and scalability-wise I don't think it could ever beat having separate GTrees, given that our workload is insert-mostly (all concurrent BST designs I've seen focus, understandably, on making lookups fast, which comes at the expense of convoluted, non-wait-free insertions/removals). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15qht: return existing entry when qht_insert failsEmilio G. Cota
The meaning of "existing" is now changed to "matches in hash and ht->cmp result". This is saner than just checking the pointer value. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15qht: require a default comparison functionEmilio G. Cota
qht_lookup now uses the default cmp function. qht_lookup_custom is defined to retain the old behaviour, that is a cmp function is explicitly provided. qht_insert will gain use of the default cmp in the next patch. Note that we move qht_lookup_custom's @func to be the last argument, which makes the new qht_lookup as simple as possible. Instead of this (i.e. keeping @func 2nd): 0000000000010750 <qht_lookup>: 10750: 89 d1 mov %edx,%ecx 10752: 48 89 f2 mov %rsi,%rdx 10755: 48 8b 77 08 mov 0x8(%rdi),%rsi 10759: e9 22 ff ff ff jmpq 10680 <qht_lookup_custom> 1075e: 66 90 xchg %ax,%ax We get: 0000000000010740 <qht_lookup>: 10740: 48 8b 4f 08 mov 0x8(%rdi),%rcx 10744: e9 37 ff ff ff jmpq 10680 <qht_lookup_custom> 10749: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-06-15Merge remote-tracking branch ↵Peter Maydell
'remotes/dgilbert/tags/pull-migration-20180615a' into staging Migration pull 2018-06-15 # gpg: Signature made Fri 15 Jun 2018 16:13:17 BST # gpg: using RSA key 0516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7 * remotes/dgilbert/tags/pull-migration-20180615a: migration: calculate expected_downtime with ram_bytes_remaining() migration/postcopy: Wake rate limit sleep on postcopy request migration: Wake rate limiting for urgent requests migration/postcopy: Add max-postcopy-bandwidth parameter migration: introduce migration_update_rates migration: fix counting xbzrle cache_miss_rate migration/block-dirty-bitmap: fix dirty_bitmap_load migration: Poison ramblock loops in migration migration: Fixes for non-migratable RAMBlocks typedefs: add QJSON Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-15Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell
Block layer patches: - Fix options that work only with -drive or -blockdev, but not with both, because of QDict type confusion - rbd: Add options 'auth-client-required' and 'key-secret' - Remove deprecated -drive options serial/addr/cyls/heads/secs/trans - rbd, iscsi: Remove deprecated 'filename' option - Fix 'qemu-img map' crash with unaligned image size - Improve QMP documentation for jobs # gpg: Signature made Fri 15 Jun 2018 15:20:03 BST # gpg: using RSA key 7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (26 commits) block: Remove dead deprecation warning code block: Remove deprecated -drive option serial block: Remove deprecated -drive option addr block: Remove deprecated -drive geometry options rbd: New parameter key-secret rbd: New parameter auth-client-required block: Fix -blockdev / blockdev-add for empty objects and arrays check-block-qdict: Cover flattening of empty lists and dictionaries check-block-qdict: Rename qdict_flatten()'s variables for clarity block-qdict: Simplify qdict_is_list() some block-qdict: Clean up qdict_crumple() a bit block-qdict: Tweak qdict_flatten_qdict(), qdict_flatten_qlist() block-qdict: Simplify qdict_flatten_qdict() block: Make remaining uses of qobject input visitor more robust block: Factor out qobject_input_visitor_new_flat_confused() block: Clean up a misuse of qobject_to() in .bdrv_co_create_opts() block: Fix -drive for certain non-string scalars block: Fix -blockdev for certain non-string scalars qobject: Move block-specific qdict code to block-qdict.c block: Add block-specific QDict header ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-15exec.c: Handle IOMMUs in address_space_translate_for_iotlb()Peter Maydell
Currently we don't support board configurations that put an IOMMU in the path of the CPU's memory transactions, and instead just assert() if the memory region fonud in address_space_translate_for_iotlb() is an IOMMUMemoryRegion. Remove this limitation by having the function handle IOMMUs. This is mostly straightforward, but we must make sure we have a notifier registered for every IOMMU that a transaction has passed through, so that we can flush the TLB appropriately when any of the IOMMUs change their mappings. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180604152941.20374-5-peter.maydell@linaro.org
2018-06-15iommu: Add IOMMU index argument to translate methodPeter Maydell
Add an IOMMU index argument to the translate method of IOMMUs. Since all of our current IOMMU implementations support only a single IOMMU index, this has no effect on the behaviour. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180604152941.20374-4-peter.maydell@linaro.org
2018-06-15iommu: Add IOMMU index argument to notifier APIsPeter Maydell
Add support for multiple IOMMU indexes to the IOMMU notifier APIs. When initializing a notifier with iommu_notifier_init(), the caller must pass the IOMMU index that it is interested in. When a change happens, the IOMMU implementation must pass memory_region_notify_iommu() the IOMMU index that has changed and that notifiers must be called for. IOMMUs which support only a single index don't need to change. Callers which only really support working with IOMMUs with a single index can use the result of passing MEMTXATTRS_UNSPECIFIED to memory_region_iommu_attrs_to_index(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180604152941.20374-3-peter.maydell@linaro.org
2018-06-15iommu: Add IOMMU index concept to IOMMU APIPeter Maydell
If an IOMMU supports mappings that care about the memory transaction attributes, then it no longer has a unique address -> output mapping, but more than one. We can represent these using an IOMMU index, analogous to TCG's mmu indexes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180604152941.20374-2-peter.maydell@linaro.org